2016 Postmortem
Related: About this forum538's estimates of probability net us almost ZERO INFORMATION
Just look at this graphic of the national poll average and the 538 probabilities
Maybe, MAYBE the 538 model is squeezing some information out at the margins. I bet I could construct a model that does as well or better than 538 simply by taking the differences in the two poll averages and converting it to a direct probability.
vadermike
(1,415 posts)Well this chart is scary Nervous about the first debate !
Loki Liesmith
(4,602 posts)Except about Silver's (and any modelers) ability to distinguish signal and noise.
If you want to see who is likely to win, look at the national poll average and see who is ahead. Anything else may just be sophistry.
DemocratSinceBirth
(99,708 posts)I like the moral of the fox and the hedgehog. Nate Is becoming a hedgehog. All he knows is how to scare people and generate clicks.
When I look at all the models they suggest to me Hill is a 3-1 to 5-2 favorite.
Do you like boxing, Loki?
Money Mayweather was a 3-1 favorite against Manny Pacquiao and he won. He wanted a boring fight and that's what he got. A boring debate would be good for Hill.
Loki Liesmith
(4,602 posts)I've watched it destroy people. Many of the players are vile. There is a kind of brutal purity to it...your analogy to politics is very apt.
In the end Nate's model will most likely be right. So will everyone else's. Almost no one is wrong at the last minute.
DemocratSinceBirth
(99,708 posts)Hillary is a hell of a lot smarter than me and if I can, in my mind, parry his verbal brickbats I am sure she can.
Will we see Trump on Quaaludesor or Crystal Meth tomorrow tonight?
Loki Liesmith
(4,602 posts)Downside of Low Energy Trump not as bad as downside of High Energy Trump
tblue37
(65,212 posts)uses any poll that presages a squeaker of an election as a tool for pumping up their horse race narrative.
By claiming all the time that Clinton is losing ground, the MSM helps Trump look like a plausible candidate, and a lot of low-info voters are undoubtedly influenced in his favor by stories that present him as a scrappy underdog defying all expectations to seriously challenge the campaign that is being presented as an invincible juggernaut.
The MSM did the same thing during the GOP primary, and by constantly painting Trump as a "winner" with polls, and then using those polls to justify giving him even more free positive publicity, they finally made him one.
Loki Liesmith
(4,602 posts)I'm not convinced that Nate's model adds significant value to that of the national average.
wncHillsupport
(112 posts)All he does is show huge overlaps where either candidate 'may' win an election. I can make a prediction like that without all the fancy modeling.
- There is a more likely chance Hillary will win.
- However, there is also a pretty good chance that Trump will win.
My rationale: It depends on which group turns out the most voters in swing states and which groups flood the poll booths in 'sure' states to be sure no 'stay at home' group loses their 'sure' state.
Nate is right either way, based on his overlaps. So he can't really be wrong. Big deal.
DemocratSinceBirth
(99,708 posts)I always read about the lack of organization in the Drumpf campaign but polling suggests supporters of both him and Clinton have been equally as likely to be contacted by the campaign.
Both can't be true.
onehandle
(51,122 posts)I encountered the GOP ground game for PA. It's a fucking joke.
PENNSYLVANIA WILL WIN THIS ELECTION.
Loki Liesmith
(4,602 posts)what is the GOP PA ground game like?
Charles Bukowski
(1,132 posts)overperform relative to the final RCP average for this reason. Similar to Obama.
CajunBlazer
(5,648 posts)"I bet I could construct a model that does as well or better than 538 simply by taking the differences in the two poll averages and converting it to a direct probability."
There is no way you could do that unless you are a statistician with who has studied the major polls, their biases, and their accuracy over the years. Just because two graphs have similar shapes, does not mean you can convert the vertical scale of one graph (national poll results) into the proper vertical scale of the other (probability of winning) without the extensive work which Nate Sliver has put into his work.
Silver does not even use national polls in his probability calculations. He work begins with a statistical evaluation of all of the recent polls for every state culminating in a probability of each candidate winning every state. Those results are then combined statistically into a probabilities of each candidate winning the electoral college vote.
Let me briefly explain who you methodology could make you look like a fool. Let's say that Candidate A is leading in the national polls, but trails Candidate B in the majority of the big swing state. Your methodology will lead you to believe that Candidate A is ahead while Candidate B would have a much better probability of winning.
The fact that the national poll results seem to correlate with the swing state results is coincidental.
Loki Liesmith
(4,602 posts)"who has studied the major polls, their biases, and their accuracy over the years."
Yeah, I couldn't be that, could I?
CajunBlazer
(5,648 posts)Even if the national polls graphs to predict the results of Presidential elections (which we can't of course), without statistics you would have no way of assigning a probability winning equivalent to each change of percentage separation between the two candidates in the national polls without simply guessing.
For instance if Hillary were head by a single percentage point over in the combined national polls over Trump, what would you assign as her probability of winning the election? You could only guess. What would be her probability of winning if she were ahead by two or three points, ... or six or seven? Again, you would have no clue and could only guess, and my guess would be as good as your's so why would anyone pay attention to your guesses.
And all of that avoids the fact that we don't elect Presidents in this country based on the results of national popular vote - we use state by state tabulations to establish Electoral College votes which in turn determine the winner.
So your original statement, ""I bet I could construct a model that does as well or better than 538 simply by taking the differences in the two poll averages and converting it to a direct probability" is simply bull manure.
Loki Liesmith
(4,602 posts)I'm typing up my letter of resignation as we speak.
CajunBlazer
(5,648 posts)unblock
(52,107 posts)CajunBlazer
(5,648 posts)unblock
(52,107 posts)BlueInPhilly
(870 posts)And I say all models are wrong. Some just have smaller error than others. A model will NEVER equal reality. Ever. A model with zero error just means an over-fitted model that is too rigid it will not work except for the particular scenario is was fitted for.
So there.
CajunBlazer
(5,648 posts)is equally wrong, especially when the two models are based on a totally different events - in this case a model of the popular vote verses a model of Electoral College results - and only one models the only path to victory.
BlueInPhilly
(870 posts)I said all models have varying degrees of error. You cannot look at one model and equate it with another. There are so many moving pieces to consider.
I do this for a living. How about you?
CajunBlazer
(5,648 posts)and polling has been a hobby for years. However, you don't have to be professional statistician to know that if you wish to as accurately as possible model an event, say the results of votes in the electoral college, it would be wise to model that event rather model the results of a similar, but totally different event such as the popular vote.
I would expect that the above statement is not disputable.
BlueInPhilly
(870 posts)But not sure why you are undermining other people. No one is jealous of Silver - he picked a niche and capitalized on it, so good for him.
Also, sometimes, you have to rely on proxy data that may not exactly be the same as what you want to predict, but may reasonably mimic its behaviour. It has been done and it has had decent results.
I do this for a living. I've seen all kinds of models, and like polls, not all models are created equal.
Foggyhill
(1,060 posts)I don't care how "A+" sic) a polling outfit is supposed to be
A 500 person (or less) poll for me were the demo's are way off from actual voting population is close to garbage level.
The margin of error on such polls is 4.5% 19 times out of 20 (and a lot more 5% of the time).
IF THE RANDOM SAMPLE IS ACTUALLY REPRESENTATIVE OF THE POPULATION POLLED (say likely voters)
If your off from that by just a few persons, because say you use land lines, call during the day, and poll more older voters, or less minorities (or whatever limitations or biases inherent in your sampling method), then your result may be way beyond the margin of error and can't be relied in any way.
That's not even taking into account the fact that the way the interview questions are asked can change answers. Asking about two way choice, before 4 way will lead to different answers than the reverse (strange but true).
That's not even taking into account biases in likely vs registered voters that overlay this whole mess of polls.
And yes, I've got experience/expertise in stats through engineering/MBA and Communication degrees.
If you see a professionally made poll that polls 2048 people in Pennsylvania that shows Trump on top in registered voters... Then be very worried, otherwise chill out and GOTV.
Yes, it's my first post but been lurking awhile in the Hillary forum (even during the crazy primary).
I'm Canadian and can't vote in this Election, but feel like a concerned third party because the USA is our closest partner.
Loki Liesmith
(4,602 posts)I stated above that national popular vote is a good predictor of electoral vote outcomes. I believe the data bear that out.
https://leftymitt.com/projects/us-election-dynamics/
The point is, not that this is the best model we can have, only that Silver's model (and really anybody's, including my own) do not necessarily introduce much new information over and against a simple poll average.
Furthermore, if a given model is actually NOISIER than a poll average, one begins to suspect that the model is INTRODUCING noise into the equation, or at least wonder how that model reacts to new information. For an industry that brags that it is bringing increased certainty to statistical questions, that would seem to be a problem, would it not?
I can't really be bothered to go into much more detail about this.
Loki Liesmith
(4,602 posts)If Silver's model has approximately the same variance as the polls he is incorporating into it over the same interval...how much information gain can his model claim to have?
Assuming a normal distribution, the entropy (uncertainty of an estimate) of a normal distribution is a function only of the variance of the distribution
Given that, if both the poll average and the Silver model are strongly correlated and have more or less the same variance, then the entropy of both distributions should be about the same.
Given their strong correlation that means that Silver's model brings minimal added value to our ability to predict outcomes.
BlueInPhilly
(870 posts)He gives higher coefficients for some polls. That is a subjective judgment that needs to be evaluated objectively. We don't know how he separates one poll from another.
So he doesn't necessarily have a normal distribution - he skews it towards some and I bet his methodology is a blackbox. As your argument pointed out, it wouldn't make his method any more special.
Loki Liesmith
(4,602 posts)For example he does trendline adjustments to individual polls. If a given poll is a large outlier in one direction in a preceding epoch, and gives you a large lead in that epoch and then you poll again on the current epoch and you have a smaller lead, the current lead is actually counted as a loss. Essentially you penalize a given poll for finding an outlier in the past. If there was a significant house effect for a given pollster, I'd be kind of OK with this, except that Silver already corrects for house effects, so this amounts to a bit of a double correction. Or if the trendline adjustment was calculated from the aggregate on an epoch...I'd be mostly OK with that too.
Silver has been criticized for introducting too much correlated error in his model and I think this a fair criticism.
In any event the normal distribution example is a toy model. Trying to capture the fact that treating the 538 model as a black box transducer of some signal(s), his bitrate is pretty low. The information gain is marginal. Perhaps if the polls closed to <1% nationally the added value would become more clear...it would certainly be easier to evaluate the performance of his secret saunce in that case.
CajunBlazer
(5,648 posts)Pulling information off of the internet does not make you a statistician. And were you a statistician you would never have made your absurd comments to begin with.
There is ample reason to believe that the nation polls need not be strongly correlated to a state by state statistical analysis of electoral college results. Just because they appear to be correlated for a particular data set does not indeed mean that they should be for any situation.
There is ample reason believe the that they may not be correlate at all. The national polls are strongly influenced by the by small samples taken from consistently blue and constantly red states where the difference in support between candidates can be and often is huge. The fact that when one takes these various small samples from each state and combines them in national polls and the graph of the results happen to resemble Silver's graph for an election cycle is no proof of correlation.
The fact that there is any correlation at all is coincidental - and there is not reason to believe that correlation will continue in the future. Even if were to accept the preposition that voters in swing states were very much the same as voters of the country as a whole, the two groups are not subject to the same stimuli. Candidates put hundreds of dollars into commercials, rallies and voter turn out drives into swing state, especially in the later stages of a campaign, which the people in other states never experience. There is no reason to believe that given all this additional stimuli voters in swing states will continue to behave in the same way as voters across the country.
Bottom line, it is a mistake to assume correlation in one data set means that that correlation will continue across all data sets. And you have not begun to address the issue explored in my other post that their is no why to assign probabilities of winning to the graphs obtained form national polls.
Loki Liesmith
(4,602 posts)Pulling information off the internet does not make me a statistician. We agree. Why argue?
CajunBlazer
(5,648 posts)and are eager to discredit his work. However, in an effort to voice your displeasure in terms which most DU members could understand, you used a clumsy example and you were called on it. Then instead of admitting that it was a clumsy example as you should have, you tried to overcome the objections with statistical BS you knew didn't apply.
If you want to criticize Silver's work, you should at least have the decency to first understand how he does his calculations. Then you can throw your statistical darts.
Loki Liesmith
(4,602 posts)I'll have a follow up on this as soon as I can rasterize the 538 trendlines, or get some csv files of it...
I still fail to see what's gotten you so riled up, but to each their own.
I don't see how constructive this is until I work up that data.
BlueInPhilly
(870 posts)Last edited Sun Sep 25, 2016, 08:37 PM - Edit history (1)
I can construct my very own model but I do not have the time to gather all the data I need. I don't know if Nate normalizes for different confidence limits and intervals; if he doesn't, he should. Historical performance is only relevant to a certain degree. You cannot drive forward by simply looking at your rear view mirror. You have to consider turning points, changing population distribution, and other attributes that may affect an individual's propensity to vote for a candidate.
I take all these polls with a grain of salt. Not polls are created equal. I see a number and I immediately look at the underlying data - sorry, 2nd nature. Whilst a 500 sample size is enough for a +/- 5% confidence interval at 50%, the required sample size goes up considerably when a poll aims for more granular questions and answers.
Remember the adage: All models are wrong but some are right sometimes.
In the case of Nate's model, the volatility of his predictive variables (i.e., disparate polls) probably undermines any semblance of stability, and volatility is the krypton of models. We don't like it.
MFM008
(19,803 posts)He had the Seahawks down to win our second super bowl 2 years ago.
Totally winning.
Then came the unexpected throw instead of running the ball at the 2 yard line to win.
Intercepted.
We lose in the last seconds of the game.
Point is.....
Yeah we're still pissed.
They can't see the unexpected...
Or calculate it.
He was wrong.
CajunBlazer
(5,648 posts)We would never have any reason to have an election or play a game.
my point IS he is not perfect.
RAFisher
(466 posts)Do the math and come up with imperial data showing one is better than the other. Using smoothing or not doesn't make a model incorrect. Honestly I don't even understand the complaint is. You probably should first read how the 538 model works before claiming you know it's wrong.
Loki Liesmith
(4,602 posts)As soon as I get a csv of 538's numbers I plan to look at
1) distribution of the absolute value of instantaneous derivative of 538 model probabilities vs. the same for huffpost poll trendlines.
2) distribution of 538 model probabilities - trendlines
3) PCA and ICA on both sets together resampled to the same grid.
Unfortunately having some trouble getting one raw data set. Expect rectified shortly.
Cheers,
Loki Liesmith
(4,602 posts)still_one
(92,058 posts)one subscribes to his model or not is another question.
The polls say 25% of the populous are undecided.
While undecideds will play a big part, I personally believe if Democrats come out in full force to vote, we will win.
Who is more motivated?
Persondem
(1,936 posts)I figured you could take a deeper dive into 538 ways of doing things than I could have. I too saw a problem with his use of the trendline as the adjustments did not seem to agree with the recent trend at all. Also he insists on using a Google based poll that is clearly an outlier for Trump; it consistently shows Trump with a double digit lead in FL. Said poll is internet based, has a 29% response rate and infers state of residency based on IP address.
I'll check back. Still curious as to what you come up with by crunching the 538 data.
Loki Liesmith
(4,602 posts)But I do tackle several of your thoughts in the posts above.
I've just written some python code to turn these jpegs into data I can crunch. That's not perfect, but I think it may be the best I can do.
Already converted the huffpo data, will be converting the 538 data while I watch football. Hopefully some answers tonight.
Foggyhill
(1,060 posts)And yes, I'm upset by the amount of attention this amount of bad science is getting.
But, shouldn't be really, much of the population has no understanding of stats and polling,
so misusing this lack of knowledge is kind of expected I guess (its certainly not up to scientific standard...)
I do question his ethics...
Loki Liesmith
(4,602 posts)Persondem
(1,936 posts)I checked out the post you link to below. No way I could do the analysis you did but my thoughts after looking at his ratings for FL were that a lot of what Silver looks at is arbitrary ... or at least poorly explained. Also he seems to let polls with very questionable results influence his ratings - ala theUSC/LA Times and the Google based surveys. His trend line calculation for FL looked way off as well.
Thank you for taking the time to look at 538's data in detail.
fleabiscuit
(4,542 posts)Donald Trumps Six Stages Of Doom
http://fivethirtyeight.com/features/donald-trumps-six-stages-of-doom/