Reverse-engineering the problematic tail behavior of Fivethirtyeight forecast (opens in new tab)

(statmodeling.stat.columbia.edu)

185 pointsbuddhiajuke5y ago234 comments

234 comments

I disagree with the author on the idea that tail is too fat for isolated anomalies. There are most certainly events that can happen, which may lead to a red California or a blue Alabama.

Presidential assassination, war, video proof of something incredibly heinous (pedophilia?), etc. can absolutely lead to these outcomes. You don't even have to go that far back. Nixon and Reagan flipped states like no-one's business.

I do however agree, that 538's state-state correlation model seems weak.

California and Alabama would only flip during a wave, and that wave would consume any and all states. The fact that 538's model doesn't strongly show that pattern is a failing of it. But, it is not clear if a model that inaccurately models the unlikeliest of events (california flipping while Florida stays blue), does not necessarily mean that it is terrible predictor of it's primary target (Presidential likelihoods).

As a data scientist, I can totally understand Nate's hesitation. Do you impose strong priors on the model to reflect strong domain intuition or do build a model that best characterizes the data it is based on. In the presence of infinite data, you should abandon all domain based priors. For single digit data points, priors are essential. For any number of data in between, it is anyone's best guess.

yodon5y ago

I've always liked Enrico Fermi's attitude on this. When you're Enrico Fermi, you get to say things like "One data point gives you a curve. Two data points gives you the distribution about the curve."

pierrefermat15y ago

Curious is there a source for this? It is meant as satire?

1 more reply

wisty5y ago

OK, so is this your point?

Something could flip California and Alabama (example, Trump starts defending Roe v. Wade and in response Biden somehow manages to sound like he's opposing it). This would probably be some latent hidden variable, like whether the candidates are seen as socially conservative, which would effect all states (though California and Alabama would be the most impacted).

nullc5y ago

Meh. If you fit a model and don't explicitly constrain against "un-physical" results like negative correlations, you'll end up with them.

Constraining against them won't improve your models fit (usually by definition), and it doesn't always improve robustness (at least for situations near average)-- because they're acting to debias the model in ways that you otherwise don't have enough degrees of freedom to address.

A negative correlation here is also potentially historically supported, in the sense that sometimes DEM/GOP candidates are philosophically reversed in some way relevant to the state. As in, "The only way a GOP would get elected in X is if they had the DEM position on subject Y which would make them lose state Z, who cares as much about that subject as X but in the opposite direction."

Now-- it doesn't seem likely case in this election (e.g. Trump is not (currently) a massively pro-choice republican), so it probably shouldn't apply here-- but it's isn't hard for me to imagine how a negative correlation might show up out of the historical data.

RivieraKid5y ago

> If you fit a model and don't explicitly constrain against "un-physical" results like negative correlations, you'll end up with them.

The Economist model does exactly that, and all of their correlations are positive.

I recommend reading their methodology, they know what they're doing (I wouldn't say the same about 538). Andrew Gelman has developed some of the Bayesian methods and software that people like Nate Silver use, he's the main author of what's considered a reference book on Bayesian statistics.

noirbot5y ago

I think the question is if it matters to the predictive accuracy of the model. Just because it puts out results you can't envision actually happening on the margins doesn't mean they can't happen, or that they can't be valuable in presenting a holistic result.

It's clear that the models are tuned differently, but from Silver's replies in the PS's, it seems that he's ok with these artifacts being part of the model.

1 more reply

sangnoir5y ago

My statistics knowledge has withered away, but isn't this quibbling over overall approach? 538 seems do be doing a top-down approach while the Economist is more bottom-up. What is strange is that a person affiliated to the Economist is then asking why the 538 model's emergent properties aren't exhibiting more bottom-up characteristics .

preommr5y ago

> It didn't take very long to do the analysis. But it did then take another hour or so to write it up.

It's very interesting to see how long it takes people to do things. I am amazed that entire article took 1 hour to type up. I've spent entire afternoons trying to write shallower pieces of work.

i_love_limes5y ago

I think a lot of people (myself included when I've felt the pressure to) lowball how long things like this take because a) it makes me look smart and b) people could judge of they knew who much time I actually wasted on it.

taeric5y ago

I think you read the parent in the opposite direction than intended. It was praise for getting this done so quickly, by my read.

Granted, i could just be misreading this post. :)

1 more reply

chrisfrantz5y ago

It looks like it was written as a single stream of conscience. While I couldn’t write that article, if I hit a flow state and was interested in the topic, it seems possible.

jacquesm5y ago

The trick is to think before you write. The same goes for programming. If you already know what you are going to write or build then you can reach very high apparent productivity, the time spent on thinking about it isn't accounted for.

cwhiz5y ago

Not sure if it’s wrong to put this here, but here is a link to their election forecast.

https://projects.economist.com/us-2020-forecast/president

You can compare this to the 538 model and see where these two teams and forecasts disagree.

rsynnott5y ago

Well, I don’t see how they expect anyone to take this seriously. No cartoon fox at all!

ta86455y ago

Can someone help me understand what odds like this mean in the context of an election?

The model says that Trump has a 1 in 10 chance of winning. With a fair 10-sided die it makes sense that you have a 1 in 10 chance of any given side rolling face up. But what is the die that is being rolled in these election statistics? What is the "chance" element that is being predicted?

aakilfernandes5y ago

You're comparing two scenarios, one in which you know all the facts, and one in which you don't.

In the dice toss scenario, we know everything relevant. In the election scenario, we don't.

A model like this is attempting to say "these are the rules we think exist. Based on the rules, and assuming the data is off by some random distribution, here's what we think could happen".

What different forecasters disagree about is what the rules are. For example, the relevance of certain demographic characteristics and the potential variance between polling (conducted prior to the election) and actual election results.

There's a huge amount of assumptions, and forecasters disagree on those assumptions. We have very little historical data (polling is very recent) and even with complete historical data, future elections do not always conform to past elections.

1 more reply

albntomat05y ago

This sounds like a frequentist vs Bayesian statistics discussion, which involves (this is a simplification by me, a non-expert in the area) different definitions of probability. The frequentist view is along the lines of rolling a 10 side die hundreds of times, recording the results, and determining that each side comes up equally. The Bayesian view is that the probability measures our certainty about some event. For example, take the hypothetical point in time where all ballots have been cast, but have not been counted. One could use polling data, etc to model the odds that a particular candidate has won. However, the frequentist approach doesn’t really make sense here, as the ground truth already exists (all ballots cast), so rerunning the the event doesn’t make sense.

Once again, I’m not an expert, so I recommend looking for additional explanations, if you’re interested.

sjy5y ago

> what is the die that is being rolled in these election statistics?

A program which randomly generates outcomes for each state, based on probability distributions inferred from the polls, and calculates who wins the election given those outcomes. They run the program repeatedly and report the proportion of simulated wins as the probability of winning. https://en.wikipedia.org/wiki/Monte_Carlo_method

taeric5y ago

It means if you saw all of these facts in ten different events, you would not be surprised to see a one to nine split in results. Ish. As you are scaling up, if course.

So, think of it as saying these facts basically describe a ten sided die. With no other knowledge, the best you have is that you expect it to behave the same as any other ten sided die.

aecay5y ago

It's less about pure random chance, and more about our uncertainty. Compare it to a weather forecast that says there's a 10% chance of rain tomorrow. In the same way that weather forecasts get better over time (better atmospheric measurements, more sophisticated computer models), we could potentially do more to measure what the outcome of the election will be. And it might be theoretically possible (albeit highly unrealistic) to predict it with complete accuracy, given enough data. But we're not in that situation, hence uncertainty.

(There are a couple of caveats about election forecasting as opposed to weather forecasting. The first is the "October surprise," a sudden revelation that changes the election. This cycle, it was arguably Trump's covid diagnosis, although that tended if anything to push the results further in the direction they seemed to be going on their own, rather than upset any trend. The second is that, unlike with weather systems, measuring voter behavior (and widespread reporting on these measures) can change people's behavior. The effect of this is hotly contested, but one of the many explanations of Trump's victory in 2016 which hinged on turnout in a few key states is that those states were predicted wins for Clinton, so Clinton voters didn't bother voting. Despite occasional jokes to the contrary, it doesn't rain just to spite the weatherman.)

bo10245y ago

That's a good question and it's not clear. As someone mentioned, here "chance" includes both uncertainty (facts that we don't know), and randomness of nature (things that will happen in the future that cannot be deterministically deduced from the state of the world today). Depending on your philosophy these may overlap. Next, someone mentioned Bayesian vs frequentist.

The frequentist interpretation is roughly that if I go around making my best possible predictions, and we lump together all the things that I predict at 10%, about 1 in 10 of those things happen and the rest don't. But I wouldn't be able to be more specific about which ones in that group are more likely than others.

The Bayesian interpretation is that I can really view the world as flipping coins -- I don't care whether it's due to my lack of knowledge or "true" randomness -- and as far as I can tell, the coin flip involved here is 1 in 10.

We can also use a gambling interpretation. Here's one based on security of python's random module. Imagine the following three lotteries I offer you. In lottery A, you get $100 if Trump is elected. In lottery B, you get $100 if the following python code returns true on my laptop:

    random.random() <= 0.09999

In lottery C, you get $100 if this code returns true:

    random.random() <= 0.10001

If you would rather have lottery A than B, and you'd rather have C than A, then in some sense that you believe Trump has a 1 in 10 chance.

Now there's an interesting extra layer to all of this because it's a model predicting, not a person. In a short space, I would basically say that we've trained models to predict in ways that are not inconsistent with any of the interpretations above, when put into situations where that is testable. Then we use them in situations where it might not be, like this.

StreamBright5y ago

It pretty much means nothing. These sort of models produced wrong result again and again. For me the biggest question mark if that if we know that recent (last 5) elections were very close how can you predict somebody winning with 93% chance? Maybe I do not understand something here.

2 more replies

weerd5y ago

The odds for a face on a d10 would be 1 in 9 (1:9).

It's different than probability (1/10 = .1)

1 more reply

Balgair5y ago

Nit: Have they counted for the possibility of a tie? US elections allow for a tie in the Electoral College (which then kicks off a supremely strange and legalistic process).

Jabbles5y ago

Before commenting you could at least search for the word "tie":

The probability of an electoral-college tie is <1%

1 more reply

cwhiz5y ago

I can’t speak for all models but I do know that the 538 model accounts for a tie.

mrtnmcc5y ago

FiveThirtyEight has a page where you can choose winning states (condition on a certain outcome) and it will regenerate the prediction map, https://projects.fivethirtyeight.com/trump-biden-election-ma... This appears to be the what Andrew Gelman is also trying to do with their raw data.

At the bottom of the 538 page it says, " If you choose enough unlikely outcomes, we’ll eventually wind up with so few simulations remaining that we can’t produce accurate results. When that happens, we go back to our full set of simulations and run a series of regressions to see how your scenario might look if it turned up more often."

I interpret that as running a regression (linear?) and extrapolating it out to the tail where the conditioning is happening. This should eliminate the issue Andrew is seeing?

paulgb5y ago

From the plots Andrew posted, it looks like the problem is not just sample size and that (some) individual state pairs have inverse correlations, e.g. https://statmodeling.stat.columbia.edu/wp-content/uploads/20...

mrtnmcc5y ago

I'd argue negative correlation on conditionals distributions can be reasonable here.

In that particular WA-MS example, if Trump suddenly took more liberal positions and somehow won WA (e.g., announces he's pro abortion), he would in fact be more at risk of losing Mississippi. The idea that these two states are in play already is fringe and would require some major idealogical (or other third variable) shifts.

1 more reply

noirbot5y ago

To me, this mostly tracks with what 538 has said on record about how their model works and the design philosophy behind parts of it. To me, what Nate means when he says "directionally the right approach in terms of our model's takeaways" is that these sorts of wild and unintuitive outcomes are part of the point of the way the model is constructed.

Specifically, that when you get off into the weird situations like Trump winning Washington state, it's likely something incredibly weird has happened - something that likely has no historical precedent, so it may actually be a more sane thing to do to assume that now almost everything is backwards and Biden would win a bunch of states he shouldn't either.

To me, this points to a general willingness in the 538 model to just go "who knows" and build in some room for insane things to happen on the fringes. The Friday podcast episode about the 538 model specifically mentions that they have large/fat tails on their distribution that make it nearly impossible for someone to get over 95% chances of winning on a national level, and these sorts of wild results seem like the outcome of that. If you bake in an assumption that there's always a 5% chance of something crazy happening, that chance has to come from something in the data somewhere that reflects the ability of that to happen numerically, and thus will have numerical outcomes that seem impossible.

drblast5y ago

Yeah I think for Trump to win washington state, he'd have to do something to appeal to voters there in a way that would likely cause his red state base to abandon him.

The negative correlation makes sense when we think about how difficult it is for everyone in Washington to suddenly turn conservative and everyone in Mississippi to turn liberal. Much more likely is that the crazy thing is that the candidate or circumstances changed in some way.

It makes more sense if we ask...if a candidate wins NJ what is the chance they also won AK?

Invictus05y ago

I think there's nothing Trump can do to win Washington; rather, more accurate to say there are many things Biden could do to lose Washington.

justinzollars5y ago

Every single one of these models will break down this year. We are living in an unprecedented time. I can't understand how we can model how many people will vote, when we don't even know how many people have moved out of cities this year. Half of my friends have left San Francisco - if as many people left Philadelphia, The Twin cities, Milwaukee or Pittsburgh, then that really effects the outcome.

pseudoramble5y ago

That's a neat point. My gut reaction is that there probably hasn't been enough people migrating to make a big difference. As in, I doubt enough people left CA to make it cut Republican and I doubt enough people moved to SC to make it cut Democrat. But that's just a gut reaction - I have no clue really.

It's a neat possibility to think about though. If there were enough people who did that, it would really depend on the demographics moving and where they're going. It could swing the election either way. I wonder if anyone has found numbers on this and attempted to model it.

croissants5y ago

That's an interesting thought, but San Francisco aside, it seems like most people moving out of cities are just moving to the suburbs of those cities, which shouldn't affect the presidential election calculus?

justinzollars5y ago

I know people that have moved, but forgot to register. Its a fact of life that voter registration will take the back seat when managing more complicated things in life - like a move

3 more replies

thaumasiotes5y ago

> I can't understand how we can model how many people will vote, when we don't even know how many people have moved out of cities this year. Half of my friends have left San Francisco

Moving out doesn't stop you from voting. I didn't change my voter registration when I moved from San Francisco to China. Years later, back in California, I voted in San Francisco, where I was still registered, despite residing in Hayward.

For verification purposes, they asked me when I voted what my address was. I was allowed to vote despite not knowing my own apartment number.

ceilingcorner5y ago

I've made similar points on topics like this, and bar none, every single time, it is downvoted into oblivion. People seem to have a difficult time with forecasting data that goes against their preferred outcomes.

thomasmeeks5y ago

If you'd like to not be downvoted into oblivion you'd do well to provide an alternate theory. What's your prediction for the upcoming election? Why? What's your methodology? Or is your point that forecasting is futile? I can't tell. There's too much emotion directed at 538/etc. for me to suss out what your point is besides disliking 538/the media/etc, which just isn't particularly helpful for the discussion.

1 more reply

justinzollars5y ago

I'm not trying to get points, or keep points. I'm just trying to point out that the models are not going to work this year and its crazy to assume they will.

1 more reply

an_opabinia5y ago

Andrew Gelman designed the 538 model in 2007.

Nate Silver authored an adjustment to polls used in that model. Polls have more impact if they are more representative of statewide turnout among demographic things he chose like “black” and “low income.” This is why his predictions were so accurate for Obama’s 2008 and 2012 elections, and likely why they were so inaccurate in 2016.

Gelman’s own grad student is the only person to have academically published this approach, in a paper about polling Xbox Live users.

These guys sort of make a thing that is the same in many more ways than it is different. Why not just share the code is the biggest question?

umanwizard5y ago

Nate Silver said Trump had a 1 in 3 chance, which basically means one shouldn’t be surprised no matter the result. I’m not sure where this “all the polls were so far off in 2016!!” narrative comes from, but it’s wrong.

lordnacho5y ago

It comes from innumerate journalism, and an innumerate population. Next time someone laughs off being bad at math, you should point out that being unable to read is no laughing matter, and being unable to understand numbers shouldn't be either.

The only sensible way to predict probabilities that aren't extreme is to tell people how the model works and the figures it is currently spitting out. That's is the great thing about these kinds of blog posts, people are kicking the tyres, not just looking at the car.

Nobody predicting a one-off election with a rather special candidate would summarize a 33% chance as equivalent to having no chance.

1 more reply

vanattab5y ago

The narrative comes from the medias inaccurate and misleading coverage of the polls in 2016. Many news outlets all but declared Clinton president before the election.

1 more reply

jeremyjh5y ago

That isn't the only thing that happened. Probably the 1 in 3 odds were too low given the data available, because the polling demographics were not adjusted for education. If you randomly sample 1000 people to represent several millions, you also collect demographic information to ensure that you properly weight the responses based on how skewed that demographic is in your sample compared to the total voting population. In 2016 they weren't correcting for education, which turned out to be a huge hidden variable. This is explained quite well by 538 themselves: https://fivethirtyeight.com/videos/polling-101-what-happened...

evgen5y ago

And in particular any claim 538 was the site that was off the mark compared to other prediction sites is clearly based in a reality that is not shared with the rest of us. In the week before the election Nate and crew were posting articles specifically outlining the non-zero probability of a Trump win and if it happened how it was likely to happen.

untog5y ago

Nate was the outlier in that respect. But it’s true that the polls aren’t weren’t all that inaccurate in 2016: a bunch of important swing states were within the margin of error and Trump won some important states by very small margins.

The mistake in 2016, IMO was a) the extrapolation that came from those polls and b) people paying way too much attention to national polls, which have very little connection to electoral outcomes, given the electoral college.

Also perhaps c) the larger public not “getting” statistics in the way they’ve been presented. The NYT had, if I recall, Clinton at 90% chance of winning. That still means that in one of every ten flips of a coin is a Trump win. But people read “90% chance” as “definite win”. I don’t actually know what anyone should or could do about that.

4 more replies

uoaei5y ago

It was more like 1 in 5.

1 more reply

thaumasiotes5y ago

> Polls have more impact if they are more representative of statewide turnout among demographic things he chose like “black” and “low income.” This is why his predictions were so accurate for Obama’s 2008 and 2012 elections

I find this argument strange, because black turnout was unusually high in 2008. That should have a negative impact on the accuracy of statistical adjustments, not a positive one.

an_opabinia5y ago

I think he made an estimate for the increase in black turnout. If I were designing the model, and I believed turnout is the biggest factor (maybe inconclusive among political scientists), I would look at the circumstances where turnout changes based on candidate's demographics and validate it across statewide and congressional races.

However, we will never know, because they never published the code.

1 more reply

dsaavy5y ago

I think polls being more representative of turnout amongst minorities could help indicate a potential black swan event for the election. If turnout does return to 2008 and 2012 election levels, polls featured in this fivethirtyeight article [1] indicate Trump is performing better amongst black and hispanic voters. Both demographics are seeing a 10-15% swing in support for Trump compared to 2016, which could theoretically cement swing states like Florida, Pennsylvania, and Michigan.

I don't think it's likely but if those polls are indicative of what's actually happening, we're talking about potentially a 2-4 million vote swing in Trump's favor. Here's a link to estimates of voter turnout in 2016 [2].

[1] https://fivethirtyeight.com/features/trump-is-losing-ground-... [2] https://www.pewresearch.org/fact-tank/2017/05/12/black-voter...

dllthomas5y ago

You're misreading or confused or something. The 538 piece points to a ~10% swing away from the (in the case of black voters) 82% that favored Hillary. This is very different from a swing all the way to a 10% preference for Trump. A bigger turnout by (these) minority voters, assuming they cast votes even vaguely in line with this polling, is more votes in Biden's column than Trump's. That's bad news for Trump.

1 more reply

runawaybottle5y ago

It wouldn’t be a black swan event, it would simply be a variable that got re-toggled on. As in, there was a large black turn out for Obama, and there wasn’t for Hilary. What if we turned that variable back on to true for Biden? That’s about all the rocket science involved.

We’ve seen that variable before.

Traster5y ago

> I’d think that if Trump were to win New Jersey or, even more so, California, that this would most likely happen only as part of a national landslide of the sort envisioned by Scott Adams or whatever.

That's a valid intuition to have but you can also clearly make the argument that if Trump wins California you're in such a weird scenario that using the traditional wisdom about correlation is dangerous. The point that 538 have tried repeatedly to make is that firstly: if you're conservative in your level of confidence you'll give a higher likelihood to outliers, and secondly: It's not particularly useful to focus on whether X has a 3% or 4% chance.

If Trump wins California, we aren't going to be talking about whether the chance was 3% or 0.3% we're going to be talking about that Nuclear explosion that wiped out 25million Californians.

For the same logic the reason that Trump winning Alaska given winning New Jersey is lower than given losing New Jersy is because your sample size is rubbish. The chance of Trump winning Alaska given losing New Jersey is an accurate number, the number of Trump winning Alaska given winning New Jersey is like saying "How likely is it Trump wins Alaska given the UK gains US statehood" it's like.... well... if that happens then we're so far outside of what the model thinks can happen then you should be that we're just gonna say it's 50:50 - because who the hell knows.

It's not like saying "Oh well if X swing state goes blue, Y will probably follow", the scenarios in this article are so bizarre that the model should rightly be very cautious and probably default to either refusing to give an answer or just default to 50:50 or the same probably ignoring that data. The implicit bias in this analysis seems to be that if NJ went Red that would be because Trump won by a big margin, but that's not a likely enough scenario to actually get numbers for, and is so unlikely that things like "The supreme court threw out all the ballots for inner city areas" start to become valid possibilities.

noirbot5y ago

It is worth noting that they explicitly aren't accounting for any election tampering/throwing out ballots in the model, so that final hypothetical isn't something they're factoring in. Your "an explosion kills everyone in LA" is closer to the sort of things that the model is "considering" in so much as a pile of statistical code has any understanding of what could happen in the real world to cause the outcome it's putting forward.

evgen5y ago

This is the important point here IMHO. There are two errors that the tails need to deal with, voter shifts that are missed by polling and black swan events that completely upend the table. I think that the 528 model lumps a lot of the long tail into the second category, which then basically becomes a "let's throw out most of the rules and make wild guesses" territory. There is so little worthwhile information in those tails I am really surprised that this is the focus of the disagreement.

jsnell5y ago

See also this Twitter thread from Nate Cohn (who has no dog in the fight): https://twitter.com/Nate_Cohn/status/1320042092694065153

happytoexplain5y ago

That's a pretty bold claim to make about essentially anybody in regards to the US presidential election. Not that I don't believe his account is even-handed and valuable, just that I'm curious what makes you say he has no dog in the fight.

chki5y ago

I think he is not talking about the election in general but referencing the "fight" between the 538 model/Nate Silver vs. the Economist/Andre Gelman.

1 more reply

noirbot5y ago

I think the third post in that thread really nails the underlying issue here. We, as humans, know that it would be silly to think that some configurations could happen. We know there's no reason to expect Biden to win Alabama outside of him winning nearly every state.

A statistical model only has a vague idea of context/the real world. It looks at polls (and probably not really that many polls of Alabama or Mississippi or Alaska) and sees that, statistically, Biden should win 3% of the time or so.

It doesn't have a specific world set of events in mind that would cause that, it just knows that that's how the numbers go, and thus may lead to weird circumstances in the grander results because it has to make the world match the numbers in these small corners.

vitus5y ago

One thing to note: 538 uses a t-distribution (and calls out on regularly in their podcast that this yields much heavier tails than a normal distribution). Even 40,000 samples is not enough to characterize the tails.

So, it seems to me that the entire article is predicated on a faulty conjecture, namely that 538 uses a mixture of a normal distribution with an independent heavy-tailed one. (It's not explicitly stated what the author thinks the base model is, but I think "normal" is a reasonable guess.)

I'd be interested in seeing a reverse-engineering analysis of 538's choice of distribution parameters, and extrapolation from there to see if these pathologies still arise with (much) larger samples.

...

That said, ultimately, the choice of how fat to make the tails is a modeling decision, and how the models behave outside the regime of interest isn't as important as how they behave within the operating region. There are key ways we can evaluate goodness of fit once we have results (e.g. bias, MSE) which we can use to determine just how wrong the model was as a predictor, and chances are pretty good that we won't see, say, Trump winning NJ, so we won't actually be able to validate the tail correlation with the vote in PA. But we will be able to validate the correlation in margin between PA and NJ.

Maybe 538's tails are too fat, and every prediction in the 80-95% range ends up going as predicted. Or maybe they're not fat enough, and some races in the 99% bucket end up going the opposite way. Point is, we won't know for sure which models were the best predictors until we can verify the predictions.

(see: all models are wrong, etc. Newtonian mechanics work great as long as your objects are big and slow, for instance.)

olliej5y ago

It seems like the behavior between WA and MS could just be statistics saying that WA and MS always[1] vote for the opposite candidate, rather than considering a massive sudden change in the direction that one of them votes in. E.g. it's not reflecting who they vote, just who they most vehemently disagree with.

I'm not sure why that kind of interstate correlation should impact predictions?

<incoherent rambling :D> IANAS but it feels like these correlations were added to compensate for the failure in 2016 to recognize that state A going one way implied that state B would also go that way. It "feels" like a more correct approach would be to compute some kind of error/weakness measure in a states polls by bringing in those of its geographical neighbors and incorporating the polling error of that entire block vs prior years. Or something.

The intuition I'm having difficulty conveying is that actual voting correlation is based on neighboring states only because you've got bubbles of ideology that aren't strictly cut along state lines. If strength of opinion in a bubble is going one way, then you'll see that mostly in the state at the center of the bubble, but the bubble still spreads into neighboring states, and a "stronger" bubble could push it geographically further into those neighbouring states, and/or could increase the bias in areas inside the bubble. </rambling>

[1] "Always" == most recent history

RivieraKid5y ago

> I'm not sure why that kind of interstate correlation should impact predictions?

538 has low positive correlations between states on average, which actually has a big impact, it increases overall uncertainty (and therefore Trump's win probability). Why? If the states are not correlated, you usually end up with a few states going off the rails, like Trump winning Colorado without any nationwide swing.

DavidSJ5y ago

Other way around: uncorrelated errors tend to cancel each other, correlated errors tend to reinforce each other.

1 more reply

aakilfernandes5y ago

Could this just be a result of low sampling by the author? If you reduce a sample to only include some tiny edge case, the resulting data points are going to be weird in random ways.

RivieraKid5y ago

No, there's enough data to determine all between-state error correlations.

Edit: Why the downvote? Each of the between-state correlations can be calculated from 40,000 datapoints.

yk5y ago

Somewhat interesting, however the guy lost me more and more the longer he argues. So, the various anomalies in the dataset are somewhat interesting, but having weird outliers in the margins is an entirely expected effect. Just because there are not many datapoints. So when you filter for something marginal like Trump winning New Jersey, then the statistical error increases and therefore it is entirely unsurprising that something weird happens. Thankfully, these systems are designed to work with probabilities, and these outliers are weighted down.

Additionally, getting worked up about a 3% chance of Biden winning Alabama. I mean, what does a 3% chance even mean for a one off event, compared to a 5% chance or a .3% chance? I know fully well, that it means I should bet $100 if I can get more than $3000 payout, but the trouble is that is only if we bet often enough. (Perhaps often enough on different things.) For a one off thing, the important part is, it is with a very high degree of certainty a loss of $100. So any claims that Bidens chances of winning are too high should be regarded with high suspicion.

Also, I listened eralier to Nate Silver's model talk [0], where he discusses quite a few problems with low quality polls in some states.

[0] https://fivethirtyeight.com/features/politics-podcast-nation...

RivieraKid5y ago

> Just because there are not many datapoints.

There are more than enough data points to determine the between-state error correlations, many of which seem to be very off.

> Additionally, getting worked up about a 3% chance

The weird between-state correlations actually have a large effect, they increase state and nationwide uncertainty and as a result Trump has a higher chance of winning.

noirbot5y ago

It seems more likely that it's actually the other way. Nate Silver has specifically said he built the model to have relatively high uncertainty, especially with the volatility of this year, so this seems more like the outcome of intentional decisions to not let the model be overly confident.

1 more reply

lunchladydoris5y ago

Not to nitpick, but it's Andrew Gelman. Murray Gell-Mann is the physicist.

kevmo5y ago

Haha I rolled in here to see if it was his son.

sequoia5y ago

As a US voter I am very frustrated With and tired of this obsession with election forecasting. Is it going to influence whether or not you go out and vote? If not, what is the point of it?

What value does something like fivethirtyeight add to our democracy, if any? Is this motivation the same as that of diving deep into baseball stats or Star Wars starship engineering, just like “nerding out” for its own sake?

Contrast the voter who never looks at any of these polls with one who keeps up with them daily. Is the latter voter better off in some way? Is this just about trying to read the tea leaves so you can strut and preen later about having been correct, should the dice roll be in your favor?

My concern is that these things are distracting and may actually dissuade some people from voting because they think they “don’t have to.”

Here’s an idea: everyone go vote for whoever you think the best candidate is regardless of what a stack of polls say.

Someone set me straight here, what is the point of all this stuff.

ncw965y ago

> What value does something like fivethirtyeight add to our democracy, if any?

One possible use of polls and election models is for helping people who want to donate to candidates determine which races are closest and where their money is most likely to have an impact.

> Here’s an idea: everyone go vote for whoever you think the best candidate is regardless of what a stack of polls say.

Because the US does not have ranked choice voting in most elections, polls are useful to determine which candidates are viable. If your preferred candidate is only polling at 5%, they are pretty unlikely to win, so you might want to vote instead for whichever of the leading candidates you find most agreeable.

FinanceAnon5y ago

Apart from what others have said, I think it also good to have independent polls to shine light on fraudulent elections. For example, see what happened in Belarus recently where public polling is banned.

eyelidlessness5y ago

Well, I’m not particularly affected by this (I live in a state that hasn’t been considered purple for at least a decade and probably shouldn’t have been for the decade prior), but... most election systems have some form of strategic voting behavior—either built in (all manner of ranked voting), or based on understanding of the system and being underserved by it. In the US, that could look like leftish people in VA seeing that the state is reasonably safe to do its part to oust trump, and people of the same lean in NC not seeing the same level of safety. Or rightish people in Georgia feeling less safe bucking the Rs, and so on.

In this system, there are two ways to democratically influence politics by voting:

1. Vote your preferred candidate

2. Withhold your vote from the party more closely aligned with your views, in hopes of helping shift its coalition priorities

If you fall into the second category, accurate forecasting makes a strategic difference.

rayiner5y ago

It's worth pointing out that the other thing 538 does besides political polling is analyzing the performance of sports teams. Americans are just drawn to the horse race. (See also: American Idol, which also involves voting.)

xnyan5y ago

>Here’s an idea: everyone go vote for whoever you think the best candidate is regardless of what a stack of polls say.

On the one hand, I think I get your sentiment. On the other, I mean, we are all just solitary individuals floating through this life. Almost all of our important decisions are make at least in part (or more in some cases) dependent on the thoughts and actions of others. That's natural, right? You do otherwise in your life?

If the above is true, it makes total sense why one out of 7 billion plus people would want to understand the choices of others before making theirs.

dahart5y ago

> what is the point of it?

Great question, and I share some of your concern, though I can imagine some positive framings in addition to what you wrote. For example, to use an analogy, what’s the point of trying to predict the weather, or trying to predict the stock market? There are lots of reasons including planning ahead for likely outcomes, the ability to protect against losses, and last but not least making money.

I can also imagine that the desire to talk about the potential outcomes is valuable as a social activity, and doesn’t necessarily need to meet a standard of influencing the vote, or adding to our democracy.

> My concern is that these things are distracting and may actually dissuade some people from voting because they think they “don’t have to.”

Of course if your concern is founded, this can go both ways... if the polls show the candidate you favor starting to lose, it could be a call to vote.

If polls are distracting and dissuade voters, then unfortunately election results might do exactly the same or worse. When a state has been solidly red or blue and not purple for 50 years in a row, people do (perhaps rightly so) jump to conclusions about the outcome in advance.

One question we could ask is whether, if voting were made mandatory, would election predictions go away? I’d speculate no.

l8rpeace5y ago

Diving deep into baseball statistics has actually changed how the game is played. I understand your sentiment overall, I don't agree with your sentiment because there are valid use cases for predictive analysis in polotics, but regardless G whether I agree or disagree about political polling - using baseball analytics as an example is not appropriate.

gridlockd5y ago

The point is that people want to know what's going to happen before it happens. If I believed a Democrat was sure to win, I would want to allocate my stock portfolio different than if a Republican were to win.

Also, you can make money betting on the outcome itself. If the odds you get are underpriced relative to an accurate forecast, that's a great bet to take.

Furthermore, these forecasts influence where politicians put their focus. Let's say you're Hillary in '16 and you think Wisconsin is yours despite the forecast showing a narrow lead, maybe you should reconsider.

pessimizer5y ago

There's nothing scientific about any of this trash. It's a weird conflation of the degree of the incompetence of pollsters and the degree to which opinions can be changed within a span of time. And it's completely unfalsifiable. When Trump wins, the true believers will say "We gave him a 9.684% chance of winning! It's only your ignorance that makes you think we were wrong" and they go back to poring over their race tables.

It's an orgy of false precision.

edit: the entire debate is based on the weird assumption that if a prediction about a particular state is wrong, then pollsters must have systematically gotten middle-class Hispanic women over 40 wrong, therefore the odds of other states will change. It's all based in the reification of particular categories that are axiomatically significant for their profession.

wtallis5y ago

> And it's completely unfalsifiable. When Trump wins, the true believers will say "We gave him a 9.684% chance of winning! It's only your ignorance that makes you think we were wrong" and they go back to poring over their race tables.

> It's an orgy of false precision.

The false precision is pretty obviously coming from you, not the FiveThirtyEight pages that never show more than two (or rarely three) significant figures, and emphasize in every other way they can that the numbers are approximate and uncertain. Have you seen the width of the 80% confidence intervals on their graphs?

As for falsification: all of their predictions are for testable outcomes. We'll always know soon enough who actually wins an election, and which states they won, and by what margin, and who turned out to vote. That's all public record. The only part of the post-hoc analysis that is non-trivial is figuring out how a candidate fared with specific demographic groups. It's imperfect, but between exit polling and precinct-level demographic information and election results, it certainly is possible to detect large pre-election polling errors resulting from inaccurate demographic weighting.

rsynnott5y ago

> When Trump wins

So, wait, you're offended by the election modellers making a prediction, and yet you yourself are making a prediction? What's yours based on? Time machine?

klyrs5y ago

The negative correlation between NJ and AK is curious. What I'd like to see here (and in all of these forecast sites) is some confidence analysis. If you pick Trump in NJ, looking at the plots, you've selected a tiny fraction of the data to examine. Who cares if the predicted value is wonky; show me the confidence interval!

Treating this as a tea-leaf reading (that is, deliberately searching for meaning via free association, without investing it with a truth value) I'm reminded of the "own the libs" meme. I see folks on foxnews.com comments bragging about it; I see lefties complaining about it, but I suspect that it's overblown and not actually a driver behind people's decision-making. But that's what comes up for me when I see "NJ goes Trump" forcing "AK goes Biden".

I'm amused by the resulting thought experiment... if dems started airing "socialists for Trump" campaigns in otherwise safe GOP states, would it move the needle there? Even sillier: if you aired those ads in NJ, would it move the needle in AK?

runawaybottle5y ago

This is the 2008 election result county map in NY:

https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Ne...

This is it in 2016:

https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Ne...

Long Island is really red there. It’s really hard to say how a democratic stronghold like NYC and something literally a 45 min train ride next to it could vote so differently. Long Islanders are not separate from NYCers, they commute to and work in the city.

To your question, could experiments work in similar situations like this across the country for either side? I think so in the next 50 years as demographics shift (and I don’t think it’s as simple as urban liberals taking over, people do become more conservative as they get older). God knows the dynamic at work between NYC and Long Island in 2016, but it’s obvious things are in flux.

I’ll make a bold prediction here. If Long Island is that red again, yeah, you better believe the typical rust belt states are staying red.

lordnacho5y ago

I only briefly scanned this, but it sounds like there's some correlations that turn up when conditioning on specific scenarios?

Doesn't that sound like Berkson's Paradox?

RivieraKid5y ago

I'm surprised that a lot of people almost worship the 538 model, when there is a better model with more competent people behind it. Economist is also more open about how their model works and it's open-source. After reading the methodology I was pretty impressed.

Edit: Why am I being downvoted?

abnry5y ago

While numerically literate, I don't understand the details of the 538 or economist models. What I do know is that 538's model has had a great track record. It gave Trump one of the highest chances of winning in 2016. It did very well in prior elections. And both models are essentially predicting the same results: ~10% chance of Trump winning.

ceilingcorner5y ago

How is being slightly less wrong than everyone else "having a great track record"? Serious question. Because I find it hard to take any of them seriously after the debacle that was 2016.

5 more replies

TTPrograms5y ago

For a post on a blog presumably focused on statistics the author seems oddly concerned with a model that predicts odd events as being "possible". The difference between "possible" and "impossible" is \eps << 1 - there's no real distinction to be made there in practical statistical terms.

Likewise asserting that California with a 3% chance of going Trump is absurd is an unreasonable degree of overconfidence. Assuming maximizing expected return, the author is implying that they would be willing to take a bet that Trump would lose California with odds >> 97::3, i.e. presumably they would take a bet where I bet $1 to every $99 they bet. To be critical of a model based on outcomes it predicts with tiny probability you need truly remarkably biased priors.

artichokes5y ago

I don't know if I'm reading your comment wrong but I would happily take that bet- I think most people would. Want to make it? The election result of California isn't a matter of probability; it's an empirical matter that the number of people who will vote for Biden in California far exceeds the number that will vote for Trump.

TTPrograms5y ago

To you and the other comment - as much as I dislike Taleb's rhetoric, this is precisely the sort of bet he has made a lot of money on. People round rare event probability down to zero, and if you bet against them enough with sufficiently extreme odds you'll eventually (and in expectation) hit a home run.

I would be more than happy to make this bet with anyone willing to take the other side - as in literally, find a modern middle-man system and I'm game.

You may want to be aware that you have provided me a nontrivial arbitrage opportunity, as the odds on predictit are closer to 7:93 : https://www.predictit.org/markets/detail/6611

1 more reply

Der_Einzige5y ago

I'll take that exact bet for you. I like those odds because I know that the chance of california going red is effectively 0

altdatathrow5y ago

polls and votes don’t matter when one side controls faithless electors and the supreme court. none of these models are accounting for that black swan.

phonebucket5y ago

This is undoubtedly useful to know when playing around with different scenarios.

However, for my use of 538, I’m perfectly happy to ignore such scenarios (such as Trump taking New Jersey). I can call the election in his favour by myself in these scenarios without needing the model.

asplake5y ago

An interesting read and a great concluding paragraph

newfeatureok5y ago

The thing about this election is that what if Trump himself is some sort of wildcard that can't really be properly forecasted in the polls?

Why is there so much fascination with polls to begin with? I understand that there are betting markets, but it seems sort of silly. If you had a 100% accurate poll, for instance, then what would be the purpose of the actual election?

techsupporter5y ago

> Why is there so much fascination with polls to begin with?

Polling is useful for candidates and gives them ideas on where to target outreach and spending.

For the rest of us, it gives us something to watch. With Presidential campaigns running for almost two years in advance of the actual voting weeks, there’s a huge gap between when the thing starts and when we see results. This way, people have something to fill the time. Even now that voting has started, we are still another 10 days until the voting is done and likely another 7 after that until we have a sufficient count to know who has been elected.

That’s a long time for a populace that’s worried, distracted, and interested, especially since so many of us live in states where—due to the mechanics of a broken election system—we can’t do much to influence the national outcome.

rsynnott5y ago

> The thing about this election is that what if Trump himself is some sort of wildcard that can't really be properly forecasted in the polls?

The only way that would work is if he made people more likely to lie about their voting intentions. Now, there may be something about Trump that makes polling methodologies less accurate (notably, many pollsters have started to take into account education, which turned out to be unexpectedly important last time round) but that points just to bad methodology, not inherent unpredictability.

unwoundmouse5y ago

Now that you bring that up, if we had a 100% accurate poll that would be really good for productivity wouldn't it? Perhaps it wouldn't give voters the same feeling of self-determination but it'd save a lot of resources in fundraising, going out to vote, counting votes

bigbubba5y ago

The sensation of self determination is the entire point of democracy though. We don't use democracy because we think masses of people are particularly wise; we use these systems because they feel more fair than the alternatives and that perception of fairness produces good results (peaceful power transitions.)

jedberg5y ago

I mean technically the election is a 100% accurate poll. You get data from every voter.

To run a 100% accurate poll would require you to sample every voter, so it would literally be an election.

1 more reply

ceilingcorner5y ago

The soothsayers and fortune tellers never went out of business, they just adopted a new name.

peteradio5y ago

The problem is that 538 is correctly factoring the voting fuckery done at the state level which ruins the voting correlations. Gelman seems to be modelling fair elections - ha! Now the question, how did 538 come up with the correct model which takes into account vote manipulations at the state level? /s

fisherjeff5y ago

I’ve decided to start my own election forecasting site that only ever gives 50/50 odds. Then I’ll just have to wait for the next 2016-style underdog win and my inevitable victory lap in the press as The Guy Who Called the Election.

sampo5y ago

Funny thing, can't submit this to r/politics, as they seem to have a tightly curated whitelist of allowed domains that must "Be notable, as defined by our domain notability guidelines. Notable domains will consist of news organizations, research organizations, political advocacy groups, governmental agencies / bodies, and political parties."

And apparently columbia.edu does not fulfill those criteria.

thetinguy5y ago

Did you try asking the moderators to add it to the whitelist?

sampo5y ago

In a way. They have a form for suggesting new domains to the whitelist: https://goo.gl/forms/lRQikA1rI0bVbKCl1

secondcoming5y ago

Don't bother, /r/politics is one of the most censored places on the internet. It's best avoided. Despite its description don't expect any actual adult discussion of politics there.

mathattack5y ago

The first problem with the model is it grossly missed the last election. I still think it’s good reading.

hn_throwaway_995y ago

Are you referring to the 2016 election? If so, you are wrong. 538 gave Trump a higher chance of winning than pretty much every independent pollster.

disown5y ago

So they were "very very" wrong rather than "very very very" wrong like other pollsters? I think you proved the OP right rather than wrong.

1 more reply

Ibethewalrus5y ago

nope, you're wrong.

538 gave Hillary a 71.4% chance of winning

https://projects.fivethirtyeight.com/2016-election-forecast/

3 more replies

KKKKkkkk15y ago

My biggest issue is when people say that it's a probabilistic model, and therefore it wasn't wrong in 2016 because 28% chance of winning is pretty high and you don't get probabilities. Well, guess what, this kind of model that provides a probabilistic estimate on a future event that cannot be repeated cannot be validated or falsified. It's basically junk science (if it has any aspirations of being scientific).

Marazan5y ago

If I tell you an unweighted 6 sided die only has a 1 in 6 chance of coming up 6 and we roll it once and it comes up a 6 then that is not junk science.

bena5y ago

Yeah, but what if we could never roll that particular die again?

I think that's what he's talking about.

3 more replies

bigbubba5y ago

Sounds more like gambling than science; you're making a one in six bet that you can dupe me. More seriously, I would not accept your claim with evidence as weak as that. Roll the dice a few more times and then I'll credit your claim. Rolling the dice only once when you could just as easily roll them a dozen times is junk science.

Incidentally, this trick is something magicians sometimes do. Sometimes when a trick has gone wrong they'll make a wild guess. If they're right, the audience is impressed. If they're wrong, they'll brush it aside with some joke and the audience won't notice/mind much. This works for things card guessing tricks and puedo-psychic/cold reading stuff.

stjohnswarts5y ago

So all statistics and probabilities are junk science because they can't predict the future 100% of the time? Surely you can't be asserting that...

apsec1125y ago

You can check their forecasts on Senate and House races, where we have a lot more data points.

https://projects.fivethirtyeight.com/checking-our-work/

civilized5y ago

Or... if you think for a second about what probabilities are supposed to mean, there is an obvious way to check if a probabilistic forecaster is accurate https://projects.fivethirtyeight.com/checking-our-work/

HeavenFox5y ago

This is basically the frequentist vs bayesian debate. Looks like you are firmly in the former camp :)

cwhiz5y ago

No one is purporting election forecasting to be scientific.

bigbubba5y ago

Some of your sibling comments seem to suggest otherwise.

1 more reply

j / k navigate · click thread line to collapse

234 comments

screye5y ago

I disagree with the author on the idea that tail is too fat for isolated anomalies. There are most certainly events that can happen, which may lead to a red California or a blue Alabama.

I do however agree, that 538's state-state correlation model seems weak.

yodon5y ago

I've always liked Enrico Fermi's attitude on this. When you're Enrico Fermi, you get to say things like "One data point gives you a curve. Two data points gives you the distribution about the curve."

pierrefermat15y ago

Curious is there a source for this? It is meant as satire?

1 more reply

wisty5y ago

OK, so is this your point?

nullc5y ago

Meh. If you fit a model and don't explicitly constrain against "un-physical" results like negative correlations, you'll end up with them.

RivieraKid5y ago

> If you fit a model and don't explicitly constrain against "un-physical" results like negative correlations, you'll end up with them.

The Economist model does exactly that, and all of their correlations are positive.

noirbot5y ago

It's clear that the models are tuned differently, but from Silver's replies in the PS's, it seems that he's ok with these artifacts being part of the model.

1 more reply

sangnoir5y ago

preommr5y ago

> It didn't take very long to do the analysis. But it did then take another hour or so to write it up.

It's very interesting to see how long it takes people to do things. I am amazed that entire article took 1 hour to type up. I've spent entire afternoons trying to write shallower pieces of work.

i_love_limes5y ago

taeric5y ago

I think you read the parent in the opposite direction than intended. It was praise for getting this done so quickly, by my read.

Granted, i could just be misreading this post. :)

1 more reply

chrisfrantz5y ago

It looks like it was written as a single stream of conscience. While I couldn’t write that article, if I hit a flow state and was interested in the topic, it seems possible.

jacquesm5y ago

cwhiz5y ago

Not sure if it’s wrong to put this here, but here is a link to their election forecast.

https://projects.economist.com/us-2020-forecast/president

You can compare this to the 538 model and see where these two teams and forecasts disagree.

rsynnott5y ago

Well, I don’t see how they expect anyone to take this seriously. No cartoon fox at all!

ta86455y ago

Can someone help me understand what odds like this mean in the context of an election?

aakilfernandes5y ago

You're comparing two scenarios, one in which you know all the facts, and one in which you don't.

In the dice toss scenario, we know everything relevant. In the election scenario, we don't.

A model like this is attempting to say "these are the rules we think exist. Based on the rules, and assuming the data is off by some random distribution, here's what we think could happen".

1 more reply

albntomat05y ago

Once again, I’m not an expert, so I recommend looking for additional explanations, if you’re interested.

sjy5y ago

> what is the die that is being rolled in these election statistics?

taeric5y ago

It means if you saw all of these facts in ten different events, you would not be surprised to see a one to nine split in results. Ish. As you are scaling up, if course.

So, think of it as saying these facts basically describe a ten sided die. With no other knowledge, the best you have is that you expect it to behave the same as any other ten sided die.

aecay5y ago

bo10245y ago

    random.random() <= 0.09999

In lottery C, you get $100 if this code returns true:

    random.random() <= 0.10001

If you would rather have lottery A than B, and you'd rather have C than A, then in some sense that you believe Trump has a 1 in 10 chance.

StreamBright5y ago

2 more replies

weerd5y ago

The odds for a face on a d10 would be 1 in 9 (1:9).

It's different than probability (1/10 = .1)

1 more reply

Balgair5y ago

Nit: Have they counted for the possibility of a tie? US elections allow for a tie in the Electoral College (which then kicks off a supremely strange and legalistic process).

Jabbles5y ago

Before commenting you could at least search for the word "tie":

The probability of an electoral-college tie is <1%

1 more reply

cwhiz5y ago

I can’t speak for all models but I do know that the 538 model accounts for a tie.

mrtnmcc5y ago

I interpret that as running a regression (linear?) and extrapolating it out to the tail where the conditioning is happening. This should eliminate the issue Andrew is seeing?

paulgb5y ago

mrtnmcc5y ago

I'd argue negative correlation on conditionals distributions can be reasonable here.

1 more reply

noirbot5y ago

drblast5y ago

Yeah I think for Trump to win washington state, he'd have to do something to appeal to voters there in a way that would likely cause his red state base to abandon him.

It makes more sense if we ask...if a candidate wins NJ what is the chance they also won AK?

Invictus05y ago

I think there's nothing Trump can do to win Washington; rather, more accurate to say there are many things Biden could do to lose Washington.

justinzollars5y ago

pseudoramble5y ago

croissants5y ago

justinzollars5y ago

I know people that have moved, but forgot to register. Its a fact of life that voter registration will take the back seat when managing more complicated things in life - like a move

3 more replies

thaumasiotes5y ago

> I can't understand how we can model how many people will vote, when we don't even know how many people have moved out of cities this year. Half of my friends have left San Francisco

For verification purposes, they asked me when I voted what my address was. I was allowed to vote despite not knowing my own apartment number.

ceilingcorner5y ago

thomasmeeks5y ago

1 more reply

justinzollars5y ago

I'm not trying to get points, or keep points. I'm just trying to point out that the models are not going to work this year and its crazy to assume they will.

1 more reply

an_opabinia5y ago

Andrew Gelman designed the 538 model in 2007.

Gelman’s own grad student is the only person to have academically published this approach, in a paper about polling Xbox Live users.

These guys sort of make a thing that is the same in many more ways than it is different. Why not just share the code is the biggest question?

umanwizard5y ago

lordnacho5y ago

Nobody predicting a one-off election with a rather special candidate would summarize a 33% chance as equivalent to having no chance.

1 more reply

vanattab5y ago

The narrative comes from the medias inaccurate and misleading coverage of the polls in 2016. Many news outlets all but declared Clinton president before the election.

1 more reply

jeremyjh5y ago

evgen5y ago

untog5y ago

4 more replies

uoaei5y ago

It was more like 1 in 5.

1 more reply

thaumasiotes5y ago

I find this argument strange, because black turnout was unusually high in 2008. That should have a negative impact on the accuracy of statistical adjustments, not a positive one.

an_opabinia5y ago

However, we will never know, because they never published the code.

1 more reply

dsaavy5y ago

[1] https://fivethirtyeight.com/features/trump-is-losing-ground-... [2] https://www.pewresearch.org/fact-tank/2017/05/12/black-voter...

dllthomas5y ago

1 more reply

runawaybottle5y ago

We’ve seen that variable before.

Traster5y ago

If Trump wins California, we aren't going to be talking about whether the chance was 3% or 0.3% we're going to be talking about that Nuclear explosion that wiped out 25million Californians.

noirbot5y ago

evgen5y ago

jsnell5y ago

See also this Twitter thread from Nate Cohn (who has no dog in the fight): https://twitter.com/Nate_Cohn/status/1320042092694065153

happytoexplain5y ago

chki5y ago

I think he is not talking about the election in general but referencing the "fight" between the 538 model/Nate Silver vs. the Economist/Andre Gelman.

1 more reply

noirbot5y ago

vitus5y ago

I'd be interested in seeing a reverse-engineering analysis of 538's choice of distribution parameters, and extrapolation from there to see if these pathologies still arise with (much) larger samples.

...

(see: all models are wrong, etc. Newtonian mechanics work great as long as your objects are big and slow, for instance.)

olliej5y ago

I'm not sure why that kind of interstate correlation should impact predictions?

[1] "Always" == most recent history

RivieraKid5y ago

> I'm not sure why that kind of interstate correlation should impact predictions?

DavidSJ5y ago

Other way around: uncorrelated errors tend to cancel each other, correlated errors tend to reinforce each other.

1 more reply

aakilfernandes5y ago

Could this just be a result of low sampling by the author? If you reduce a sample to only include some tiny edge case, the resulting data points are going to be weird in random ways.

RivieraKid5y ago

No, there's enough data to determine all between-state error correlations.

Edit: Why the downvote? Each of the between-state correlations can be calculated from 40,000 datapoints.

yk5y ago

Also, I listened eralier to Nate Silver's model talk [0], where he discusses quite a few problems with low quality polls in some states.

[0] https://fivethirtyeight.com/features/politics-podcast-nation...

RivieraKid5y ago

> Just because there are not many datapoints.

There are more than enough data points to determine the between-state error correlations, many of which seem to be very off.

> Additionally, getting worked up about a 3% chance

The weird between-state correlations actually have a large effect, they increase state and nationwide uncertainty and as a result Trump has a higher chance of winning.

noirbot5y ago

1 more reply

lunchladydoris5y ago

Not to nitpick, but it's Andrew Gelman. Murray Gell-Mann is the physicist.

kevmo5y ago

Haha I rolled in here to see if it was his son.

sequoia5y ago

As a US voter I am very frustrated With and tired of this obsession with election forecasting. Is it going to influence whether or not you go out and vote? If not, what is the point of it?

My concern is that these things are distracting and may actually dissuade some people from voting because they think they “don’t have to.”

Here’s an idea: everyone go vote for whoever you think the best candidate is regardless of what a stack of polls say.

Someone set me straight here, what is the point of all this stuff.

ncw965y ago

> What value does something like fivethirtyeight add to our democracy, if any?

One possible use of polls and election models is for helping people who want to donate to candidates determine which races are closest and where their money is most likely to have an impact.

> Here’s an idea: everyone go vote for whoever you think the best candidate is regardless of what a stack of polls say.

FinanceAnon5y ago

eyelidlessness5y ago

In this system, there are two ways to democratically influence politics by voting:

1. Vote your preferred candidate

2. Withhold your vote from the party more closely aligned with your views, in hopes of helping shift its coalition priorities

If you fall into the second category, accurate forecasting makes a strategic difference.

rayiner5y ago

xnyan5y ago

>Here’s an idea: everyone go vote for whoever you think the best candidate is regardless of what a stack of polls say.

If the above is true, it makes total sense why one out of 7 billion plus people would want to understand the choices of others before making theirs.

dahart5y ago

> what is the point of it?

> My concern is that these things are distracting and may actually dissuade some people from voting because they think they “don’t have to.”

Of course if your concern is founded, this can go both ways... if the polls show the candidate you favor starting to lose, it could be a call to vote.

One question we could ask is whether, if voting were made mandatory, would election predictions go away? I’d speculate no.

l8rpeace5y ago

gridlockd5y ago

Also, you can make money betting on the outcome itself. If the odds you get are underpriced relative to an accurate forecast, that's a great bet to take.

pessimizer5y ago

It's an orgy of false precision.

wtallis5y ago

> It's an orgy of false precision.

rsynnott5y ago

> When Trump wins

So, wait, you're offended by the election modellers making a prediction, and yet you yourself are making a prediction? What's yours based on? Time machine?

klyrs5y ago

runawaybottle5y ago

This is the 2008 election result county map in NY:

https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Ne...

This is it in 2016:

https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Ne...

I’ll make a bold prediction here. If Long Island is that red again, yeah, you better believe the typical rust belt states are staying red.

lordnacho5y ago

I only briefly scanned this, but it sounds like there's some correlations that turn up when conditioning on specific scenarios?

Doesn't that sound like Berkson's Paradox?

RivieraKid5y ago

Edit: Why am I being downvoted?

abnry5y ago

ceilingcorner5y ago

How is being slightly less wrong than everyone else "having a great track record"? Serious question. Because I find it hard to take any of them seriously after the debacle that was 2016.

5 more replies

TTPrograms5y ago

artichokes5y ago

TTPrograms5y ago

I would be more than happy to make this bet with anyone willing to take the other side - as in literally, find a modern middle-man system and I'm game.

You may want to be aware that you have provided me a nontrivial arbitrage opportunity, as the odds on predictit are closer to 7:93 : https://www.predictit.org/markets/detail/6611

1 more reply

Der_Einzige5y ago

I'll take that exact bet for you. I like those odds because I know that the chance of california going red is effectively 0

altdatathrow5y ago

polls and votes don’t matter when one side controls faithless electors and the supreme court. none of these models are accounting for that black swan.

phonebucket5y ago

This is undoubtedly useful to know when playing around with different scenarios.

asplake5y ago

An interesting read and a great concluding paragraph

newfeatureok5y ago

The thing about this election is that what if Trump himself is some sort of wildcard that can't really be properly forecasted in the polls?

techsupporter5y ago

> Why is there so much fascination with polls to begin with?

Polling is useful for candidates and gives them ideas on where to target outreach and spending.

rsynnott5y ago

> The thing about this election is that what if Trump himself is some sort of wildcard that can't really be properly forecasted in the polls?

unwoundmouse5y ago

bigbubba5y ago

jedberg5y ago

I mean technically the election is a 100% accurate poll. You get data from every voter.

To run a 100% accurate poll would require you to sample every voter, so it would literally be an election.

1 more reply

ceilingcorner5y ago

The soothsayers and fortune tellers never went out of business, they just adopted a new name.

peteradio5y ago

fisherjeff5y ago

sampo5y ago

And apparently columbia.edu does not fulfill those criteria.

thetinguy5y ago

Did you try asking the moderators to add it to the whitelist?

sampo5y ago

In a way. They have a form for suggesting new domains to the whitelist: https://goo.gl/forms/lRQikA1rI0bVbKCl1

secondcoming5y ago

Don't bother, /r/politics is one of the most censored places on the internet. It's best avoided. Despite its description don't expect any actual adult discussion of politics there.

mathattack5y ago

The first problem with the model is it grossly missed the last election. I still think it’s good reading.

hn_throwaway_995y ago

Are you referring to the 2016 election? If so, you are wrong. 538 gave Trump a higher chance of winning than pretty much every independent pollster.

disown5y ago

So they were "very very" wrong rather than "very very very" wrong like other pollsters? I think you proved the OP right rather than wrong.

1 more reply

Ibethewalrus5y ago

nope, you're wrong.

538 gave Hillary a 71.4% chance of winning

https://projects.fivethirtyeight.com/2016-election-forecast/

3 more replies

KKKKkkkk15y ago

Marazan5y ago

If I tell you an unweighted 6 sided die only has a 1 in 6 chance of coming up 6 and we roll it once and it comes up a 6 then that is not junk science.

bena5y ago

Yeah, but what if we could never roll that particular die again?

I think that's what he's talking about.

3 more replies

bigbubba5y ago

stjohnswarts5y ago

So all statistics and probabilities are junk science because they can't predict the future 100% of the time? Surely you can't be asserting that...

apsec1125y ago

You can check their forecasts on Senate and House races, where we have a lot more data points.

https://projects.fivethirtyeight.com/checking-our-work/

civilized5y ago

HeavenFox5y ago