If one spends time on American political Twitter, particularly whenever a national Presidential poll is the subject of the tweet being replied to, a very common refrain is something along the lines of “national polls are meaningless, the states elect the President, not the national popular vote.” This contains a kernel of truth, given a choice between knowing the result of the National Popular Vote (NPV) and knowing the result of the tipping point state ahead of time, you would obviously choose the tipping point state, that would directly tell you who won the election. But the nation is made up of states, and the incredibly large amount of high quality national polling will probably do a better job of predicting the outcome of the nation than the average state’s polling will predict that state’s polling (though ideally, you’d use a composite of high quality state and national polls to construct a national picture that was somewhere between the two). If you’re smart with how you see national polls, you can take them not just as predictors of the national popular vote, but a tool to plug into priors about the state of states to give you a rough idea of the picture in less polled states. To illustrate that purpose, I want to talk about a very simple model I made using exclusively national polls, and it’s useful as a bit of a gut check to compare to state polls.
An extremely simple polling-based model of United States Presidential Elections
First things first, we need a polling average, I spent a lot of time over the last few months thinking about how to construct these, but that requires some more complicated math, which is against the spirit of this. In addition, no matter what I put together, it won’t have nearly the amount of thought and rigor put into it as FiveThirtyEight’s polling averages, so for our purposes here, we will just treat those national polls as exogenously determined magic numbers to plug into the rest of this model. The actual calculation on our part comes from a very simple process of calculating a partisan lean for each state (plus DC and the Congressional Districts of Maine and Nebraska that get their own detached electoral votes). This is done in this model by simply taking the Democratic Presidential candidate’s margin of victory in that state and subtracting their NPV margin from that number. The fact that the partisan lean is calculated relative to the nation as a whole is important here, doing otherwise would let Obama’s big wins in 2008 and 2012 blind a 2016 model, as a for instance. Do this in every state + DC + the necessary Congressional Districts for the two elections preceding the one one being forecast. Then put together a weighted average for the partisan lean from each state as 75% from the previous election to the one being forecast and 25% to the next most recent one. You can then plug the national polling average into this and see broadly what the national polls imply about the upcoming election.
Sanity Check: What would this say in 2016?
This map here is what that model would say on the morning of election day 2016, with Hillary Clinton leading in the national polls by an average of 3.8 points, and a greyed out tossup state meaning it’s less than a 5-point race, a light shade of lean D or lean R is a race between 5 and 10 points, a medium shaded likely D or R is a race between 10 and 20 points, and solid D or R is a 20-point-or-more-race:

One thing to note here to get into the rest of this, this uses 2008 as the election with the 25% weight for the partisan leans, this was on a different redistricting cycle than 2012/2016/2020. If I felt it was particularly necessary to go into the CDs with electoral votes and look at the specific counties they contain and weighting things that way, but NE-02 and ME-02 are both close enough to their 2000s definitions here that I don’t feel that’s necessary. Now to the substantive issues.
The obvious thing to see here is that Hillary Clinton was favored to win in this, but that’s not an indictment of this approach on its own. Hillary Clinton was favored in all looks at the election that weren’t either completely detached from reality or used exclusively non-polling data as its inputs. Being extremely overconfident in Hillary Clinton is also a bit of a sin, and I think this image here would not display extreme confidence, even a flip from Iowa would’ve cost Clinton victory without winning a tossup or two. The issue here then is the geographic distribution of the electoral votes Clinton was getting here. Iowa was something this model is more confident in than Virginia or Colorado, there is a reason for this, and that is Barack Obama. Obama had a particular strength in the Midwest, he was based out of Chicago, and despite having ample opportunity as the first black President to talk a lot about race relations in very thoughtful ways, Obama focused on things like healthcare that frankly weren’t as challenging to the priors of Midwestern whites. This led to huge margins in the Midwest that gave him an electoral college advantage, Obama could’ve lost the NPV by 1.0 points and still taken the White House. Hillary Clinton, by contrast, based her campaign from New York, and had a campaign largely staffed by younger people who wanted to boost talk about social liberalism, Hillary Clinton was the first major party nominee to actually say “systemic racism” as a for instance. This plays great for urban liberals like me, and she was able to win the NPV over Trump, but it concentrated her votes in those urban areas while failing to stop the bleeding in the suburban and rural Midwest where largely sympathetic to things like universal healthcare, but frankly sort of racist and sexist, white voters rejected a campaign fighting from the left on the cultural issues instead of the economic ones. Thus, Iowa being redder than Texas after being a state Obama carried by more than the NPV just four years prior. All of that being said, this is pretty close to FiveThirtyEight’s projections in 2016, with the exception of Iowa being clearly more right wing and Virginia being clearly more left wing. So some of this is the lack of state polls in this, but a lot of it was that the state polls were also wrong in this sort of direction. I think this actually performed better in 2016 backtesting than I expected, and I think this proves its usefulness as a bit of a gut check about the national state of the race.
What things look like in 2020
Biden is currently up 10.5 in the FiveThirtyEight national polling average. This is what this corresponds to on a map with 2016 weighted 75% and 2012 weighted 25%.

This is pretty damn close to a state poll based model. It’s a bit more bullish on Biden in Florida and bearish on Biden in Arizona than is really implied by the polling in each state. But otherwise this checks out. The tipping points are in the trio of Michigan, Pennsylvania, and Wisconsin, and Biden has a pretty big lead in them. That being said, it’s the same as the leads (well, actually a little more if you look at the spreadsheet, but the same on the map) as Hillary had in the previous model because of assumptions of a Democratic electoral college advantage that turned into a disadvantage. Is this something to note? I think it is to an extent, it’s important to realize that Trump can still win, but it requires both a shift in polling and a roughly 2016 sized polling error in the states that matter again, and given that polls don’t have a predictable direction of error from cycle-to-cycle, that’s a risky thing to have to rely on. I think it’s also worth noting that Biden just has a more stable base of states in his column here, VA and FL are lean D here, along with a tossup Arizona that based on state polls is probably lean D. Being up double digits nationally gives you a lot of paths to victory that are demographically distinct from each other, losing Michigan may mean that you’ll also lose Wisconsin, but it doesn’t say nearly as much about whether you’ll lose Arizona. The other thing to mention here is of course tossup Texas, Trump has generally led polls in Texas, but by a narrow enough margin that a fairly small error could result in a Biden win. Texas isn’t getting the highest quality polls either because it’s both hard to poll and not terribly important for the result of the election, if it’s particularly close in Texas, Biden will have already won. FiveThirtyEight puts the chance of this upset at about 30%, which feels about right. This national polls set up says Biden would need to win the NPV by about 13 points to pull it off, which is within the realm of possibility, if unlikely.
Conclusion
Why did I write this anyway? It feels like I’m making a lot of very banal points. Well, this really was meant as a bit of a response to “national polls are meaningless” people, to show that they can provide a good base level sanity check if other things are leading in confusing directions or just feel too complicated in general.
I might also post updates of the 2020 version of the map if the polls fluctuate noticeably between now and election day, though I should make it clear I make no claims to that being a particularly great election forecast, merely a very simple one that’s easy to understand. But it’s not probabilistic, doesn’t take into account nearly enough information, and assumes the election will be held immediately. Still, it’s useful.

















