Joe Biden became President on January 20th, you may have noticed. Organizations often ask Americans whether they approve of the incumbent President. These polls are published in separate places and are more useful when they are put together in one place and averaged.
Well, I am introducing the Biden Approval Ratings Project. An open source polling average for Joe Biden’s approval ratings. I did some work in the past with polling averages in Excel for smaller-scale use, but now I think I can justify making this a regularly updated feature as a Python programmed product. There are still many issues with it, I have to manually do data entry to update it because of one Nathaniel Argent not doing the hard work of poll aggregation for me yet, and I could see an issue with the way I throw out older polls from the same pollster if two repeat pollsters publish on the same day (though I think there are not-too-bad ways to fix this).
As I write this: Biden’s approval rating stands at 55.2%, and I think that might go up as more high quality pollsters enter the fray. I will be writing proper documentation for it and hopefully making the program a bit more automated over time.
Happy new year fellas, we can put 2020 behind us, and importantly for this blog, a new American Presidency starts, as well as a new Congress, the 117th, to go with it.
Over the course of the next two years I want to turn this place into much more of an old school 2000s blog, with frequent updating prioritized over long posts. I want to focus on tracking the actual political work being done on Capitol Hill and in the White House, looking at bills passed, nominations confirmed, and executive orders signed, and evaluating how they will actually impact America and the world, not just the partisan fights getting highlighted on Twitter.
I want to continue to work on psephological models ahead of 2022, my goal is to be able to do a real probabilistic Senate forecast for that year’s midterms.
I also want to put in some video work, maybe a weekly update that’s mostly just me talking into a camera, I don’t know, we’ll see.
I will see people back here on January 3rd, where I will try to kick things off with a reflection of what has happened since the election and what to expect in the weeks ahead in the new Congress and new Biden administration. And as a bonus on January 2nd, you can expect a weirder article to attempt a project in interested in.
This is just a short little post to say that I’ve decided to start working on some research for how close polls tend to track to results in Senate races, something that probably won’t be totally finished before the election and is more of a backburner thing that could be deployed for a 2021 special election or something. That being said, the first state I looked at with this was New York, using 538’s collection of polls dating back to 1998, I put together very basic polling averages (adjusted for date, sample size, and partisan sponsorship, but not actual proven quality) for each state and measured how the Democratic Senate candidates (Chuck Schumer, Hillary Clinton, and Kirsten Gillibrand) in New York did compared to their poll lead on election night. Here’s a quick graph I did up in R of the results of that:
The correlation coefficient here is 0.96, though that doesn’t quite explain the fact that Democrats do seem to reasonably consistently overperform polls in New York, I’d need to look at other very blue states to say if this is just noise or not though. And there was one Democrat to underperform polls, but that was Chuck Schumer’s 2004 campaign which had only one poll measuring it.
I will do more research into these numbers and probably tweet about it, but unsurprisingly polls are undefeated at predicting Democratic victories in very Democratic states (though Schumer was only up by about 3 points in my polling average in 1998 when he in fact won by about 10, that was notably the last competitive Senate race in the state of New York). I also might update this post itself with additional findings as time goes on.
Update: Twitter thread that goes a bit more in detail is here:
If one spends time on American political Twitter, particularly whenever a national Presidential poll is the subject of the tweet being replied to, a very common refrain is something along the lines of “national polls are meaningless, the states elect the President, not the national popular vote.” This contains a kernel of truth, given a choice between knowing the result of the National Popular Vote (NPV) and knowing the result of the tipping point state ahead of time, you would obviously choose the tipping point state, that would directly tell you who won the election. But the nation is made up of states, and the incredibly large amount of high quality national polling will probably do a better job of predicting the outcome of the nation than the average state’s polling will predict that state’s polling (though ideally, you’d use a composite of high quality state and national polls to construct a national picture that was somewhere between the two). If you’re smart with how you see national polls, you can take them not just as predictors of the national popular vote, but a tool to plug into priors about the state of states to give you a rough idea of the picture in less polled states. To illustrate that purpose, I want to talk about a very simple model I made using exclusively national polls, and it’s useful as a bit of a gut check to compare to state polls.
An extremely simple polling-based model of United States Presidential Elections
First things first, we need a polling average, I spent a lot of time over the last few months thinking about how toconstructthese, but that requires some more complicated math, which is against the spirit of this. In addition, no matter what I put together, it won’t have nearly the amount of thought and rigor put into it as FiveThirtyEight’s polling averages, so for our purposes here, we will just treat those national polls as exogenously determined magic numbers to plug into the rest of this model. The actual calculation on our part comes from a very simple process of calculating a partisan lean for each state (plus DC and the Congressional Districts of Maine and Nebraska that get their own detached electoral votes). This is done in this model by simply taking the Democratic Presidential candidate’s margin of victory in that state and subtracting their NPV margin from that number. The fact that the partisan lean is calculated relative to the nation as a whole is important here, doing otherwise would let Obama’s big wins in 2008 and 2012 blind a 2016 model, as a for instance. Do this in every state + DC + the necessary Congressional Districts for the two elections preceding the one one being forecast. Then put together a weighted average for the partisan lean from each state as 75% from the previous election to the one being forecast and 25% to the next most recent one. You can then plug the national polling average into this and see broadly what the national polls imply about the upcoming election.
Sanity Check: What would this say in 2016?
This map here is what that model would say on the morning of election day 2016, with Hillary Clinton leading in the national polls by an average of 3.8 points, and a greyed out tossup state meaning it’s less than a 5-point race, a light shade of lean D or lean R is a race between 5 and 10 points, a medium shaded likely D or R is a race between 10 and 20 points, and solid D or R is a 20-point-or-more-race:
One thing to note here to get into the rest of this, this uses 2008 as the election with the 25% weight for the partisan leans, this was on a different redistricting cycle than 2012/2016/2020. If I felt it was particularly necessary to go into the CDs with electoral votes and look at the specific counties they contain and weighting things that way, but NE-02 and ME-02 are both close enough to their 2000s definitions here that I don’t feel that’s necessary. Now to the substantive issues.
The obvious thing to see here is that Hillary Clinton was favored to win in this, but that’s not an indictment of this approach on its own. Hillary Clinton was favored in all looks at the election that weren’t either completely detached from reality or used exclusively non-polling data as its inputs. Being extremely overconfident in Hillary Clinton is also a bit of a sin, and I think this image here would not display extreme confidence, even a flip from Iowa would’ve cost Clinton victory without winning a tossup or two. The issue here then is the geographic distribution of the electoral votes Clinton was getting here. Iowa was something this model is more confident in than Virginia or Colorado, there is a reason for this, and that is Barack Obama. Obama had a particular strength in the Midwest, he was based out of Chicago, and despite having ample opportunity as the first black President to talk a lot about race relations in very thoughtful ways, Obama focused on things like healthcare that frankly weren’t as challenging to the priors of Midwestern whites. This led to huge margins in the Midwest that gave him an electoral college advantage, Obama could’ve lost the NPV by 1.0 points and still taken the White House. Hillary Clinton, by contrast, based her campaign from New York, and had a campaign largely staffed by younger people who wanted to boost talk about social liberalism, Hillary Clinton was the first major party nominee to actually say “systemic racism” as a for instance. This plays great for urban liberals like me, and she was able to win the NPV over Trump, but it concentrated her votes in those urban areas while failing to stop the bleeding in the suburban and rural Midwest where largely sympathetic to things like universal healthcare, but frankly sort of racist and sexist, white voters rejected a campaign fighting from the left on the cultural issues instead of the economic ones. Thus, Iowa being redder than Texas after being a state Obama carried by more than the NPV just four years prior. All of that being said, this is pretty close to FiveThirtyEight’s projections in 2016, with the exception of Iowa being clearly more right wing and Virginia being clearly more left wing. So some of this is the lack of state polls in this, but a lot of it was that the state polls were also wrong in this sort of direction. I think this actually performed better in 2016 backtesting than I expected, and I think this proves its usefulness as a bit of a gut check about the national state of the race.
What things look like in 2020
Biden is currently up 10.5 in the FiveThirtyEight national polling average. This is what this corresponds to on a map with 2016 weighted 75% and 2012 weighted 25%.
This is pretty damn close to a state poll based model. It’s a bit more bullish on Biden in Florida and bearish on Biden in Arizona than is really implied by the polling in each state. But otherwise this checks out. The tipping points are in the trio of Michigan, Pennsylvania, and Wisconsin, and Biden has a pretty big lead in them. That being said, it’s the same as the leads (well, actually a little more if you look at the spreadsheet, but the same on the map) as Hillary had in the previous model because of assumptions of a Democratic electoral college advantage that turned into a disadvantage. Is this something to note? I think it is to an extent, it’s important to realize that Trump can still win, but it requires both a shift in polling and a roughly 2016 sized polling error in the states that matter again, and given that polls don’t have a predictable direction of error from cycle-to-cycle, that’s a risky thing to have to rely on. I think it’s also worth noting that Biden just has a more stable base of states in his column here, VA and FL are lean D here, along with a tossup Arizona that based on state polls is probably lean D. Being up double digits nationally gives you a lot of paths to victory that are demographically distinct from each other, losing Michigan may mean that you’ll also lose Wisconsin, but it doesn’t say nearly as much about whether you’ll lose Arizona. The other thing to mention here is of course tossup Texas, Trump has generally led polls in Texas, but by a narrow enough margin that a fairly small error could result in a Biden win. Texas isn’t getting the highest quality polls either because it’s both hard to poll and not terribly important for the result of the election, if it’s particularly close in Texas, Biden will have already won. FiveThirtyEight puts the chance of this upset at about 30%, which feels about right. This national polls set up says Biden would need to win the NPV by about 13 points to pull it off, which is within the realm of possibility, if unlikely.
Conclusion
Why did I write this anyway? It feels like I’m making a lot of very banal points. Well, this really was meant as a bit of a response to “national polls are meaningless” people, to show that they can provide a good base level sanity check if other things are leading in confusing directions or just feel too complicated in general.
I might also post updates of the 2020 version of the map if the polls fluctuate noticeably between now and election day, though I should make it clear I make no claims to that being a particularly great election forecast, merely a very simple one that’s easy to understand. But it’s not probabilistic, doesn’t take into account nearly enough information, and assumes the election will be held immediately. Still, it’s useful.
I issued NB election ratings based on a quantitative model I privately made, it did pretty well. I talked about the mechanics of how it worked and its strengths extensively in this tweet thread, so I won’t go over all of that again, instead I want to talk about where I failed and what steps I plan on taking on this topic in the future.
The NB election results are in: PCs with 27, Libs with 17, Greens with 3, and the PA with 2. What did my model project? Exactly that.https://t.co/8OFJF9ltfH
I would say I really screwed up in one specific electoral riding, Shippagan-Lamèque-Miscou, a riding with a PC winner previously who defected from the party, but I had it still as Likely PC, instead it went big for the Liberals. I think this suggests that I need to do some amount of demographic regressions in future models to realize that, “hey, Francophone districts will tend to be way more Liberal on average”. That’s a specifically New Brunswick problem, but it’s probably worth carrying over in districts where, say, a Blue Dog Southern Democrat type is retiring and it’s an open seat, in case that evermight be a problem.
In terms of what I want to do moving forward, I was a bit upset with having to rely on someone else’s pollster ratings for this analysis, so the main thing I will be working on is a pollster ratings algorithm, first in America just to get a sense of how to do it in a more data-rich, two-party environment, then applying it to Canada, where the data will be broadly more useful as a public resource in a more sparsely analyzed area.
That’s all for now, but I hope people who do read this blog regularly enjoyed seeing me get things pretty close to all correct in this election. It was fun for me as a (probably not actually that impressive) self-confidence booster.