How a 4-year-old startup will call the winners on election night
Another, newer voter data firm will also be hard at work on election night—Washington D.C.-based Decision Desk HQ. DDHQ’s race call team may make some of the toughest, and most important, calls in the firm’s four-year history Tuesday night.
“We’re not picking a stock here that may or may not lose us thousands or millions of dollars . . . we’re calling an election that potentially 150 to 160 million Americans are going to vote on, and it’s a highly emotional election,” says Scott Tranter, the veteran political data scientist who leads DDHQ’s race call team. “It’s an election with lots of misinformation and heated rhetoric, so getting this wrong is a big deal.”
That’s probably an understatement. This year, election officials will process perhaps 10 times the number of mail-in ballots as they’re used to because of the coronavirus. One of the candidates, President Trump, has declared that he’ll consider the election illegitimate if he loses, probably claiming fraud or other issues with mail-in ballots. Some experts worry that a winner will be called prematurely on election night before all the mail-in ballots have been counted. In the current highly charged political environment, an incorrect race call that’s later reversed could have frightening effects.
On the other hand, there are scenarios in which the presidential race could be accurately called on Tuesday night or early Wednesday morning. If Donald Trump doesn’t win Florida, for example, his chances of winning narrow sharply, and it becomes possible for Biden to reach 270 electoral votes from states reporting on election night.
Tranter and his six-person team of academics and veteran political operatives have been preparing their models and practicing various election night scenarios for months now. On November 3, they’ll be furiously crunching numbers to bring the first news of the winners to client publications like The Economist, BuzzFeed, and Vox. While it would be advantageous for DDHQ to crunch numbers and correctly call races faster than the AP and Edison Research, Tranter tells me it won’t do so until the chance of being wrong approaches zero.
How to call an election
In normal times, DDHQ staffers would be all together in a room looking at a bunch of computer screens. This year half of them will be working remotely because of the coronavirus. But their work will be largely the same. After the first polling places close at 7 p.m. on the East Coast on Tuesday, voting precincts up and down the Eastern Seaboard will begin tabulating and transmitting their vote tallies to county election officials.
Like its competition at the AP and Edison, DDHQ has its own method of pulling in all the election data being released by counties and states. It contracts with hundreds of stringers who pick up the vote tallies from county election offices in person and then enter the data into DDHQ’s system using a smartphone app.
Some counties in the Northeast call or fax in their tally data. A growing number of counties post their results to a website, which DDHQ automatically scrapes throughout the night. Tranter calls that scraping system DDHQ’s “secret sauce,” because it allows the team to get the live results data into its projection models faster, and ultimately allows them to call races faster.
As the data flows in on election night, the DDHQ team works to clean it, looking for errors, do-overs, or other problems. And it’s a lot of data. DDHQ will be aggregating data on more than 500 federal and state races Tuesday night, including all Senate and House races, gubernatorial races, and, of course, the race for the presidency.
Some of DDHQ’s clients buy only the aggregated race data, and use their own in-house team and their own data models to call races. Other clients, like BuzzFeed, have no such team and rely on DDHQ’s people and models to make the calls.
Hundreds of races, hundreds of spreadsheets
Once the data is clean, it populates into hundreds of DDHQ spreadsheets. Each individual race has its own set of spreadsheets, which feed top-level results data into a series of dashboards. These provide a way for the DDHQ team to glance at the progress of the most important races throughout the night.
Tranter showed me one of the spreadsheets his team will use to track the results of the contest for a Senate seat in North Carolina between Democratic challenger Cal Cunningham and the GOP incumbent Thom Tillis. The large spreadsheet has rows for each county in the state, and perhaps 20 columns representing various aspects of the result. These include the vote tallies for each candidate in the race, percentage counts of votes recorded by the precincts in each county, and benchmark numbers showing how well a candidate is doing relative to past performances by candidates from their party. Other columns represent various scenarios. One column estimates the probability that the second-place candidate can come back and win by the end of the night. As live results data comes in, the values in all of these columns change.
“As we get data coming in at the county level, the model refreshes and remodels itself and outputs new estimates going forward,” Tranter says.
DDHQ creates turnout models that reflect low, medium, and high levels of voter participation.
But the vote tally numbers tell only of the votes already counted. To safely call a race, DDHQ needs to know how many votes have yet to be counted. It must be convinced that there are not enough available votes for the second-place candidate to capture the lead before the last vote is counted.
“DDHQ has come up with this methodology to create projected turnout,” Tranter says, “so that when we’re communicating to these publications . . . we can say, ‘Hey . . . Joe Biden has 12 votes, Donald Trump has 8 votes, and we think there are 6 votes left.’”
Developing these turnout models requires some real tradecraft, and it begins long before election night. The models derive from a wide variety of data sources. DDHQ licenses the voter files from every state, which show who is registered to vote, their party affiliation, and some voting history to suggest the likelihood that they’ll vote in the current election.
The turnout models also factor in the demographic qualities of voters in a county, including factors such as ethnicity, income levels, and education levels. They also consider turnout patterns seen in previous elections. For example, one of the main drivers of Trump’s victory in 2016 was that working-class white males turned out in high numbers. There’s reason to believe that will happen again in many places in the current election.
From this data, DDHQ creates turnout models that reflect low, medium, and high levels of voter participation. Those static turnout models are used as a starting point on election night. The numbers in those models get updated with live data on election night so that they paint an increasingly accurate picture of the real-world turnout. Tranter says that DDHQ developed its turnout models in cooperation with the engineering team at The Washington Post.
The demographic information included with the voter turnout models can make a big difference when the time for calling a race draws closer. Tranter provides an election night hypothetical in which precincts in all but a few counties in a key state such as Florida have already reported their vote tallies.
“It’s a heavy African American county and these are the last 10,000 votes,” Tranter says. “Do we really think Donald Trump can overtake [Biden]?” In that case, the demographic data could have a direct impact on when the race is called. It could mean the difference between declaring a winner on election night or waiting.
Good news, bad news
The huge role played by mail-in ballots this year is both good and bad news for DDHQ and other tabulators, Tranter tells me.
The obvious bad news is that in some important states—like Michigan, Pennsylvania, and Wisconsin—election officials may need several days to count all the mail-in ballots. Each of these states is prohibited from counting absentee ballots before Election Day (although Michigan passed a law saying some election offices may start “processing” them on November 2). In close races, it may be impossible to safely make a call on election night, and DDHQ and other tabulators will be forced to wait until the vote counting is done.
In addition, absentee ballots will make it harder to know for sure how many ballots remain outstanding on election night, because in some states they’re not necessarily associated with a particular precinct. “These counties will say all the precincts are reporting, yet they still have . . . 100,000 votes left to account for because they’re not allocated to a precinct,” Tranter says. This increases DDHQ’s reliance on the assumptions baked into its turnout models.
On the other hand, the preponderance of both mail-in voting and early voting gives DDHQ a chance to get some early clues on the accuracy of its turnout models. For example, the data could provide an early tipoff to DDHQ that some demographic group is coming out in high numbers and may have an outsize influence on the eventual outcome of the election. DDHQ gets data dumps from the states every day showing who has voted (by Wednesday of this week, 75 million people had already voted). This data doesn’t show the voter’s selections, only that they returned a ballot.
The DDHQ team also conducts something like an exit poll. It calls voters or reaches them online, and asks who they voted for. Some will answer, some won’t. But for the ones who do, DDHQ can match their choice with their demographic data from the voter file, and that can be meaningful. “We use that information to give us an idea of where the race is going,” Tranter says.
The big night
On election night, the DDHQ team divides all the races into two groups. The first group contains races that are being closely watched, or races that are likely to be close calls. Cunningham’s Senate bid in North Carolina will be in that group, along with about 12 other Senate races that will determine whether Democrats can flip the Senate blue. Naturally, the presidential race will also be in that group. A second group contains all other races, including House races that aren’t expected to be close, and lesser-known gubernatorial races.
Each race is assigned to a group of three of DDHQ’s six team members. Throughout the night, they’ll be watching dashboards that summarize the completeness of the race data. Tranter says that team members are trained to move through the various races at a quick pace—four to five races every minute. They’re just looking at the top line numbers and determining whether races may be ready to call.
“As this data comes in across all these hundreds of races, we can pick out one and say, ‘Hey, this race has brand-new data and it’s close; it’s worth opening up the North Carolina Senate spreadsheet,’” Tranter says.
The models themselves don’t call the race. It takes one team of three people to look at the data and agree that there’s enough evidence to be 99% certain of the ultimate winner. If one person thinks it’s too risky, they move on.
Not every race will be a high-pressure call. Tranter says the teams will make some “insta-calls” at the beginning of the night for races that aren’t close. “So what we’re waiting for is just the first post of the night to make sure that there isn’t some ridiculous surprise out there,” Tranter says. If staffers see the data moving in the expected direction, they can call it early on.
As the evening progresses things get more interesting—and more difficult. The truth about the turnout estimates and the results of the high-profile races begin to come into focus. “As we get into the core of the evening—9:30 or 10 o’clock Eastern until about 3 a.m. Eastern—that’s when a whole bunch of polls will be closing,” Tranter says. Starting on the East Coast and moving to the West Coast, the numbers begin rolling in.
To get to 99% sure, you better have a really good turnout estimate that you trust, and you better be the fastest tabulator.”
Scott Tranter, DDHQ
As more and more precincts report, county by county, state by state, more and more data becomes available to reality-check DDHQ’s turnout model. Toward the right side of the spreadsheet DDHQ showed me, there are columns that show the remaining votes outstanding for each of the low, medium, and high turnout estimates. Next to each of those columns is another column showing how many of those remaining votes the second-place candidate would need to catch up. At some point that column begins showing that in various counties the number-two candidate would need to capture more than 100% of the remaining votes to win, making the feat impossible.
Once numbers like that start showing up, it may be time for a three-person conversation about whether to call the race. The discussion weighs the chance of being wrong against the desire to call races quickly. Later in the night, for high-profile races, this can require DDHQ to make big decisions at a rapid pace.
“If you’re heading up a call team you want be 99% sure—that’s what we’re waiting for,” Tranter says. “The key is, to get to 99% sure, you better have a really good turnout estimate that you trust, and you better be the fastest tabulator.”
Tranter tells me that if his team, the AP’s team, and the Edison team each had a month to call a given race they’d all get to the same answer. Ultimately each team will access the same set of inputs to get them to the bottom of the math problem. Capturing those inputs sooner, and accurately modeling unknowns such as total turnout, are the keys to calling a race first.
If on Tuesday night you see Florida being called first on The Economist, BuzzFeed, or Vox, you’ll know that DDHQ’s formula worked.
(28)