With summer turning to fall and Election Day less than a month away, Texans following the gubernatorial election and other less prominent but consequential statewide races might already feel bombarded with polling results. There are already more than two dozen listed in the Texas Politics Project 2022 Poll Tracker, and there will still be plenty more before the final votes are cast.
With some of those surveys dating back to the spring of 2021, polling provides a record of the trajectory of attitudes toward the candidates over the breadth of the campaign. But now that we’re in the final stretch of the race, the difference between the preferences of a broader universe of Texans (like registered voters) and estimates of the smaller universe of voters who will actually cast a ballot comes into focus among pollsters. This focus manifests in estimates of the preferences of “likely voters” in poll results and, subsequently, in media coverage of those results.
Rarely is it made clear to the public consuming surveys what to make of these “likely voters” or even who, exactly, they are. Which is why it’s so important, with the election just around the corner for a quick explainer (“likely voter” surveys produce estimates of what people that pollsters think are likely to vote say they intend to do) and an important reminder: without transparency from the pollster, in particular, an explanation of how they arrived at their definition of a likely voter, there’s no way to evaluate those survey results.
So just what is a likely voter? While the question might seem simple, the answer is not, starting with the point that there is no single definition of a likely voter. And to make things more complex (though necessarily so), a likely voter survey is not a survey of the electorate.
Let’s start with the second point, which might be the more confusing of the two (or at least the more conceptual): Isn’t there a lot of false advertising out there if a likely voter survey is not a survey of the electorate?
To understand this point, it’s good to revisit what a public opinion poll is (and is not). Public opinion surveys seek to measure the attitudes of a specific population of people defined at the outset of a survey research project by measuring and aggregating individual attitudes and opinions from among a representative sample of that population. To accurately sample a representative group from a targeted population, pollsters need to know the relevant features of that population, features that often include the joint distribution of age, race and/or ethnicity, and gender. Pollsters need this type of information to ensure that the features of their sample match those same, relevant characteristics in the population. This is key to making an accurate estimate of the preferences, including vote choice, of any group. With this information and a way to reach respondents, in theory, one can survey any defined population – say all adults living in Texas or all registered voters living in Harris County.
Given the necessity of this kind of information for making accurate estimates of preferences, including in the context of an election, we’re immediately faced with a challenge: What share of the 2022 Texas electorate will be Hispanic, female, college educated? What share will come from the state’s seven largest counties? What share will be under 30? The answer to these and other questions like them is a simple one: we don’t know. We have some fairly well-informed guesses based on the results of past elections. But unlike our knowledge of, say, the adult population of Bexar County; Austin, Texas; or the entire state, each informed by U.S. Census figures, this kind of information about the 2022 electorate doesn’t exist because the 2022 electorate doesn’t exist...yet. “The electorate” won’t exist in a definable way until voting closes at the end of November 8, 2022 — including all those persons who voted early, by mail, and in-person on Election Day.
While likely voter surveys may be received (and even reported) as predictions about the vote preferences of the electorate, in fact, these surveys are actually estimating the electoral preferences of a subset of registered voters who the pollster has determined meets an arbitrary (if usually well-reasoned) threshold that flags that respondent as “likely” to turn out in the upcoming election.
It follows that pollsters need to screen for likely voters from among registered voters for two reasons: first, it’s safe to assume that turnout in the 2022 election will be less than 100% of registered voters; second, it’s reasonable to assume that the attitudes of those who are engaged enough to turn out to vote might differ from those who, for one reason or another, are registered to vote, but not terribly engaged. In addition to these general assumptions, turnout in Texas in 2022 is anyone’s guess after some wild deviations from the norm over the last few election cycles.
At this point, it may be worth addressing why pollsters even report the opinions of registered voters at all. While the answer to this question is a longer one that touches on normative ideas about representation and the limitations of democracy, the more relevant answer relates to the on-going discussion: just as the 2022 Texas electorate doesn’t actually exist until all the votes are counted, likely voters don’t necessarily become likely voters in January of an election year. For many (if not most) perfectly normal people, the election season doesn’t begin until the fall. This is mirrored by the campaigns, who, while active for much of the 2 years (or more) leading up to an election, save most of their advertising dollars and mobilization efforts for the final few months when those efforts are most likely to provide the greatest payoff. This is when voters become more or less likely to vote if their vote is not all but certain, and when pollsters begin to shift their focus to these likely voters.
The threshold used to determine whether a survey respondent is a likely voter follows no set standard, though some types of rules are more common than others. This may sound a little haphazard, but two points should alleviate this to some extent. The first is the point made above: the electorate doesn’t exist, so we can’t make direct estimates of its opinion, therefore there is no clearly dominant approach to produce the “best” group of likely voters from among registered voters; largely because, second, each election cycle is different, some with high turnout, some not; some drawing in many new voters, some dominated by long-time, habitual voters. And as you’ll see below, different approaches might be expected to produce different likely voter samples that could emphasize the views of one group over another. These samples may reflect the final demographic characteristics of the electorate quite well, or not. These samples may produce a vote estimate that mirrors the final result quite well, or not. And these two indicators of “accuracy” may or may not work together in tandem, though we would like them to.
Pollsters generally have three choices at their disposal when it comes to defining a likely voter: take respondents at their word, consider their prior behavior, or model their likelihood to turn out and apply a rule that defines when that estimated likelihood is sufficiently high to classify a respondent as a likely voter. What should be known up front is that any approach is, to a great extent, subjective, based on some mixture of judgment, analysis, and experience.
When pollsters take a voter at their word, they choose from among a number of possible questions asked of the respondent about their likelihood to vote, interest in the election, interest in politics more generally, reported past voting behavior, or some combination of any or all of these. The limitation in self-reporting is that many people say they vote more regularly than they do, that they’re more engaged than they are, and/or overstate their intention or likelihood to vote (because, normatively, they know they should). This is known based on comparisons of registered voters’ stated behaviors to actual turnout rates. In short, attitudinal questions alone tend to overstate potential participation. However, if the electorate is likely to have a significant number of new voters, it is essential to rely on stated intention to some degree when, for a significant share of the electorate, past behavior may be absent or not as predictive of future behavior based on electoral conditions.
But in general, past behavior tends to be predictive of future behavior. And while the choices that voters make in the voting booth are secret, the act of voting is not. Therefore, another approach to defining likely voters uses a respondent’s own voting history to inform a pollsters’ classification. For example, a likely voter may be defined as only those registered voters who have voted in each of the last two elections (2020 and 2018) or who voted in the 2022 primaries. This, for sure, is a good way to find likely voters, but, in a reversal of the point made above, a reliance solely on past behavior could lead to underrepresentation of new voters entering the electorate if those voters make up a significant share, or a significantly increased share, of that electorate. Too great a reliance on past behavior also increases the likelihood of missing a significant shift in the composition of the electorate, if, for example, the issue environment or a provocative candidate increases the participation rate and in turn vote share among a key group or groups of voters to the advantage or disadvantage of one candidate or the other.
In addition to these relatively simple, though surprisingly effective, approaches, one can combine information learned about turnout from prior elections with survey and demographic data validated against verified turnout in state records to create a statistical model that incorporates a number of factors to identify a likely voter from among a sample of registered voters.
The factors in these models might include some already discussed as well as other voter characteristics, like attitudinal questions about interest, engagement, and intention along with past voting history, as well as socioeconomic, geographic, or other, more elaborate considerations. While these models can be very accurate, they are also likely to be influenced by the assumptions of the model and the modeler. And when it comes to publicly released polls, the construction of these models can be difficult, if not impossible, to explain to a general audience.
What are consumers of public polling to do then with this information about likely voters?
First, one should probably disregard (or at least remain skeptical) of any poll that makes limited or no effort to explain how they arrived at their likely voter sample, including how they defined “likely voters,” when reporting the results of a likely voter poll. This is basic transparency (and good practice), without which one can’t know who was surveyed.
Once basic disclosure and reasonable skepticism have been satisfied, take the results reported by polls and pollsters in the last few months of the campaign for what they are: estimates of what people that pollsters think are likely to vote say they intend to do. These are usually pretty good estimates of the outcome when considered together, but they’re not an estimate of the electorate. By the time we can accurately estimate the attitudes of the 2022 electorate, we’ll already be talking about 2024.
We find it useful to raise this issue on a bi-annual, or near bi-annual, basis with each passing election cycle. To see how our thinking has evolved, here are past attempts at discussing the issue of likely voter surveys (here, here, here, here, for some examples).