|
Methodology Statement
Sampling and Weighting Methodology for the October 2012 Texas Statewide Study
For the survey, YouGov interviewed 912 respondents between Oct 15-21 2012, who were then matched down to a sample of 800 to produce the final dataset. The respondents were matched on gender, age, race, education, party identification, ideology and political interest. YouGov then weighted the matched set of survey respondents to known marginals for the registered voters of Texas from the 2008 Current Population survey and the 2007 Pew Religious Landscape Survey.
Sampling Frame and Target Sample
YouGov constructed a national sampling frame from the 2007 American Community Survey, including data on age, race, gender, education, marital status, number of children under 18, family income, employment status, citizenship, state, and metropolitan area. The frame was constructed by stratified sampling from the full 2007 ACS sample with selection within strata by weighted sampling with replacements (using the person weights on the public use file). Data on voter registration status and turnout were matched to this frame using the November 2008 Current Population Survey. Data on interest in politics and party identification were then matched to this frame from the 2007 Pew Religious Life survey, using the following variables for the match: age, race, gender, education, marital status, number of children under 18, family income, employment status, citizenship, state. The target sample of 800 Texas registered voters was selected with stratification by age, race, gender, education, and with simple random sampling within strata.
Weighting
Because matching is approximate, rather than exact, and response rates vary by group, the sample of completed interviews normally shows small amounts of imbalance that can be corrected by post-stratification weighting.
Raking, first proposed by Deming and Stephan (1940), adjusts an initial set of weights to match a known set of population marginals, using a method of iterative proportional fitting (see Bishop, Fienberg and Holland, 1975 for details). In this procedure, the weights are adjusted sequentially to match the marginal distribution of each weight variable. The process proceeds until all marginals are matched. It does not require any information about the joint distribution of the variables (though, if these data are available and believed to be important, they can be employed by defining a marginal distribution involving a cross-classification of two variables).
Post-stratification weights are calculated by raking the completed interviews to known marginals for Texas registered voters from the November 2008 Current Population Survey for the following variables: age, race, gender, and education.
Survey Panel Data
The YouGov panel, a proprietary opt-in survey panel, is comprised of 1.2 million U.S. residents who have agreed to participate in YouGov Web surveys. At any given time, YouGov maintains a minimum of five recruitment campaigns based on salient current events.
Panel members are recruited by a number of methods and on a variety of topics to help ensure diversity in the panel population. Recruiting methods include Web advertising campaigns (public surveys), permission-based email campaigns, partner sponsored solicitations, telephone-to-Web recruitment (RDD based sampling), and mail-to-Web recruitment (Voter Registration Based Sampling).
The primary method of recruitment for the YouGov Panel is Web advertising campaigns that appear based on keyword searches. In practice, a search in Google may prompt an active YouGov advertisement soliciting opinion on the search topic. At the conclusion of the short survey respondents are invited to join the YouGov panel in order to receive and participate in additional surveys. After a double opt-in procedure, where respondents must confirm their consent by responding to an email, the database checks to ensure the newly recruited panelist is in fact new and that the address information provided is valid.
Additionally, YouGov augments their panel with difficult to recruit respondents by soliciting panelists in telephone and mail surveys. For example, in 2006 and 2010, YouGov completed telephone interviews using RDD sampling and invited respondents to join the online panel. Respondents provided a working email where they could confirm their consent and request to receive online survey invitations. YouGov also employed registration based sampling, inviting respondents to complete a pre-election survey online. At the conclusion of that survey, respondents were invited to become YouGov members and receive additional survey invitations at their email address.
The YouGov panel currently has nearly 20,000 panelists who are residents of Texas. These panelists cover a wide range of demographic characteristics.
Sampling and Sample Matching
Sample matching is a methodology for selection of "representative" samples from non-randomly selected pools of respondents. It is ideally suited for Web access panels, but could also be used for other types of surveys, such as phone surveys. Sample matching starts with an enumeration of the target population. For general population studies, the target population is all adults, and can be enumerated through the use of the decennial Census or a high quality survey, such as the American Community Survey. In other contexts, this is known as the sampling frame, though, unlike conventional sampling, the sample is not drawn from the frame. Traditional sampling, then, selects individuals from the sampling frame at random for participation in the study. This may not be feasible or economical as the contact information, especially email addresses, is not available for all individuals in the frame and refusals to participate increase the costs of sampling in this way.
Sample selection using the matching methodology is a two-stage process. First, a random sample is drawn from the target population. We call this sample the target sample. Details on how the target sample is drawn are provided below, but the essential idea is that this sample is a true probability sample and thus representative of the frame from which it was drawn.
Second, for each member of the target sample, we select one or more matching members from our pool of opt-in respondents. This is called the matched sample. Matching is accomplished using a large set of variables that are available in consumer and voter databases for both the target population and the opt-in panel.
The purpose of matching is to find an available respondent who is as similar as possible to the selected member of the target sample. The result is a sample of respondents who have the same measured characteristics as the target sample. Under certain conditions, described below, the matched sample will have similar properties to a true random sample. That is, the matched sample mimics the characteristics of the target sample.
When choosing the matched sample, it is necessary to find the closest matching respondent in the panel of opt-ins to each member of the target sample. YouGov employs the proximity matching method to find the closest matching respondent. For each variable used for matching, we define a distance function, d(x,y), which describes how "close" the values x and y are on a particular attribute. The overall distance between a member of the target sample and a member of the panel is a weighted sum of the individual distance functions on each attribute. The weights can be adjusted for each study based upon which variables are thought to be important for that study, though, for the most part, we have not found the matching procedure to be sensitive to small adjustments of the weights. A large weight, on the other hand, forces the algorithm toward an exact match on that dimension.