It’s the sampling, stupid! (Part 1)

The folks over at Pew have put up a blog post of sorts that that starts what will no doubt be a long, torturous, and, if recent history is any guide, ultimately forgettable series of investigations aimed at trying to determine why, once again, the polls were wrong. The Pew post lays out three potential sources of the error:

  1. The nonresponse bias inherent in (very) low response rates
  2. The “silent Trumper” phenomenon, formerly known as “the Bradley effect
  3. Flawed likely voter models

The second of these, the “silent Trumper,” has been widely debunked both in general and in the context of this particular election. So let’s put that aside. The sampling process, including likely voter selection, is the heart of the problem. And there we need to look separately at telephone polls relying on probability samples versus online polls that rely primarily on nonprobability samples, if we can even call them that. Let’s start with probability samples.

Probability sampling, with its grounding basic probability theory, is very robust but only as long as it meets three assumptions:

  1. The existence of a sample frame containing all or nearly all of the target population.
  2. Random selection of a sample from that frame.
  3. A high response rate when interviewing that sample.

Violate one of these assumptions and you run the risk of introducing biases that are devilishly difficult to measure and correct.

In the US it is relatively easy to assemble a sample frame that covers the entire US population by using a dual frame that combines both landlines and cell phones. It obviously is important to avoid duplicate respondents when the same household is sampled from both frames. This is not rocket science and is done pretty routinely.

The bigger challenge is the response rate. The long-term decline in respondent cooperation is well known and it afflicts a broad spectrum of research methods that require members of the general public to participate in research. Electoral pollsters and market researchers typically make it worse because their constituencies drive them to field periods of just a few days, making it impossible to thoroughly work the sample to maximize response. And so it’s not unusual to end up with a response rate in the single digits, a perfectly good sample spoiled.

We all understand that for the data we’ve generated to be reliable we need to make a fourth assumption: that those who did not respond are not different in important ways from those who did. Or put another way, that if we had achieved a 100% response rate our survey results would be no different than they are with the 10% response rate we have in hand.

None of us is so naïve as to believe that and so we make some effort to correct the likely bias through weighting. In this we have come to ascribe magical properties to demographics, seemingly to believe that if we can weight the data so that it approximates the known distribution (from the Census, for example) of the target population across age, gender, race, and Census region then we have eliminated bias. As if every person who ticks the same box in each of those categories is just like every other person who ticks those same boxes. A serious investigation to identify all of the potential bias caused by nonresponse is seldom undertaken.

As the final step, we use the fig leaf of margin of error to describe the accuracy of our estimates, a calculation that is heavily influenced by sample size and totally ignores nonresponse. Worse yet, the media may genuflect to the MOE and then go on to breathlessly report one and two percentage differences as proof of who is winning the horsey race.

Defenders of contemporary telephone surveys that rely on probability samples argue that, despite the obvious challenges, these surveys continue to be pretty accurate across a wide variety of benchmarks drawn from non-survey sources. There are two problems with that. First, it gives up the high ground of theory in favor of an empirical justification. It works until it doesn’t. Second, consumers of these data expect a precision that it increasingly elusive. More on that later.

Online surveys that rely on panels, river, exchanges, etc. present even greater challenges. More on that in my next post.

Reg Baker is Executive Director of MRII.

Facebooktwitterredditpinterestlinkedinmail

5 thoughts on “It’s the sampling, stupid! (Part 1)

  1. Interesting thoughts, and much more to come. Here are a few cents’ worth from me.

    1) Fit for purpose. Most polls try to predict the share of vote that a party or candidate get, but many people are more interested who who will win. Clinton won the share vote, but Trump had a landslide win – so even if pollsters could predict the share of vote correctly, they would not be fit for purpose if they could not predict (or assign probabilities) to the result.

    2) Another source of potential error that is often mentioned, when the polls fail, is the late swing. Like the Bradley effect (the shy Tory effect in the UK), this is usually bogus. I guess there is a possibility that the FBI playing at politics may have had an impact, the announcement 1st that HRC might be a crook, and then the late announcement that she wasn’t a crook seemed to impact polling results. But, did it impact actual intentions? It might be possible to look at early voting in those states that had early voting and compare with states that did not to gain some insight.

    3) When the polls go wrong we often see them sheltering behind their so called margin of error. If one poll publishes a forecast that is based on a genuinely random probability sample of 1000 people, and if there was a 100% response, and if there was no weighting of the data, then an estimate of 50% for one party/candidate implies that there is a 95% chance the real number is +/-3% of that number. However, if a second poll is run, and it is fair and independent, then there is only a 50% chance the error will be in the same direction (if three polls are published there is only a 25% chance the random errors will all be in the same direction). If two polls predict 50% and the actual result is 47.1% that is NOT inside the +/-3%. If we look at the poll results, most of them had a margin for Clinton of over 1%. That is not a random sampling margin of error problem, that is a systematic error.

    4) Weighting only works when we know what the weighting interactions are and we do the weighting fairly, and even then it reduces the effective sample size. One of the big problems in the UK’s election polling debacle related to older people. Older people matter, mostly because more of them vote. However, it is really hard to get the over 80s on the phone or online, so many researchers upweight older people, which can be as blunt as upweighting all the over 65s. Upweighting a 68 year old because you are short short of 80 year olds, is like upweighting an 18 year old because you are short of 40 year olds. We would never assume an 18 year old swapped for a 40 year old, but many companies appear to be treating people over 65 (and in some cases over 60) as interchangeable. There is also the potential for systematic biases – for example is an 18 year old who answers the phone and completes the survey similar to the 18 year who does not? Are 80 years olds who answer their mobile phone or do an online survey typical of 80 year olds that do not?

    My feeling is that there should be a legal requirement for pollsters to add a comment along the lines of “Please note, the way polling is done does not conform to the requirement of sampling theory, so whilst we make every effort to be as accurate and valid as possible, you should not bases your views or actions on these findings. Note, that over the last 20 studies we have conducted the mean error has been X, the median error was Y, and the largest error was Z. I like 20 studies as it has a sentimental link back to the concept of 95% 🙂

  2. While I’d agree that the Bradley effect may no longer apply in our hopefully post-racial society, the Politico article completely misses the mark on whether social acceptability bias may have been at play in the results that we as an industry collected.

    Borrowing from mass communication theory, the spiral of silence was most certainly at play during this election (see https://en.wikipedia.org/wiki/Spiral_of_silence and https://masscommtheory.com/theory-overviews/spiral-of-silence/).

    From the Wiki link: “The spiral of silence theory suggests that “people who have believed that they hold a minority viewpoint on a public issue will remain in the background where their communication will be restrained; those who believe that they hold a majority viewpoint will be more encouraged to speak.”

    So Politico’s claim that an anonymous, online survey was the perfect opportunity to express one’s support for Trump is wildly inaccurate. The “isolation of the perceived minority view” not only reduced expression of support for Trump, but reduced participation by Trump supporters within surveys of all kinds: telephone, mobile and web.

    After watching the press cast aspersions on Trump and Trump supporters since the primaries began, Trump supporters wouldn’t take a survey, wouldn’t answer the telephone and they certainly wouldn’t express their opinions at work or at home or even outside of their own heads.

    Random Probability Sampling won’t help us out of the hole that has been dug by the most partisan media environment in history, a body that openly states its perceived duty to express their own points of view over accurately reflecting the points of view of all Americans. (see http://www.nytimes.com/2016/08/08/business/balance-fairness-and-a-proudly-provocative-presidential-candidate.html and, ironically, http://www.nytco.com/who-we-are/culture/standards-and-ethics/ and http://www.nytimes.com/2016/07/24/public-editor/liz-spayd-the-new-york-times-public-editor.html)

    Try not to feel the sting as the press attacks our industry for ‘not getting it right’.

  3. Good points, Dan, although I think perhaps there are two different things at play here. Much of the “silent Trump voter” argument was about classic social desirability bias, which I think the Political study debunked. What you have described is a different but very real phenomenon with a potentially more dramatic impact, that is, the tendency for people to not participate when they feel alienated or know that there candidate is down in the polls. Nate Silver commented about this at one point but I have not been able to find it again. This is likely a significant issue in this last election, especially given the intense bias among Trumpers toward the MSM which feeds so on poll results.

Leave a Reply to Dan Coates Cancel reply

Your email address will not be published. Required fields are marked *