“There are three kinds of lies: lies, damned lies, and statistics.” The quote, popularized by Mark Twain, who attributed it (erroneously?) to the 19th-century British Prime Minister Benjamin Disraeli, speaks volumes. Facts, as often reported, may be based on statistics. Statistics are based on data – data which is inherently based on a collection of opinions. The danger of statistics is that what we often see reported as fact may not be as reliable as we are led to believe.
Data analysis can be a fuzzy science. But, when used properly, statistics can be invaluable for such purposes as predicting elections, quantifying public opinion, estimating trends, and forecasting the effects of policy change. So why the bad rap?
Political polls are the grease behind campaigns, election forecasts, favorability ratings, and a seemingly infinite number of media reports. Professional pollsters provide the raw data that campaign handlers knead, massage, twist, and spin to generate “fact-based” messages about their candidates or causes.
These pollsters are employed by election campaigns, news media, think tanks, political parties, and special interest groups. A good pollster is revered and sought after by competing elements in the political spectrum.
Not surprisingly, a vast some of money is spent on polling. Total polling expenditures in the 2012 presidential election was estimated at close to $50 million. While this figure represents only 2-3 percent of total campaign spending, it is still a staggering sum.
When survey results can be used favorably or to prove a point, they become the word of authority; when poll numbers are down, politicians dismiss the results. Some polls, at best, offer nothing more than entertainment value. Others, like those conducted by Gallup, have been respected for over 75 years. How are these polls conducted? How are the statistics generated? And, how can the average voter look behind the numbers and determine the accuracy of polling results?
In the interest of full disclosure, when not writing about politics, I am a quantitative researcher and owner of a survey management firm. I have a reasonably solid understanding of the concepts behind polling and statistics. My firm, however, does not work for any political campaigns.
Survey researchers use specialized jargon to convey the reliability of survey results. Terms such as “statistically significant,” “margin of error,” and “confidence level” help describe and make inferences when analyzing data. These terms are used to extend limited poll results to the larger population. The key question of whether a difference seen in a poll is “statistically significant” can be restated as “is the difference enough to allow for normal sampling error?” So what is this concept of sampling error, and why is there error at all?
We begin with a population, which, in many election polls, may be defined as likely voters. An actual election is not a sample at all; it is a census of those who voted. But, for polling purposes, it is neither practical nor cost-effective to speak with everyone in the target population; instead, a sample is drawn.
The underlying principle which assures polling accuracy is that those interviewed are representative of the target population and that each individual in the population has an equal chance of being selected for interview. This is the science of sampling technique, and when using a scientific sample, a relatively small number of individuals can be used to project to the target population.
A sample of 1,000 scientifically drawn respondents is often used for national projections. Most respected polls are conducted by telephone, taking care to include both land-lines and cellular phones. A combination of random digit dialing, computerized predictive techniques, and subsequent weighting of the sample is employed to yield results which accurately reflect the target population.
After understanding the meaning of a sample, we can return to the question of “error.” A sample of observations, such as responses to a poll, will generally not reflect exactly the same population from which it is drawn. As long as variation exists among individuals, sample results will depend on the particular mix of individuals chosen.
“Sampling error” (or conversely, “sample precision”) refers to the amount of variation likely to exist between a sample result and the actual population. A stated “confidence level” qualifies a statistical statement by expressing the probability (usually 95-99%) that the observed result cannot be explained by sampling error alone.
To say that an observed result is significant at the 95 percent confidence level means that there is a 95 percent chance that the difference is real and not just a quirk of the sampling. If we repeated the same poll 100 times, 95 of the samples drawn would yield similar results (within the margin of error).
“Sampling error” and “confidence levels” work hand-in-hand. A larger difference may be significant at a lower confidence level. For example, we might be able to state that we are 99 percent confident that a sample result falls within a certain range of the true population level. However, we can also be 95 percent confident that our sample result falls within some broader range of the population level.
The key takeaway is that, even under the best of circumstances, there is an acceptable “margin of error” implicit in the results of any poll. The term “error” does not mean “mistake.” A series of coin flips which turns up ten consecutive heads is not “wrong”; it is just unusual (assuming a fair coin!). Polling is not perfect. There have been notable mistakes in predicting elections. Some may have erred due to poor sample design; others may have simply fallen outside the confidence level of the results.
Another key question is whether other biases may exist in how the data were collected. The wording of the questions, the order of the questions, and leading introductions about the sponsor of the research can all create bias. Non-response error can also be a factor, along with the timing of a poll in relation to current events. An excellent guide to some of the pitfalls of unscientific polls is: 20 Questions A Journalist Should Ask About Poll Results by the National Council on Public Polls.
So what happens from here? While reputable polling organizations provide accurate and unbiased results, how the information is used by other parties is not always so innocent. This is where statistics lie, or at least may offer only a partial truth. A campaign may see data showing that the overall approval rating of their candidate is low, and rather than reporting this fact, it may selectively report that among a certain subgroup, perhaps in urban populations, the approval rating was much higher. Is this accurate? Yes. However, the statistics are clearly being selectively used to support something favorable to their candidate. The art of the “spin” is the domain of political consultants and strategists who take fact-based data and use it to benefit their candidate or cause. Always consider not only the source of polling data, but who is reporting it, and whether it is being selectively used to distort the complete story behind the numbers.
As a general rule, accept what comes from reliable, unbiased sources. Statistics being reported by a candidate, a campaign, or other partisan stakeholders should not be accepted at face value without further scrutiny. Information is easy to come by, and readily verified, but it is critical to differentiate between well-substantiated facts and partisan half-truths. Statistics don’t lie; people do.
A slightly modified version of this article was originally published by IVN.us a non-profit news platform for independent journalists.