Thursday, April 17, 2008


It seems to be my day for short posts. Matt Yglesias posts about polls, pointing out (based on a Zogby poll that has Obama down only one percentage point in Pennsylvania) that there are perverse incentives for the dramatic, non-intuitive result:
This illustrates a real problem with the public polling game, namely the lack of incentives to get things right. Presumably there's some level of consistent wrongness at which people stop giving you the links, readership, buzz, and whatever else it is you're looking for but it's really not clear where that is. And, indeed, for your average media poll where the objective is to produce an "interesting" article accompanying the poll, you're probably better off being wrong.
That's fine, as far as it goes, but it misses something else important. We all pretty much blow by the sampling error that is printed with these polls, the plus or minus 4% or whatever. What we fail to appreciate is that the listed error refers to a confidence level, a number that's never printed. 95% is pretty standard in science, so what that means is that there is a 95% probability that the result is within 4 percentage points of reality.

This is really important, and I don't think most people have any idea that this is the truth. If a poll comes out that says Hillary leads Barack 55%-45% in Pennsylvania, and the error is plus/minus 4%, the tendency is to assume that Hillary's support is anywhere from 51-59%, so she'll win for sure.

No, what this says is there is a 95% probability that Hillary's true support lies between 51% and 59%. There's a 5% chance that the true number falls outside of that range. If you take 20 polls, you would expect one of them to be blatantly wrong. It is possible, but not very, that you happen to talk to 500 people, 400 of whom are supporters of Hillary. You publish your poll: "Hill's support up to 80%/Clinton's running away with PA." The pundits seize on that number, make their ponderous statements on momentum and electability, possibly even changing the results.

I'm not going to dwell on other sources of error, like poorly-designed polls - read the story of Literary Digest in 1936 for all you need to know about that.

Keep in mind that the foregoing applies to scientifically-designed polls. The Internet click polls and local news phone-in (and IM-in) polls are far worse; there's nothing real that you can determine from them, and they should be regarded as noise.

No comments:

Clicky Web Analytics