by Carl V Phillips
Sorting out truths from lies requires an understanding of the underlying science. Since I am, arguably more than anything else in my professional life, a science teacher, I thought it might be worth posting a few focused lessons on scientific points that are key to understanding our topic area. Someone who reads all of my posts would probably pick up these same points en passant, but I thought I would see if there is some value in some periodic posts about a particular lesson that is not buried in the specifics of a particular topical discussion.
Most consumers of social science reports, a category that includes epidemiology, are vaguely aware that what they see is based on some sample of the total population. They are seldom aware of quite how much sampling properties affect the results. This is understandable, since those who conduct such science are often equally unaware of it, and even when they understand it in theory, they ignore it when presenting their results.
The error statistics that you commonly see (confidence intervals and such) are based on the potential for purely random sampling error. That is, they offer a rough measure of about how likely it is that luck of the draw produced a misleading result, such as if you flipped a coin 100 times and got a non-representative result of 60 heads. (Note that the numbers you see are really just that — a rough measure. Contrary to common belief, the exact borders of the confidence interval mean nothing of importance, but that is a lesson for another day.) Those statistics are only valid if we assume that the only error is random. Other types of error (non-random sampling bias, measurement error, etc.) ought to be represented in summary error statistics too. My fame in epidemiology is largely due to my work that argues this point and inspired some efforts to create partial solutions, but such efforts have been a failure to date, and so the reader is left having to recognize the unreported error.
In some cases this error represents a relatively minor adjustment in the results. If a study attempts to get a representative sample but seems like it might have failed to do so (e.g., because people with a particular characteristic seem slightly more likely to refuse to participate in a study), the estimated effect will be biased away from the true value. There are ways to try to adjust for this, at least roughly.
But in other cases the sample is so clearly and completely unrepresentative that it is just nonsense to even calculate some of the statistics you see. A good example of that is an recent paper by anti-THR activists that has provoked several comment threads. The authors mined comments in sections of e-cigarette message boards (possibly in violation of terms-of-service) that reported adverse effects that the posters think may have been caused by their e-cigarette use. This is not an entirely illegitimate or even useless exercise. There is substantial value in compiling adverse event reports to give some idea of what possible outcomes to look for in further research. Indeed, a robust enough collection of the right kind of adverse event reports, if they find a consistent problem, is good evidence the problem is real.
What is not legitimate, however, is to calculate statistics based on that extremely unrepresentative sample. Instead of sticking to what was useful, the authors engaged in various bits of fancy intellectual masturbation with their paltry data, and reported such statistics as what percentage of the reported results were negative rather than positive. Um, yeah. If you search the forums pages that discuss possible adverse events, you are going to find mostly adverse events. I doubt I have to explain why this sampling method is cannot produce any useful estimates about how often particular events occur. If you sampled people sitting the waiting rooms of medical clinics, you would also find them reporting mostly negative health conditions. It would not even make sense to try to estimate the distribution of which negative health states are most common from that sample, since people with some conditions are far more likely to be there than others.
The same principle applies to the many surveys of e-cigarette users. Existing surveys, and all those that are likely to be reported in the near future, consist of convenience samples of users who highly motivated to respond, and are either customers of online merchants or are politically/socially active in the vaping community and respond to postings. Surveys that are representative of the population, like those conducted by the US government, have not yet reported data on enough people who tried or used e-cigarettes to draw conclusions about the overall population. Thus, we only know about the practices, habits, history, and success of people who are dedicated to vaping.
Despite this, the results are frequently reported as if they can provide information like how often use of e-cigarettes lead to successful smoking cessation (e.g., the recent report touted by the UK NHS). They cannot do this for obvious reasons: Anyone who found e-cigarettes unappealing and did not continue to buy them and does not frequent vaping social media is not going to be in the sample. Even among current consumers of e-cigarettes, those motivated to respond to the survey will be biased toward those who are happiest about their experience and most excited about having found a good way to quit smoking. There will be very few responses from, for example, from causal vapers who were never regular smokers.
It is even worse than this. We can generally make a guess about who responds to the surveys, as I just did, but we cannot even say with certainty that they are representative of the happiest and most dedicated vapors. In the jargon, we simply do not know the sampling properties. When using a convenience sample of highly motivated volunteers, you really have no idea who your study population represents. Thus it is not really even appropriated to make claims like “among dedicated vapers, X% have completely quit smoking.”
This is not to say that the surveys are uninformative. There is a lot of useful information to be gleaned. Nor is it difficult to see the temptation to do report some statistics that may be hopelessly biased by the weird sample. (I did not look back to see whether our report — the first published survey of e-cigarette users — was guilty of that. I am pretty confident I did not allow any glaring problems, but it is easy to not be sufficiently careful about acknowledging the limits of the sampling, so I am sure you could call me on something.)
So, if a study sample is a random draw from some identifiable population, but it is really not quite random in important ways, the results are biased but might still be in the right neighborhood. Even if the sample is not random but is based on an identifiable population, there is some hope. But when it is not even possible to describe who the population is that the non-random sample is drawn from, it is pretty much impossible to make sense of any statistics other than in reference to the specific study respondents, which is not very interesting and is pretty much never the way the results are presented.