Following my previous post and my comments regarding a current ill-advised project proposal, I have been asked to further explain why randomized trials are not a useful method for studying THR. I did a far from complete job explaining that point in the previous post because the point was limited to a few paragraphs at the end of a discussion of the public health science mindset. So let me try to remedy that today.
1. Any research method intended to assess causation starts with a contrast in exposures (aka independent variables; treatments) among subjects. Sometimes this is experimental: the researcher controls the exposure and creates the contrast. Sometimes it is observational: some other force creates the contrast and the researcher takes advantage of that. We then look to see if the outcome(s) of interest is different, on average, when the exposure is different, and then use that to infer whether the exposure is causing an outcome and by how much. To do this, we obviously want the contrast in exposures in the research to match the exposure(s) whose effect(s) we are trying to assess. We also want the groups to represent the population we are interested in drawing conclusions about.
2. The next concern is the possibility that the outcome varies between the exposure groups for reasons other than the exposure itself, which is called confounding. A typical example is that people who like nicotine are more likely to smoke and are also more likely to use another tobacco product. So an observation that more snusers smoke than non-snusers does not tell us that one product causing use of the other; we know that a common factor is causing both to occur in the same people. Usually separated out from this (though it could be called a form of confounding) is selection bias, where our identification of subjects for inclusion in the dataset does not collect quite the same people from both groups. For example, if people who smoked and got a particular disease are easily found in hospitals but people who smoked and did not get the disease are less likely than never-smokers to agree to be part of the comparison group, then the apparent effect of smoking will be increased.
3. There are also a number of other challenges like properly measuring the outcome (and also the exposure if the study is observational), recording the data correctly, etc.
I numbered those paragraphs for convenient reference. I am focusing on research on people, but the same general points apply to any causal research.
The advantage of experiments (RCTs, randomized clinical trials) is that they (ideally) eliminate the problems in paragraph 2. Study subjects are randomly assigned to one group or the other, eliminating systematic differences between the two groups other than the exposure itself (there will still be random error), and there is no potential for selection bias because the study population is who they are and there is no chance you will be unaware of anyone you wanted to include (though you still may lose some when they fail to report back).
But the downsides are everything in paragraph 1, as well as some important limits to what is physically possible. There is a reason I listed the paragraphs in that order, and that is because problems with the issues in paragraph 1 are potentially far more important than those in paragraph 2. Even so, those issues in paragraph 1 are generally ignored in public health, with researchers and readers (including journal reviewers) just taking on faith that they are not a problem. This, to put it bluntly, is batshit crazy.
Of course, part of the problem is that most people doing public health research never had any real science training. But most of the problem is the harm caused by public health science being built on the medical model, as I discussed in the previous post. For experiments comparing clinical treatments, it is generally quite easy to make sure that the intended exposures are exactly what we want to compare. Getting the right population is more of a problem, since these experiments are often limited to the patients who are healthiest (other than having the particular disease that is being treated), and thus who might react differently than average. But in theory, at least, it is possible to overcome that problem too. Thus, the potential problems in paragraph 1 are relatively unimportant compared to the problem of confounding, which can be debilitating in medical research (see previous post for examples). That is why RCTs are fetishized based on the engineered oversimplification taught to medics that they are always the best study design, and then public health adopted that fetish. RCTs are not always best even in clinical medicine, but they usually are. When that simplification is ported to public health research, however, serious problems ensue.
A randomized trial comparing the effectiveness of two smoking cessation interventions gets rid of confounding and selection bias. And it works just fine so long as the different treatments really are like medical treatments, such as “Group A gets Chantix; Group B is just scolded for smoking” and what we want to know is the simple clinical question, “how many more people will quit if assigned to take Chantix?” Even then, there are problems. For example, some in Group A will balk after their first dose and either hide the fact that they are not really taking the drug or leave the study. There are study protocols designed to minimize such problems and deal with them when analyzing the data. But they still mean that even for this simplest possible case, RCTs are far from perfect.
Those problems are trivial compared to what happens when experiments are applied to social science phenomena (studies of real-world phenomena, not clinical interventions; people as actual actors, not just as bodies that are being acted upon by a clinician). Such experiments require enormous simplifications of what is being studied. There is still an ideal experimental design you could imagine, but it is often not possible to do anything that remotely resembles it. As a result, what actually gets done can fail badly due to the challenges in paragraph 1.
The perfect theoretical experiment for anything is to run two histories of the world, one where everyone had one exposure and another where everyone had the alternative we want to compare. So, for example, in one run of the world everyone took drug X to cure a disease and in the next run everyone received surgery Y. Of course that is impossible, but it is really what we are trying to simulate. The best we can do is to divide the population, as randomly as possible, and give some X and some Y.
Consider a very simple experiment you might want to perform in another social science, one that is about as close to the simple clinical situation as you can get: the effect of the level of unemployment insurance on employment decisions. We could simulate the “two runs of the world” ideal experiment by offering one level of insurance to some people and another level to others and observe how many more people are unemployed when benefits are more generous. Of course, we are not going to get away with randomizing benefit levels based on someone’s Social Security number (the resulting social upheaval would interfere with data collection). But we can come pretty close when policy makers set different levels in different jurisdictions for their own reasons — a “natural experiment”. There will be some confounding because the two populations are not exactly the same, but it might be pretty minimal.
But assume the state governments are not cooperating and giving us this natural experiment, but we have a few million dollars and want to simulate that contrast with a few hundred people. It would not work so well. The exposures are going to be fundamentally different from those experienced in the hypothetical reruns of world history or the natural experiment. The effect of an unemployment insurance rate on one person alone will be different from the effect on him of changing the rate throughout his community. The people who volunteer to participate would be non-representative of the population in ways that would obviously affect their response to the intervention. The mere fact that they were in the experiment and knew it would affect their choices.
Still, this is a far better substitute for the ideal experiment than experiments about smoking cessation choice.
The ideal experiment to measure how much smoking cessation is caused by e-cigarettes would be to run two histories of the world, one where e-cigarettes exist and one where they do not. This is what a real experiment should be trying to mimic, even though clinical researchers do not generally understand that. It would address the question that we are generally most interested in: In the real world, with all its complexities, how many more people are quitting thanks to the availability of e-cigarettes? Given that this is impossible, we need to mimic it. We have some versions of the natural experiment, like the comparative quit rates in England where e-cigarettes are especially popular, and other countries in Europe where they are not. Of course this is not as clean as the comparison of, say, of responses to different levels of unemployment insurance in Ohio and Pennsylvania, because the other differences between the populations are greater. Still, it is not terrible. Exactly this was used to estimate — very roughly, but still better than could otherwise be done — the increased smoking due to the snus ban in Finland, as Brad Rodu reported yesterday (he was not an author of the paper, but apparently should have been).
But now imagine if you wanted to spend 4.5 million dollars to try to make the comparison for a few hundred people in a lab setting. Now it gets really ugly. Unlike the unemployment experiment, the different exposures are much more complicated than promising someone different sized checks if he is unemployed. We cannot assign some people in a world that contains e-cigarettes and associated social networks and knowledge, and others in a world that does not. We could mimic this (though this is never what those who are imitating the medical model want to do): We could educate one group all about e-cigarettes and send them into the world to network with vapers and perhaps buy an e-cigarette and try it. That would be a passable substitute for the “different worlds” ideal experiment since right now many people are still not aware that they are in a world where a multitude of e-cigarettes are available, which many people find a great substitute for smoking, and that they are estimated to be 99% less harmful than smoking. By creating that different micro-world for some people, we could get a decent proxy for the contrast between the world of 2005 (no e-cigarettes) and the world of 2018 (everyone understands e-cigarettes). That would be useful.
But we still have the problems we had with the unemployment experiment. The population who volunteered for the experiment are not representative of smokers, nor even smokers who are “wanting to quit”, which creates unknown biases. People in the experiment know they are in the study, which is far more important for a behavior like smoking than it is for unemployment. In particular, at any given time, a lot of smokers are on the verge of quitting but need some focusing event to trigger the change, which is why the null-treatment arm of most cessation studies has such a high “success” rate — merely being in the study was that trigger. Of course, the trigger might be the combination of being in the study and receiving a treatment that feels like it ought to work. And the experimental subject would be surrounded by people who did not share his new view of the world, unlike the real world where associated groups of people have shared experiences and knowledge, further distancing the experiment from reality. These are substantial problems already. But it gets worse.
Researchers conducting RCTs are not inclined to wait to see if their subjects find their way to social networks and a vape shop. They need to see what happens right away, before their budget runs out. And they want to exert tight control over their subjects so they can make the study about a specific clean intervention; the C stands for “controlled”, after all (which technically means “there is a comparison group” but this interpretation fits also). They do not want people experiencing the real world, in complex ways, at its real pace. They are in the business of administering a medical treatment.
So any RCTs that we would actually see would consist of handing e-cigarettes to the people in one arm of the study to see if they rapidly quit smoking. With this, we have fundamentally departed from the real-world experience of interest, and are now asking the very different question, “If every “patient” who showed up at a clinic asking for “treatment” to quit smoking were given a particular e-cigarette device and liquid (or a choice among a short list) and a regimented set of instructions about using it (and no further social support), how many would be “cured” of smoking?” This is certainly a question we might ask, but it is far from our main question of interest about the real world, though the answers will inevitably be conflated. It is the question a medic would ask (or a “public health” person who was acculturated to think like a medic), thinking of people only in terms of showing up to a clinic and being acted upon.
Moreover, the experiment not only does not look like the real world contrast of greatest interest, but it does not even look like what sensible clinicians would do. To the extent that real clinicians are recommending patients/clients/people try e-cigarettes to quit smoking (and it is quietly happening to a limited extent), they are doing it based on an understanding of the real world. We know that much of the effectiveness of e-cigarettes is the infinite opportunities for customizing the experience, and useful clinical advice will recommend experimentation with different hardware, flavors, and nicotine densities. An experimental setting is going not going to resemble this, no matter how much effort is put into it adding flexibility. Moreover, in the real world, people who successfully switch to e-cigarettes often have social mentoring, not just technical instruction — often a friend who already vapes or a helpful vape shop proprietor — and online social networks. Thus, a real-world clinical intervention would recommend seeking that, or at least logging on to ECF, whereas a controlled experiment would not tend to push such vague but useful advice, let alone personalize it, because it conflicts with the RCT setting being a comparison of consistent well-defined interventions. Finally, fully transitioning sometimes takes quite a while, and a useful intervention would point that out and offer advice based on it, while a trial will just measure whether someone is still smoking at set near-term dates and count this as the outcome.
Notice that I have only tangentially touched on those problems with such an RCTs that get most talked about, the fact that quitting with e-cigarettes is a process that involves personalized experimentation and networking, and no RCT can mimic that. That is because the other problems are actually worse and more intractable. But the more talked-about problems are huge in themselves. There is not a well-defined intervention that could be called “getting an e-cigarette”. It depends a lot on what e-cigarette they are handed, or which ones they have a choice among, and what they are told about it. The details matter. Now a researcher might attempt to make sure the choices are broad and the instructions about how to use it are good — and that would certainly be better than the alternative — but that does not eliminate the problem. There is no possible way that the clinic regime is going to resemble real-world flexibility.
Someone planning an RCT might protest that they are attempting to deal with all of these problems. But they are not going to be able to pull it off adequately no matter how hard they try, and the effectiveness of e-cigarettes in the experiment is always going to be different from even a good clinical intervention, let alone the real world. It will almost certainly be less effective.
But I have not even gotten to the biggest problem yet.
A critical point that is often overlooked, even among those who recognize the folly of inappropriate RCTs, is that the question “how well would this work if it were assigned to everyone?” is not an interesting question when we are not talking about a medical care guideline or policy that needs to be assigned to everyone and that crowds out other options. Moreover, the answer to that question is almost guaranteed to be misleading because it is easily misinterpreted (including, typically, by the study authors). The value of e-cigarettes, as with any other smoking cessation method, is that it works for some people (particularly including a lot of people for which other approaches failed — we know that from observational evidence), and often those people have a pretty good idea who they are before they start. This is true for many things in public health, whereas it is not true in medicine — someone’s opinion about the best technique for removing her appendix is not likely to be of much value — which is yet another problem with porting biological experimental methods into real human situations. It might be that 90% of smokers have no interest in e-cigarettes whatsoever, but if 10% do and it works for half of them when nothing else would, that is a huge health and welfare benefit.
Recall that in the previous post I noted that critics of NRTs sometimes criticize them unfairly based on RCT results. It is easy to see why this occurs: Those in “public health” who tout RCTs constantly lie about them, effectively claiming that they will cause most smokers to quit. Critics quite reasonably push back against that. But let us ignore for the moment the marketing propaganda from those who profit from NRT sales and just look at how their availability affects the world, which is what we should really care about. Consider study results that show that 8% of smokers using an NRT become abstinent from smoking compared to 6% in the null treatment arm. This shows (a) that the NRT works for some people and (b) that it does not work for very many people if you just give them to the whole population of smokers who say “I want to quit” (which is a very vague concept and mostly includes people who really would prefer not to quit). But that is fine. It is not like NRTs do any harm (the over-promising about them does cause harm, keeping many smokers locked into a cycle of frustration and misery, but we are setting the lies aside and focusing on the product).
If NRTs were honestly recommended in terms of their benefits and limits, smokers could self-identify that they might be in the minority for whom they are helpful, and try them. Moreover, if it did not work for him, it would not represent the situation that is shouted by many THR advocates: “they fail 92% of the time!!!!” Someone tries it, it does not work, and he moves on, recognizing that it was never a sure thing. If that counts as “failure!!!!” then a large portion of what we do every day is failure. The protests are legitimate about the over-hyping, not about the actual value of the products.
I trust you see where I am going with this.
Now imagine the addition of an e-cigarette arm to that NRT study in which 10% become abstinent. This would tell us that the e-cigarettes work for even more people than the NRT. But it would also — as these results are typically interpreted — suggest that “they fail 90% of the time!!!!” and are really not much different from an NRT. This would not genuinely reflect badly on e-cigarettes, because no one who knows what they are talking about is claiming that they will ever appeal to everyone, let alone that they will frequently cause cessation if you force them into the hands of smokers who are not already contemplating switching. Lots of smokers who would not apparently quit otherwise find e-cigarettes a good way to quit, and that means they work for THR.
I said imagine doing such a study and how it might be interpreted, but we do not have to imagine it. That is basically what Bullen et al. did, and they found — contrary to common knowledge about the experience of those who have decided to try to quit smoking using e-cigarettes — that e-cigarettes almost always “fail!!!!” and were barely different from nicotine patches. Some e-cigarette advocates clutched at the straw of them scoring slightly higher than patches. But if the study were really the right measure of e-cigarettes, those who say they do not actually matter much for smoking cessation would be right. If we naively defer to the simplistic medical hierarchy of studies, rather than bringing some scientific thinking to the table, that study result trumps everything else we know about e-cigarettes, and makes them look quite unimportant.
Any new RCT is likely to come close to the same results. Perhaps it will do a bit better, offering better products and better training. E-cigarette advocates seeking to explain the poor showing in the Bullen study generally point out that the products used in it were rather low quality. While that is true, the bigger problems are the other ones I described. The same results — “almost always fails!!!!” and barely better than NRT in a clinical setting — are almost inevitable. Indeed, a bit of random error in the “wrong” direction and e-cigarettes will end up looking worse than NRT. For the same reasons that NRT trials do not mean NRTs are bad, this does not mean that e-cigarettes are bad. It certainly does not trump our extensive observation-based knowledge. But that will inevitably be the interpretation in “public health” because of the fetishization of RCTs.
Over ten years ago, Rodu and I mused that if the anti-THR fanatics really wanted to put some effort into “proving” that THR does not work, they would be well advised to do an RCT. Springing smokeless tobacco on a bunch of smokers who think it is just as harmful as smoking and who had never tried it, to see if they are willing to switch to it immediately, would show that, sure enough, smokers do not want to switch. Never mind that the overwhelming observational data shows that lots of people avoid smoking thanks to smokeless tobacco; this would “prove” that this is just not so. Of course with e-cigarettes some of the worst problems there would be eliminated, but it would still be a great way to “show” that e-cigarettes are not all that effective. (Please spare me the “don’t write that! now they will know!” — it is not like the ANTZ cannot figure this out. They are just too lazy to do it and are too risk-averse to do real research that might happen to give them a result they don’t like.)
The naive porting of clinical methods has burdened public health with a fetishization of RCTs. Nowhere is this more clearly a problem than for the study of THR. A RCT in this arena simply does not measure what we want to know. It is almost certain to understate the potential for realistic clinical THR interventions, and is guaranteed to understate the real-world benefits of having e-cigarettes available. And yet the naivety (either legitimate or feigned) of those in “public health” would result in it being interpreted as disproving what we know from much more useful observational data.