by Carl V Phillips
I have composed this at the request of Gregory Conley, who recently had the nightmarish experience of trying to explain science to a bunch of health reporters. It is just a summary, as streamlined as I am capable of, of material that I have previously explained in detail. To better understand the points, see this post in particular, as well as anything at this tag. For a bit more still, search “RCT” (the search window is the right or at the top, depending on how you are viewing this).
- RCTs, like every other study design, have advantages and disadvantages.
Background: The one fundamental advantage is that they eliminate systematic confounding. A second feature that is sometimes and advantage, but sometimes a disadvantage, is that they study a clearly-defined specific intervention, rather than the collection of similar but not identical exposures that occur in the real world. Clear disadvantages include volunteer subjects being different from the average person and subjects knowing they are being studied which affects behavior. The fatal disadvantage in many cases is that it is physically impossible to conduct an RCT that address the real questions of interest, and so the RCT addresses a question that is merely sort of similar to them.
- The reason there is a myth that RCTs are always better is that for actual medical treatments for diseases, the advantage is crucial and the disadvantages are not great.
Background: There is a huge problem of confounding-by-indication for medical treatments (roughly: people who seem most likely to benefit from a particular treatment are more likely to receive that particular treatment), so eliminating confounding is particularly important for answering the question, “what would happen if we gave this treatment to everyone?” The requirement of using one tightly-defined intervention is largely an advantage. Most patients will volunteer for studies, so the subject selection is not all that odd, and behavior has limited effect on physical treatment outcomes. What can be studied is pretty close to exactly the question of interest; medical treatment is always an intervention, so an experiment about the effects of an intervention is on-target.
- But just the opposite is true for studies of behavior, where the advantages are of minimal importance, or actually become disadvantages, and the disadvantages constitute fatal flaws.
- The questions we are most interested in for THR are things like, “does the availability of e-cigarettes in the real world — along with real-world knowledge and other realities — cause people who would otherwise have continued to smoke to quit?” An RCT can only answer a fundamentally different question, along the lines of, “what happens when you give the odd smokers who volunteered greater access to and knowledge about e-cigarettes than they had before?” We are interested in the real-world questions, not the effects of concocted interventions that will only occur in the experiments.
Additionally: Those who try e-cigarettes are self-selected — they are those who seem most likely to benefit from them — but this is part of the reality we want to study. That is, eliminating such confounding-by-indication is useful for answering the question “what would happen if you imposed this intervention on every smoker” but is totally wrong for understanding what happens in the real world, where the self-selection is an important part of the real experience of interest. Thus, the main advantage of RCTs actually becomes mostly a disadvantage when studying real-world phenomena rather than controlled medical treatments.
- The most frequently recited concerns about THR RCTs — that they are too expensive and that they do not offer subjects the “right” choices of products — are red herrings. These are not the fundamental problems.
Background: RCTs are certainly more expensive than many other study designs, but that cost is not one of their fatal flaws. The obsession with making sure the intervention includes high-quality products and proper instruction misses the point: The intervention can be made more or less effective, but it is still an intervention that is not at all similar to the real-world experience we are interested in understanding.
1) RCTs can prove that there exists a mechanism in e-cigarettes that would cause cessation. They can also prove that ecigs are more than NRT.
There are limitations in measuring behavior using artificial interventions, but your recent article about induction is very relevant.
2) You mention lower interest for THR in RCT subjects. This can be dealt with by measuring desire to switch, testing against cessation results and then comparing to real world desire to switch.
3) The limitations mentioned generally are biased against ecig’s efficacy in clinical settings. But several intervention cohorts show very significant cessation rates despite several preventable hurdles (quality, variety, supply of refills and follow-up), thus good RCT designs certainly would be impressive evidence.
4) RCTs are very relevant for clinicians recommending THR to the uninitiated. This can have a massive effect.
Ridding of confounding through RCTs is certainly not something we can afford missing. A consensus on how to unscramble the scrambled egg of confounding (etc.) in ecig population studies is impossible.
Again, RCTs are immensely convincing in conventional discourse. Even if the world’s stupid, why should we let their stupidity lead to false and deadly conclusions?!
Contrary to your cynicism, if RCTs generate enthusiastic results, many PH would be willing to give a more careful assessment on gateway and toxicity.
Some responses, in the order of the comment:
1) RCTs, like other outcome studies, say nothing about mechanism. And there is no proof in science. Rather than showing (not proving!) that ecigs are more than NRT, they actually can only show that ecigs *are* NRT — that is my point: RCTs are good for studying medicines and any intervention so studied is effectively treated like a medicine. Measuring behavior is not the issue; indeed that is the trivial part (you only need to measure whether someone is smoking).
2) In theory this could be done. But in reality it cannot. The available measure might allow you to reduce some confounding, but not nearly all of it. In any case, you missed my point, which is not that there is confounding that needs to be controlled for, but that the self-selection among real-world switchers is a feature of reality, not a source of estimation error that needs to be controlled for. I think you might be suggesting that you could somehow measure propensity so well that you could make predictions about a particular individual. No chance of that. Also, you again miss the key point: you would still be measuring the impact of assigning the intervention to someone, which is not an interesting question.
3) See immediately previous sentence, as well as the last bullet of the post.
4) I was not addressing the question of whether RCTs can be used to trick people into believing something in spite of them not really being informative on the point. Now it turns out that if a medic is actually doing whatever it is that was being studied, the study is actually relevant. That seems unlikely, though.
Do you understand what confounding is and why (in some circumstances) we want to try to eliminate it from our effect estimates? What confounding is there that we cannot afford to be missing?
Same point again about this not being about whether RCTs could be used to persuade people who do not understand them.
There are probably a few people who would genuinely believe that they learned something useful from the RCTs and change their behavior. But I can assure you it will be very few. Do you really think that the liars give reasons for their behavior, rather than rationalizations? You might notice that the conclusions are always the same even as the arguments they are making change. Kind of a giveaway that they are starting with the conclusion and just making up whatever is convenient and plausible to the rubes to put after “…because…”. Moreover, RCTs are pretty much a dream methodology for those who want to show that something does not work well; it is trivially easy to build in design details that pretty much guarantee something fails.
In any case, you are still missing the point. So try this: If 20 RCTs were done and all of them had results that could be interpreted as “e-cigarettes did not work very well or did not work at all”, would you believe that e-cigarettes do not help people quit smoking? How about if they showed that those given e-cigarettes were less likely to quit smoking?
Thanks for the response. (If you feel this is going in circles don’t feel obliged to respond.)
1) Of course RCTs say nothing about mechanism, but they can say that there is *some* potent mechanism here; i.e. this is not homeopathy…
If RCTs show that ECs are better other forms of NRT, than we can see that sensory stimuli from EC is important (something very relevant to know).
2) Come on! Is it plausible that receiving a reduced harm product would induce cessation against control, but actively buying it has no effect?
3) Indeed, parts of RCTs are different than real world, parts are identical/similar. But we aren’t Martians and if a THR product works in clinical trial settings we have a very good idea as to why.
4) Stronger or weaker, RCTs are certainly evidence at some level. This is not tricking people (unless teaching science to someone unfamiliar with philosophy of science means tricking him).
I’m all ears if you think that we have a population study for which the biases have been accurately accounted for. Mike’s blog presents severe biases in population studies thus far. You praised Brown et al which weirdly found lower cessation rates for NRT quit attempts — perhaps not all kind of “attempts” are equal.
“If 20 RCTs were done and all of them had results that could be interpreted as ‘e-cigarettes did not work very well or did not work at all’, would you believe that e-cigarettes do not help people quit smoking?”
No! Clinical trials are biased *against* THR. Though, they would still make me think that they are not great. But the 4 uncontrolled clinical trials available all show that e-cigarettes are great.
The modest results of the 2 available RCTs mislead many. People don’t realize that both were not about substitution; the intervention was but 3 month of EC use.
But that’s a small problem. Caponnetto et al reported battery life of only 50-70 puffs — nothing close to substitution! Bullen et al reported the device had less nicotine than reported and “users consumed on average just over one cartridge per day, delivering around only 20% of the nicotine obtained from cigarette smoking.”
Furthermore, 2 reviews on the device Bullen used report horrible battery life:
http://www.ecigaddict.com/2011/07/elusion-electronic-cigarette-review/
http://www.e-cigarette-forum.com/forum/australia-new-zealand-forum/167561-recommendation-elusions-may-break-bank-though.html
Surely, almost no data can help for Glantz and Chapman, but the mainstream will most certainly be influenced by good RCT results.
In summation, two bad RCT designs are limiting the due respect EC should get in the mainstream health community, while 4 intervention cohorts showing excellent results aren’t mentioned because they aren’t RCTs.
Ok, you inspired me to come up with a teaching example: Sheldrake’s “feeling that you are being stared at” experiments. If you are not familiar, he conducted a bunch of RCTs of whether someone could detect that they are being watched even if they have no way of seeing etc. that it is occurring. The results were a small but statistically robust improvement over chance.
What does this tell us about a mechanism? Absolutely nothing. Sheldrake labels it the “morphic field”, but that is an ad hoc description of a construct that could explain the observations, but so could any number of other constructs.
Does it convince people? Not so far as I can tell. Half (roughly) of everyone believes that you can detect when someone is watching you, and the RCTs merely serve to let the say “see!” If the RCTs had gone differently, they would not have stopped believing it, saying (quite validly) that these artificial situations apparently eliminated whatever it is that makes it work. Half of everyone insists it is impossible and just denies that the RCT results are valid (with no real basis, but absolute conviction). Maybe 0.1% of the population actually says “you know, I would not have bought that claim, but the study results are what they are.”
The thing is that in this case, unlike with ecigs, the RCTs is much better evidence than the real world. It is very difficult to figure out if really are detecting being stared at in the real world, whereas it is easy to detect someone who quit smoking because of ecigs. Thus, Sheldrake’s RCT give us some measure of something that we do not have any good measure of, whereas ecig RCTs merely provide a poor measure of something we already have a good measure of. In spite of having no other solid evidence, almost no one changes their minds because of Sheldrakes RCTs. It seems rather less likely that they will change their mind about a claim that has solid real-world evidence.
I think that responds to about half your points.
Your 2) does not seem to be responding to my points under 2). What I think you are claiming is that if forced exposure to ecigs in the context of a cessation clinic and study causes some smokers to quit then it must be that a spontaneous exposure in the real world will too. No, definitely not. This is basically a marketing study conducted with the (inconsequential) trappings of a medical study. Marketers discover all the time that a product that played well in their studies does not sell in the real world. More important, just because the forced intervention did not seem to work for enough people to show up in the statistics definitely does not mean that those choosing it in the real world are not benefitting.
I am rather confused about what biases you claim exist in the observational evidence. I have pointed out the enormous problems with the RCTs, which make their results remarkably close to useless for answering the questions of interest. I am yet to hear any claim that the observational evidence is suspect, other than innuendo that every last smoke who switched to ecigs was about to spontaneously quit and so they did not aid cessation. Short of such a claim, it is difficult to understand what you or anyone else is talking about with the hand-waving talk about biases.
That claim seems to be invoked by people who read once that observational studies have bias that RCTs do not, and they just recite that without having any idea what it means and, especially, having no idea that it does not apply in this case. Yes, observational evidence produces a more biased estimate than does an RCT of X for the uninteresting question “what percentage of volunteer smokers will quit when forced exposed to X in a clinical cessation experiment”, but who cares? But what, exactly, are these supposed biases regarding our actual questions of interest.
As I said, I agreed that RCT results would probably persuade some people. (And thus if the ANTZ are smart, they will do some RCTs.) Cat posters will also persuade some people. That does not make them useful science.
– Staring experiments are irrelevant to the THR RCT question, because with regard to the staring hypothesis 1) there is a very high degree of certainty regarding prior plausibility 2) we do not know the potential mechanism for a possible staring effect, so we cannot rule out causation from the real-world correlates of staring.
Where the potential mechanism is obvious — in our case mitigation of the costs of cessation — we may judge that the effect is similar in clinical and natural settings. Indeed, receiving an unsolicited e-cig, as opposed to buying one, is unlikely to be a significant causative factor.
– You brought empirical support to your argument, saying that artificial marketing studies are not reflected in the real world.
However, 1) in marketing studies the intervention and the endpoint are usually imaginary or simulated, whereas in THR RCTs both are real 2) well designed marketing studies do indeed reflect reality (of course, not perfectly).
To be sure, I AGREE THAT INTERVENTION THR SUBJECTS DO NOT HAVE THE GREAT POSITIVE ASPECT OF ACTUALLY *CHOOSING* THR, but that makes a positive RCT outcome even MORE convincing.
4 uncontrolled trials show that we will get amazing results from THR clinical trials done right:
http://www.ncbi.nlm.nih.gov/pubmed/25380748
http://www.ncbi.nlm.nih.gov/pubmed/21989407
http://www.mdpi.com/1660-4601/11/11/11220
http://digitalunc.coalliance.org/islandora/search/mods_name_personal_namePart_ms%3A%22Squires,%5C%20Rhonda%5C%20D.%22
Also, I’m sure you are aware of the RCTs on snus with great results.
Furthermore, receiving an ecig in a clinical setting is not mutually exclusive to choosing to use it, so there should certainly be some positive effect in RCTs if ecigs work in the big world. So RCTs can show effect, even if they underestimate magnitude,
– You write, “I am rather confused about what biases you claim exist in the observational evidence.”
Glantz put out a list of observational studies to support his view (http://tobacco.ucsf.edu/meta-analysis-all-available-population-studies-continues-show-smokers-who-use-e-cigs-less-likely-quit-smoking). Do you hold that these are not confounded?
Just a short list of possible problems with any design of a population study
1) Anyone who tries quitting with an ecig is probably more proactive about quitting.
2) Measuring vaping at enrollment biases for treatment failures, as you have pointed out your post about Borderud et al.
3) Measuring vaping at follow-up is equal to a cross-sectional study and is subject to reverse causation bias.
4) Asking if ecigs were used in a previous quit-attempt cannot work if ecig uptake was not stable with time. (Also, that NRT was negatively associated with quitting in Toolkit Study suggests a tremendous bias here.)
Thus, I do not see how you can manage to show ecigs work from observationals.
– “if the ANTZ are smart, they will do some RCTs.”
Huh? The available RCTs do not offer great support to ANTZ, in spite of their bad battery life, poor nicotine delivery… Also, what about all that the uncontrolled intervention cohorts suggest?
> I AGREE THAT INTERVENTION THR SUBJECTS DO NOT HAVE THE GREAT POSITIVE ASPECT OF ACTUALLY *CHOOSING* THR, but that makes a positive RCT outcome even MORE convincing.
But the main point here (which is hard to get the conversation to focus on, as my experience tells me is typical): more convincing about *what*? About what happens when you assign volunteers to use ecigs. The issue is not that the aspect is “great positive”, it is that it is what we want to better understand. We already know that many smokers switch to ecigs. So showing that you can induce that result is not interesting. (Recall that we agreed that RCTs might have propaganda value to trick people who do not understand that this reality tells us more than they ever could, but that does not change their minimal contribution to useful science.)
>Where the potential mechanism is obvious — in our case mitigation of the costs of cessation — we may judge that the effect is similar in clinical and natural settings. Indeed, receiving an unsolicited e-cig, as opposed to buying one, is unlikely to be a significant causative factor.
Understanding the mechanism has no bearing on the relative value of an RCT. A medicine administered for a disease, which is accidentally discovered to work for reasons we do not understand at all, can effectively be assessed with an RCT because RCTs work for studying medicines. But in any case, “mitigation of the costs” is not a mechanism. It is basically the last step on the causal pathway that the actual mechanisms are the first steps on.
>You write, “I am rather confused about what biases you claim exist in the observational evidence.”
Note that I wrote that not because I could not identify biases, obviously, but because I did not know what claims you were making and thus could not figure out how to respond. Now I have something to work with:
>Just a short list of possible problems with any design of a population study
>1) Anyone who tries quitting with an ecig is probably more proactive about quitting.
That is a problem if you are trying to figure out, from real world data, what would happen in an RCT. It is, however, a feature, not a bug, in terms of understanding the real effects. It does depend on the researcher being a moderately competent epidemiologist (by which I mean at the 95th percentile or higher for epidemiologists; no, that is too optimistic — change that to 98th) to use the data properly rather than bungling it.
>2) Measuring vaping at enrollment biases for treatment failures, as you have pointed out your post about Borderud et al.
Yeah, well, that was just stupid. It is not a problem with observational research, it is a problem with the study design. There are a zillion ways to ask the wrong question.
>3) Measuring vaping at follow-up is equal to a cross-sectional study and is subject to reverse causation bias.
Ditto.
>4) Asking if ecigs were used in a previous quit-attempt cannot work if ecig uptake was not stable with time. (Also, that NRT was negatively associated with quitting in Toolkit Study suggests a tremendous bias here.)
I am not sure I understand what you are saying. It is possible to measure (imperfectly, as with everything) someone’s history of quit attempts and history of trying ecigs. This allows us to address a lot of useful questions — not every possible interesting question, but at least some of them, unlike the RCT.
The negative association between trying cessation aids and quitting is not even slightly surprising, given the large number of Phillips-Nissen-Rodu Category 1 people among those who quit and Category 3 people among those who try. That just means that this particular statistic cannot be used to address some questions of interest. It does not make the research useless.
>>“if the ANTZ are smart, they will do some RCTs.”
>Huh? The available RCTs do not offer great support to ANTZ, in spite of their bad battery life, poor nicotine delivery…
I spent a couple of minutes making a list of the ways that someone could design such a trial to fail that had nothing to do with choosing bad hardware to distribute. I stopped at about a dozen and I don’t think I was anywhere near saturation. (No, I am not going to list them. It probably takes tobacco controllers a person-year to figure out I can do in two minutes, so I will at least make them spend the time.) Some of them would be so glaringly obvious that anyone with any understanding of the science would spot them — but that does not change their propaganda value much. Others are so subtle that ecig proponents doing trials might do them accidentally because they do not really understand their implications.
“We already know that many smokers switch to ecigs. So showing that you can induce that result is not interesting.”
Switchers are poor evidence because they may have quit alone without the e-cigarettes (unless we rely on their self-awareness when they say that they wouldn’t have otherwise quit).
“Understanding the mechanism has no bearing on the relative value of an RCT. A medicine administered for a disease, which is accidentally discovered to work for reasons we do not understand at all, can effectively be assessed with an RCT because RCTs work for studying medicines.”
But when we have an idea of why the treatment would work if it does, then we can extrapolate to scenarios similar to the one in the RCT — this is my main point.
“It is possible to measure (imperfectly, as with everything) someone’s history of quit attempts and history of trying ecigs. This allows us to address a lot of useful questions — not every possible interesting question, but at least some of them, unlike the RCT.”
There would still be a problem comparing cessation rates when population ecig usage rates are not stable, biasing a shorter cessation period after quit attempt. Also, unaided quitters have an easier time forgetting a failed attempt.
Pingback: Sunday Science Lesson: Why people mistakenly think RCTs (etc.) are always better | Anti-THR Lies and related topics
The other thing about RCT is that eliminating the confounding elements can actually remove some of the things that make cigarette replacement products work in the first place. Smoking is a complex behavioral phenomena. Which aspect should you test for? Nicotine? Hand to mouth experience? The reward sensation of buying new ecigs? How many RCT would it take to cover all bases? Yes it would help if we could find the confounders than impede success but as this information is already available through epidemiological studies, why spend money replicating it other than to gain the cudos an RCT adds.
Or I could be wrong, it would depend on what result you were looking for. Pure knowledge or a solution to a problem. If knowledge then RCT all the way?
One of my points is that clinical trials are often look at a poor proxy for the exposure of interest, and thus provide a poor measure of it. That will always be the case for smoking cessation unless what you are trying to measure is “what is the effect of given X to someone who volunteers to have X given to him?”, which is a very weird question.
Some clarification is in order. When you say “confounding” what you describe is only actually confounding if you are asking what can only be seen as the wrong question. I get the impression you understand that, but what you wrote might confuse people. So the effect of the manual accompaniment to puffing would contribute confounding if someone attempted to assess the effects of the specific effects of inhaling the chemicals. That might be an interesting study for R&D, but it would be rather pointless from the perspective of public health epidemiology. On the other hand, if you were trying to assess the effect of vaping, then the various aspects of it would be part of the exposure, not a contributor to confounding.
It is true that if you cannot effectively imitate a real-world exposure than an RCT is not going to give much information about real-world problems. But that does not mean that the RCT will work any better to answer a more abstract basic question. For example, taking up zero-nic vaping (which separates the effects of the action, flavor/feel, and perhaps non-nicotine chemical activity from the effect of nicotine) is still very different from being assigned to use zero-nic ecigs in a trial.
It’s like you read my mind and expressed it far better than I did. That’s exactly what I was struggling to say.