Tag Archives: RCTs

The unfortunate case of the Cochrane Review of vaping-based smoking cessation trials

As many of you are aware, there was a recent major update to the old Cochrane Review of smoking cessation intervention studies (trials) that gave some or all participants e-cigarettes. This report is an unfortunate turn of events. I foresee yet another highly publicized vaping “success” statistic that so hugely underestimates the benefits of vaping that it is really a perfect anti-vaping talking point.

For those not familiar, Cochrane reviews are complicated-seeming simplistic analyses where a bunch of study results are averaged together, using a technique that is typically called “meta-analysis” though is properly described as “synthetic meta-analysis” (as in, synthesizing the results; it is not the only kind of meta-analysis). For those not familiar with that methodology, it is basically junk science if a set of fairly strong conditions is not met, conditions which are far from being met in the present case.

For more on that general observation, see this previous post. I am not going to go into that level of detail again here, but I will summarize. First, just because you can declare a bunch of numbers to be part of the same category and average them together doesn’t mean it makes any sense to do so. A bunch of studies with different interventions, different populations, and other different methods cannot be treated as if they were just one big study of a single phenomenon, even if they can all be described with the same imprecise phrase, like “studies of whether vaping helped people quit smoking.” The analogy I thought of while working on this was asking “what is the average mass of house pets?” Yes, “pets” is a category you can create, and average mass is something you can calculate. But why would you want to know that average? It is a meaningless amalgamation of several clearly different collections of observations. Why would you want to know the average smoking abstinence rate, at a given future moment, of people who were handed some e-cigarettes, with some degree of flexibility in their choice, with some level of information and assistance, for some people, at some place and time over the last ten years. Yes, you can calculate that number, but why would you?

Well, you can calculate it in theory, but in reality you are stuck doing something that is weak proxy for it. The Cochranoids only pretend to be calculating that number because, of course, measures of all those different combinations of “some” do not exist. Instead what they have is whatever nonsystematic combination of “some”s that someone decided to study and write down in a journal article. It is like trying to assess the average mass of pets by looking at the records of one veterinary practice. Do they specialize in dogs or cats? Whichever types of animals they happen to see is going to be what you measure, not the population average. What’s worse, there is no attempt in Cochrane or the typical synthetic meta-analysis to figure out a population representative weighting (not that you could even do it in this case, but they never even try). By this I mean that you could bring in an estimate of the relative number of dogs and cats in a population and use that to weight (no pun intended) the average of the data you have for dog and cat averages to get a reasonable estimate for the average of the set of all {dogs, cats}. But no, the Cochrane methodology just weights the average by however many observations happened to be in the studies (analogy: averaging cats and dogs based on how many were in the vet practice’s database, even if they see ten dogs for every one cat).

As I noted, this correction does not work for smoking cessation studies (since they do not represent any real-world practice at all, so there is no real-world weighting to use), but it is still a problem. If the collection of studies included one huge study that used a particularly ineffective vaping intervention, it would drag the average way down. If that same study instead had a low sample size, the estimated average would go up. Just think about it. Can a method that has this property possibly be considered valid science? Consider the analogy again: If the vet practice also sees twenty horses, the average mass shoots up. If it sees only one, the average is pulled up, but not that much.

But even worse, the most common (by head count) pets never visit a vet. The modal pet category, in terms of individuals, is caged critters like fish, rodents, and the occasional lizards and hermit crabs. So the vet practice selection methodology is not representative of “pets”, the originally defined category. The analogy is that the Cochrane paper purports to be looking at the effects of vaping-based interventions on smokers, but really it is only looking at the effect of (a few particular) interventions on people who volunteer for smoking cessation trials. Yes, you can redefine the categories to be “pets who see vets” or “people who volunteer for smoking cessation trials”, but altering your scientific question to better fit your data is another pretty good sign that you are doing junk science. Although that is far better than pretending you are pursuing the original question, while actually analyzing data in a way that could only answer the redefined question, which is what usually happens (including in this case).

And then there is the related problem that clinical interventions are not how most smokers are introduced to vaping. So this was never about measuring the effect of vaping on smoking cessation, but the effect of being told to try vaping in a clinical setting on smoking cessation. Those are very different concepts, but the results are interpreted as if they are the former when they are obviously the latter. If you wanted to assess how much vaping reduces smoking in the actual real world, rather than in barely-existent clinical interventions, you would use an entirely different body of evidence. These trials do approximately nothing to help answer that question.

It turns out that a systematic review of vaping-based smoking cessation trials could legitimately help answer some interesting and useful questions. It is just that this paper does not do that. Most notably: What characteristics of an intervention seems to cause the highest rate of smoking abstinence — which e-cigarettes, what advice and support, which types of people, and whatever else the methods reporting from the studies lets you figure out? With that information, you could design better vaping-based clinical interventions (which are not unheard of, though the are too rare to really affect the question of how much vaping reduces smoking). You could also add useful assessments like what future trials should do for best practices (based on current knowledge) and what characteristics they should test to see what seems to work better.

This potential value of the review only serves to reinforce the fundamental failing of what was done. Why, oh why, would you want to take the success rates from the better-practice interventions and average them together with the rates from other interventions? And weight the result based on how many people happen to have been studied using the various methods? And then report that number as if it meant something? My mind just boggles that anyone ever would think this is a useful question to ask.

So I trust we have established that the number they reported is meaningless junk, even for what it purports to be. By the way, that number is a four percentage point increase in successful medium-term smoking abstinence, compared to null or near-null interventions. I buried this because mentioning a scalar in a headline or early in a piece tends to cause the reader to fixate on that number and consider it the main takeaway. It is not. It is meaningless. I urge you to never repeat it.

The reason I mention it at all is to comment on how low it is. If this were really the measure of the smoking cessation benefits of vaping, it would not make a very good case for vaping. Yes, you can spin it as “the prestigious definitive Cochrane Review [cough cough cough] finds vaping is better for smoking cessation than ‘officially recommended’ methods like NRT.” But the magnitude of “better” is so low that it is easy for someone to convincingly make the case that it is not good enough to justify the scourge of teen vaping, or whatever. Or that it is so low that we can just develop some improvement to the ‘officially recommended’ methods that would be even better.

So far, I have only hinted at the main reasons why that number is not a valid measure of how much smoking cessation is caused by vaping. Even if that statistic were a valid measure of what it could measure — “what happens if you use clinical methods to encourage vaping for smokers who are seeking aid to quit” — and not just some bizarrely weighted average of a random collection of often terrible ways of going about that, it would still be a huge underestimate.

There are three main pathways via which vaping causes less smoking: 1) For some people who are actively attempting smoking cessation, it increases their chance of success. 2) For some people who would not otherwise be attempting cessation, it inspires them to try or just do it. 3) It displaces some smoking initiation, replacing it with vaping instead. I am highly cognizant of the failure to understand this distinction because a colleague and I recently finished a review of those “population model” papers about the effects of vaping on future smoking (which hopefully will see the light of day soon). We discovered that almost every one of those papers just ignored 2) and only looked at 1) as a measure of how much cessation would increase. Some of them did this rather overtly (though they never admitted — or apparently even realized — they were accidentally making this assumption), while for others it was implicit. (Some, but not all, also considered 3) separately, but that is not immediately relevant.)

People described by 2) include “accidental quitters” as well as people who decide vaping is tempting and decide to try to quit smoking (switch) because of that. It seems safe to make the educated guess (for that is all we can really do with the data we have) that this has greater total effect than 1). In addition to creating new cessation attempts (which those “population model” papers mostly assume do not happen), vaping gets “full credit” for any resulting cessation, not just credit for the increase in the success rate (another error in the population model papers). That is, even if someone would have quit smoking even without vaping had they given it a try — and thus the the fact that switching to vaping is a particularly effective way to quit did not even matter — that case of cessation was still caused by vaping.

Like those problematic population models, the Cochrane approach only looks at 1). Everyone studied is doing something to attempt to quit smoking, or at least is going through the motions and signed up for some guided quitting attempt. So half or more of the cessation effect of vaping is being assumed away.

And it gets worse still, for both the whole meta-analysis method and this particular exercise. Behavior is not biology. The Cochrane method is sometimes valid if what is being studied is a biological effect (see the above link for more details of other conditions that must be met), but is hopeless for assessing personal behavior. Why? Because people know themselves and make choices, of course. So in a population (place, time) where vaping is reasonably well known, a smoker who finds it an appealing option is likely to try it, and if she was correct that it was indeed just what she needed, then she is going to quit smoking. She is a category 2) success story, or perhaps category 1) if she was already dedicated to quitting. And then what happens? She doesn’t volunteer for a smoking cessation trial!

That is, the people who are accurately self-aware that they are a particularly good candidate for quitting via vaping just do it, and so do not contribute to the study-measured success rate. It is like the fish in the “average mass of pets” analogy — they never show up to the vet to get weighed into the average. This cuts both ways, of course: Anyone who is self-aware that they just need some nicotine gum also quits and is not in the study to give due credit to nicotine gum. The difference, of course, is that basically no one accurately thinks that.

We also know that people who pursue formal smoking cessation interventions are more likely to be “unable” to quit on their own, which could bias the results either direction. I.e., it could be that people who just decide to quit are more likely to be helped by vaping, as compared to someone seeking aid, because their baseline success rate is higher and vaping multiplies that. Or vaping might matter less for them because they would be successful even without vaping. But it is almost certainly not exactly the same.

In fairness to the Cochranoids, it was not their task to review category 2) quitters, or selection bias, or other evidence. It was not their job to provide useful information. Their job is to just mechanically average whatever numbers someone hands them. However, that is being rather too fair to them, since they pretended they were measuring something useful. They conclude “There is moderate‐certainty evidence that [vapes] with nicotine increase quit rates….” This claims implies that they are measuring how much quitting is caused by vaping, full stop, not merely how much more likely clinical study volunteers are to be abstinent if they are in the vaping trial arm.

To summarize: If clinical assignment to try vaping really only increases successful smoking cessation (or, more precisely, medium-term abstinence) by four percentage points, it is really not very impressive. But we are pretty sure that is not the case because it is based on a population where many of those most likely to switch have already exited, and it is based on randomly averaging together best practices and poorly designed interventions. Moreover, even if it were right this would only be one of the many pathways from vaping to smoking abstinence, and one of the least important, so who cares that it is low?

On the bright side, most of the headlines and pull quotes I have seen about this fake science say something like “vaping shown to be better for quitting than NRT” or “new study shows vaping helps people quit smoking.” While these stories seem to be all written by someone without a clue about the Cochrane Report, this is a case where three clueless wrongs make a right: At least those vague unquantified messages are correct. (The third wrong is that people have been indoctrinated into thinking that NRT has measurable benefits, so they interpret “better than NRT” as meaning “good” when it really only means “not quite zero”.)

The problem is that after a spate of “this just in!” headlines this month, which will affect almost no one’s beliefs, we can look forward to a few years of this paper being cited as evidence that vaping has a trivial effect on reducing smoking. The four percent number will be successfully portrayed as definitive and the entirety of the effect of vaping on smoking prevalence. And everyone who is currently suggesting that it is not total junk, because they like the headlines of the day, is helping make that happen.

Sunday Science Lesson: Why people mistakenly think RCTs (etc.) are always better

by Carl V Phillips

I recently completed a report in another subject area which explains and rebuts the naive belief by non-scientists (including some who have the title of scientists but are clearly not really scientists) that some particular epidemiologic study types are always better, no matter what question you are trying to answer. I thought it might be worthwhile to post some of that here, since it has a lot of relevance to studies of THR.

Readers of this page will recall that I recently posted talking-points about why clinical trials (RCTs) are a stupid way to try to study THR. A more detailed version is here and the summary of the summary is: RCTs, like all study designs have advantages and disadvantages. It turns out that when studying medical treatments, the advantages are huge and the disadvantages almost disappear, whereas when trying to study real-world behavioral choices of free-living people the disadvantages are pretty much fatal and what are sometimes advantages actually become disadvantages. Similarly, some other epidemiologic study designs (e.g., case-control studies) are generally best for studying cancer and other chronic diseases, which are caused by the interplay of myriad factors that occurred long before the event, but are not particularly advantageous for studying things like smoking cessation. Asking someone why he thinks he got cancer is utterly worthless, but asking someone why he quit smoking can provide pretty good data. Continue reading

Simple talking points on RCTs not being a very useful way to study tobacco harm reduction

by Carl V Phillips

I have composed this at the request of Gregory Conley, who recently had the nightmarish experience of trying to explain science to a bunch of health reporters. It is just a summary, as streamlined as I am capable of, of material that I have previously explained in detail. To better understand the points, see this post in particular, as well as anything at this tag. For a bit more still, search “RCT” (the search window is the right or at the top, depending on how you are viewing this). Continue reading

This works in practice, now we just need to see if it works in theory

by Carl V Phillips

The title refers to a classic joke about economists, describing a common practice in the field: Something is observed in the real world — say, the collapse of the Greek economy, insurance prices dropping under the ACA, or people lining up to buy new iPhones in spite of already owning perfectly good old iPhones — and the theoretical economists scramble to figure out if their models can show that it can really happen. In fairness, that way of thinking is not as absurd as it sounds. Developing a theory to explain an observation is good science, so long as it is being done to try to improve our models and thus better understand reality and perhaps make better predictions. Obviously, the ability or inability to work out the model does not change what has happened in reality. Continue reading

Why clinical trials are a bad study method for tobacco harm reduction

Following my previous post and my comments regarding a current ill-advised project proposal, I have been asked to further explain why randomized trials are not a useful method for studying THR. I did a far from complete job explaining that point in the previous post because the point was limited to a few paragraphs at the end of a discussion of the public health science mindset. So let me try to remedy that today. Continue reading

How the medicalized history of public health damaged its science too (a science and history lesson)

by Carl V Phillips

This week, in my major essay (and breezy follow-up), I argued that the dominance of hate-filled nanny-staters in public health now is actually a product of medic and technocrat influence more than the wingnuttery itself. The worst problem there has to do with inappropriate goals that stem from a medical worldview morphing into a pseudo-ethic. The seemingly inevitable chain of events created by that pseudo-ethic resulted in public health professionals hating the human beings who we think of as the public because we are a threat to what they think of as the public, which is just the collection of bodies we occupy.

But this is not the only damaging legacy in public health of the thoughtless application of medical thinking. The science itself has also suffered, most notably (though far from only) because of the fetishization of clinical experiments (aka RCTs: randomized controlled trials) and denial of research methods that are more appropriate for public health. This is something I have written and taught about extensively. I will attempt to summarize it in a couple of thousand words. Continue reading

Sunday Science Lesson: mistaking necessity for virtue in study design

by Carl V Phillips

Yes, I have written versions of this before, but I never tire of the topic, mostly because of how much damage the errors do to science and health policy.  I get reminded of it every time I travel through a European or European-influenced airport.

Most scientific knowledge (which is just a fancy way of saying “knowledge” — I am just coopting the phrase from those who try to imply that the adjective is meaningful) comes from easy observations — e.g., “there are a lot more women than men in this population” requires only looking around.  Sometimes a bit of knowledge of interest gets a bit more complicated and we need to actively use measurement instruments — e.g., “this is heavy” is easy, but “this has a mass of 44.21 kg” requires careful methods and a good scale.  Finally, something that we want to know might be completely beyond our ability to assess without complicated methods — e.g., “does a lifetime of exposure to E double the risk of disease D” requires a complicated statistical analysis of thousands of people.  The point here is that just because those methods are necessary for the latter does not mean they are necessary — or even useful! — for easier observations.

To elaborate on the concept, I will start with my favorite analogy to it:  Airports/stations need to communicate to thousands of people when their plane/train/bus leaves and where to board it, and until we all have reliable connectivity in our pockets (good realtime phone apps personalize our information and can make this all moot), this will continue to be provided using overhead displays.  These were originally written and updated by hand, and then replaced by some amazing and clever mechanical devices, and are now video monitors.  But fundamentally nothing has changed, and that is the problem, because airports are not train stations, or more particularly, flights are not train trips.

Consider what you naturally know and can easily remember when you arrive at an airport or train station.  Most obvious to you, you know your identity, which is sufficient to find your vessel (using the phone app, though it has always been sufficient to visit the check-in desk), but we still need displays which are quick to access, instantly updated, and always available.  To make the displays usable you need to know something other than your identity.  You surely know where you are going and approximately when you are leaving, and this is all you need to identify your flight.  There are seldom multiple departures from one airport to a particular other airport at close to the same time (particularly since you also easily remember which airline you are flying).  This does not work so well for trains, however, because it can be that almost half the trains leaving a station make a particular stop because every train going a particular direction passes that station and stops.

This also means there is a difference in what can be communicated via the monitors, because planes land in just one place, whereas the same train stops several or dozens of places and not all can be listed.  Thus, train stations are forced to have their passengers to drill down further and make an effort to remember something that is not quite so intuitive: the exact minute of the scheduled departure time, which is how they identify the vessels (usually along with one target destination, either the end of the line or the most major station on the way, which is intuitive to remember).

You probably see where I am going with this:  American airports and those following their style display departing flights based where they are going, alphabetically by city.  This a great system since everyone knows where they are going and is so skilled at searching by alphabetical order that they can quickly glance to the right range of the list to find the city name.  European-style airports have been designed by people who seem to think they are train stations, and list flights by the minute of departure.  This is a bad system because it requires passengers to make the extra effort to remember or check the exact minute of departure, and to step through a list of ordered numbers with varying gaps, which is much harder than alphabetical order because you cannot use instant intuition like “I am going to Philadelphia, so I will start direct my glance to 3/4 of the way through the list”.

Like a complicated cohort study or clinical trial, the train-style listing is a cost necessity under particular conditions.  But such necessity is not a virtue of the method.  “It is needed at train stations, so it is the best we can do there” clearly does not imply “it is always best.”  Similarly, “we cannot figure out whether this exposure causes a .001 chance of that disease without a huge systematic study and a lot of statistic analysis” does not mean “we cannot figure out that e-cigarettes help people quit smoking without such a study”.  Even more absurd is the “reasoning” that leads to: “we cannot figure out which medical treatment works better without a clinical trial” and therefore “we cannot figure out if people like e-cigarettes without a clinical trial”.

Needless to say, the latter statement in each sentence is obviously false, and the proposed equivalences are moronic.  Just because the extra complication and effort is needed to ask a hard quantification does not mean that it is needed for an obvious qualitative conclusion.  Anyone who actually understands science at the grade-school level realizes that different research is needed to answer different questions.  It makes a bit more sense to use a clinical trial to try to understand adoption of THR than it does to use a particle accelerator to do it, but not a lot more.

Yet, of course, it is just such innumeracy that appears in the public discourse.  Just as habit leads many people to ignore common sense and insist that train-style displays at airports make sense, “public health” indoctrination also eliminates the common-sense level science that is taught in grade school.  It is reassuring to note that the claims about a particular type of study always being best, or even merely always being needed, are not made by actual scientists.  They always come from political activists or medics, and occasionally from incompetent epidemiologists (not actually a redundant phrase — just close to it).

I think this analysis also extends into dealing with thought-free analogies in regulation, such as “we do X with cigarette regulation and therefore should do it with products that are different in almost every way other than being tobacco” or “we require X for medicines that serve only to eliminate a disease, and therefore should require it for products that people use for enjoyment.”  I will leave that extension as an exercise.

NewZ ecig clinical study, an “I told you so”

by Carl V Phillips

Yesterday I explained why the new clinical trial out of New Zealand should not be touted as important news for e-cigarettes or THR in general.  In addition to the general message that clinical cessation trials are not the right way to study THR products and are just as likely to produce bad results as “good” ones, I pointed out a few particular issues.  First, it was damningly faint praise, claiming that e-cigarettes perform just barely better than nicotine patches, which grossly misrepresents everything we know about their effectiveness.  Additionally, with a plausible different level of luck (random sampling error) that study would have “shown” that e-cigarettes are less effective than patches.  Of course, such a result would have been no more informative about e-cigarettes than the “good” result was, but that is the point.

Sure enough, no sooner had I finished writing my analysis when anti-THR liar Stanton Glantz pretty much made my point for me.  In a post on his pseudo-blog (not really a blog because he censors any critical discussion) Glantz claimed that the study

found no difference in 6 month quit rates among the three groups.

And in a hilarious bit of “do as I say, not as I do”, opined,

Hopefully this study will get ecig promoters to stop claiming that ecigs are better than NRT for quitting.

Of course, the study showed that e-cigarettes did a bit better.  Glantz probably thinks this bald lie is justified by a common misinterpretation of statistics, wherein different numbers that are not statistically significantly different are incorrectly called “the same”.  Anyone with a 21st century understanding of epidemiology knows that this is not the right thing to say, but since Glantz’s paltry understanding of the science seems to be based on two classes he took three decades ago, perhaps this is simple innumeracy and not a lie.

Still, he has a point about the numbers not being very dramatic.  The real lie (and a case of innumeracy much worse than using incorrect terminology) is suggesting that this one little flawed artificial study somehow trumps the vast knowledge we have from better sources.  It is quite funny that he, who has made a career out of ignoring evidence, suggests that everyone else should pay attention to this “evidence” and change their behavior.  Not so funny is my role as Cassandra:  If we start touting misleading studies like this one as being great news when they happen to go our way, it is pretty much guaranteed to hurt us rather than help us.

(Glantz goes on to post some utter drivel about the nature of RCTs and what previous evidence shows about e-cigarettes, which I have debunked before and will not bother with here.  After a few decades, you learn to not try to fix every little flaw in a particularly slow student’s writings.)

Of course, Glantz does not have the skills to figure out that this study is flawed.  But he might have had some hope had he actually read it.  Or the press release.  Or even one of the news stories.  Instead, it is appears that he just heard some garbled sentence or two about it and wrote his post based on that.  How can we know that?  Because when his post first appeared (screenshot below), it described the comparison as between nicotine gum and e-cigarettes, even though someone who actually spent three minutes studying the material would not have made that mistake.

1st try

Oops. That’s what happens when you don’t do the reading.

Notice that in both the headline and the first sentence he describes the study as using nicotine gum.  Oh, but wait, it gets better.  A few hours later, he changed the first sentence (see screenshot below).  Of course, being who he is, he did not include any sort of statement of correction as an honest researcher or reporter would.  (Quietly fixing a grammar typo or garbled sentence is no big deal — I do that — but when you actually told your readers something wrong and then you try to memory-hole that, rather than actually noting you are making a correction, it is yet another layer of lying.)

2nd try

And this is what happens when you don’t know how to operate your software.

Notice now the first sentence is changed but the headline is still the same.  Did he just not realize he needed to fix that too, or did he have no idea how to change a title on his blog and was desperately calling tech support to try to get them to help hide his error.  Apparently tech support came through, though, because the version you will see if you follow the above link has memory-holed the evidence suggesting he did not even read the study (though you will notice that the link I gave still has “gum” in the URL, but now redirects to the new page where the URL has “patch” in it).

So that is all quite hilarious.  But don’t let it distract you from the main message.  We need to focus on the real sources of knowledge about THR and not buy into a research paradigm that is — often literally — designed to hide THR’s clear successes and benefits.  When e-cigarette advocates embrace studies with bad methods and misleading results (even if they seem to be “good” results), rather than objecting to the bad approach, it hurts the cause.  In this case, even the “good” study can be spun against the truth about THR.