by Carl V Phillips
A few months ago, Borderud, Li, Burkhalter, Sheffer, and Ostroff, from Memorial Sloan-Kettering Cancer Center published a paper in the peer-reviewed journal, Cancer, that they claimed showed that using e-cigarettes did not help — and indeed hindered — attempts to quit smoking by cancer patients who enrolled in a smoking cessation program. The problem is that it showed no such thing. Instead, what is shows quite clearly is just how bad journal peer review really is in this field.
Shortly after the paper appeared in September, Brad Rodu mentioned to me that he had noticed something very strange about their Table 2 (note that most everything is paywalled, so I am providing download links to copies of the relevant material). We are not talking about some obscure detail here. He noticed that the table showed their main result to be exactly the opposite of what the text said it was. Upon reviewing the results, I shared his confusion about that point and noticed a slightly more subtle (but still extremely obvious) error: their entire result stemmed not from what in the data they did have, but from one obviously incorrect assumption about data they did not have.
We, along with Rodu’s colleague, Nantaporn Plurphanswat, wrote a letter to the journal (it appears in the journal here, you can read it here). It appeared along with an erratum (not paywalled) that says, “The authors discovered some errors regarding reference group labels in Table 2. The corrected table appears below.” Yeah, right, the authors discovered it. Rodu discusses this and presents some of the basic parameters of the study in his post on the matter.
Let’s think about what this means. There is no realistic chance that this error (inverting the labels in a table) was introduced in the typesetting phase. There is equally little chance that the authors introduced the error themselves between versions. This means that the version of paper that was reviewed and approved by journal reviewers and editors contained an error such that the numbers in the table showed exactly the opposite of what they were claiming in the text.
Now you might suggest that failing to catch what is basically a typo is not the biggest failure of peer-review, and you would obviously be right. On the other hand, this was not as simple as it might sound from what I have written. It was not one those typos that you glance at and immediately correct in head because it was obvious what the otters meant. It took me a while of mulling it over before I agreed with Brad’s assessment (reviewing my contemporaneous notes, I notice that I went back and forth on my assessment of what was going on because you have to parse the table title and notes, and the prose to figure out if their endpoint is smoking or smoking cessation, among other things). Thus it should not have been something that was just not noticed because it was so obviously wrong (not that it still should not have been corrected; and as an aside, in the erratum for another paper that appears immediately before it in the authors apologize for inverting two of the tables — the journal does not apologize for any of these patent errors in their pages). Also recall that it was in the main result in the main result table that any reader would focus on, not some obscure minor point. Of course, journal reviewers are supposed to be looking at the minor points too, but we certainly know better than to believe that.
Failing to notice this problem, it is no big surprise that reviewers and editors failed to notice that not only was the main result misreported, but it was basically just made up by the authors. We mention that in our letter too, though the word limits in what passes for debate in public health sciences are absurd (it takes 10 times as many words to debunk junk science as to write it, not 1/10th as many), so you have to read this post to understand the full extent of their errors.
If you look at their first result in Table 2 (you can see it in the erratum, the left-hand columns) they show that smokers in the cessation program who had used e-cigarettes (at all, within the previous 30 days) had a slightly higher smoking cessation rate than those who had not, with just under half of each group being smoking abstinent at the time of follow-up.
But we know — though the authors fail to point out — that older smokers who want to quit and try to switch to e-cigarettes are different from the average smokers who want to quit. They are likely to have tried and failed to quit using other methods. Most important, they are less likely to be among those who have just decided that they would genuinely prefer to just not smoke and merely need some focusing event to allow them to make the switch to abstinence, those we call Category 1 in our recent paper on the topic. People in that category are far more likely to successfully quit smoking than those who need some active assistance like e-cigarettes, which biases any comparisons about successful cessation by method, as we explain in our paper. Thus, the fact that those who were trying e-cigarettes did about as well — indeed a bit better — than those who did not feel like they needed such aid suggests that the e-cigarettes were helpful, though it is obviously not the strongest support for that claim that exists.
But now look to the right, the second set of columns. All of a sudden the authors are claiming that those who had recently used an e-cigarette were only half as likely to have quit smoking. This claim (which appears in the abstract of the paper) is rather different, to say the least. So how did data that showed no measurable difference morph into that? Via an absurd assumption.
Almost half the subjects who enrolled dropped out of the study. This always introduces major complications in interpreting the data. For example, we would expect that those dropping out of a cessation program and study would be either (a) those who successfully quit smoking and so had no reason to stick around or (b) those who decided it was a waste of time. Or both — you could imagine that a lot of the e-cigarette users had been through such programs before and found themselves saying “same old waste of time” but thanks to e-cigarettes (and the shock of being treated for cancer) they did quit smoking. If that were the case, the main results that showed nearly equal quit rates would be underestimating the benefits of e-cigarettes
But it is worse than that. Far worse. And you would never know that from reading what the authors emphasized and discussed. If you comb through the text you discover that one-third of those who had not recently used an e-cigarette dropped out of the treatment program and study, but a full two-thirds of those who had used an e-cigarette dropped out. The latter drop-out rate basically invalidates any attempt to use this study to assess whether e-cigarette users quit smoking, which is what this paper was about. They simply do not know. They lost almost all of them before they could find out. Perhaps it would have been worthwhile for the authors to just report their data from this study, but the calculations they did based on it were entirely inappropriate; their study produced ignorance, but they dressed it up to look like knowledge.
Rather than admit that, or even merely do the inappropriate calculations based on what data they had, the authors decided to report a result based on the assumption that everyone who dropped out was still smoking. See any problem there? If every e-cigarette user had dropped out of the study, rather than just most of them dropping out, they would have concluded from their assumption that no one who uses e-cigarettes to try to quit ever succeeds. Their assumption would be indefensible even if the drop-out rates had been similar for the reasons noted above — it seems quite possible that the e-cigarette users dropped out because of success in quitting, not failure. But given that the drop-out rates were so radically different (something that the authors should not have buried, but rather should have tried to explain — it is more interesting than the mostly-missing outcome data they had) this assumption simply creates the reported result.
Needless to say, the peer-review process let the authors get away with all of this: they wrote a paper about e-cigarettes’ effects on quitting attempts even though they had such loss-to-followup that they should never have attempted it; they then reported and emphasized a result that was entirely an artifact of an extreme and unlikely assumption; they failed to highlight and analyze the radically different loss-to-followup rate, a far more interesting result than what they reported; and, indeed, they failed to analyze the importance of loss-to-followup and sample selection bias at all.
This authors’ excuse for making their absurd and radical assumption is claiming that this represents an “intention to treat” (ITT) analysis. They could not be more wrong. ITT refers to a data analysis option of comparing the outcomes for two groups in an experiment (RCT) based on what treatment they were assigned, not what treatment they actually got (if, e.g., they were assigned to take a course of a drug but refused to take it after having some side effects, they would be included in the treatment group rather than non-treatment group when calculating ITT statistics). There are reasons for doing such an analysis when analyzing experimental data (though there are also reasons for analyzing the actual treatment rather than the assignment — they answer different questions, so it is not as if one is right and the other is wrong). So, do you see how ITT applies to making assumptions about the outcomes for missing subjects in an observational study? Neither do I. There are no intentions to do anything in an observational study, and dealing with missing outcome data is unrelated to the options for dealing with cases where realized treatment differs from assigned treatment.
We know that reviewers for medical journals are often just medics rather than scientists, and so are probably in over their heads when reviewing something that, like this, is not a simple RCT of medical treatments. Still, the ITT concept is solidly within the knowledge base of anyone who is qualified to review a paper on a simple RCT. So either the reviewers did not even read the text closely enough to notice this gaffe or they were not even that marginally qualified.
What is worse, the authors refer to their not-actually-ITT assumption — pretty much the most favorable assumption one could make about the missing data to support their anti-ecig claims — as “more conservative” than just analyzing the data they had. Seriously? A genuinely conservative assumption, thanks to those differential drop-out rates, would be to assume that everyone who was lost to follow-up had quit smoking. That would have produced the adjusted estimate that the e-cigarette group was twice as likely to have quit smoking as the others. To go really conservative, they could have assumed that all the e-cigarette users who dropped out had quit while all the others still smoked. While it is difficult to estimate the numerical results for that from what is reported, it would clearly favor e-cigarettes by more than that twice as likely — something in the range of four times as likely.
[To get more technical: It is not entirely clear that the authors were intentionally lying about doing a conservative analysis, despite actually making an anti-conservative assumption that hugely favored their biases. Another explanation is that their understanding of research is limited to drug trials, where the object can be to show that the treatment works even if you make worst-case assumptions. In that case, unlike the present one, “conservative” means “if you are missing data, assume it is all whatever would make the treatment look worst”. It so happens that they did not even get that right — the more extreme assumption would be that all the lost e-cigarette users were still smoking but all the others had quit. But this is not conservative in the present context because this was not a study where it was appropriate to take an extreme hypothesis (“this does not work at all”) and see if the data is completely incompatible with it even under extreme assumptions. That approach is fine in some cases, so long as you do not (as they did) report the assumption-based results as if they were meaningful other than as extreme sensitivity tests. But you can never do that for a sloppy study like this, as evidenced by the fact that the assumptions simply created the grossly misleading results. The authors basically turned the sloppiness of their study (losing most of the subjects) into their reported result. This was a clear lie/error on the part of the authors that the peer-review process did nothing to correct.
It turns out that such non-conservative “conservative” assumptions have some relationship to the actual ITT concept (e.g., when analyzing by ITT, if the drug does cure the disease but many people stop using it because of side effects, then the analysis understates the effectiveness of the drug if taken, because those who stopped and did not get its benefits are counted against it). But it still should be completely obvious that it is not the same as the ITT concept.]
Also notable is the fact that 75% of the eligible subjects refused to enroll in the smoking cessation program. This introduces huge potential for selection bias which the authors do not acknowledge. The minimal proper response to this is to report whatever is known about those who did not enroll (age distribution, etc.) to see if they look even superficially similar to those who did. Almost certainly they did not, but we will never know because the authors hid this information and the reviewers failed to take the obvious step of telling them to add it. We do know that one-fourth of the subjects who enrolled in the stop smoking program had used an e-cigarette, at least once in the previous 30 days. This is so much higher than the population average that it suggests major selection bias.
Given all that, of course, it is no shock that the paper is rife with other problems. Reading the introduction, you would think that e-cigarettes were medicines whose role in the world is to be imposed on people, and people are just machines to be fixed. No real shock there in a paper by medics (see also), but it spills over into ignoring the overwhelming evidence about the role of e-cigarettes in the world in favor of a few inconsequential RCTs.
Once again, the lesson is that the journal review process does almost nothing to prevent errors, ranging from utterly invalid analyses to out-and-out misreporting in the text. Once again, that is why truth-seeking sciences circulate papers for real peer review, which looks like what appears in this post, rather than just churning them into journals based on the obviously non-useful pabulum that was submitted by the reviewers for Cancer.
[Update P.S.: This is somewhat tangential to the points of this post, but it is worth noting and does add to the content here.
It was called to my attention (h/t Gregory Conley) that Stanton Glantz called Borderud the “best study on e-cigs for cessation so far” and interpreted it as showing that they hurt cessation attempts. So, basically, he is impressed a study of an extremely non-representative population (high age; narrow geography; suffering from cancer and its treatment with all the effects that has), in an extremely unnatural setting, with biases from only 1/4 agreeing to enroll and 1/2 lost to followup, and that did not even really measure the effectiveness of e-cigarettes (it only measured if someone happened to have recently used one, not whether they were using them to try to quit; also note that it would miss those who already succeeded in quitting using e-cigarettes). And the result he highlighted was just made up. I did not go into most of that in this post because I wanted to focus on the bright-line errors of the authors, not the many ways that the study was uninformative about the world.
Glantz explicitly believed that what the authors did was an ITT analysis and showed no sign of noticing that the numbers did not match the prose, in addition to failing to understand the many reasons that the study is generally useless. This is the type of person who “peer-reviews” journal articles about tobacco — someone who cannot even recognize the most glaring errors. Presumably neither Glantz nor the authors had any inkling that perhaps a study of a few self-selected cancer patients in a clinical setting is not the best way to understand people’s consumer choices. Public health research in general is bad, but anti-tobacco “research” is worse still. The one thing that is accurate is that it does, in fact, undergo peer review — the reviewers genuinely are peers of the authors who have little expertise in how to do or understand research. Of course, people think “peer review” means “expert review” and do not see the truth that is hidden in plain sight.]
Is this sort of ‘ research’ paper restricted to the area of ‘Tobacco ,or is it indicative of wider problems- I am thinking of research into things like obesity, alcohol and the like.
That is a useful question. The answer is that basically across health sciences, many of these problems are quite common. So to take the examples from this case, it is common to find papers based on obviously faulty data, authors failing to recognize what is even interesting about their results, using statistical methods that they half (at best) understand and thus probably do wrong (and certainly over-interpret), ignoring selection biases, and publishing conclusions that do not follow from their data. Epidemiology is particularly bad, with most people doing what would basically be an acceptable (not good — merely acceptable) second-semester term paper in 1990 and that is clearly inadequate today — there have been a lot of advances in the field that are ignored by almost everyone doing it. Throw in an element of economics (behavior) and it gets far worse because they people doing this stuff know even less about that than they do about epidemiology.
The journal system is a large part of that problem, as I have noted. It fails to fix the problems and people are rewarded for putting any old crap on the pages of a journal. If there were are norm — as there are in real sciences — of circulating the paper for comments and real peer-review, this would never happen. While those of us who can see the problems would obviously get to only a small fraction of the papers, it would change the norms about what is acceptable.
Now once you get into the areas populated by the puritanical busybodies, it gets worst. But mainly in the direction of writing conclusions (and introductions, and discussions) that do not follow from the data, in a consistent and predictable direction. The actual quality of the scientific methods used is every bit as bad as average for public health, but not necessarily all that much worse. You get quite a few more insight-free commentaries that are dressed up as research reports (they throw in some data analysis as an excuse for writing 2000 words about their personal opinions) because you have more opinionated people (and for the case of tobacco, journals that are dedicated to publishing exactly that). Also you get very little participation from the small minority of researchers who know what they are doing in tobacco (basically no one among the Tobacco Control people are competent scientists and others are scared to jump in, leaving just a few of us). You do get some participation from the 2% of workaday epidemiologists who are honest and basically know what they are doing for topics like alcohol and obesity, so the average quality is much higher that with tobacco.
I should add (see Junican’s comment) that anti-tobacco (and anti-sugar and anti-alcohol) has a lot more made-to-order falsehood that the rest of public health. The authors write their conclusions and then come up with some data analysis that is likely to produce something that is not altogether different from what the conclusions say (it is generally rather different though).
Can I get this straight? I have been looking at Table 2 with the two versions side by side. I see that what was flipped were the order of the ‘labels’ in the first column (or it could have been the data alongside the labels). So, in version 1, This (figures in the first column):
E-cig use: 0.95.
No ecig use: 1.00.
should have read:
No ecig use:0,95.
Ecig use: 1.00.
And the figures in the second column, which read:
No ecig use:1.
should have read:
No ecig use 2.
Regarding the immediate figures above (second column), the original said that ecig use was twice as good as no ecig use, and the corrected version says that ecig use is only have as good as no ecig use.
Is that correct?
If it is correct, then I can understand why you pondered for some while, since both versions make you want to say, “Eh? What? The group which completed the course were just about equal, but the group which included the drop-outs was substantially different? That makes no sense”
It makes you wonder if these error-strewn studies are being produced to order with the intention of confusing the issues.
Yes, that is correct. And, yes, it is rather confusing. Add to that their failure to make clear whether they are talking about endpoint = “cessation” or endpoint = “still smoking” (muddling those in places) and it is even less clear. They could have also fixed the first table by changing it so that the endpoint was “still smoking”.
I actually think this one was more incompetence of the authors (and reviewers and editors) than the intentionally misleading analysis we get from the UCSF crowd. It was not just the sloppiness, but also none of them apparently understanding why the analysis was wrong. The authors responded to our letter (I did not bother to go into that because it is irrelevant to the failure of peer-review) in a way that further suggested they did not even understand the problem. They are so far from able to do good analysis that they do not even understand their errors when someone explains them.
But many of them are definitely produced to order.
I breathed a small sigh of relief that I had, at least, go the right idea. I think that I also see that it does matter whether ecig use is ‘1’ or non-use is ‘1’.
I’m serious about ‘confusing the issues’. As you said, ten times more words are needed to refute some allegation that to make the allegation. Confusion has been around ever since the word ‘bluff’ was coined.
“no one among the Tobacco Control people are competent scientists”
Is that really true- if it is it has obvious implications re policy advice to government .
I can’t think of any. There are some at the borderline (by both measures): they are still welcome at the anti-tobacco conferences, but are not really supporters of the party line if you look closely; and they do basically competent work, above average for public health, but still not really up to modern standards. But I can think of no one among the serious anti-tobacco people who does even pretty-good-quality science (let alone high-quality), and no one in the fringe-tobacco-control zone who does really high-quality science.
Now in fairness, most people doing pro-THR research fall short of what I would call modern standards for the field. But a decent portion are in that pretty-good range and a few are top scientific thinkers.
Australia has a publicly funded Pharmaceuticals Benefit Scheme as well as a reasonably generously funded public medical and health system. We also have a number of policing bodies that keep a lid on the natural tendency of pharma and related entities attempts to want to get subsidies for expensive drugs and/or treatments, of doubtful cost benefit. I really pray that our ‘police’ do not rely on junk ‘research’ in making their deliberations.
Is there a way to determine what the impact of the drop-out rate is on the confidence of the comparison between the two treatments? Obviously, in this case, the high drop-out rate would substantially lower the confindence you would have in comparing the two. If only a small percentage dropped-out, your confidence in the comparison would be much higher. Is there a mathematical way to calculate the level of confidence and if so, was it/could it be done?
Yes. There are simple calculations that can be done based on “if X% of the people with exposure E that we lost had the outcome, while Y% without E had the outcome” determining “how would our estimate differ from the true value”. Anyone who takes intermediate-level classes in a decent epidemiology program (which is to say, 0.1% of those writing epidemiology journal articles) learns that. I would guess there are web apps that let you just plug in those numbers.
As you note, if relatively few people were lost to follow-up then the magnitude of the adjustment would be much smaller. When you lose as many as they did in this study, the possible range of values is (rough calculation in my head) something like true RR could be anywhere from .2 to 5, rather than the .95 that they found in the data. The 2.0 they reported was just one possibility plucked from that range (not surprisingly, in the direction they favored politically).
If you want to translate those corrections into a distribution, rather than just a single if-then point, you need to use something like the methods I introduced to the field in my 2003 paper in Epidemiology. No one does that. There has been more attention to dealing with measurement error (Burstyn and others) and confounding than to selection biases.
Pingback: Electronic cigarette use among patients with cancer: Characteristics of electronic cigarette users and their smoking cessation outcomes | Vape Studies
Pingback: FDA | warrior3995
Pingback: Sunday Science Lesson: Identifying bullshit is usually easy (it is just seldom happens in tobacco-land) | Anti-THR Lies and related topics
Pingback: What is peer review really? (part 1) | Anti-THR Lies and related topics