by Carl V Phillips
Stanton Glantz is at it again, publishing utter drivel. Sorry, that should be taxpayer-funded utter drivel. The journal version is here and his previous version on his blog here. I decided to rewrite the abstract, imagining that Glantz had stayed in the field he apparently trained in, aerospace/mechanical engineering. (For those who do not get the jokes, read on — I explain in the analysis. Clive Bates already explained much of this, but I am distilling it down the most essential problems and trying to explain them so the reasons for them are apparent and this is not just a battle of assertions.)
Aircraft materials strength: a systematic review and meta-analysis
Sara Kalkhoran(*), MD, Prof Stanton A Glantz, PhD
Published Online, Lancet Materials Engineering(**): 14 January 2016
Travelers increasingly use airplanes for many reasons, including as a supposedly safer alternative to motor vehicle travel, and to smuggle drugs and travel to places they should not go. We aimed to assess the whether flying in airplanes is actually safer, irrespective of travelers’ motivation for using them.
Databases of journal publications from the 20th century were searched between April 27, 2015, and June 17, 2015 for any reports related to aircraft materials failures. All information published in other venues and real-world statistics were ignored. Chances of aircraft materials failing was assessed using a random effects meta-analysis. A modification of the ANSI-2538 tool was used, though we ignored the standards that recommended not using it for applications such as this. This meta-analysis is registered (i.e., we filled out a form).
380 studies (of 5770 studies identified) were included in the systematic review. On average, airplane materials were found to be incapable of supporting passenger jet airframes. We did some weak statistical stuff that we are not going to really explain or justify, and the results remained true.
As currently being used, airplanes are not safe.
Detroit Institutes of Health, Carnival Cruise Research Foundation, FDA Center for Tobacco Products(***).
(*She is Glantz’s coauthor on the drivel, and so also is at fault.)
(**Think how lucky we are that there is no such journal.)
(***They fund him now, even though he has never made any apparent contribution to health research, so we can only assume they would have funded him if he had stayed in engineering.)
This “study” has been thoroughly trashed since Glantz first published it a year ago. Most of the criticism has been valid, though it often fails to emphasize the three key points:
1. Meta-analysis is a dubious method under the best of circumstances, and the present circumstances are not even close to good enough. The best circumstances are highly-controlled clinical trials of medical treatments (drugs or procedures) that are always almost exactly the same, administered to people who basically represent the same clearly-defined population (anyone who has disease X), with the same outcome measured (recovery, survival, etc.). Under those circumstances, a lot of small studies can be combined as if they were all part of one big study without causing too many problems.
Even for medical experiments there are often major differences in the exposure that make this combining inappropriate (with an example being clinical trials of smoking cessation). But as soon as you move on to observational studies, it is almost guaranteed that differences among studies make it wrong to pretend that they were just all one big study of exactly the same phenomenon, which was divided into a bunch of smaller datasets (that being the underlying fiction behind this type of analysis). The ANSI reference above is a substitute for Glantz’s referring the Cochrane meta-analysis process, despite the Cochrane guidelines clearly say that what Glantz did was inappropriate.
Notice that the studies included in the airplane version of the abstract would have varied hugely. Some would be legitimately useful analyses of the strength of relevant materials under realistic conditions. Some would be about wood. Some would be about primitive systems from the 1920s. Some would be about what happens only under extreme conditions. Some would be about the very rare failures that caused an engine to fall off. Each would measure a different outcome. It is no less absurd to combine whatever statistics each of those produced than it is to combine results from studies that fall under the incredibly broad umbrella of “studies of e-cigarette users quitting smoking”.
(Sidebar for those seeking to understand a bit more: It is useful to know that a proper descriptor of these studies is “synthetic meta-analysis” — that is, the results are synthesized into a single result. “Meta-analysis” alone just means “analysis of analyses”. There are many valid analyses that can be done of previous studies, even when those studies are very heterogenous, including analytic reviews and statistical analyses of their characteristics to try to explain differences in their results (called “comparative meta-analysis”). Indeed, in the present case, a comparative meta-analysis would have revealed the problems noted in point 2, identifying the characteristics of the faulty studies that drove the results.)
This exercise is just utter nonsense from start to finish. That is sufficient to dismiss the whole thing. It does not even matter what studies were included. It does not even matter how the final statistic compares to real-world evidence. No one with a modicum of understanding about scientific methods would do, publish, or endorse this study. You can quit reading now if you want.
2. The studies in the collection that drag down the average, and thus produce the result, suffer from such huge selection bias that they cannot possible be considered valid. The riff on this in the above abstract is collecting the studies where aircraft materials did fail for some reason, treating them as if they were representative of non-failure conditions. It is a bit more complicated than this, but that is actually a pretty good analogy.
I should note that it is not fair to all of the original authors to describe the studies as biased. In many cases, they did not claim their study could estimate the effect of e-cigarettes on smoking cessation. The selection bias is created by Glantz, when he interpreted the data as a measure of that. For at least one of those studies [Update: for at least two of them], the authors specifically responded to Glantz interpreting their results that way, pointing out it was not a valid interpretation. For several of the studies, other authors had pointed this out to him. Thus it is clear that Glantz was misinterpreting the data willfully, rather than accidentally.
Selection bias occurs when researchers select a study population that has different rates of exposure-outcome combinations than the population they are trying to assess. An example that should be familiar to readers is surveying a population of vaping enthusiasts and then trying to assess what portion of people who have used e-cigarettes have quit smoking. Obviously this is not going to work, because you have selected a population that is disproportionately those who did quit; people who tried e-cigarettes and failed to quit, or who casually e-cigarettes sometimes are unlikely to be in the survey. Thus such a survey may find that 90% of subjects who became e-cigarettes users have quit smoking, even though we know it is closer to one-third of that in the population.
Often selection bias is caused by selecting (or defining) the exposed and unexposed groups in different ways, thereby directly creating the biased results. One particular kind of such bias, known as “immortal person-time bias”, occurs when an advantage for one group or the other (in terms of having the outcome of interest) is inherently built into the exposure or its definition. It is a bit subtle in many cases (though not the present one), but it is really fairly intuitive.
Consider a study that tries to assess whether treatment at a tertiary care center improves outcomes for people with Ebola. It compares their survival time to the average person who got Ebola. It should be obvious that those who got tertiary care are a biased sample of everyone who contracts Ebola disease: they lived long enough to get out of the jungle, to a field hospital, and be transported to a tertiary care center. Thus part of their survival time was literally immortal — if they died during that time, they would not be part of the group exposed to tertiary care — and their survival after that exposure is further biased by the fact that they were healthy enough to reach that point. (Triage probably introduces an additional layer of selection bias, since those who are clearly doomed will not get to make the last leg of that journey.)
For the studies that Glantz is so enamored with misinterpreting, the exposed group (e-cigarette users) were selected from those who had “survived” as smokers for a while after trying e-cigarettes. That is, everyone in the study was a current smoker, and the exposure was defined retrospectively (they had already tried e-cigarettes). Their period of smoking after trying e-cigarettes but before being studied was “immortal”, since if they had quit they would have been excluded. (It is a bit different from the Ebola case, since they would not then be put in the comparison group as failure cases, which would further exacerbate the bias, but half the problem is still there.)
In a press release put together by some British academics to condemn the Glantz paper, Peter Hajek presents this nice analogy when arguing what is basically the same as my point 2:
Imagine you recruit people who absolutely cannot play piano. There will be some among them who had one piano lesson in the past. People who acquired any skills at all are not in the sample, only those that were hopeless at it are included. You compare musical ability in those who did and those who did not take a lesson, find a difference, and report that taking piano lessons harms your musical ability. The reason for your finding is that all those whose skills improved due to the lessons are not in the sample, but it would not necessarily be obvious to readers.
Now it turns out that in the e-cigarette case or the piano case, it might still be that the exposed group does better. That is, it is quite possible that those who took a few piano lessons in the past, but have forgotten all they learned, are still better at music than those who never did. So they would still outperform the no-lesson group on whatever music skills test was conducted, even with the successful piano learners excluded. It is possible that current e-cigarette users who still smoke are more likely to quit over a given period than smokers who are not using e-cigarettes. Indeed, I would predict that is the case on average. However, it turned out not to have been the case in the particular populations in the studies Glantz misrepresents; there were other factors that made those study populations unrepresentative of the whole population of smokers. But the point is that even if the exposed group still had a higher rate of the outcome (i.e., even if e-cigarette users had quit smoking at a higher rate going forward, despite them being selected because they had an immortal history of continuing to smoke after trying e-cigarettes), the result would be biased downward by the removal of those who already succeeded.
Even worse, as was the case for some studies, if the exposed group includes those who tried e-cigarettes but gave up on them, it is now basically just a way of selecting those who are most committed to continuing smoking. The exposure definition does not merely exclude those who succeeded, but it selects those who are less likely than average to quit smoking. It would be like limiting the piano-lesson sample to those who took lessons for years but still could not play Twinkle twinkle little star even then, or restricting the Ebola treatment to those healthy enough to walk to the hospital under their own power.
Meta-analysis is a great tool for hiding the fact that some of the input studies had fatal biases. (Or that the studies were fine for what they really were, but are hopelessly biased measure for the particular question.) It is safe to say that Glantz made sure that these biased results would dominate the final results of his statistical chicanery before he presented it.
3. The results contradict the real world. We know, from all sorts of robust evidence, that smokers who start using e-cigarettes are quite likely to quit smoking. Millions have done so.
Thus, the arithmetic that make Glantz’s claim implausible is simple: Very conservatively assume that there are 9 e-cigarette users who did not fully switch from smoking for every 1 who did (it is probably more like 3-to-1 if we define “user” meaningfully). Very liberally assume that each of those e-cigarette users would have had a 10% chance of quitting smoking during the study period if he never tried e-cigarettes. Also assume that of the 90% who did not switch completely, none successfully quit smoking by any means; that is, assume using e-cigarettes completely eliminates your chance of quitting if you do not manage to switch completely. The result even under these absurd assumptions is a wash — there is no net reduction in smoking cessation. The assumptions about these numbers would have to be even more extreme (i.e., more wrong) for it to be possible for e-cigarette use to reduce smoking cessation at all, let alone by the 28% Glantz claims.
Of course, the reality is that e-cigarette users quit smoking at a much higher rate than average, and there is no reason to believe that those who have not fully switched have somehow become less likely to quit. Even if a study is not laughingly invalid, as Glantz’s was, if it reaches conclusions that are flatly contradicted by what happens in the real world, it must be wrong. The aircraft materials study could not possible show that air travel is not safe, because air travel is safe. A real scientist who attempted an honest study and found the results contradicted reality would respond by trying to explain what went weird, which might be informative about something. They would not claim that reality is overruled by their result.
Of course, the problem is that the Tobacco Wars are not fought based on real science or honesty. That is the fundamental fact that needs to be realized. Glantz is not honest, and he thrives doing fake science because he works in a realm where junk science is the norm, and goes unpunished. Several commentators have expressed amazement that this junk was published in a health journal. But such amazement is either delusional or self-serving. Tons of papers that are published in academic public health journals or medical journals are patently junk, and quite a few are fully as bad as this one. So long as the cult-like belief in the academic journal process persists — so long as those who decry what it produces fail to take the next logical step and make clear that journal publication is not an assurance of quality — the problem of propagandists using that process to their advantage will be undiminished.
There is a general need for a higher level of discourse. Shooting at such a target-rich environment as Glantz’s writings invites sloppiness. Clive’s rebuttal (which is mostly quoted from Legacy) is solid, but it and some other critiques spend most of their time tangents and details rather than just honing in on the fatal flaws (see my comment on Clive’s blog). The academic commentators properly noted that (a simplified version of) point 3 is sufficient to show the result is wrong. But they argued this based on smoking cessation rates not going down as e-cigarettes have become popular (which would only be valid if e-cigarettes were the only factor that might be affecting smoking cessation rates), rather than doing the simple math that really supports it. They also failed to note that point 1 is also a fatal flaw, perhaps because doing so means they would have to admit that their favorite Cochrane meta-analysis, of e-cigarette-based smoking cessation trials, suffers from the same problems (not nearly to the same degree, but enough that it too is meaningless). Several commentators have referred to the selection bias problem I explain above as “confounding”, but confounding is a fundamentally different issue, and it probably cuts the other way.
(Sidebar: Confounding is when the population itself, not just the sample the researcher selects, has different outcomes in the exposed and unexposed groups that are not caused by the exposure. For example, smokers who try e-cigarettes with the intention of quitting smoking would probably be more likely than average to quit smoking soon even if e-cigarettes did not exist — they were trying, after all. So if we just looked at the quit rates of everyone as they tried or did not try e-cigarettes, the result would have a confounding bias that overestimates the benefits of e-cigarettes.)
I have to say that I watch with some dismay as those who battle the junk science from a scientific perspective let Glantz and his ilk define the terms of engagement, and thus offer replies that do not sharply cut to the heart of the problem. I imagine Clive and others seething, having read to this point, annoyed that I care that someone has made arguments that sound pretty good, but are not actually valid, or has mischaracterized selection bias as confounding, even though its implications are the opposite. “We have to counter soundbite propaganda with soundbite propaganda!” Well, yes. We have to take nibbles wherever we can.
But I find myself reminded of a particular style of high school and college debate competitions from when I was a kid (which I believe has, fortunately, faded away). Ostensibly a policy recommendation had to be supported or defended. But the main rule was that reading an index card with any claim someone had written somewhere made the assertion true for purposes of the game, unless the other side had a contradictory claim in their files they could read. Pointing out that the claim was utterly absurd was not an option, nor was constructing a logical argument. The debate around e-cigarette science, like most debate in public health, has degenerated into a similar stylized game of arbitrary rules. Those games bear the same resemblance to real policy analysis and real science that chess does to real warfare. They are cartoon versions that might be fun, and are a lot easier than the real thing, but nothing more. You might think that it is sufficient to be able to show that the results of a study cannot possibly be true, given what we know about the real world. But the rules of the public health game say that conclusions in published studies trump reality, as in the debate game. And it so happens that Glantz and company are the among the best at the game, and have the resources of U.S. government to build up their files of index cards. The only way to win is to not play, to say it is a silly exercise and that there is such a thing as real scientific reasoning, and this is not it. The present case was a rare case where some of those who play the game did just that, as in the Hajek example above.
But for the most part, Glantz gets to keep playing his game because his most prominent opponents still thrive within the game even though he beats them at it. His detractors who are trying to be heretics within “public health” (which seems impossible) are unwilling to simply concede that health journals will publish anything that basically looks like a research study; that would mean admitting that their whole enterprise is rotten to the core, and no longer being able to claim what they have written must be right because it is in a journal. They are unwilling to admit that entire methodologies and approaches in the field are demonstrably invalid; that would annoy funders and colleagues, and make it rather embarrassing to employ those approaches when they are convenient for supporting their own politics. Also, objecting to conclusions that do not really follow from the data might condemn about 99% of anti-ecig research, but also about 80% of pro-ecig research, and it would make the game so much harder to play.
“But, Carl,” I can hear people yelling at their computers, “you can’t be idealistic, calling for a scientific approach to public health. You just have to beat them according to the rules of the game.” All I can say is, how’s that working out for you?