Category Archives: Science lesson

Sunday Science Lesson: Debunking the claim that only 16,000 smokers switched to vaping (England, 2014)

by Carl V Phillips

When this journal letter (i.e., short paper), “Estimating the population impact of e-cigarettes on smoking cessation in England” by Robert West, Lion Shahab, and Jamie Brown came out last year, most of us said “wait, wot?” The authors estimated that in 2014, about 16,000 English smokers became ex-smokers because of e-cigarettes (a secondary analysis offered 22,000 as an alternative estimate). But that year saw an increase of about 160,000 ex-smokers who were vapers in the UK (the year-over-year increase for 2015 versus 2014) according to official statistics. In addition, there were about 170,000 more ex-smokers who identified as former vapers. Since the latter number subtracts from the number of ex-smokers who are vapers in 2015 they need to be added back. So it appears that the year-over-year increase in English ever-vapers among ex-smokers appears to be nearly 200,000, after roughly adjusting for the different populations (England is 80% of the UK population). Thus West et al. are claiming, in effect, that the vast majority of people who went from smoking to vaping did not quit smoking because of vaping.

My calculation is rough, and for several reasons it may be a bit high (e.g., the measured points in 2015 and 2014 demarcate a year that falls slightly later in calendar time than 2014 itself, and the rate of vaping initiation was increasing over time). But we are still talking about well over 100,000 new ex-smoker vapers. Probably closer to 200,000. So this would mean that about 90% of new ex-smoker vapers either would have quit smoking that year even without vaping, had quit tobacco entirely and only later took up vaping, or are not “real quitters” (i.e., they were destined to start smoking again before they would “count” as having quit, which is not a well-defined definition, but the authors seem to use one year as the cutoff). This seems rather implausible, to say the least.

This is an extraordinary claim on its face given what we know about the advantages of quitting by switching, and more so given that more detailed surveys of vapers (example) show almost all respondents believe they would still be smoking had they not found e-cigarettes. It must be noted that most respondents to those surveys are self-selected vaping enthusiasts who differ from the average new vaper, and that a few of them might be wrong and would have quit anyway. But the disconnect is still far too great for West’s weak analysis (really, assumptions) to come close to explaining.

I never bothered to comment on the paper at the time it came out because the methodology was so weak and the result so implausible that I did not think anyone would take it seriously. But the tobacco wars seldom meet a bit of junk science they do not like. In this case, Clive Bates asked me to examine the claim (and contributed some suggestions on this analysis and post) because some tobacco controllers have taken to saying “e-cigarettes only caused only 16,000 people to quit smoking in England! so we should just prohibit people from using them!”

The proper responses to this absurd assessment and demand, in order of importance, are:

  1. It would not matter if they caused no one to quit smoking. It is a violation of the most fundamental human rights to use police powers to prohibit people from vaping if they want to. People have a right to decide what to do with their bodies. Moreover, in this particular case, you cannot even make the usual drug war claims that users of the product are driven out of their minds and do not understand the risks and the horrible path they will be drawn down: Vaping is approximately harmless, most people overestimate the risks, and it leads to no horrible path. It is outlandish — frankly, evil — to presume unto oneself the authority to deny people this choice.
  2. But even if you do not care about human rights and only care about health outcomes or whatever “public health” people claim to care about, causing a “mere” 16,000 English smokers to quit, annually,) is quite the accomplishment. There is no plausible basis for claiming any recent tobacco control policy has done as much. Since there is no measurable downside, this is still a positive. Also, the rate of switching probably could be increased further with sensible policies and truthful communication of relative risks.
  3. The rough back-of-the-envelope approach used in the paper could never provide a precise point estimate even if the inputs were optimally chosen. But the inputs were not well chosen. The analysis included errors that led to a clear underestimate. When a back-of-the-envelope result contradicts a reality check, we should assume that reality got it right.

So I am taking up here what is really a tertiary point.

Back of the envelope calculations

West et al. carried out a back-of-the-envelope calculation, a simple calculation based on convenient approximations that is intended to produce a quick rough estimate. It happens to have glaring errors, but I will come back to those. Crude back-of-the-envelope calculations have real value policy analysis. I taught students this for years. In my experience, when there is a “debate” about the comparative costs and benefits of a policy proposal, at least half the time a quick simple calculation show that one is greater than the other by an order of magnitude. The simple estimate can illustrate that the debate is purely a result of hidden agendas or profound ignorance, and also eliminate the waste of unnecessary efforts to make precise calculations.

When doing such an analysis, it is ideal if you get the same result even if you make every possible error as “conservative” as is plausible (i.e., in the direction that favors the losing side of the comparison). West’s analysis would thus be useful if it were presented as follows: “Some people suggest that the health cost from vaping experienced by new vapers outweighs the reduction in the health cost from smoking cessation that vaping causes. Even if we assume that vaping is 3% as harmful as smoking, the total health risk of additional vapers (the annual increase) would be the order of equivalent of the risk for about 5000 smokers. Our extremely conservative calculation yields in the order of 20,000 smokers quitting as a result of vaping. So even with extreme assumptions, the net health effect is clearly positive.”

But the authors did not claim to be offering an extremely conservative underestimate for purposes of doing such a calculation. They implicitly claimed to be providing a viable point estimate. And that requires a more robust analysis rather than rough-cuts, and best point estimates rather than worst-case scenarios. It also requires a reality check about what would have to be true if the ultimate estimate were true, namely that almost everyone who switched from smoking to vaping did not stop smoking because of vaping.

West’s estimation based on self-identified quit attempts

The crux of their calculation is the following: Their surveys estimate that 900,000 smokers self-identify as having attempted to quit smoking using e-cigarettes (please read this and similar statistics with an implicit “in this population, during this period” and I will stop interjecting it). They then assume that 2.5% of them actually did quit smoking because of e-cigarettes.

Where does the 2.5% come from? It is cited to, and seems to be based mainly on, the results of the clinical trials where some smokers were assigned to try a particular regimen of e-cigarettes; the 2.5% is an estimate of the rate at which they quit smoking above those assigned to a different protocol.

Before addressing the problems with using trial results, the second paper they cite as a basis for the 2.5% figure is one by their research group. How they got from that paper’s results to 2.5% is unfathomable. That paper was a retrospective study of people who had tried to quit smoking using various methods and found that those reporting using e-cigarettes were successful about 20% of the time, which beat out the two alternatives (unaided and NRT) by 5 and 10 percentage points. If they had used ~20% instead of ~2% their final result would have been up in the range that would have passed the reality check. So what were they thinking?

I cannot be certain, but am pretty sure. It appears they only looked at differences in cessation rates and not the absolute rates, so the 5 or 10 rather than the full 20. Several things they wrote make it clear this is how they were thinking. This is one of several fatal flaws in their analysis. There are two main pathways via which e-cigarettes can cause someone to quit smoking (which means it would not have happened without them): E-cigarette use can cause a quit attempt to be successful when that same quit attempt would not have otherwise been successful, or it can cause a quit attempt (ultimately successful) that would not have otherwise happened. West et al. are pretty clearly assuming that the second of these never happens. I am guessing that the authors did not even understand they were making a huge — and clearly incorrect — assumption here.

Causing quit attempts is a large portion of cases where e-cigarettes caused smoking cessation. Indeed in my CASAA survey of vapers (not representative of all vapers, but a starting point), 11% of the respondents were “accidental quitters”, smokers who were not even actively pursuing smoking cessation, but who tried e-cigarettes and were so enamoured that they switched anyway. Add to these the smokers who had vague intentions of quitting but only made a concerted effort thanks to e-cigarettes and probably about half of all quit attempts using e-cigarettes do not replace a quit attempt using another method. So if half the 900,000 made the quit attempt because of e-cigarettes and 20% succeeded, we have, right there, a number that is consistent with the reality check I proposed.

Of course they did not use that 20%, and it does seem too high. What they did was assume that 5% would have succeeded in an unaided quit attempt without e-cigarettes — and all the same people would have made that attempt — and so 7.5% (5%+2.5%) actually succeeded when using e-cigarettes. But if half never would have made that attempt then a full 7.5% of them should be counted as being caused to quit by e-cigarettes, which more than doubles the final result (“more than” because their final subtraction, below, would not double but should actually be reduced).

As for why they did not use that 20%, I suspect (though they do not say) that when looking at the numbers from that paper, West et al. focused not only on the differences (the error I just discussed) but on the “adjusted” rates of how much more effective e-cigarettes were than the other methods, which were considerably lower than the numbers I quoted from the paper above. This too is an error. Public health researchers think of “adjusting” (attempting to control for confounding) as something you just do, a magical ritual that always makes your result better. This perception is false for many reasons, but a particularly glaring one in this case: The adjusted number is basically the measure of how helpful e-cigarettes would have been, on average, if those who tried to switch to them had the same demographics as smokers using other cessation methods. Smokers who try to switch to e-cigarettes have demographics that predict they are more likely to succeed in switching than the average smoker. Of course they do! People know themselves (a fact that seems to elude public health researchers). The ones who tried switching were who they were; they were not a random cross-section of smokers. So it seems that West et al. effectively said “pretend that instead of self-selecting for greater average success, those who tried to switch were chosen at random, and instead of using the success rate for the people who actually made that choice, we will use instead the number that would have been true if they were random.”

[Caveat: The attempt to control for confounding could also correct for the switchers having characteristics that make them more likely to succeed in quitting no matter what method they tried. So some of the “adjustment” is valid — but only for those who would have tried anyway — but much of it is not.]

Clinical trials

That last point relates closely to the other “evidence” that was cited as a basis for that 2.5% figure, and appears to have dominated it: the clinical trials.

Clinical trials of smoking cessation are useless for measuring real-world effects of particular strategies when they are chosen by free-living people. At best they measure the effects of clinical interventions. But in this case, these rigid protocols are not even a good measure of the effect of real-world clinical interventions in which smoking cessation counselors try to most effectively promote e-cigarettes by meeting people where they are and making adjustments for each individual. I have previously discussed this extensively.

A common criticism that the trials directed subjects toward relatively low-quality e-cigarettes. That is one problem. More important, the trials and did not mimic the social support that would come from, say, a friend who quit smoking using e-cigarettes and is offering advice and guidance. The inflexibility of trials does not resemble the real-world process of trying, learning, improving, asking, and optimizing that real-world decision entail. Clinical trials are designed to measure biological effects (and even then they have problems), not complex consumer choices.

But it is actually even worse than that. A common failing in epidemiology is not having a clue about what survey respondents really mean when they answer questions. There is no validation step in surveys where pilot subjects are given an open-ended debriefing of how they interpreted a question and what they really meant by their answer. (I always do that with my surveys, but I am rather unusual.) So consider what a negative response to “tried to quit smoking with e-cigarettes” really means. If a friend shoved an e-cigarette into a smoker’s hand and said “you should try this”, but she refused to even try it, she would undoubtedly not say she tried to quit smoking with e-cigarettes. But in a clinical trial, if that were her assignment, she would be counted among those who used e-cigarettes to try quitting, thus pulling down the success rate.

If she tried the e-cigarette that was thrust at her, but did not find it promising, chances are that in a survey she would probably not say she tried quitting using e-cigarettes. (She might, but given the lack of any reporting about piloting and validation of these survey instruments, we can only guess how likely that is.) If she passed that first hurdle, of not rejecting e-cigarettes straightaway, but used them sometimes for a few days or weeks, she might or might not say she tried quitting using e-cigarettes. But if she actually quit using e-cigarettes, she would undoubtedly count herself among those who tried to quit using e-cigarettes. I trust you see the problem.

It is the same problem that is common in epidemiology when you read, say, that 20% of the people who got a particular infection died from it. This usually means that 20% of the people who got sick enough from it to present for medical care and get diagnosed died, but countless others had mild or even asymptomatic infections. Everyone in the numerator (died in this case, quit in the case of e-cigarettes) is counted but an unknown and probably very large portion of those in the denominator (got the infection, were encouraged to try an e-cigarette) are not. Clinical trial results are (at best) analogous to the percentage you would get if did antibody tests in the population to really identify who got the infection. This turns out to be the right way to measure the percentage of infected who die. But then if you the applied that percentage to the portion who presented for medical treatment, you would be underestimating the number of them who would die. That is basically what West et al. did. Their 900,000 are those for whom e-cigarettes seemed promising enough to be worth seriously trying as an alternative, but they applied a rate of success that was (again, at best) a measure of the effect on everyone, including those who did not consider them promising enough to try.

This would be a fatal flaw in West’s approach even if the trials represented optimal e-cigarette interventions, providing many options among optimal products, and the hand-holding that would be offered by a knowledgeable friend, vape shop, or a genuine smoking cessation counseling efforts. They did not, and so underestimated even what they might have been able to measure.

Final step

As a final step, West et al’s approach debits e-cigarettes with an estimated decrease in the use of other smoking cessation methods caused by those who tried e-cigarettes instead. These are the methods that are believed to further increase the cessation rate above the unaided quitting that West debited across the board (the major error discussed above). We can set aside deeper points about whether estimates of the effects of these methods, created almost entirely by people whose careers are devoted to encouraging these methods, are worth anything. West et al. assume that those methods would have had average effectiveness had they been tried by those who instead chose vaping. They also still assume that every switching attempt would have been replaced by another quit attempt in the absence of e-cigarettes, as discussed above. This lowers their estimate from 22,000 to the 16,000. But a large portion of smokers who quit using e-cigarettes do so after trying many or all of those other methods, often repeatedly. Assuming those methods would have often miraculously been successful if tried one more time makes little sense.

As a related point that further illustrates the problems with their previous steps, recall that the 2.5% is their smoking cessation rate in excess of that of those who tried unaided quitting or some equivalently effective protocol. But it seems very likely that the average smoker who tries to switch to e-cigarettes has already had worse success with that other protocol than has the average volunteer for a cessation trial. This is the “I tried everything else, but then I discovered vaping” story. I am aware of no good estimate for this disparity, but if the average smoker who tried to switch were merely 1 percentage point less likely than average to succeed with the other protocol (e.g., because she already knew that it did not work for her), then the multiplier should have been 3.5% (7.5%-4% rather than 7.5%-5%). This is trivial compared to the error of using the incredibly low estimated success rate suggested by the trials in the first place, of course, but that little difference alone would have increased West’s estimate by 40%. This illustrates just how unstable and dependent on hidden assumptions that estimate is, even apart from the major errors.

Returning to the reality check

But lest we get lost in the details, the crux is still that West implicitly concluded that the vast majority of those who switched from smoking to vaping did not quit smoking because of vaping. The authors never reflect on how that could possibly be the case. They do, however, offer an alternative analysis, in what are effectively the footnotes, that gives the illusion of responding to this problem without actually doing so. They write:

The figure of approximately 16,000–22,000 is much lower than the population estimates of e-cigarette users who have stopped smoking (approximately 560,000 in England at the last count, according to the Smoking Toolkit Study). However, the reason for this can be understood from the following….

What follows is even weirder than their main analysis.

West’s “alternative” analysis

They actually start with that 560,000. That is inexplicable since it is possible to estimate the year-over-year change in 2014, as I did, rather than working with the cumulative figure. The 560,000 turns out to be well under half what you get if you add the current vapers and ex-vapers among ex-smokers from the statistics I cite above. So their number already incorporates some unexplained discounting from what appears to be the cumulative number. But since I am baffled by this disconnect, I will just leave this sitting here and proceed to look at what they did with that number.

As far as I can understand from their rather confusing description of their methods here, their first step is to eliminate those who were already vaping by 2014, and thus did not switch in 2014. That makes sense, though it would have been easier to just start with that. When they do this, they leave themselves with 308,000. So they started with something much lower than what you get from the statistics I looked at, and ended up with something that is half-again higher than the rough estimate from those statistics. Um, ok — just going to leave that here too. But the higher starting figure makes it even more difficult for them to explain away the reality check.

Their next step is the only one that seems valid. They estimate that 9% of ex-smokers who became vapers did so sometime after they had already completely quitting smoking, and subtract them. This is plausible. An ex-smoker who is dedicated to never smoking again still might see the appeal of consuming nicotine in a low-risk and smoking-like manner again. (Note that this should be counted as yet another benefit of e-cigarettes, giving those individuals a choice that makes them better off, even though the “public health” types would count it as a cost because they are not being proper suffering abstinents. It might even stop them from returning to smoking.)

Of course, this only makes a small dent. So where does everyone else go? Most of them go here:

It has to be assumed on the basis of the evidence [6, 7] that only a third of e-cigarette users who stopped smoking would not have succeeded had they used no cessation aid

…and here:

It is assumed that, as with other smoking cessation aids, 70% of those recent ex-smokers who use e-cigarettes will relapse to smoking in the long term [11]

This takes them down to 28,000.

Taking the latter 70% first, any limitations in relying on a single source for this estimate (another West paper) are overshadowed by: (a) There is no reason to assume switching to vaping will work as poorly, by this measure, as the over-promising and under-delivering “approved” aids that fail because they do not actually change people’s preferences as promised. Indeed, there is overwhelming evidence to the contrary. (b) Many of those in the population defined by “started vaping that year and were an ex-smoker as of the end of the year” have already experienced a lot of the “long term”. That is, if we simplify to the year being exactly calendar 2014, some people joined that population in December, and thus a (correct, undoubtedly much lower than 70%) estimate of the discounting between “smoking abstinent for a week or two thanks to e-cigarettes” and “abstinent at a year” (a typical measure for “really quitting” as noted above) is appropriate. But some joined the population in January and are already nearly at the long term. On average, they will have been ex-smokers for about six months, and being abstinent at six months is much better predictor of the long run than the statistic they used (which, again, is wrong to apply to vaping). Combining (a) and (b) makes it clear that this is a terrible estimate.

As for the first of those major reductions, references 6 and 7 do not actually provide any reason that “only a third…has to be assumed”. Those are the same references they cite for the 2.5% above. So this is just a reprise of the 2.5% claim, and suffers from the same errors I cited above.

You see what they did there, right? The reality check I offered is “your results imply that 90% of new ex-smoker vapers did not quit because of vaping; can you explain that?” Either anticipating this damning criticism or by accident, they provided their answer: “Yes, we assume — based on nothing that remotely supports the assumption — that 70% of them would have quit anyway (and 9% were already ex-smokers, and some other bits).”

This step basically sneaks in the same fatal assumptions from their original calculation but is presented as if it offers an independent triangulation that responds to the criticism that their original calculation has implausible implications. Here is a pretty good analogy: Someone measures a length with a ruler that is calibrated wrong by a factor of ten. They are confronted with the fact that a quick glance shows that their result is obviously wrong. So they make a copy of their ruler and “validate” their results with an “alternative” estimation method.

Oh, and at the end of this they knock off another 6000 based using what appears to be double counting, but at this point who really cares?


Their first version of the estimate is driven mainly by their assumption that attempting to switch to vaping is close to useless for helping someone quit smoking compared to unaided quitting, and also that all those who attempted to switch would have tried unaided quitting in the absence of e-cigarettes. There are also other errors. Their second version is based on the “reasoning” that because we have assumed that attempting to switch to vaping is close to useless, it must be that most of those who we have observed actually did switch to vaping must have not really quit smoking because of vaping — and so (surprise!) approximately the same low estimate.

So nowhere do they actually ever address the reality check question:

Seriously? You are claiming that almost everyone who ventured into one of those weird vape shops, who spent hundreds of pounds on e-cigarettes, who endured the learning curve for vaping, who ignored the social pressure to just quit entirely, and who decided to keep putting up with the limitations and scorn they faced as a smoker and would still face as a vaper, that almost all of them were someone who was going to just quit anyway? You are really claiming that almost all of them said, “You know, I think I will just quit buying fags this week — oh, wait, you mean I instead could go to the trouble to learn a new way of quasi-smoking and spend a bunch of money on new stuff and keep doing what I am doing it even though I am really over it and ready to just drop it? Where do I sign up?” Seriously?

Reality. Check. (And mate.)

For what it is worth, if you asked me to do a back-of-the-envelope estimate for this, I would probably go with something like the following:

There were about 200,000 new vaping ex-smokers. It seems conservative to assume that about half of them quit smoking due to vaping. 100,000. Done.

That is obviously very rough, and the key step is just an educated guess. But an expert educated guess is often far better than fake precision based on obviously absurd numbers that just happen to have appeared in a journal (as a measure of something — in this case, not even the same thing). In this case, it has far better face validity than West et al.’s tortured machinations.

[Update, 4 Oct:

Since this was posted, two other flaws in the West analysis have become apparent. The first come from my Daily Vaper article which was based on the lessons from this, a terse presentation in the many ways in which vaping causes smoking cessation. That is worth reading in its own right if you are interested in this stuff. What occurred to me when writing that was that I was too charitable in just saying “ok fine” about the dropping of all ex-smokers who had become vapers after already quitting smoking. For some of them, taking up vaping caused them to not return to smoking. So a few of them should actually be counted. (One might make the semantic argument that the claim is about how many were caused to quit, not how many were caused to be (i.e., become or remain) ex-smokers, so they really do not count. But it is still worth mentioning.)

The second flaw came up in the comments, thanks to Geoff Vann. He figured out an internal inconsistency in the West approach. Basically, if their base methodology (assumptions, etc.) is applied to their step that removed the established vaping ex-smokers from that 560,000, it turns out that you cannot remove nearly as many as they do remove. You can see the details in the comment thread. Internal inconsistencies are always interesting because even if someone denies the criticisms from external knowledge and analysis — which are really far more damning — they cannot complain about being held to their own rules!



What is Tobacco Harm Reduction?

by Carl V Phillips

In response to a couple of recent requests and my schooling of FDA in a recent Twitter thread, it seems time for me to again write a primer on the meaning of tobacco harm reduction (THR). Rather than return to a previous version I have written, I am doing this from scratch. This seems best given the evolution of my thinking and changing circumstances.

The key phrase, of course, is “harm reduction”, with “tobacco” denoting the particular area it is applied to. This is important: THR is not a concept that stands apart from HR. It means “the principles of harm reduction, applied to the use of tobacco and nicotine products, and other products that tend to get lumped in with them” (see my previous post for an explanation of that last bit and some other useful background about the current politics). Indeed, when my university research and education group was trying to decide on a name and URL in 2005, it was far from obvious that this was the right term, and we considered others (e.g., “nicotine harm reduction”). While the first prominent use of “THR” appeared in 2001, it was far from established as a common term. (There is probably some endogeneity here, of course — if we had chosen a different term, that might have ascended instead.) In any case, the key to answering “what is THR” is asking “what is HR” rather than thinking it is something different. Continue reading

FDA’s proposed smokeless tobacco nitrosamine regulation: innumeracy and junk science (postscript)

by Carl V Phillips

For completion of this series (with this footnote), the following is what I submitted to FDA. My comment does not yet(?) appear on the public docket as of this writing. But I got a confirmation (conf code 1k1-8xfb-dhwh if you want to search for it later). It has a bit of extra content beyond what I already presented.

I know a few of you urged me to rewrite my analysis in a more, er, formal manner. While I understand their reasoning for doing so, I chose not to take time from my other obligations to do that.  I honestly think it does not make any difference. I am reasonably confident that FDA “fulfills” their obligation to consider all the comments by having a low-level staffer read on each one, without reporting anything of substance up the chain, so they can check a box that says they read and considered each of them. If this proposed rule is not withdrawn for political reasons or as a result of the various procedural problems, then whoever is pursuing a lawsuit to strike it down can enjoy my essays as they mine them for substance. (Shameless plug: Of course, if they would like to hire me to formalize anything, I am quite good at that.) Besides, I might manage to embarrass that staffer who reads it into going into a more honorable line of work.

The content follows:

The primary purpose of this comment is to demonstrate that FDA’s assessment of the supposed benefits of this rule (115 fatal cancers averted per year) is fatally flawed for approximately half a dozen reasons, each one of which is sufficient to invalidate it. I have published the analysis in the following three blog posts, which I incorporate into this comment by reference:

(I have also attached printouts of them for completeness, but I would suggest reading the online versions with live links.)

The implication of that analysis is that there is no scientific basis for claiming that any disease incidents will be prevented by this rule, let alone the specific quantity claimed by FDA as the rule’s justification. Based on this alone, the rule should be withdrawn.

This analysis should not be interpreted as implying that if, counterfactually, the 115 figure were actually science-based, then it would justify the rule. There is no analysis of the negative health impacts from driving smokeless tobacco users to smoking when their preferred products are banned. The absence of this analysis is another sufficient reason for withdrawal of the rule. Moreover, even there were a legitimate reason to believe there were health benefits, and even if there were no health costs, justifying this rule would require a cost-benefit analysis that considered the welfare loss to consumers and other costs. The absence of this analysis is yet another sufficient reason for withdrawal of the rule.

Finally, given the lack of cost-benefit analysis of any sort, there obviously is no justification for choosing the particular quantitative standard in the proposed rule (even apart from the fact that it appears to be 1/4 of the intended quantity). This makes the choice of the standard arbitrary and capricious. It appears it must have been chosen with an eye to which particular winners and losers it would create, as I presented in this footnote to the previous analysis here (incorporated into this comment by reference and also attached):

While not central to the main point of this comment, this is a further problem with the legitimacy of this rulemaking.

Sunday Science Lesson: toxicology and “the chains” in American football

by Carl V Phillips

Those of you who read my series on fatal flaws in FDA’s proposed rule about limiting the nitrosamine NNN in smokeless tobacco (and presumably anyone reading this quick little tangent read those important and carefully crafted posts) might have tripped up over an oddity from the third post in the series. I quoted this from FDA’s proposed rule about how their key number, used for estimating the risk of cancer caused by some quantity of NNN, was calculated:

As defined by the EPA guidelines, the cancer slope factor (CSF) is “an upper bound (approximating a 95percent [sic] confidence limit) on the increased cancer risk from a lifetime exposure to an agent.

I noted (you can read the original for more detail) this means that when FDA estimated the dose-response for NNN, they did not use the point estimate generated by the underlying study, but inflated it by an arbitrary fudge factor (which is not actually an upper bound, as claimed, but is still much higher than the point estimate). This is obviously an error. There are arguments that using such inflation factors when setting standards (e.g., how much of a potentially toxic substance a facility is allowed to emit) are appropriate, to err on the side of caution. But an inflation factor, creating a number higher than what the data suggests is the best estimate, obviously does not give us the best estimate for the actual dose-response. I also observed that the model used to translate the data from rodent megadose studies into an estimate for the effects of realistic human exposures was fraught with huge, undoubtedly incorrect, assumptions that made the final result nearly worthless, even apart from this.

So you might be asking why such lousy models and arbitrary fudge factor rules even exist. They are clearly grossly inappropriate for what FDA was doing — no ambiguity there. But presumably they serve some purpose, or they would not exist.

I found myself flashing back to when I was ten or twelve years old and a fan of American football. There is a process in that game that occasionally occurs, in which a very close judgment has to be made about whether the offensive team advanced the ball the required ten yards to get a “first down”. (That is all you need to know. Obviously most American readers will know more details. Also, I realize I do not even know whether what I describe is still done at professional levels, given that it could be replaced by imaging and computers, but is at least still presumably done in high school games.) At that point, two officials run in from the sidelines carrying “the chains”, a pair of posts connected by a ten-yard chain. One of them places one end at the starting point for the required ten yards of progress. The other then pulls the chain taut and observes whether the current placement of the ball is a little past his post or a little short of it.

You might wonder why. It is no easier to identify the exact starting point, and measure from it, versus just identifying the exact ending point needed for the first down. Since the play will usually have moved the ball sideways, it is not as if someone can just remember the exact blade of grass the ball was on at the start; it is necessary to eyeball the corresponding point on an imaginary line across the field. Also, the ball is not a single point. And the current placement of the ball after the last play was somewhat arbitrary too. So why not just eyeball the spot that is ten yards further (using as a guide, in either case, the markings of yards that are painted on the field, though not necessarily exactly on the line the ball is on)? Further contemplation reveals that the answer lies in game theory.

If one official has to eyeball a target point near where the ball is sitting on the ground, either just ahead of it or just behind it, he is full-on deciding whether to award the first down. That creates a huge amount of pressure and also creates an enormous potential for exercise of any bias that official is feeling for whatever reason. It could be nefarious bias. But it could be an innocent moral struggle such as, “I denied them the last close call that could have gone either way, so I owe them this one that could go either way.” Or it could be an attempt at beneficence in violation of the procedural rules like, “where the ball is sitting is short of my estimated spot, but my colleague who decided where to place the ball after the last play really should have put it further forward and I can fix his error.” But when the first official eyeballs his spot ten yards back, he cannot be sure whether an inch one way or another even matters and can just do it mechanically without all those inconvenient thoughts cropping up. Of course, the colleague could exercise nefarious bias when he chooses where to pick his spot; an inch forward or an inch back are both plausible estimates of the starting point. But the complicated mechanism reduces the temptation to exercise such bias somewhat, and strongly reduces effects of the “I owe them this one” or “I can fix it” factor.

Regulators setting an allowable level of potentially harmful effluent, contaminant, or ingredient also have to draw a line. The right place to draw the line is hugely uncertain, both in terms of what levels are actually harmful and the political decision about what level of harm should be allowed (this contrasts with the American football analogy). Getting it right is pretty much impossible. Still, issues like those facing the football referee can be avoided. If regulators are allowed to draw the line when looking at exactly where the ball is sitting, as it were, they are deciding such things as “this product is fine, but its leading competitor is banned,” or “the facilities operated by our boss’s biggest campaign donor all just squeak in under the line.” That would not be good.

So instead they create a rule that says “make an estimate based on this crazy dubious model and then inflate the result by this predefined arbitrary factor, and draw a line based on that.” This does not eliminate directional bias (intentionally trying to be more or less stringent) in defining the models or inflation factors, or in interpreting the underlying data. But it does help avoid someone saying “hey, if I just bump this limit down from 7.5 to 6.8, I can really stick it to that company that I have always hated.” Since the proper line is enormously uncertain, that would be easy to do.

For the same reason, it does not matter so much that many of the steps in the defined process are just silly. You can still get outcomes where experts largely agree that the standard spit out by the sketchy complicated (but well-defined) process is way too low or too high. But even then, at least it offers a starting point for debate that was not just someone capriciously making up a number. Most of the time, the genuine uncertainty is sufficient that the result of the process might actually be the optimal number.

Circling back to the FDA, it is worth noting that their proposed rule in no way resembles this clumsy, but arguably justifiable, process. They were not following a rule that spit out a quantitative standard that, while probably non-optimal, was at least non-arbitrary. No, they misused elements of this process to (inaccurately!) estimate the effects of their proposed standard. But their standard itself was still an arbitrary and capricious number that was pulled out of the air. This was done with the clear view of exactly which products would make the cut, which would have to be re-engineered, and which would be banned. This is exactly the bright-line decision about who wins and who loses that those football and normal regulatory rules are designed to prevent.

Well, I should say that FDA thought they had a clear view of exactly which products would be affected. As noted in the first post in my series, they actually made a factor-of-four arithmetic error that means far more products would be affected and far more banned than they intended. But the point is still that they were misusing the trappings of a process that is designed to avoid exactly such picking-and-choosing, while still trying to engage in arbitrary picking-and-choosing.

FDA’s proposed smokeless tobacco nitrosamine regulation: innumeracy and junk science (part 3)

by Carl V Phillips

In Part 1 of this series, I described FDA’s proposed rule that would require smokeless tobacco products (ST) to have no more than 1 ppm of NNN (a tobacco-specific nitrosamine or TSNA) dry weight. I discussed some of the political and policy implications of this, and reasons why the rule will probably not survive. I also noted that almost no current products meet that standard, and that American-style ST probably cannot meet it. Despite the proposed rule probably being mooted, I noted there is still value in examining just how bad the ostensibly scientific analysis behind it is. In Part 2, I noted that the FDA’s estimate the standard would save 115 lives per year is premised on their estimate for the risk of oral cancer caused by ST use. But, in fact, the evidence does not support the claim that ST use causes any oral cancer risk. I then focused on why, even if one believes there is some such risk, the method used to calculate FDA’s quantitative estimate is utter junk science.

So far, none of that has addressed NNN itself, and how meeting the NNN standard would affect the carcinogenicity of ST, if it is carcinogenic. It turns out that this part of FDA’s analysis is even worse than that discussed in Part 2.

Estimating the health effect of a quantitative standard for an exposure is a matter of estimating the relevant range of the dose-response curve, along with knowing how much people’s dosage would change. That is, you need estimates like, “N people use product X, which has 5 ppm NNN, which causes Y risk per person, versus the Z risk per person from 1 ppm, so multiply N by (Y-Z)….” With such numbers we could estimate the effect of an adjustment in the NNN concentration.

In reality, it is not that simple. In Part 1, I pointed out that most products could not just have their NNN concentration “adjusted” like that, and that they would have to be fundamentally changed, effectively eliminated and replaced in the market (perhaps if FDA had not made the arithmetic error noted in Part 1, that would only be “some” rather than “most”). Many consumers of the eliminated or fundamentally altered products would not be happy with the new option. Some would just quit, eliminating the Y risk as well as any other risk from using the product (setting aside that as far as we know are both nil; remember, we are down that rabbit hole here). Some would switch to smoking, creating a risk that is orders of magnitude greater than anything discussed so far, making all of the details moot: the net health impact would be an increase in risk.

But that is the simple practical criticism of this madness, one that hinges on questions of consumer behavior (an area where FDA’s analyses are consistently absurd, but they always manage to trick their audience into accepting their assertions). That is not what I am doing here, though I suppose I just did it in one paragraph. My goal is to point out that the FDA core claims about benefits here are based on junk science, setting aside the enormous costs that would dwarf them anyway. So returning to my point here, what basis do we have for estimating Y, Z, and other points along the dose-response curve?


Absolutely nothing.

Indeed, we do not even know that NNN in ST affects cancer risk at all.

As I mentioned in Part 1, if you are only familiar with the rhetoric about this topic, and not the science, you would be forgiven for not knowing that the assertion there is any such effect is based only on heroic extrapolations and assumptions. You might further surmise that since FDA claims that this reduction would reduce cancer deaths by 115 per year (note: not “about 100”, but as precise as 115), there is not only evidence that NNN in ST causes cancer, but there is also so much evidence that we can precisely estimate a dose-response.

What we know about NNN and cancer is based on biological theory (we have evidence that some nitrosamines cause cancer in humans), and the effects of exposing rats, hamsters, and other critters — species whose propensity to get cancer from an exposure is often radically different from ours, and even from one another’s — to megadoses of NNN. Those toxicology studies do suggest that NNN exposure probably causes cancer in humans, in a big enough dose, and under the right circumstances. Of course, that is also true for almost everything. When IARC, the cancer research arm of the WHO, made their blatantly-political decision to declare NNN a known human carcinogen, they did so in violation of their own rule that there has to be some actual human exposure evidence before making such a declaration. There is not. But even if someone believes that NNN in ST does cause cancer in humans, the rodent megadose data obviously does not tell us anything about the effect of the reduction in dosage imposed by this rule.

Stepping back, it is useful to understand the potential legitimate use of toxicology studies like those. They — or, better, in vitro studies of cells that are actually similar to the human body and do not require sociopathic torturing of innocent animals — are useful for giving us a heads-up that a chemical or combination of chemicals might be carcinogenic or poisonous. This might be a good reason to undertake the more difficult search for epidemiologic evidence that the real-world version of the exposure is causing the bad outcome. Or at least a reason to pursue the in-between step of looking for biological evidence of harm from the real-world exposure in humans. It might even be sufficiently compelling to prohibit introducing a novel exposure, acting before we can even get any human data.

If toxicology studies of a chemical all fail to produce a bad outcome, this strongly suggests that the exposure will not cause the harm, so long as that failure is consistently confirmed using various toxicology methods (claims that a single toxicology study shows that an exposure is harmless, which are currently appearing in the pro-vaping rhetoric, are misguided). But getting a bad outcome in a particular toxicology study does not mean that the real-world exposure actually does cause harm. The pattern in the toxicology has to be far better than what we have for NNN before such a conclusion is justified, including getting the effect at reasonably realistic exposure levels and fairly consistently across a variety of methods.

Consider an analogy: We are interested in knowing whether there is life on other planets, but actually going there to take a look is rather difficult. We have a much cheaper tool in our toolbox, however, which is to use modern telescopes to see if light scatter suggests a water-rich atmosphere. Of course, that is far short of observing life; it would be insane to say “we saw evidence of water, so there must be life there!” But since the versions of life that we understand require there to be enough water, seeing that creates the intriguing possibility of life. Failing to find water tends to rule out the possibility of life as we know it.

Another legitimate use of toxicology is to tell us why an exposure is causing harm. Of course, this should mean there is evidence of harm, not just some wild assumption that there is harm. Continuing the analogy, pretend that someone looked at the light scatter around Mars and claimed they saw enough water to support life: “Aha, this shows that the canal-building civilization is water-based life as we know it.” Um, but you do know that early 20th century telescopes debunked that 19th century canals myth, right? Also we have had numerous close observations of the planet and little labs driving around on the surface. Your hint about the possibility of life is utterly pointless given that we have much better information about the reality.

I have often described the TSNA toxicology research, which inexplicably continues to this day, as an attempt to identify which chemical pathways cause a cancer outcome that does not actually occur. As with Mars not having canals, we know that ST use does not cause a measurable risk for cancer, and therefore the NNN and other TSNAs in ST are not causing a measurable risk (unless we think that other aspects of the ST exposure prevent exactly as much cancer as the TSNAs cause, something that no one is seriously proposing). One possibility that has been seriously proposed — e.g., by Brad Rodu, whose work I cited in Parts 1 and 2 — is that something else in ST, perhaps antioxidants, directly negates whatever cancer-causing effect the TSNAs might have if we were exposed to them alone (which does not happen at a level beyond a few stray molecules). Indeed, when the exposure is tobacco extract, those rodent studies fail to show the carcinogenic effect from NNN, or anything else in ST for that matter, a fact that is conveniently glossed over.

So how did we end up with the “fact” (which I suppose should be called the fake news in current parlance) that NNN and other TSNAs in ST cause cancer? It basically comes down to circular reasoning, or perhaps it is figure-eight reasoning since there are two circles as well as a few other fallacies. It goes something like this (and I am really not exaggerating):

“Given that we have only seen an effect in megadose rat studies, how can we really be sure that TSNAs at the relevant dosage and in a realistic exposure cause cancer?”

“Because smokeless tobacco causes cancer, and it contains TSNAs.”

“But [even setting aside that we do not know that is true] how could you know it was the TSNAs causing it.”

“Because we know TSNAs cause cancer.”

“Um, isn’t that so transparently circular that even tobacco control’s useful idiots will see right through it?”

“There is more. We know that higher-TSNA products cause more cancer risk.”

“Ah, now that sounds like actual evidence. Please explain.”

“US products have higher TSNA levels than Swedish products, and US studies show a cancer risk while Swedish studies do not.” [Note: see appendix to this dialogue, below.]

“But didn’t you read Part 2 of this series? That contrast does not appear in studies of modern US products, but only from a few studies of an archaic type of product.”

“Yes, exactly. That product was very high in TSNAs, and its cancer effects were off the charts compared to modern products. Case closed.”

“There are no measurements of the TSNA levels of those archaic products. How do you know they had high TSNA levels?”

“Isn’t it obvious? They must have, because they caused cancer and TSNAs cause cancer.”

Loopity loopity loop.

In fairness, there are honest observers, including Brad Rodu, who hypothesize that this is indeed the reason the archaic products apparently caused cancer. But this is just a hypothesis, and it cannot be tested. Indeed, we cannot even replicate the basis for claiming those products caused cancer in the first place. It basically comes down to a single study from the 1970s — not exactly overwhelming evidence.

A bit more useful background: In the 2000s, the anti-ST crusaders in and funded by the US government (CDC and NCI, before FDA joined the game) fought a rearguard action against the evidence that had emerged from Sweden that ST was approximately harmless. Part of this was insisting that the higher levels of TSNAs in US products meant that the Swedish evidence was not informative. It was political bullshit on its face. Still, I wrote an analysis over a decade ago that showed that the ST products that produced those null results in Sweden had about the same TSNA levels as then-current US products. (This was based on limited analytic chemistry from before 2000. There were only a handful of TSNA concentration studies in the public record. But there was enough to show this.) TSNA levels in all styles of ST products were and are decreasing over time. It might have been true that 1990 US products were materially more hazardous than 1990 Swedish products (which showed no measurable risk) because they had higher TSNA levels. But mid-2000s US products had low enough TSNA levels that this would have no longer been true. This leads to the appendix for the dialogue. We could imagine this variation:

“US products have higher TSNA levels than Swedish products, and US studies show a cancer risk while Swedish studies do not. Also there is a time trend, wherein TSNA levels have been dropping in both US and Swedish products, and older studies found elevated cancer risks, while newer ones do not.”

“Part 2 of this series dismisses your first sentence. But the second sentence makes some sense, though it might just be because the older studies used really primitive methodology. Still, you have a prima facie valid point there, unlike all your other complete bullshit. But, hey, doesn’t that also mean you are conceding the fact that modern ST products do not cause any measurable cancer risk, even if older products might have?”

“Er, no. We never said that. We never made any claim about time trends despite it being the most scientifically defensible argument we have. Strike all that from the record.”

Summarizing this, we have only unsupported hypotheses and circular reasoning behind the claim that NNN in ST causes any of the (quite possibly zero) cancers caused by ST. Given this, we obviously know nothing about how much cancer a particular concentration of NNN causes. That is sufficient to show that FDA’s claim cannot possibly be science-based. But I am sure you share my curiosity about how FDA took this complete lack of information and turned it into the conclusion that exactly 115 lives per year would be saved by this regulation.

Here it is (from the proposed regulation):

….increase in oral cancer risk of 116 percent among smokeless tobacco users compared with never users. We then reduce this value by 65 percent based on toxicological evidence relating the estimated average reduction in the dose of NNN to lifetime cancer risk under the proposed standard. The result is a reduction in the estimated relative risk of oral cancer to 1.41 under the proposed product standard. FDA used the following calculation: (1 + (2.16−1) × (1−0.65) = 1.41) for this determination.

Thanks, guys, for showing us how to do that arithmetic so I did not have to find a third grader to ask. The important bit of showing their work, of course, is about justifying the inputs. In the introduction, FDA refers the reader to section IV.C for the basis for the .65 figure. It is really section IV.D, because, hey, just because you spent a million dollars writing a regulation that is potentially devastating for industry and millions of consumers does not mean you should bother to have someone edit it. It turns out the assumption is that the dose-response is linear across all quantities, and under that assumption the effects observed from megadoses in rodents gives a dose-response that translates into .65. The generic problems with this include the fact that the linear (also known as “one hit”) model of carcinogenesis has long-since been dismissed as invalid, the folly of extrapolating orders of magnitude beyond the observed data, and the little matter that rodents are not people.

It gets worse still when you look at the equation that FDA used to calculate the fictitious linear trend. (And I am not referring to the fact that they actually cut-and-pasted the equation in their document as an image from some low-res PDF of someone else’s document. This is not a scientific flaw, of course, but, it does suggest the proposed rule was written by people who have so little education and experience in science that none of them had ever learned how to typeset a simple equation.) The equation builds in the assumption that a very high exposure for a short time (e.g., what the rats experienced) has the same effect as the same total exposure stretched out over many years. This is the linearity assumption taken to the extreme. It not only assumes linearity for each parameter — i.e., increasing years of exposure, increasing quantity per exposure, or increasing number of exposures per day by Y% increases risk by Y% — which is completely unsupportable and almost certainly wrong. It also assumes a multiplicative effect for all interactions, which is also unsupportable and almost certainly wrong. For those who did not follow that, I will explain its major implication: The assumption is that a given lifetime quantity, X, of NNN exposure creates the exact same total cancer risk whether it is consumed all in one day, or one month, or spread out over 70 years. It is the same whether an ongoing exposure takes place all at once each Monday morning or it is spread evenly throughout the week. Moreover, if you increase X by 10% it increases the risk by 10% no matter how the consumption is spread out. On top of all that, if someone’s body mass is 10% lower his risk from X is always increased by 10%. If his mass is 99.963% lower (i.e., he is a hamster and not a human) then the risk is increased exactly 2720-fold.

Such simplifying assumptions about linearity and multiplicativity are not terrible if you are interpolating (i.e., you have data from both sides of the quantity you are assessing and you are trying to fill in the middle) or are extrapolating a little bit beyond the range of your data. But in this case they are extrapolating orders of magnitude beyond the rat data. Weeks of exposure rather than decades, 30 g bodies rather than 75 kg, and crazy large doses. And, of course, there is the little matter of assuming that a different exposure pathway in a different species has the same effect of ST exposure in humans. The huge extrapolation means that the slightest departure of the assumptions from reality (and it is safe to say that the departures are more than slight), means that the final estimate is complete garbage.

It gets worse. The key parameter is what is multiplied by the total lifetime units of exposure in order to estimate risk, which FDA calls the “cancer slope factor” or CSF if you want to search for it in the document. For this, they rely entirely on a 1992 estimate from the California EPA, which itself was based on the results of a 1983 paper that looked at what happens when hamsters were given huge doses of NNN dissolved in their drinking water. Yes, really. FDA’s number ignores the ~99% of the relevant research that has been done in the last three decades, and it was obviously pretty sketchy even in 1992 given that it was based on a study whose real information value (about actual human exposures) was approximately nil. Moreover, there is this:

As defined by the EPA guidelines, the cancer slope factor (CSF) is “an upper bound (approximating a 95percent [sic] confidence limit) on the increased cancer risk from a lifetime exposure to an agent.

So apparently (the methods are reported so poorly that it is hard to be certain) they not only based this key number on evidence — to use the word rather loosely — from a single ancient toxicology study, but they did not even use the actual estimate that was generated from that. Rather, they used a larger number generated via an arbitrary process. The upper bound of a 95% confidence interval is a completely meaningless number in this context. There is an argument (which many would call dubious) that some arbitrary inflation of the point estimate like this should be used in “abundance of caution”-based regulations. (Update: More on this in my follow-up post.) But it is not an estimate of the actual effect. I know this seems like an arcane technical point in the context of everything else, but I cannot stress enough what an enormous failure of legitimate science this is (assuming they did what it sounds like they did). This would mean, for example, if there had been fewer observations collected in that 1983 study, but it had still supported exactly the same point estimate, FDA would be claiming some larger number of lives saved, like 125 per year rather than 115.

When presenting this number, and practically admitting it is junk (despite using it to calculate their estimate of 115 to three significant figures), FDA writes:

FDA welcomes public comment on whether there is a more robust CSF available for NNN.

This is a classic bit of anti-scientific rhetorical strategy. Anyone answering that question as phrased is implicitly conceding that the estimate FDA used has some validity. Respondents are effectively conceding that if they cannot make a compelling case that some other number is better, then FDA’s number was appropriate to use. When a question’s phrasing builds in invalid assumptions, or when it assumes away the really important questions (“Have you stopped beating your wife?”), the response needs to unask it, not answer it. So here is my unasking answer to their welcoming of public comment:

The number FDA used has absolutely no hint of validity. However, there is no robust, or even remotely plausible basis for generating this “CSF”; any number used here might as well be made-up from thin air. That said, given that ST does not seem to cause oral cancer in the first place, the best default estimate is zero. There is no legitimate basis for concluding an estimate of zero is wrong. Oh, and also if you are going to use a junk-science extrapolation from rodent studies, you should at least calculate this number based on all such studies to date. If you are not capable of doing that analysis, and instead are limited to using the approach any middle-school student would use if confronted with this question (run a search and blindly transcribe whatever someone once wrote), then you have no business regulating anything!

I’ll take a deep breath here, because that is still not all. Look back at that grade-school arithmetic they showed us. Notice any assumptions embedded in it? Yes, that’s right, they assumed that all the cancer risk that they claim is caused by ST is caused by NNN, and thus a .65 reduction in the risk from NNN exposure is a .65 reduction in total risk. Wait, what? FDA did some hand-waving in their document about reductions in NNN also carrying along reduction in another TSNA, NNK, but they never tried to justify the claim that the (supposed) cancer risk was all due to NNN or even NNN plus NNK. How could they?

Effectively, FDA has just declared that they believe that whatever the cancer risk (at least oral cancer risk) is caused by ST consumption, it is all caused by TSNAs and no other molecules contribute any cancer risk. They never suggested this was a simplifying assumption. This could have some amusing implications. The next time you see one of those anti-scientific bits propaganda about ST containing 27 carcinogenic chemicals (or whatever number they are making up that day), you can reply that FDA has declared that at least 25 of those do not actually cause cancer. On the other hand, we should probably not try to push this too hard on this. I am guessing that, given all the other errors, the authors of this rule did not understand their own arithmetic sufficiently to know they were implicitly declaring this to be true.

Returning to the life on Mars metaphor, and the dialogue motif, the “logic” behind the FDA analysis would map to something like the following:

“From my light-scatter observations, I have concluded that had the water density in the martian atmosphere been X, instead of the Y I observed, the civilization that built the canals would not have collapsed just after helping humans build the pyramids, but would have thrived for 1,150 more years.”

“Wait, what? There are no canals. There was no civilization. Ancient extraterrestrial visitation stories are just silly claims by people who do not understand science and technology. The rovers and other Mars exploration have already shown that if there is or was anything we might call life, it has had no perceptible impact, let alone built a civilization. There is not enough water to support an ecosystem now, and was not enough 5000 years ago. But even if there had been a civilization, there is obviously no basis for estimating how atmospheric water density affected it, let alone a way to predict its demise to three significant figures based on one observation. As a minor point, I am not sure from what you said whether you meant Mars years or Earth years, but I am guessing you do not even know they are different.”

I am not being hyperbolic when I say FDA’s proposed rule comes across as parody. It reads like someone concocted it in order to ridicule a collection of faulty common practices and reasoning in public health science, creating cartoon versions to highlight problems that are often subtle. Please reassure us, FDA, that this was intentional. Even more so, those of you at the Center for Tobacco Products might want to reassure your colleagues elsewhere in FDA that this is not what their once respectable agency has come to.

Alternatively, perhaps it was really a joke by outgoing officials, hoping for a *popcorn* moment when the new administration tried to defend the rule in court. Or maybe it was just a Dadaesque tribute to the day it was issued. I realize these do not seem like terribly likely explanations, but they are more plausible than believing that anyone with a modicum of scientific expertise thought that this hot mess was legitimate analysis.

FDA’s proposed smokeless tobacco nitrosamine regulation: innumeracy and junk science (part 2)

by Carl V Phillips

In the previous post, I gave some background about the new proposed rule from FDA’s Center for Tobacco Products (CTP) that would cap the concentration of the tobacco-specific nitrosamine (TSNA) known as NNN allowed in smokeless tobacco products (ST). Naturally, I think you should read that post, but to follow the scientific analysis which begins here, you do not need to.

Before even getting to the even worse nonsense about NNN itself, it is worth addressing CTP’s key premise here: They claim that ST causes enough cancer risk, specifically oral cancer, that reducing the quantity of the putatively carcinogenic NNN could avert a lot of cancer deaths.

Readers of this blog will know that the evidence shows ST use does not cause a measurable cancer risk. That is, whatever the net effect of ST use on cancer (oral or otherwise), it is not great enough to be measured using the methods we have available. That does not necessarily mean it is zero, of course. Indeed, it is basically impossible that any substantial exposure has exactly zero (or net zero) effect on cancer risk. But even if all the research to date had been high-quality and genuinely truth-seeking — standards not met by much of the epidemiology, unfortunately — there is no way that we could detect a risk increase of 10% (aka, a relative risk of 1.1) or, for that matter, a risk decrease of 10%. Realistically, we could not even detect 30%. For some exposure-disease combinations it is possible to measure changes that small with reasonable confidence (anyone who tries to tell you that all small relative risk estimates should be ignored does not know what he is talking about). But it is not possible for this one, at least not without enormously more empirical work than has been done.

Despite that, FDA bases the justification for the rule on the assumption that ST causes a relative risk for oral cancer of 2.16 (aka, a 116% increase), or a bit more than double. This eventually leads to their estimate that 115 lives will be saved per year. Before even getting to their basis for that assumption, it is worth observing just how big this claimed risk is. (I will spare you a rant about their absurd implicit claims of precision, as evidenced in their use of three significant figures — claiming precision of better than one percent — to report numbers that could not possibly be known within tens of percent. I wrote it but deleted it and settled for this parenthetical.)

A doubling of risk, unlike the change of 10% or 30%, would be impossible to miss. Almost every remotely useful study would detect an increase. Due to various sources of imprecision, some would have a point estimate for the relative risk of 1.5 (aka, a 50% increase) and some 3.0, but very few would generate a point estimate near or below 1.0. Yet the results from most published studies cluster around 1.0, falling on both sides of it.

You would not even need complicated studies to spot a risk this high. More than 5% of U.S. men use smokeless tobacco. The percentages are even higher, obviously, for ever-used or ever-long-term-used, which might be the preferred measure of exposure. This would show up in any simple analysis of oral cancer victims. With 5% exposed, doubling the risk would mean about 10% of oral cancer cases among nonsmoking males would be in this minority. A single oral pathology practice that just asked its patients about tobacco use would quickly accumulate enough data to spot this. It is not quite that simple (e.g., you have to remove the smokers, who do have higher risk) but it is pretty close. The point is that the number is implausible.

In Sweden, ST use among men is in the neighborhood of 30% (and smoking is much less common). A doubling of risk for any disease that is straightforward to identify, like oral cancer and most other cancers, would be much more obvious still. But no such pattern shows up. The formal epidemiology also shows approximately zero risk. Most of the ST epidemiology is done in Swedish populations, basically because relatively common exposures are much easier to study.

So how could someone possibly get a relative risk estimate of more than double?

The answer is that they created the absurd construct, “all available U.S. studies” and then took an average of all such results. (They actually used someone else’s averaging together of the results. They cite two papers that did such averaging and — surprise! — chose the higher of the results, though that hardly matters in comparison to everything else.) This is absurd for a couple of reasons which are obvious to anyone who understands epidemiologic science, but not so obvious to the laypeople that the construct is designed to trick.

You might be thinking that it is perfectly reasonable to expect that different types of ST pose different levels of risk. Indeed, that seems to be the case (however, the difference is almost certainly less than the difference among different cigarette varieties, despite the tobacco control myth I mentioned in Part 1, the claim they are all exactly the same). But nationality obviously does not matter. Should Canadian regulators conclude that nothing is known about ST because there are no available Canadian studies? This is like assessing the healthfulness of eating nuts by country; the difference is not about nationality but mostly about what portion of those nuts are peanuts (which are less healthful than tree nuts). If the category of nuts is to be divided, the first cut should be health-relevant categories of nuts, not nationality. Nutrition researchers and “experts” are notoriously bad at what they do, but few would make this mistake like FDA did.

The error is particularly bad in this case: It turns out the evidence does not show a measurable difference in risk between the products commonly used in the USA and those commonly used in Sweden. The data for all those is in the “harmless as far as we can tell” range. But it appears that an archaic niche ST product, a type of dry powdered oral snuff, that was popular with women in the US Appalachian region up until the mid-20th century, posed a measurable oral cancer risk. It turns out that a hugely disproportionate fraction of the U.S. research is about this niche product — disproportionate compared to even historical usage prevalence, let alone the current prevalence of about nil. There is nothing necessarily wrong with disproportionate attention; health researchers have perfectly good reasons to study the particular variations on products or behaviors that seem to cause harm. Also, it is much easier to study an exposure if you can find a population that has a high exposure prevalence, in this case Appalachian women from the cohorts born in the late 19th and early 20th centuries.

It is not the disproportionate attention that is the problem. The problem is the averaging together of the results for the different products. Even if that might have some meaning if the average were weighted correctly, it was very much not weighted correctly.

The 2.16 estimate was derived using the method typically called meta-analysis, though it is more accurately labeled synthetic meta-analysis since there are many types of meta-analysis. It consists basically of just averaging together the results of whatever studies happen to have been published. Even in cases the are not as absurd as the present one, this is close to always being junk science in epidemiology. The problems, as I have previously explained on this page, include heterogeneity of exposures, diseases, and populations, which are assumed away; failure to consider any study errors other than random sampling error; and masking of the information contained in the heterogeneity of the results. To give just a few examples of these problems: Two studies may look at what could be described in common language as “smokeless tobacco use”, but actually be looking at totally different measures of quite different products. Similarly, one study might look at deaths as the outcome and another look at diagnoses, which might have different associations with the exposure. A study might have a fairly glaring confounding problem (e.g., not controlling for smoking), but get counted just the same, obscuring its fatal flaw as it is assimilated into the collective. One study might produce an estimate that is completely inconsistent with the others, making clear there is something different about it, but it still gets averaged in.

But beyond all those serious problems with the method in general, all of which occur in the present case, this case is even worse. It is worse in a way that makes the result indisputably wrong for what FDA used it for; there is simply no room for “well, that might be a problem but…” excuses. It is easy to understand this glaring error by considering an analogy: Imagine that you wanted to figure out whether blue-collar work causes lung disease. This might not be a question anyone really wants an answer to, but it is still a scientific question that can be legitimately asked. Now imagine that to try to answer it, you gather together whatever studies happen to have been published in journals about lung disease and blue-collar occupations. As a simplified version of what you would find, let us say that you found two about coal miners, one about Liberty ship welders, one about auto body repair workers, one about secretaries, and two about retail workers. So you average those all together to get the estimated effect on lung disease risk of being a blue-collar worker.

See any problem there? If you do, you might be a better scientist than they have at FDA.

Obviously the mix of studies does not reflect the mix of exposures. Why would it? There is absolutely no reason to think it would. Notwithstanding current political rhetoric, only a miniscule fraction of blue-collar workers are in the lung-damaging occupations at the start of the list. The month-to-month change in the number of retail jobs exceeds total jobs in coal mining. But the meta-analysis approach is to calculate an average that is weighted by the effective sample size of each study, with no consideration of the size of the underlying population each study represents. The proper weighting could easily be done, but it was not in my analogy nor in the ST estimate FDA used (nor almost ever). If all the studies in our imaginary meta-analysis have about the same effective size, this average puts more weight on the <1% of the jobs that cause substantial risk than the majority that cause approximately zero risk. (Assume that you effectively controlled for smoking, which would be a major confounder here creating the illusion that even harmless blue-collar jobs cause lung disease, as is also a problem with ST research).

As previously noted, it is not only possible, but almost inevitable that studies will focus on the variations of exposures that we believe cause a higher risk. No one would collect data to study retail workers and lung disease. If they have a dataset that happens to include that data, they will never write a paper about it. (This is a kind of publication bias, by the way. Publication bias is the only one of the many flaws in meta-analysis that people who do such analyses usually admit to. However, they seldom understand or admit to this version of it.)

It turns out that this same problem is no less glaring in the list of “all available U.S. studies” of ST. In that case, about 50% the weight in the average is on the studies of the Appalachian powered dry snuff[*], which accounts for approximately 0% of what is actually used. Indeed, the elevated risk from the average is almost entirely driven by a single such study (Winn, 1981), which is particularly worth noting because this study’s results are so far out of line with the rest of the estimates in the literature. A real scientific analysis would look at that and immediately say that study cannot plausibly be a valid estimate of the same effect being measured in the other studies; it is clearly measuring something else or the authors made some huge error. Thus it clearly makes not sense to average it together with the others.

[*] As far as we can tell. The methods reporting in the studies was so bad — presumably intentionally in some cases — that they did not report what product they were observing. We know that the Winn study subjects used powdered dry snuff because she admitted it in a meeting some years later, and this was transcribed. She has made every effort to keep that from getting noticed in order to create the illusion that the products that are actually popular cause measurable risk. For some of the other studies we can infer the product type from gender and geography (i.e., women in particular places tended to be users of powdered dry snuff, not Skoal).

It is amusing to note what Brad Rodu did with this. Recall that the over-represented powered dried snuff was used by Appalachian women. So effectively Brad said, “ok, so if you are going to blindly apply bad cookie-cutter epidemiology methods rather than seeking the truth with scientific thinking, you should play by all the rules of cookie-cutter epidemiology: you are always supposed to stratify by sex” (my words, not his). It turns out that if you stratify the results from “all available U.S. studies” by sex (or gender, assuming that is what they measured — close enough), there is a huge association for women (relative risk of 9) and a negative (protective) association for men. ST users in the USA are well over 90% male. Brad has some fun with that, doing a back-of-the-envelope to show that if you apply that 9 to women and zero risk to men, you get only a small fraction of the supposed total cases claimed by FDA. And this is a charitable approach: If you actually applied the apparent reduced risk that is estimated for men, the result is that ST use prevents oral cancer deaths on net.

Notice that in my blue-collar example, you would also get a large difference by sex, with almost all the elevated risk among men. Of course, there is no reason to expect that sex has a substantial effect on either of these, or most other exposure-disease combinations. Results typically get reported as if any observed sex difference is real, but that is just another flaw in how epidemiology is practiced. The proper reason for doing those easy stratifications is to see if they pop out something odd that needs to be investigated, not because any observed difference should be reported as if it were meaningful. When there is a substantial difference in results by sex for any study where the outcome is not strongly affected by sex (e.g., not something like breast cancer or heart disease), it might really be an inherent effect of sex, but it is much more likely to be a clue about some other difference. Maybe it shows an effect of body size or lifestyle. Or perhaps the “same” exposure actually varied by sex. In the ST and blue-collar cases, we do not have to speculate: it is obvious the exposure varied by sex.

The upshot is not actually that when assessing the average effect, you should stratify the analysis by sex (though it is hard not to appreciate the nyah-nyah aspect of doing that). It is that averaging together effects of fundamentally different exposures produces nonsense. If there is a legitimate reason to average them together (which is not the case here), the average needs to be weighted by prevalence of the different exposures, not by how many studies of each happen to have appeared in journals.

It gets even worse. I put a clue about the next level of error in my blue-collar example: the shipyard welders worked on Liberty ships. In the 1940s, ship builders had very high asbestos exposures, the consequences of which were not appreciated at the time. Today’s ship welders undoubtedly suffer some lung problems from their occupational exposures, but nothing like that. Similarly, regulations and better-informed practices have dramatically reduced harmful exposures for coal miners and auto body workers. In other words, calendar time matters. Exposures change over time, and the effects of the same exposure often change too, with changes in nutrition, other exposures, and medical technology. There are no constants in epidemiology. (That last sentence, by the way, a good six-word summary of why meta-analyses in health science are usually junk.)

One of the meta-analysis papers FDA cites breaks out the study results between studies from before 1990 and after that. It turns out that the older group averages out to an elevated risk, while that later ones average out to almost exactly the null. This is true whether you look at just U.S. studies or studies of all Western products. Does this mean that ST once caused risk, but now does not? Perhaps (a bit on that possibility in Part 3). Some of it is clearly a function of study quality; I have poured over all those papers and some of the data, and the older ones — done to the primitive standards of their day — make today’s typical lousy epidemiology look like physics by comparison. A lot of this difference is just a reprise of the difference between the sexes: the use of powdered dry snuff was disappearing by the 1970s or so (basically because the would-be users smoked instead). In case it is not obvious, if you have a collection of modern studies that show one result and a smaller collections of older studies that show something different, you should not be averaging them together.

In short, a proper reading of the evidence does not support the claim that ST causes cancer in the first place. But even if someone disagrees and wants to argue that it does, that 2.16 number is obviously wrong and based on methodology that is fatally flawed three or four times over. That is, even if one believes that ST causes oral cancer, and even he believes it could even double the risk (setting aside that such a belief is insane), relying on this figure makes the core analysis that justifies this regulation junk science.

The next post takes up the issue of NNN specifically.

Time to stop measuring risk as “fraction of risk from smoking”?

by Carl V Phillips

I ran across a tweet touting a press release out of the Global Forum on Nicotine (GFN) meeting (a networking meeting, mostly of e-cigarette boosters) that made the claim that snus is 95% less harmful than smoking. This was variously described as being based on “new data”, “new data analysis” and “the latest evidence”, but with no further explanation of where the number came from. Since the presenter was Peter Lee, those of us who know who’s who can surmise that it is a statistical summary of existing published studies, because that is what Peter does. There is nothing necessarily wrong with that (though for reasons I will explain in an upcoming post, it is potentially suspect in this context). but it is certainly not new data or the latest evidence.

Oh, and it is clearly wrong. Continue reading

What is peer review really? (part 9 — it is really a crapshoot)

by Carl V Phillips

I haven’t done a Sunday Science Lesson in a while, and have not added to this series about peer review for more than two years, so here goes. (What, you thought that just because I halted two years ago I was done? Nah — I consider everything I have worked on since graduate school to be still a work in progress. Well, except for my stuff about what is and is not possible with private health insurance markets; reality and the surrounding scholarship has pretty much left that as dust. But everything else is disturbingly unresolved.) Continue reading