Category Archives: Science lesson

FDA’s proposed smokeless tobacco nitrosamine regulation: innumeracy and junk science (postscript)

by Carl V Phillips

For completion of this series (with this footnote), the following is what I submitted to FDA. My comment does not yet(?) appear on the public docket as of this writing. But I got a confirmation (conf code 1k1-8xfb-dhwh if you want to search for it later). It has a bit of extra content beyond what I already presented.

I know a few of you urged me to rewrite my analysis in a more, er, formal manner. While I understand their reasoning for doing so, I chose not to take time from my other obligations to do that.  I honestly think it does not make any difference. I am reasonably confident that FDA “fulfills” their obligation to consider all the comments by having a low-level staffer read on each one, without reporting anything of substance up the chain, so they can check a box that says they read and considered each of them. If this proposed rule is not withdrawn for political reasons or as a result of the various procedural problems, then whoever is pursuing a lawsuit to strike it down can enjoy my essays as they mine them for substance. (Shameless plug: Of course, if they would like to hire me to formalize anything, I am quite good at that.) Besides, I might manage to embarrass that staffer who reads it into going into a more honorable line of work.

The content follows:


The primary purpose of this comment is to demonstrate that FDA’s assessment of the supposed benefits of this rule (115 fatal cancers averted per year) is fatally flawed for approximately half a dozen reasons, each one of which is sufficient to invalidate it. I have published the analysis in the following three blog posts, which I incorporate into this comment by reference:

https://antithrlies.com/2017/06/26/fdas-proposed-smokeless-tobacco-nitrosamine-regulation-innumeracy-and-junk-science-part-1/

https://antithrlies.com/2017/06/29/fdas-proposed-smokeless-tobacco-nitrosamine-regulation-innumeracy-and-junk-science-part-2/

https://antithrlies.com/2017/07/02/fdas-proposed-smokeless-tobacco-nitrosamine-regulation-innumeracy-and-junk-science-part-3/

(I have also attached printouts of them for completeness, but I would suggest reading the online versions with live links.)

The implication of that analysis is that there is no scientific basis for claiming that any disease incidents will be prevented by this rule, let alone the specific quantity claimed by FDA as the rule’s justification. Based on this alone, the rule should be withdrawn.

This analysis should not be interpreted as implying that if, counterfactually, the 115 figure were actually science-based, then it would justify the rule. There is no analysis of the negative health impacts from driving smokeless tobacco users to smoking when their preferred products are banned. The absence of this analysis is another sufficient reason for withdrawal of the rule. Moreover, even there were a legitimate reason to believe there were health benefits, and even if there were no health costs, justifying this rule would require a cost-benefit analysis that considered the welfare loss to consumers and other costs. The absence of this analysis is yet another sufficient reason for withdrawal of the rule.

Finally, given the lack of cost-benefit analysis of any sort, there obviously is no justification for choosing the particular quantitative standard in the proposed rule (even apart from the fact that it appears to be 1/4 of the intended quantity). This makes the choice of the standard arbitrary and capricious. It appears it must have been chosen with an eye to which particular winners and losers it would create, as I presented in this footnote to the previous analysis here (incorporated into this comment by reference and also attached):

https://antithrlies.com/2017/07/09/sunday-science-lesson-toxicology-and-the-chains-in-american-football/

While not central to the main point of this comment, this is a further problem with the legitimacy of this rulemaking.

Sunday Science Lesson: toxicology and “the chains” in American football

by Carl V Phillips

Those of you who read my series on fatal flaws in FDA’s proposed rule about limiting the nitrosamine NNN in smokeless tobacco (and presumably anyone reading this quick little tangent read those important and carefully crafted posts) might have tripped up over an oddity from the third post in the series. I quoted this from FDA’s proposed rule about how their key number, used for estimating the risk of cancer caused by some quantity of NNN, was calculated:

As defined by the EPA guidelines, the cancer slope factor (CSF) is “an upper bound (approximating a 95percent [sic] confidence limit) on the increased cancer risk from a lifetime exposure to an agent.

I noted (you can read the original for more detail) this means that when FDA estimated the dose-response for NNN, they did not use the point estimate generated by the underlying study, but inflated it by an arbitrary fudge factor (which is not actually an upper bound, as claimed, but is still much higher than the point estimate). This is obviously an error. There are arguments that using such inflation factors when setting standards (e.g., how much of a potentially toxic substance a facility is allowed to emit) are appropriate, to err on the side of caution. But an inflation factor, creating a number higher than what the data suggests is the best estimate, obviously does not give us the best estimate for the actual dose-response. I also observed that the model used to translate the data from rodent megadose studies into an estimate for the effects of realistic human exposures was fraught with huge, undoubtedly incorrect, assumptions that made the final result nearly worthless, even apart from this.

So you might be asking why such lousy models and arbitrary fudge factor rules even exist. They are clearly grossly inappropriate for what FDA was doing — no ambiguity there. But presumably they serve some purpose, or they would not exist.

I found myself flashing back to when I was ten or twelve years old and a fan of American football. There is a process in that game that occasionally occurs, in which a very close judgment has to be made about whether the offensive team advanced the ball the required ten yards to get a “first down”. (That is all you need to know. Obviously most American readers will know more details. Also, I realize I do not even know whether what I describe is still done at professional levels, given that it could be replaced by imaging and computers, but is at least still presumably done in high school games.) At that point, two officials run in from the sidelines carrying “the chains”, a pair of posts connected by a ten-yard chain. One of them places one end at the starting point for the required ten yards of progress. The other then pulls the chain taut and observes whether the current placement of the ball is a little past his post or a little short of it.

You might wonder why. It is no easier to identify the exact starting point, and measure from it, versus just identifying the exact ending point needed for the first down. Since the play will usually have moved the ball sideways, it is not as if someone can just remember the exact blade of grass the ball was on at the start; it is necessary to eyeball the corresponding point on an imaginary line across the field. Also, the ball is not a single point. And the current placement of the ball after the last play was somewhat arbitrary too. So why not just eyeball the spot that is ten yards further (using as a guide, in either case, the markings of yards that are painted on the field, though not necessarily exactly on the line the ball is on)? Further contemplation reveals that the answer lies in game theory.

If one official has to eyeball a target point near where the ball is sitting on the ground, either just ahead of it or just behind it, he is full-on deciding whether to award the first down. That creates a huge amount of pressure and also creates an enormous potential for exercise of any bias that official is feeling for whatever reason. It could be nefarious bias. But it could be an innocent moral struggle such as, “I denied them the last close call that could have gone either way, so I owe them this one that could go either way.” Or it could be an attempt at beneficence in violation of the procedural rules like, “where the ball is sitting is short of my estimated spot, but my colleague who decided where to place the ball after the last play really should have put it further forward and I can fix his error.” But when the first official eyeballs his spot ten yards back, he cannot be sure whether an inch one way or another even matters and can just do it mechanically without all those inconvenient thoughts cropping up. Of course, the colleague could exercise nefarious bias when he chooses where to pick his spot; an inch forward or an inch back are both plausible estimates of the starting point. But the complicated mechanism reduces the temptation to exercise such bias somewhat, and strongly reduces effects of the “I owe them this one” or “I can fix it” factor.

Regulators setting an allowable level of potentially harmful effluent, contaminant, or ingredient also have to draw a line. The right place to draw the line is hugely uncertain, both in terms of what levels are actually harmful and the political decision about what level of harm should be allowed (this contrasts with the American football analogy). Getting it right is pretty much impossible. Still, issues like those facing the football referee can be avoided. If regulators are allowed to draw the line when looking at exactly where the ball is sitting, as it were, they are deciding such things as “this product is fine, but its leading competitor is banned,” or “the facilities operated by our boss’s biggest campaign donor all just squeak in under the line.” That would not be good.

So instead they create a rule that says “make an estimate based on this crazy dubious model and then inflate the result by this predefined arbitrary factor, and draw a line based on that.” This does not eliminate directional bias (intentionally trying to be more or less stringent) in defining the models or inflation factors, or in interpreting the underlying data. But it does help avoid someone saying “hey, if I just bump this limit down from 7.5 to 6.8, I can really stick it to that company that I have always hated.” Since the proper line is enormously uncertain, that would be easy to do.

For the same reason, it does not matter so much that many of the steps in the defined process are just silly. You can still get outcomes where experts largely agree that the standard spit out by the sketchy complicated (but well-defined) process is way too low or too high. But even then, at least it offers a starting point for debate that was not just someone capriciously making up a number. Most of the time, the genuine uncertainty is sufficient that the result of the process might actually be the optimal number.

Circling back to the FDA, it is worth noting that their proposed rule in no way resembles this clumsy, but arguably justifiable, process. They were not following a rule that spit out a quantitative standard that, while probably non-optimal, was at least non-arbitrary. No, they misused elements of this process to (inaccurately!) estimate the effects of their proposed standard. But their standard itself was still an arbitrary and capricious number that was pulled out of the air. This was done with the clear view of exactly which products would make the cut, which would have to be re-engineered, and which would be banned. This is exactly the bright-line decision about who wins and who loses that those football and normal regulatory rules are designed to prevent.

Well, I should say that FDA thought they had a clear view of exactly which products would be affected. As noted in the first post in my series, they actually made a factor-of-four arithmetic error that means far more products would be affected and far more banned than they intended. But the point is still that they were misusing the trappings of a process that is designed to avoid exactly such picking-and-choosing, while still trying to engage in arbitrary picking-and-choosing.

FDA’s proposed smokeless tobacco nitrosamine regulation: innumeracy and junk science (part 3)

by Carl V Phillips

In Part 1 of this series, I described FDA’s proposed rule that would require smokeless tobacco products (ST) to have no more than 1 ppm of NNN (a tobacco-specific nitrosamine or TSNA) dry weight. I discussed some of the political and policy implications of this, and reasons why the rule will probably not survive. I also noted that almost no current products meet that standard, and that American-style ST probably cannot meet it. Despite the proposed rule probably being mooted, I noted there is still value in examining just how bad the ostensibly scientific analysis behind it is. In Part 2, I noted that the FDA’s estimate the standard would save 115 lives per year is premised on their estimate for the risk of oral cancer caused by ST use. But, in fact, the evidence does not support the claim that ST use causes any oral cancer risk. I then focused on why, even if one believes there is some such risk, the method used to calculate FDA’s quantitative estimate is utter junk science.

So far, none of that has addressed NNN itself, and how meeting the NNN standard would affect the carcinogenicity of ST, if it is carcinogenic. It turns out that this part of FDA’s analysis is even worse than that discussed in Part 2.

Estimating the health effect of a quantitative standard for an exposure is a matter of estimating the relevant range of the dose-response curve, along with knowing how much people’s dosage would change. That is, you need estimates like, “N people use product X, which has 5 ppm NNN, which causes Y risk per person, versus the Z risk per person from 1 ppm, so multiply N by (Y-Z)….” With such numbers we could estimate the effect of an adjustment in the NNN concentration.

In reality, it is not that simple. In Part 1, I pointed out that most products could not just have their NNN concentration “adjusted” like that, and that they would have to be fundamentally changed, effectively eliminated and replaced in the market (perhaps if FDA had not made the arithmetic error noted in Part 1, that would only be “some” rather than “most”). Many consumers of the eliminated or fundamentally altered products would not be happy with the new option. Some would just quit, eliminating the Y risk as well as any other risk from using the product (setting aside that as far as we know are both nil; remember, we are down that rabbit hole here). Some would switch to smoking, creating a risk that is orders of magnitude greater than anything discussed so far, making all of the details moot: the net health impact would be an increase in risk.

But that is the simple practical criticism of this madness, one that hinges on questions of consumer behavior (an area where FDA’s analyses are consistently absurd, but they always manage to trick their audience into accepting their assertions). That is not what I am doing here, though I suppose I just did it in one paragraph. My goal is to point out that the FDA core claims about benefits here are based on junk science, setting aside the enormous costs that would dwarf them anyway. So returning to my point here, what basis do we have for estimating Y, Z, and other points along the dose-response curve?

None.

Absolutely nothing.

Indeed, we do not even know that NNN in ST affects cancer risk at all.

As I mentioned in Part 1, if you are only familiar with the rhetoric about this topic, and not the science, you would be forgiven for not knowing that the assertion there is any such effect is based only on heroic extrapolations and assumptions. You might further surmise that since FDA claims that this reduction would reduce cancer deaths by 115 per year (note: not “about 100”, but as precise as 115), there is not only evidence that NNN in ST causes cancer, but there is also so much evidence that we can precisely estimate a dose-response.

What we know about NNN and cancer is based on biological theory (we have evidence that some nitrosamines cause cancer in humans), and the effects of exposing rats, hamsters, and other critters — species whose propensity to get cancer from an exposure is often radically different from ours, and even from one another’s — to megadoses of NNN. Those toxicology studies do suggest that NNN exposure probably causes cancer in humans, in a big enough dose, and under the right circumstances. Of course, that is also true for almost everything. When IARC, the cancer research arm of the WHO, made their blatantly-political decision to declare NNN a known human carcinogen, they did so in violation of their own rule that there has to be some actual human exposure evidence before making such a declaration. There is not. But even if someone believes that NNN in ST does cause cancer in humans, the rodent megadose data obviously does not tell us anything about the effect of the reduction in dosage imposed by this rule.

Stepping back, it is useful to understand the potential legitimate use of toxicology studies like those. They — or, better, in vitro studies of cells that are actually similar to the human body and do not require sociopathic torturing of innocent animals — are useful for giving us a heads-up that a chemical or combination of chemicals might be carcinogenic or poisonous. This might be a good reason to undertake the more difficult search for epidemiologic evidence that the real-world version of the exposure is causing the bad outcome. Or at least a reason to pursue the in-between step of looking for biological evidence of harm from the real-world exposure in humans. It might even be sufficiently compelling to prohibit introducing a novel exposure, acting before we can even get any human data.

If toxicology studies of a chemical all fail to produce a bad outcome, this strongly suggests that the exposure will not cause the harm, so long as that failure is consistently confirmed using various toxicology methods (claims that a single toxicology study shows that an exposure is harmless, which are currently appearing in the pro-vaping rhetoric, are misguided). But getting a bad outcome in a particular toxicology study does not mean that the real-world exposure actually does cause harm. The pattern in the toxicology has to be far better than what we have for NNN before such a conclusion is justified, including getting the effect at reasonably realistic exposure levels and fairly consistently across a variety of methods.

Consider an analogy: We are interested in knowing whether there is life on other planets, but actually going there to take a look is rather difficult. We have a much cheaper tool in our toolbox, however, which is to use modern telescopes to see if light scatter suggests a water-rich atmosphere. Of course, that is far short of observing life; it would be insane to say “we saw evidence of water, so there must be life there!” But since the versions of life that we understand require there to be enough water, seeing that creates the intriguing possibility of life. Failing to find water tends to rule out the possibility of life as we know it.

Another legitimate use of toxicology is to tell us why an exposure is causing harm. Of course, this should mean there is evidence of harm, not just some wild assumption that there is harm. Continuing the analogy, pretend that someone looked at the light scatter around Mars and claimed they saw enough water to support life: “Aha, this shows that the canal-building civilization is water-based life as we know it.” Um, but you do know that early 20th century telescopes debunked that 19th century canals myth, right? Also we have had numerous close observations of the planet and little labs driving around on the surface. Your hint about the possibility of life is utterly pointless given that we have much better information about the reality.

I have often described the TSNA toxicology research, which inexplicably continues to this day, as an attempt to identify which chemical pathways cause a cancer outcome that does not actually occur. As with Mars not having canals, we know that ST use does not cause a measurable risk for cancer, and therefore the NNN and other TSNAs in ST are not causing a measurable risk (unless we think that other aspects of the ST exposure prevent exactly as much cancer as the TSNAs cause, something that no one is seriously proposing). One possibility that has been seriously proposed — e.g., by Brad Rodu, whose work I cited in Parts 1 and 2 — is that something else in ST, perhaps antioxidants, directly negates whatever cancer-causing effect the TSNAs might have if we were exposed to them alone (which does not happen at a level beyond a few stray molecules). Indeed, when the exposure is tobacco extract, those rodent studies fail to show the carcinogenic effect from NNN, or anything else in ST for that matter, a fact that is conveniently glossed over.

So how did we end up with the “fact” (which I suppose should be called the fake news in current parlance) that NNN and other TSNAs in ST cause cancer? It basically comes down to circular reasoning, or perhaps it is figure-eight reasoning since there are two circles as well as a few other fallacies. It goes something like this (and I am really not exaggerating):

“Given that we have only seen an effect in megadose rat studies, how can we really be sure that TSNAs at the relevant dosage and in a realistic exposure cause cancer?”

“Because smokeless tobacco causes cancer, and it contains TSNAs.”

“But [even setting aside that we do not know that is true] how could you know it was the TSNAs causing it.”

“Because we know TSNAs cause cancer.”

“Um, isn’t that so transparently circular that even tobacco control’s useful idiots will see right through it?”

“There is more. We know that higher-TSNA products cause more cancer risk.”

“Ah, now that sounds like actual evidence. Please explain.”

“US products have higher TSNA levels than Swedish products, and US studies show a cancer risk while Swedish studies do not.” [Note: see appendix to this dialogue, below.]

“But didn’t you read Part 2 of this series? That contrast does not appear in studies of modern US products, but only from a few studies of an archaic type of product.”

“Yes, exactly. That product was very high in TSNAs, and its cancer effects were off the charts compared to modern products. Case closed.”

“There are no measurements of the TSNA levels of those archaic products. How do you know they had high TSNA levels?”

“Isn’t it obvious? They must have, because they caused cancer and TSNAs cause cancer.”

Loopity loopity loop.

In fairness, there are honest observers, including Brad Rodu, who hypothesize that this is indeed the reason the archaic products apparently caused cancer. But this is just a hypothesis, and it cannot be tested. Indeed, we cannot even replicate the basis for claiming those products caused cancer in the first place. It basically comes down to a single study from the 1970s — not exactly overwhelming evidence.

A bit more useful background: In the 2000s, the anti-ST crusaders in and funded by the US government (CDC and NCI, before FDA joined the game) fought a rearguard action against the evidence that had emerged from Sweden that ST was approximately harmless. Part of this was insisting that the higher levels of TSNAs in US products meant that the Swedish evidence was not informative. It was political bullshit on its face. Still, I wrote an analysis over a decade ago that showed that the ST products that produced those null results in Sweden had about the same TSNA levels as then-current US products. (This was based on limited analytic chemistry from before 2000. There were only a handful of TSNA concentration studies in the public record. But there was enough to show this.) TSNA levels in all styles of ST products were and are decreasing over time. It might have been true that 1990 US products were materially more hazardous than 1990 Swedish products (which showed no measurable risk) because they had higher TSNA levels. But mid-2000s US products had low enough TSNA levels that this would have no longer been true. This leads to the appendix for the dialogue. We could imagine this variation:

“US products have higher TSNA levels than Swedish products, and US studies show a cancer risk while Swedish studies do not. Also there is a time trend, wherein TSNA levels have been dropping in both US and Swedish products, and older studies found elevated cancer risks, while newer ones do not.”

“Part 2 of this series dismisses your first sentence. But the second sentence makes some sense, though it might just be because the older studies used really primitive methodology. Still, you have a prima facie valid point there, unlike all your other complete bullshit. But, hey, doesn’t that also mean you are conceding the fact that modern ST products do not cause any measurable cancer risk, even if older products might have?”

“Er, no. We never said that. We never made any claim about time trends despite it being the most scientifically defensible argument we have. Strike all that from the record.”

Summarizing this, we have only unsupported hypotheses and circular reasoning behind the claim that NNN in ST causes any of the (quite possibly zero) cancers caused by ST. Given this, we obviously know nothing about how much cancer a particular concentration of NNN causes. That is sufficient to show that FDA’s claim cannot possibly be science-based. But I am sure you share my curiosity about how FDA took this complete lack of information and turned it into the conclusion that exactly 115 lives per year would be saved by this regulation.

Here it is (from the proposed regulation):

….increase in oral cancer risk of 116 percent among smokeless tobacco users compared with never users. We then reduce this value by 65 percent based on toxicological evidence relating the estimated average reduction in the dose of NNN to lifetime cancer risk under the proposed standard. The result is a reduction in the estimated relative risk of oral cancer to 1.41 under the proposed product standard. FDA used the following calculation: (1 + (2.16−1) × (1−0.65) = 1.41) for this determination.

Thanks, guys, for showing us how to do that arithmetic so I did not have to find a third grader to ask. The important bit of showing their work, of course, is about justifying the inputs. In the introduction, FDA refers the reader to section IV.C for the basis for the .65 figure. It is really section IV.D, because, hey, just because you spent a million dollars writing a regulation that is potentially devastating for industry and millions of consumers does not mean you should bother to have someone edit it. It turns out the assumption is that the dose-response is linear across all quantities, and under that assumption the effects observed from megadoses in rodents gives a dose-response that translates into .65. The generic problems with this include the fact that the linear (also known as “one hit”) model of carcinogenesis has long-since been dismissed as invalid, the folly of extrapolating orders of magnitude beyond the observed data, and the little matter that rodents are not people.

It gets worse still when you look at the equation that FDA used to calculate the fictitious linear trend. (And I am not referring to the fact that they actually cut-and-pasted the equation in their document as an image from some low-res PDF of someone else’s document. This is not a scientific flaw, of course, but, it does suggest the proposed rule was written by people who have so little education and experience in science that none of them had ever learned how to typeset a simple equation.) The equation builds in the assumption that a very high exposure for a short time (e.g., what the rats experienced) has the same effect as the same total exposure stretched out over many years. This is the linearity assumption taken to the extreme. It not only assumes linearity for each parameter — i.e., increasing years of exposure, increasing quantity per exposure, or increasing number of exposures per day by Y% increases risk by Y% — which is completely unsupportable and almost certainly wrong. It also assumes a multiplicative effect for all interactions, which is also unsupportable and almost certainly wrong. For those who did not follow that, I will explain its major implication: The assumption is that a given lifetime quantity, X, of NNN exposure creates the exact same total cancer risk whether it is consumed all in one day, or one month, or spread out over 70 years. It is the same whether an ongoing exposure takes place all at once each Monday morning or it is spread evenly throughout the week. Moreover, if you increase X by 10% it increases the risk by 10% no matter how the consumption is spread out. On top of all that, if someone’s body mass is 10% lower his risk from X is always increased by 10%. If his mass is 99.963% lower (i.e., he is a hamster and not a human) then the risk is increased exactly 2720-fold.

Such simplifying assumptions about linearity and multiplicativity are not terrible if you are interpolating (i.e., you have data from both sides of the quantity you are assessing and you are trying to fill in the middle) or are extrapolating a little bit beyond the range of your data. But in this case they are extrapolating orders of magnitude beyond the rat data. Weeks of exposure rather than decades, 30 g bodies rather than 75 kg, and crazy large doses. And, of course, there is the little matter of assuming that a different exposure pathway in a different species has the same effect of ST exposure in humans. The huge extrapolation means that the slightest departure of the assumptions from reality (and it is safe to say that the departures are more than slight), means that the final estimate is complete garbage.

It gets worse. The key parameter is what is multiplied by the total lifetime units of exposure in order to estimate risk, which FDA calls the “cancer slope factor” or CSF if you want to search for it in the document. For this, they rely entirely on a 1992 estimate from the California EPA, which itself was based on the results of a 1983 paper that looked at what happens when hamsters were given huge doses of NNN dissolved in their drinking water. Yes, really. FDA’s number ignores the ~99% of the relevant research that has been done in the last three decades, and it was obviously pretty sketchy even in 1992 given that it was based on a study whose real information value (about actual human exposures) was approximately nil. Moreover, there is this:

As defined by the EPA guidelines, the cancer slope factor (CSF) is “an upper bound (approximating a 95percent [sic] confidence limit) on the increased cancer risk from a lifetime exposure to an agent.

So apparently (the methods are reported so poorly that it is hard to be certain) they not only based this key number on evidence — to use the word rather loosely — from a single ancient toxicology study, but they did not even use the actual estimate that was generated from that. Rather, they used a larger number generated via an arbitrary process. The upper bound of a 95% confidence interval is a completely meaningless number in this context. There is an argument (which many would call dubious) that some arbitrary inflation of the point estimate like this should be used in “abundance of caution”-based regulations. (Update: More on this in my follow-up post.) But it is not an estimate of the actual effect. I know this seems like an arcane technical point in the context of everything else, but I cannot stress enough what an enormous failure of legitimate science this is (assuming they did what it sounds like they did). This would mean, for example, if there had been fewer observations collected in that 1983 study, but it had still supported exactly the same point estimate, FDA would be claiming some larger number of lives saved, like 125 per year rather than 115.

When presenting this number, and practically admitting it is junk (despite using it to calculate their estimate of 115 to three significant figures), FDA writes:

FDA welcomes public comment on whether there is a more robust CSF available for NNN.

This is a classic bit of anti-scientific rhetorical strategy. Anyone answering that question as phrased is implicitly conceding that the estimate FDA used has some validity. Respondents are effectively conceding that if they cannot make a compelling case that some other number is better, then FDA’s number was appropriate to use. When a question’s phrasing builds in invalid assumptions, or when it assumes away the really important questions (“Have you stopped beating your wife?”), the response needs to unask it, not answer it. So here is my unasking answer to their welcoming of public comment:

The number FDA used has absolutely no hint of validity. However, there is no robust, or even remotely plausible basis for generating this “CSF”; any number used here might as well be made-up from thin air. That said, given that ST does not seem to cause oral cancer in the first place, the best default estimate is zero. There is no legitimate basis for concluding an estimate of zero is wrong. Oh, and also if you are going to use a junk-science extrapolation from rodent studies, you should at least calculate this number based on all such studies to date. If you are not capable of doing that analysis, and instead are limited to using the approach any middle-school student would use if confronted with this question (run a search and blindly transcribe whatever someone once wrote), then you have no business regulating anything!

I’ll take a deep breath here, because that is still not all. Look back at that grade-school arithmetic they showed us. Notice any assumptions embedded in it? Yes, that’s right, they assumed that all the cancer risk that they claim is caused by ST is caused by NNN, and thus a .65 reduction in the risk from NNN exposure is a .65 reduction in total risk. Wait, what? FDA did some hand-waving in their document about reductions in NNN also carrying along reduction in another TSNA, NNK, but they never tried to justify the claim that the (supposed) cancer risk was all due to NNN or even NNN plus NNK. How could they?

Effectively, FDA has just declared that they believe that whatever the cancer risk (at least oral cancer risk) is caused by ST consumption, it is all caused by TSNAs and no other molecules contribute any cancer risk. They never suggested this was a simplifying assumption. This could have some amusing implications. The next time you see one of those anti-scientific bits propaganda about ST containing 27 carcinogenic chemicals (or whatever number they are making up that day), you can reply that FDA has declared that at least 25 of those do not actually cause cancer. On the other hand, we should probably not try to push this too hard on this. I am guessing that, given all the other errors, the authors of this rule did not understand their own arithmetic sufficiently to know they were implicitly declaring this to be true.

Returning to the life on Mars metaphor, and the dialogue motif, the “logic” behind the FDA analysis would map to something like the following:

“From my light-scatter observations, I have concluded that had the water density in the martian atmosphere been X, instead of the Y I observed, the civilization that built the canals would not have collapsed just after helping humans build the pyramids, but would have thrived for 1,150 more years.”

“Wait, what? There are no canals. There was no civilization. Ancient extraterrestrial visitation stories are just silly claims by people who do not understand science and technology. The rovers and other Mars exploration have already shown that if there is or was anything we might call life, it has had no perceptible impact, let alone built a civilization. There is not enough water to support an ecosystem now, and was not enough 5000 years ago. But even if there had been a civilization, there is obviously no basis for estimating how atmospheric water density affected it, let alone a way to predict its demise to three significant figures based on one observation. As a minor point, I am not sure from what you said whether you meant Mars years or Earth years, but I am guessing you do not even know they are different.”

I am not being hyperbolic when I say FDA’s proposed rule comes across as parody. It reads like someone concocted it in order to ridicule a collection of faulty common practices and reasoning in public health science, creating cartoon versions to highlight problems that are often subtle. Please reassure us, FDA, that this was intentional. Even more so, those of you at the Center for Tobacco Products might want to reassure your colleagues elsewhere in FDA that this is not what their once respectable agency has come to.

Alternatively, perhaps it was really a joke by outgoing officials, hoping for a *popcorn* moment when the new administration tried to defend the rule in court. Or maybe it was just a Dadaesque tribute to the day it was issued. I realize these do not seem like terribly likely explanations, but they are more plausible than believing that anyone with a modicum of scientific expertise thought that this hot mess was legitimate analysis.

FDA’s proposed smokeless tobacco nitrosamine regulation: innumeracy and junk science (part 2)

by Carl V Phillips

In the previous post, I gave some background about the new proposed rule from FDA’s Center for Tobacco Products (CTP) that would cap the concentration of the tobacco-specific nitrosamine (TSNA) known as NNN allowed in smokeless tobacco products (ST). Naturally, I think you should read that post, but to follow the scientific analysis which begins here, you do not need to.

Before even getting to the even worse nonsense about NNN itself, it is worth addressing CTP’s key premise here: They claim that ST causes enough cancer risk, specifically oral cancer, that reducing the quantity of the putatively carcinogenic NNN could avert a lot of cancer deaths.

Readers of this blog will know that the evidence shows ST use does not cause a measurable cancer risk. That is, whatever the net effect of ST use on cancer (oral or otherwise), it is not great enough to be measured using the methods we have available. That does not necessarily mean it is zero, of course. Indeed, it is basically impossible that any substantial exposure has exactly zero (or net zero) effect on cancer risk. But even if all the research to date had been high-quality and genuinely truth-seeking — standards not met by much of the epidemiology, unfortunately — there is no way that we could detect a risk increase of 10% (aka, a relative risk of 1.1) or, for that matter, a risk decrease of 10%. Realistically, we could not even detect 30%. For some exposure-disease combinations it is possible to measure changes that small with reasonable confidence (anyone who tries to tell you that all small relative risk estimates should be ignored does not know what he is talking about). But it is not possible for this one, at least not without enormously more empirical work than has been done.

Despite that, FDA bases the justification for the rule on the assumption that ST causes a relative risk for oral cancer of 2.16 (aka, a 116% increase), or a bit more than double. This eventually leads to their estimate that 115 lives will be saved per year. Before even getting to their basis for that assumption, it is worth observing just how big this claimed risk is. (I will spare you a rant about their absurd implicit claims of precision, as evidenced in their use of three significant figures — claiming precision of better than one percent — to report numbers that could not possibly be known within tens of percent. I wrote it but deleted it and settled for this parenthetical.)

A doubling of risk, unlike the change of 10% or 30%, would be impossible to miss. Almost every remotely useful study would detect an increase. Due to various sources of imprecision, some would have a point estimate for the relative risk of 1.5 (aka, a 50% increase) and some 3.0, but very few would generate a point estimate near or below 1.0. Yet the results from most published studies cluster around 1.0, falling on both sides of it.

You would not even need complicated studies to spot a risk this high. More than 5% of U.S. men use smokeless tobacco. The percentages are even higher, obviously, for ever-used or ever-long-term-used, which might be the preferred measure of exposure. This would show up in any simple analysis of oral cancer victims. With 5% exposed, doubling the risk would mean about 10% of oral cancer cases among nonsmoking males would be in this minority. A single oral pathology practice that just asked its patients about tobacco use would quickly accumulate enough data to spot this. It is not quite that simple (e.g., you have to remove the smokers, who do have higher risk) but it is pretty close. The point is that the number is implausible.

In Sweden, ST use among men is in the neighborhood of 30% (and smoking is much less common). A doubling of risk for any disease that is straightforward to identify, like oral cancer and most other cancers, would be much more obvious still. But no such pattern shows up. The formal epidemiology also shows approximately zero risk. Most of the ST epidemiology is done in Swedish populations, basically because relatively common exposures are much easier to study.

So how could someone possibly get a relative risk estimate of more than double?

The answer is that they created the absurd construct, “all available U.S. studies” and then took an average of all such results. (They actually used someone else’s averaging together of the results. They cite two papers that did such averaging and — surprise! — chose the higher of the results, though that hardly matters in comparison to everything else.) This is absurd for a couple of reasons which are obvious to anyone who understands epidemiologic science, but not so obvious to the laypeople that the construct is designed to trick.

You might be thinking that it is perfectly reasonable to expect that different types of ST pose different levels of risk. Indeed, that seems to be the case (however, the difference is almost certainly less than the difference among different cigarette varieties, despite the tobacco control myth I mentioned in Part 1, the claim they are all exactly the same). But nationality obviously does not matter. Should Canadian regulators conclude that nothing is known about ST because there are no available Canadian studies? This is like assessing the healthfulness of eating nuts by country; the difference is not about nationality but mostly about what portion of those nuts are peanuts (which are less healthful than tree nuts). If the category of nuts is to be divided, the first cut should be health-relevant categories of nuts, not nationality. Nutrition researchers and “experts” are notoriously bad at what they do, but few would make this mistake like FDA did.

The error is particularly bad in this case: It turns out the evidence does not show a measurable difference in risk between the products commonly used in the USA and those commonly used in Sweden. The data for all those is in the “harmless as far as we can tell” range. But it appears that an archaic niche ST product, a type of dry powdered oral snuff, that was popular with women in the US Appalachian region up until the mid-20th century, posed a measurable oral cancer risk. It turns out that a hugely disproportionate fraction of the U.S. research is about this niche product — disproportionate compared to even historical usage prevalence, let alone the current prevalence of about nil. There is nothing necessarily wrong with disproportionate attention; health researchers have perfectly good reasons to study the particular variations on products or behaviors that seem to cause harm. Also, it is much easier to study an exposure if you can find a population that has a high exposure prevalence, in this case Appalachian women from the cohorts born in the late 19th and early 20th centuries.

It is not the disproportionate attention that is the problem. The problem is the averaging together of the results for the different products. Even if that might have some meaning if the average were weighted correctly, it was very much not weighted correctly.

The 2.16 estimate was derived using the method typically called meta-analysis, though it is more accurately labeled synthetic meta-analysis since there are many types of meta-analysis. It consists basically of just averaging together the results of whatever studies happen to have been published. Even in cases the are not as absurd as the present one, this is close to always being junk science in epidemiology. The problems, as I have previously explained on this page, include heterogeneity of exposures, diseases, and populations, which are assumed away; failure to consider any study errors other than random sampling error; and masking of the information contained in the heterogeneity of the results. To give just a few examples of these problems: Two studies may look at what could be described in common language as “smokeless tobacco use”, but actually be looking at totally different measures of quite different products. Similarly, one study might look at deaths as the outcome and another look at diagnoses, which might have different associations with the exposure. A study might have a fairly glaring confounding problem (e.g., not controlling for smoking), but get counted just the same, obscuring its fatal flaw as it is assimilated into the collective. One study might produce an estimate that is completely inconsistent with the others, making clear there is something different about it, but it still gets averaged in.

But beyond all those serious problems with the method in general, all of which occur in the present case, this case is even worse. It is worse in a way that makes the result indisputably wrong for what FDA used it for; there is simply no room for “well, that might be a problem but…” excuses. It is easy to understand this glaring error by considering an analogy: Imagine that you wanted to figure out whether blue-collar work causes lung disease. This might not be a question anyone really wants an answer to, but it is still a scientific question that can be legitimately asked. Now imagine that to try to answer it, you gather together whatever studies happen to have been published in journals about lung disease and blue-collar occupations. As a simplified version of what you would find, let us say that you found two about coal miners, one about Liberty ship welders, one about auto body repair workers, one about secretaries, and two about retail workers. So you average those all together to get the estimated effect on lung disease risk of being a blue-collar worker.

See any problem there? If you do, you might be a better scientist than they have at FDA.

Obviously the mix of studies does not reflect the mix of exposures. Why would it? There is absolutely no reason to think it would. Notwithstanding current political rhetoric, only a miniscule fraction of blue-collar workers are in the lung-damaging occupations at the start of the list. The month-to-month change in the number of retail jobs exceeds total jobs in coal mining. But the meta-analysis approach is to calculate an average that is weighted by the effective sample size of each study, with no consideration of the size of the underlying population each study represents. The proper weighting could easily be done, but it was not in my analogy nor in the ST estimate FDA used (nor almost ever). If all the studies in our imaginary meta-analysis have about the same effective size, this average puts more weight on the <1% of the jobs that cause substantial risk than the majority that cause approximately zero risk. (Assume that you effectively controlled for smoking, which would be a major confounder here creating the illusion that even harmless blue-collar jobs cause lung disease, as is also a problem with ST research).

As previously noted, it is not only possible, but almost inevitable that studies will focus on the variations of exposures that we believe cause a higher risk. No one would collect data to study retail workers and lung disease. If they have a dataset that happens to include that data, they will never write a paper about it. (This is a kind of publication bias, by the way. Publication bias is the only one of the many flaws in meta-analysis that people who do such analyses usually admit to. However, they seldom understand or admit to this version of it.)

It turns out that this same problem is no less glaring in the list of “all available U.S. studies” of ST. In that case, about 50% the weight in the average is on the studies of the Appalachian powered dry snuff[*], which accounts for approximately 0% of what is actually used. Indeed, the elevated risk from the average is almost entirely driven by a single such study (Winn, 1981), which is particularly worth noting because this study’s results are so far out of line with the rest of the estimates in the literature. A real scientific analysis would look at that and immediately say that study cannot plausibly be a valid estimate of the same effect being measured in the other studies; it is clearly measuring something else or the authors made some huge error. Thus it clearly makes not sense to average it together with the others.

Note:
[*] As far as we can tell. The methods reporting in the studies was so bad — presumably intentionally in some cases — that they did not report what product they were observing. We know that the Winn study subjects used powdered dry snuff because she admitted it in a meeting some years later, and this was transcribed. She has made every effort to keep that from getting noticed in order to create the illusion that the products that are actually popular cause measurable risk. For some of the other studies we can infer the product type from gender and geography (i.e., women in particular places tended to be users of powdered dry snuff, not Skoal).

It is amusing to note what Brad Rodu did with this. Recall that the over-represented powered dried snuff was used by Appalachian women. So effectively Brad said, “ok, so if you are going to blindly apply bad cookie-cutter epidemiology methods rather than seeking the truth with scientific thinking, you should play by all the rules of cookie-cutter epidemiology: you are always supposed to stratify by sex” (my words, not his). It turns out that if you stratify the results from “all available U.S. studies” by sex (or gender, assuming that is what they measured — close enough), there is a huge association for women (relative risk of 9) and a negative (protective) association for men. ST users in the USA are well over 90% male. Brad has some fun with that, doing a back-of-the-envelope to show that if you apply that 9 to women and zero risk to men, you get only a small fraction of the supposed total cases claimed by FDA. And this is a charitable approach: If you actually applied the apparent reduced risk that is estimated for men, the result is that ST use prevents oral cancer deaths on net.

Notice that in my blue-collar example, you would also get a large difference by sex, with almost all the elevated risk among men. Of course, there is no reason to expect that sex has a substantial effect on either of these, or most other exposure-disease combinations. Results typically get reported as if any observed sex difference is real, but that is just another flaw in how epidemiology is practiced. The proper reason for doing those easy stratifications is to see if they pop out something odd that needs to be investigated, not because any observed difference should be reported as if it were meaningful. When there is a substantial difference in results by sex for any study where the outcome is not strongly affected by sex (e.g., not something like breast cancer or heart disease), it might really be an inherent effect of sex, but it is much more likely to be a clue about some other difference. Maybe it shows an effect of body size or lifestyle. Or perhaps the “same” exposure actually varied by sex. In the ST and blue-collar cases, we do not have to speculate: it is obvious the exposure varied by sex.

The upshot is not actually that when assessing the average effect, you should stratify the analysis by sex (though it is hard not to appreciate the nyah-nyah aspect of doing that). It is that averaging together effects of fundamentally different exposures produces nonsense. If there is a legitimate reason to average them together (which is not the case here), the average needs to be weighted by prevalence of the different exposures, not by how many studies of each happen to have appeared in journals.

It gets even worse. I put a clue about the next level of error in my blue-collar example: the shipyard welders worked on Liberty ships. In the 1940s, ship builders had very high asbestos exposures, the consequences of which were not appreciated at the time. Today’s ship welders undoubtedly suffer some lung problems from their occupational exposures, but nothing like that. Similarly, regulations and better-informed practices have dramatically reduced harmful exposures for coal miners and auto body workers. In other words, calendar time matters. Exposures change over time, and the effects of the same exposure often change too, with changes in nutrition, other exposures, and medical technology. There are no constants in epidemiology. (That last sentence, by the way, a good six-word summary of why meta-analyses in health science are usually junk.)

One of the meta-analysis papers FDA cites breaks out the study results between studies from before 1990 and after that. It turns out that the older group averages out to an elevated risk, while that later ones average out to almost exactly the null. This is true whether you look at just U.S. studies or studies of all Western products. Does this mean that ST once caused risk, but now does not? Perhaps (a bit on that possibility in Part 3). Some of it is clearly a function of study quality; I have poured over all those papers and some of the data, and the older ones — done to the primitive standards of their day — make today’s typical lousy epidemiology look like physics by comparison. A lot of this difference is just a reprise of the difference between the sexes: the use of powdered dry snuff was disappearing by the 1970s or so (basically because the would-be users smoked instead). In case it is not obvious, if you have a collection of modern studies that show one result and a smaller collections of older studies that show something different, you should not be averaging them together.

In short, a proper reading of the evidence does not support the claim that ST causes cancer in the first place. But even if someone disagrees and wants to argue that it does, that 2.16 number is obviously wrong and based on methodology that is fatally flawed three or four times over. That is, even if one believes that ST causes oral cancer, and even he believes it could even double the risk (setting aside that such a belief is insane), relying on this figure makes the core analysis that justifies this regulation junk science.

The next post takes up the issue of NNN specifically.

Time to stop measuring risk as “fraction of risk from smoking”?

by Carl V Phillips

I ran across a tweet touting a press release out of the Global Forum on Nicotine (GFN) meeting (a networking meeting, mostly of e-cigarette boosters) that made the claim that snus is 95% less harmful than smoking. This was variously described as being based on “new data”, “new data analysis” and “the latest evidence”, but with no further explanation of where the number came from. Since the presenter was Peter Lee, those of us who know who’s who can surmise that it is a statistical summary of existing published studies, because that is what Peter does. There is nothing necessarily wrong with that (though for reasons I will explain in an upcoming post, it is potentially suspect in this context). but it is certainly not new data or the latest evidence.

Oh, and it is clearly wrong. Continue reading

What is peer review really? (part 9 — it is really a crapshoot)

by Carl V Phillips

I haven’t done a Sunday Science Lesson in a while, and have not added to this series about peer review for more than two years, so here goes. (What, you thought that just because I halted two years ago I was done? Nah — I consider everything I have worked on since graduate school to be still a work in progress. Well, except for my stuff about what is and is not possible with private health insurance markets; reality and the surrounding scholarship has pretty much left that as dust. But everything else is disturbingly unresolved.) Continue reading

Extraordinary claims

by Carl V Phillips

Fairly often (e.g., in the previous post) I make reference to the concept that extraordinary claims require extraordinary evidence. That is, if something seems extremely unlikely based on a great deal of accumulated knowledge or an understanding about how the world works, and you wish to claim it is true, you really need to have done some tight work. It is a good principle in science. Research does not produce scientific knowledge without adherence to principles like this (note that there are no “rules” in science, so we have to make do with evolved principles).

Today I am thinking of that in terms of a new study that was reported in this BBC story, “E-cigarettes ‘help more smokers quit'” (quotes from there). Continue reading

Sunday Science Lesson: Misconceptions about the gateway effect

by Carl V Phillips

Disturbingly close to 100% of writings about whether there is a gateway effect among tobacco products, particularly about e-cigarettes being a gateway to smoking, are nonsense. That includes most metas on the topic, where someone tries to explain what mistakes others are making. Perhaps addressing a few simple misconceptions can clear some of this up (and it is a favor to a few of my tweeps — I hope I have addressed all the points our conversation left hanging). I have written about all of this much more extensively and comprehensively (see this in particular), but not in soundbite form. Continue reading