by Carl V Phillips
Many of you will have already seen or heard about a paper by Farsalinos et al., in which they review some case series data from China and observe that for hospitalized COVID-19 patients, the recorded smoking prevalence is far lower than would be expected given the population prevalence. The US CDC also released data a couple of days ago that shows the same pattern. If the data is representative and accurate (but note that there are compelling reasons to question whether either of those is true), this strongly suggests that smoking is hugely protective against COVID-19 inflection and/or the resulting disease progressing to the point that hospitalization is required.
We are not talking at the level of “well I guess smokers get a bit of compensation this year for all the health costs of smoking.” This is at the level of “everyone should take up smoking for a few months until the pandemic abates.” The protective effect implied by the data is absolutely huge.
As you might guess, the usual suspects are doing everything they can to hide and deny this. Indeed, even those reporting the statistics are doing that. My biggest criticism of the Farsalinos et al. paper is that they misinterpret what their results imply, failing to report this and erroneously suggesting their results suggest there is no association.
It is certainly true that it is an extraordinary claim that requires better evidence than we have. As already noted, the quality of both the Chinese and US data is suspect. Still, to assume that something is not true, and that evidence that suggests it is true must therefore be wrong, merely because it would be unfortunate if it were true (in one’s personal view of How The World Should Be), is not exactly scientific thinking.
What really floors me is the fact that people do not want this to be true. We are desperately trying to slow the spread and reduce the severity of a disease we cannot cure or vaccinate against. We are suffering enormous costs in order to do that. How great would it be if everyone could reduce their risk of getting a serious case of the disease by 80% by smoking a few packs?
Needless to say, for anyone other than sociopathic monomaniacs, it would be great. Alas, I doubt it is really true. Still, that is what the statistics suggest.
As for those statistics, well…
The above link is to a working paper version of the paper. I cannot emphasize strongly enough that it is great that the authors posted their paper immediately, and sought comments on it, rather than engaging in the usual horrible health research practice of just sending it to a journal where it is kept secret up until the day that it is etched in stone with all of its flaws. [Update: I have now posted a review attached to the posted paper. It mostly just points the reader back to this post.] I strongly commend the authors for this. After doing that, they submitted it to a journal, and I was asked to review it. (No secrets are being disclosed here. This journal is one of the good ones that does not keep reviewers and reviews secret, and as already noted, the authors did the right thing and already made the paper public.)
The paper is important and timely, the authors have made clear they want public comments, and I think the review is educational. Thus I am posting the review here. Note that the current version of the paper at the link is v.13 (which might change again by the time you read this), while I think the version I reviewed is v.10 (at least it is not the current version, which just went up). If you care, the working paper server lets you look at previous versions, though it probably does not matter much which version you read. You can probably make sense of the review having just read the abstract or heard about the paper, though reading it is not much of a burden (ten minutes or so – it is short).
As I try to remind my readers when I post a review, please keep in mind that my reviews might be what you imagine journal reviews always look like. In reality, the typical review in public health is 1/10th as long, does not even try to review the technical content and instead is based mostly on whether the reviewer likes the conclusions, misses most of the glaring errors in the paper, and offers advice that would probably make the paper worse rather than better. (For a whole lot more on that, see our paper on that topic.)
This is not an everyday paper. If it were a typical paper, one that used the same methods for some relatively unexciting question, this review would have a tone of “well, ok, but you need to do X and you should not do Y, and there are some other little problems that need to be explicitly dealt with, but sure whatever, it is just the little thing that it is.” That is not an option with this paper.
To summarize my observations before continuing: The result of this analysis is, if true, enormously important. But there is so much uncertainty about the data and so much fundamental material missing from the analysis that we cannot conclude anything based on what is presented. The authors need to do a lot more analysis if they are going to present these results. If they think they can stand by the true implications of what they are reporting, they should do that and present their arguments. If they do not think they can do that, they need to report that fact, as so not report their results as they do, and to definitely not present conclusions that are not supported by their analysis as they do now. (There are also some specific issues that I address.)
The authors are presenting us with evidence that says — if accepted as even roughly valid at the level they are presenting it — that smoking is HUGELY protective against COVID-19 colonization and/or the resulting disease progressing to hospitalization. The magnitude is sufficient that it would not be a joke to recommend people take up smoking until the pandemic subsides (contrary to the authors’ obligatory and dutiful recitation of the “generalized advice to quit smoking”). That is an extraordinary claim (at least in practical terms, whether it is biologically extraordinary or not), and thus demands a more complete analysis than appears in the paper.
To respond immediately to the pre-positioned retort to these observations: It is not acceptable to hide behind the rhetoric of “but we never say that!” (I suppose technically it is not really rhetoric in the paper, but rather the rhetorical equivalent of empty space in art, but it is a bright beacon even in the form of its omission.) The authors assert that their results merely suggest that smoking does not have detrimental effects on COVID-19 outcomes. But this would be like Hill, Doll, et co. reporting their classic results with the conclusion “there is no evidence that smoking protects against lung cancer.” Either this paper — with its results as presented — shows that smoking is protective against COVID-19 with near certainty, or it does not show anything at all because it is massively flawed to some unknown degree. People are already saying this paper provides evidence of a protective effect. They are right if we assume the analysis is even roughly legitimate.
This is especially true in light of the just-reported U.S. CDC statistics that show a similar strong protective association. (Specific review comment: The authors need to cite that report and discuss it in their paper.) It is reasonable to be of the opinion that both of these sets of statistics are misleading for reasons discussed here. But the triangulation certainly should shift posteriors at least somewhat toward believing toward believing the implications of the present paper are really true.
The authors cannot hide from this. Trying to do so is disingenuous.
Had the results really amounted to “we explored the hypothesis that smokers are at higher risk and found data that actually points slightly in the other direction”, then they could have written the easy paper that they did. It is sometimes possible to legitimately say (and by ‘say’ I mean actually say it, explicitly), “our results point somewhat toward causation in a particular direction, but we recognize this would be an extraordinary claim” along with some combination of “the support our statistics offer for such an affirmative claim is modest” and/or “we identify sources of possible error that would plausibly be of the magnitude to create the observed departure from the null” and then land on “but while we cannot claim causation in the direction suggested by our statistics, those statistics make it difficult to believe there is substantial net causation in the other direction.”
That is not possible here. Authors cannot choose to declare that their results are not what they are just because they are uncomfortable stating the implications of those results.
The association in the protective direction is not modest, and there is no exploration of the magnitude of errors that might have been skewing things in that direction. There is no attempt to quantify the sources of error, and indeed they are not even effectively identified. If the claim is that there are errors that are sufficient to adjust the observed hugely protective association down to merely “it appears there is no causation in the other direction”, then those errors are sufficiently large and uncertain that nothing whatsoever can be learned from what appears in the paper. Either the potential errors must be explored and quantified rather precisely or the data should be declared to be suspect to such an unknown degree that these numbers should not be reported at all. There is no legitimate way to do something in between. What appears in the paper — effectively “we make the unsupported assumption that the errors in the data perfectly cancel out the magnitude of the huge association we report, and thus we interpret our findings as supporting a null relationship” — is illegitimate.
I expect it is frustrating for the authors that this rare opportunity to produce a stunning result — to be part of what would be, if true(!), the biggest discovery about smoking and health in decades — demands so much more work. But here they are. If this were a jaw-dropping bench discovery, the researchers would work their lab 24/7 to recheck every last detail, replicate, test, etc. There is no such recipe here, but something equivalent needs to be done.
For what it is worth, my quick assessment upon first seeing these results was that the quality of the data is so hopeless that nothing at all can be made of it, and thus this report can only mislead. But that was not a strong prior and I accepted that I might be wrong. Indeed, I was actively excited by the possibility that I was wrong, at both the “scientific curiosity” level and the “this could have huge practical implications” level. After carefully reading what appears now, as well as available commentary about it, my prior did not really move. The authors need to try to convince themselves that their quantitative results are informative and, if they succeed, then try to convince the reader. If they cannot find a reason to be convinced that the data is valid, then they should morph this paper into a report about why they think no one should repeat this quantitative analysis because it is misleading. It is a major epistemic error and arguably a logical fallacy to declare, in effect, that the results support the null hypothesis because they are based on data that is not solid enough to support some other conclusion.
The authors need to have someone on their team who can research Chinese language documents and someone who is familiar with Chinese medical records. Perhaps they do already, but that individual has not contributed what is needed here. For cranking out a typical workaday pub that dredged up an uninteresting association, just downloading the ten papers that happen to be in English and happen to be easy to find, pretending that these represent all available knowledge, is weak methodology but not necessarily a fatal problem. But this very non-boring paper is about a phenomenon that is only a few months old (i.e., an even smaller portion of human knowledge appears in academic journal articles than is typical) and relies on data coming out of a country that …well, let’s say has more than its share of controversy about the validity of research papers, to say nothing of its official statistics. It is not adequate to just take the English-language journal articles and treat them as if they were the available data about Chinese COVID-19 hospitalization.
Someone needs to dig into other reports that may have been written in Chinese, original versions of the papers that form the dataset, etc. They need to see if there is any public discussion about statistics like this and what similar statistics are available in some form. They need to see if there is a samizdat questioning the validity of the internationally reported results. They need to see what Chinese commentators said about this association (surely, if it really exists, someone noticed it and wrote something). They need to review what Western China watchers have said about the Chinese COVID-19 data (it seems safe to assume that the English-language publications did not go out without the approval of the Chinese government).
It is not beyond imagination that the data used here is fabricated. It is definitely plausible that it was intentionally cherrypicked (at the level of deciding what would be reported, not by the present authors) in support of some goal. This should be acknowledged and addressed to the extent possible.
Similarly, someone needs to assess whether the Chinese medical records data is valid. Those of us familiar with US data, upon seeing the aforementioned US CDC report, immediately started thinking, “hmm, ad hoc medical records are known to often fail to collect information on covariates that are unlikely to affect treatment decisions, and sometimes the record keeping methods then just default those to the negative” and also “given all the talk about triage and rationing treatment, it is easy to imagine smokers — who are aware of the medical discrimination they face even under normal circumstance — hiding their smoking status upon interview to try to avoid being denied treatment”. What are the chances that these, or something else like that, happened in China? The authors need to try to figure this out.
The overarching point here is that there is only one potential source of error (if we assume that the data is not intentional disinformation) that really matters here: exposure misclassification. The authors need to make this clear to the reader rather than having a single sentence about it buried in a paragraph in the Discussion. Someone “quitting” smoking because of disease onset and self-identifying as a former smoker is one aspect of this, so it is good the authors make some attempt to address this. But it seems likely to be a pretty trivial contribution to the misclassification compared to mere faulty recording, and thus odd that the authors devote outsized attention to it while basically ignoring the major sources of exposure misclassification.
The Discussion alludes to confounding by SES, but any such effects are likely to be pretty trivial. The authors should run some numbers to demonstrate this (to themselves, and then communicate to the reader). Merely making the qualitative observation “such confounding might exist” is not good enough.
Similarly, the authors allude to unknown age distributions of the sample. One big problem with this is that they do not even report the statistic that this would affect, the EV of smoking prevalence for each sample (which they need to report — see below). The deeper problem is that it does not seem like it could really matter, given the prevalence distribution they report; this is a red herring. The authors should run some numbers (they demonstrate that they have the inputs they need) to show themselves that any effects of this would be trivial, and then report this fact to the reader.
Other commentators have proposed a few other sources of error in these results. None of them seem worth much attention when compared to the major issue. E.g., smokers disproportionately lacking access to healthcare in this population, which seems unlikely to be true to a sufficient degree to matter. But if the authors believe this or any other source of uncertainty is worth even mentioning, then they should run some plausible numbers to inform themselves about how much it could possibly matter and report what they found. Unquantified hand-waving mentions of potential bias are always bad methodology, and this problem really matters in the present paper.
Even worse, these authors, as well as other commentators, have reported observations about detrimental effects of smoking as if they attenuate the implications of the observed association. But the exact opposite is true. (Other commentators have also mentioned the face-touching risk that might be associated with smoking, which has the same implications.) In the Discussion, the authors allude to other diseases that are caused by smoking and that are believed to increase the risk or severity of COVID-19 infection. But they then continue on to a backward interpretation of the implications of this. Assuming that smoking has detrimental effects regarding COVID-19 infection via these pathways, then this would mean that (once again, if the data is legitimate) smoking per se is even more protective against COVID-19.
That is, the reported data shows a protective net effect. If valid, it shows whatever beneficial effects smoking causes minus the detrimental effects from diseases that were caused by the previous decades of smoking. This would further recommend that people take up smoking for a few months right now, for the time where the benefits are available but not long enough to cause any substantial risk of disease. It should be clear from what appears here that I am not leaping to that conclusion. But this is the top-line implication of the observations about detrimental pathways from smoking to COVID-19 infection, as reported in the paper and elsewhere. Those observations clearly do not attenuate the observed association as claimed by the authors and other commentators. (It would really be more complicated than that. Perhaps only past decades of smoking, not current smoking, is protective. But the point is that the authors and others seem to be searching for reasons to downplay the benefit the reported results imply exists, and in so doing, they are making claims that are exactly opposite of what the claims about detrimental pathways suggests.)
The authors go on to get this even more specifically wrong when they say that the posited detrimental pathways from smoking to COVID-19 outcomes means that nothing can be said about the effects of smoking on hospitalization risk. The (net) effects of smoking on hospitalization risk are exactly what their data measure (as usual, assuming it is valid). Note that the authors phrase this claim as “no recommendation can be made”; these are technically defensible weasel words because the results are so uncertain — because of possible misclassification — that it is arguable that no recommendations can be made based on this dataset. But this has nothing to do with the immediate context of the claim, that there may be detrimental pathways.
To reiterate, until the potentially fatal source of error — exposure misclassification — is addressed, there is little point in even mentioning the others. But at least if they are going to be mentioned, their implications should not be misstated.
One interesting observation from other commentators is that we should expect a much larger representation of females in the sample if smoking were protective, given that most males in this population smoke and almost no females do. The versions of this observation I have seen are facile (it requires various “all else equal” assumptions that should be stated, among other things). But it has rough validity and important implications. If the analysis is accepted at face value, this would mean the conclusion would have to be something like, “smoking is protective for the men in this population (the exposure is too rare to judge whether it is protective for the women also); also apparently women in this population are at much lower risk compared to international statistics, so much so that the protective effect for men is not enough to make up all of this difference.” Needless to say that is, once again, a huge “if”. The seemingly better conclusion from observing the surprisingly low representation of females in the sample is “since we seem to need a very convoluted story to explain what was observed, this is further reason to believe that the input data is so flawed as to be uninformative.” An alternative version is “we really have no idea what the reported data represents because we have to come up with seriously wild stories to explain it.”
Specific issues (most of which are moot should the authors choose the option of reporting that no sense can be made of this data):
Whichever of the above potential systematic biases are important, it should be apparent to the authors that random sampling is not the (or even a) major contributor to the uncertainty in the results. Thus, the habitual bold reporting of random error confidence intervals is even worse for this analysis than it normally is in epidemiology. Naive readers (i.e., about 99% of readers) look at those and interpret them as a measure of uncertainty — all of it — which is clearly misleading here. It is even worse when authors make the mistake of averaging results together to produce an aggregate (see below). The authors need to make an explicit statement, early and boldly, not as a subtle implication of vague observations in the Discussion, that these CIs are really not informative (except for the fact that they argue against pooling the data — see below). Even better would be to suppress reporting them at all (at least not in the one table that everyone is going to look at) because they can only serve to mislead.
The authors should not average together the various datasets they are using. This is not nearly as bad as for the typical junk science “meta-analysis” of observational studies, in which the exposure measures, outcome measures, and populations obviously vary wildly. There is potentially sufficient homogeneity of exposure, outcome, population, and methodology here to justify it, unlike most of the time such averaging is done. But there is still too much heterogeneity (and even more inadequate reporting to even judge if there is heterogeneity) in those papers, which the present authors just gloss over. The association observed in the largest block of data (the first paper in the table), when compared to any or all of the next few larger ones, appears to be statistically incompatible with the assumption that these are measures of the same phenomenon. That assumption is the justification for averaging, and if the assumption cannot be justified then averaging should not be done.
[BLOG UPDATE: Wow, I really understated just how bad a methodology the averaging together was. I did not study the source material when I wrote the review. I have done so now. Many of the papers did not count up “hospitalized COVID-19 patients” as the paper claims; some restrict the population to those with particular conditions while for others it is not clear everyone was hospitalized. Far worse still, a few of the papers did not report “current smokers” status as the paper claims, but combined former and current smokers or only counted heavy smokers. For many of the others, it is unclear who was counted in the smoking column. Yet the authors just averaged these all together pretending they were the same measure. No no no no no!]
In their main table, the authors need to calculate and report the expected value of the number of smokers in each row, based on population prevalences (i.e., the EV if the smoking data were from a stratified random sample of the population) using stratification by whatever covariates are available from the original paper. This can at least consider the gender breakdown of each sample, and age distributions and any other available demographics should be used. This is the null-effect baseline the reader needs to compare to the main reported statistic, and it is missing. The reader can roughly calculate this based on what is reported in the text (though only for the gender distribution, not any other covariates), but should not have to do it themselves.
I would, however, advise against taking the typical next step of calculating and reporting the resulting relative risk. Omitting this is (and thus would continue to be) a departure from standard practice; almost every paper in the epidemiology literature reports the equivalent statistic and calls it their main result. But in this case, any reader who is not expert enough to instantly calculate that in their head should probably not be distracted by the captivating bright-line number (this is often true of other epidemiology literature also). Should the authors not heed this advice and choose to report relative risk, they should definitely not follow the typical bad practice of reporting an OR; ORs are a misleading statistic when comparing proportions. It should be reported as a proportion (risk) ratio or, better still, difference.
Regarding the confluence of the two previous observations, at least one previous commentator has recommended that the authors pool the data and compare the result to the EV for the Chinese population as a whole. I would argue that this is misguided for two reasons: First, as noted, the pooling should not be done at all. Second, even if it is, the EVs should be based on as much information data from each of the reports as possible, and thus calculated individually.
The methods reporting is inadequate. The authors report what they personally did. But everything hinges on the upstream data collection and recording methods. The authors mention a few random bits of this, but they need to systematically report to the extent they can figure it out (from the paper and any background published elsewhere) how each dataset was collected. To the extent that important upstream methodology choices are not known (not reported in the original papers and not found via further research), this needs to be noted specifically and explicitly.
Due to the inadequate methods reporting, it is not clear if it is possible to stratify any of the input datasets by gender/sex (nor is it even clear whether the reporting was for gender or for sex, though this is a relatively inconsequential point). If it is possible, it should be done. Ideally, the entire main analysis would be restricted to men, given the small exposure prevalence for women at the population level. To the extent that it is possible to stratify any of the data subsets, the stratified results should be reported and analyzed.
“This preliminary analysis, assuming that the reported data are accurate, does not support the argument that current smoking is a risk factor….” Even apart from the core problem with this paper (claiming that a strong association supports a null conclusion) this statement is semantically wrong. “Risk factor” does not mean “increases risk”.
Later in the Discussion, the authors observe that they are only reporting data from hospitalized cases, and then assert that therefore “no conclusions can be drawn” about less severe cases. This is clearly wrong. It would only be true if someone has no beliefs about the relationship between factors that affect the risk of severe cases and factors that affect the risk of less severe cases. It is unimaginable that this is true. Perhaps the authors want to say that they prefer to not try to extrapolate, without erroneously suggesting it is impossible. One way to do that is to not mention the point at all.
The last two sentences, about e-cigarettes, are a non sequitur. If the authors wish to offer a bit of discussion about how smoking is a cocktail of exposures (lung irritants, lung toxins, drug delivery of nicotine and other chemicals, etc.) and then note that we only have data about smoking and thus no idea which of these are causing the observed effects from smoking, that would be worthwhile. (Of course, that only makes sense if the conclusion is reached that the data suggest something is being caused.) But there should not be a context-free mention of another exposure that has a subset of the properties of the smoking exposure (plus some different properties).
A possible way out of this
Having thought about this for a day, I can see a way that the authors can salvage this to produce a legitimate analysis that still serves their (commendable) political goals. It is fairly apparent that the political goal here is to push back against the attempts to twist and cherrypick other data in order to use COVID-19 as an excuse to further their attacks on smoking and smokers. This is a valid and admirable goal. And this data, whatever its biases and errors, can be deployed in support of this mission.
[Aside: Assuming I am right about the mission of this project, the authors’ claims that they have no conflicts of interest is a lie. (Note that [this journal’s] COI policy wisely does not limit its definition of COIs to financial interests, as some journals make the mistake of doing.) If the goal of an analysis is to make a particular point that one personally wishes to make, this is among the biggest conflicts of interest that an author can ever have. It needs to be noted as a COI.]
To use this data to get a legitimate analysis that fulfills the goal, the authors can do the following:
- Fix the specific problems noted above (in some cases by just leaving something out).
- Change the core narrative theme to some paraphrase of this: “There are a lot of claims going around that smoking increases risks from COVID-19. These are generally based on cherrypicking and unsupported guesses. They ignore the data that points in the opposite direction like that out of the US and out of China, the latter of which we present here. Anyone wishing to claim that smoking increases risks needs to acknowledge and respond to these statistics that suggest the effects are hugely in the opposite direction.”
- Strip out all the ornaments that add no knowledge and that suggest that the Chinese data that is being used is valid enough to calculate them. That includes the random error statistics (CIs) and the pooled analysis. Including these does far more harm than good.
- Report the results (smoking prevalence vs. EV of smoking prevalence) honestly. Openly state what it shows if taken as accurate and unbiased: that smoking is hugely protective.
- Address what this implies if true. Do not play the game of asserting what one is “supposed to” always say about smoking. This was submitted to a journal run by harm reductionists not anti-smoking fanatics; take advantage of that. Honestly report that if taken at face value, these results would suggest that taking up smoking for a few months would cause a net health benefit.
(If the authors are not willing to report the obvious implications of their results, then they have no business reporting those results at all.)
- Only after having said that, if so desired (this is optional), go on to say that “we think this is probably not really the case because the data is so unreliable”. Note the phrasing there — personal and subjective. It is not ok to hide this subjective assessment in faux-objective weasel words. It is a subject assessment by a few people, not a result of the analysis. Of course, add any concrete observations that further research reveals, about there being other data that was not reported, that the case series reports were specifically criticized by someone, that the Chinese government is biasing external reporting, etc.
- Note boldly and clearly that the data quality is so uncertain that it is difficult to have much faith in its implications, or to be confident of any other conclusions that follow from it. Reiterate at that point that, nevertheless, those who would claim that smoking increases COVID-19 risk still need to deal with this elephant in the room to make their claims. Note that this does not excuse the authors from trying to figure out as much as they can about the data quality. Saying “here is what we did to try to sort out what this data really means, but just could not figure it out” is different from saying “we did not bother to even try to understand our data, and thus we do not understand it”.
7a. Note the multiple sources of fundamental doubt about this data: Normal misclassification at the recording level. The clearly biased sample (whatever happened to end up in English-language journals, which is only a sliver of the data that exists in China; the low prevalence of females). Possible games by the Chinese authorities to intentionally bias this reporting.
7b. Mention that there are other possible sources of bias (the confounding and such) but they are of such small consequence that they are not even worth addressing given the data quality problems.
- Emphasize that the data inherently captures the net effects of smoking on COVID-19 outcomes, not just protective causal pathways. Thus observations about detrimental pathways are not a reason to dismiss the observed association — they are already baked into it. That is, the authors need to not only fix the error that they made on this point, but explicitly point out that it is an error to think that way (because it appears to be a strangely common mistake to make). Thus, whatever those who desire to indict smoking might have to criticize about these results, noting that there are detrimental causal pathways is not one of them, since they are already part of this. (However, it is legitimate to say “I believe there are such clear detrimental pathways that it seems implausible that the net effect is beneficial, which is one reason to believe this data is biased.”)
[Update: I left out my conflict of interest statement that I submitted with the review. I know a lot of you are fans of my COI statements, so here it is:
I really despise people who are trying to use the pandemic as an excuse to pursue their attacks on tobacco product users.