by Carl V Phillips
I started rereading Richard Feynman’s corpus on how to think about and do science. Actually I started by listening to an audiobook of one of his collected works because I had to clear my palate, as it were, after listening to a lecture series from one of those famous self-styled “skeptic” “debunkers”. I tried to force myself to finish it, but could not. For the most part, those pop science “explainer” guys merely replace some of the errors they are criticizing with other errors, and actually repeat many of the exact same errors. The only reason they make a better case than those they choose to criticize is that the latter are so absurd (at least in the strawman versions the “skeptics” concoct) that it is hard to fail.
Feynman made every legitimate point these people make, with far more precision and depth.
Of course, you can find most of Feynman’s important insights written countless times by countless others. But the difference is that there is simply no excuse for anyone who claims to be a scientist to not have read Feynman. More important, I seldom read others’ attempts to explore “what proper science looks like” without noticing something that is badly oversimplified, over-generalized, or out-and-out wrong. The popular press versions (as well as those written for medics and public health people) are thick with those problems. Modern writers can apply these ideas to a specific topic of immediate interest, (read: don’t ignore me just because Feynman was better :-), but there is a good case to be made for finding bedrock in the classics.
When Feynman criticized pseudoscience, he tended to draw his examples from psychology and education theory; this was back when public health researchers were legitimate scientists trying to figure things out in spite of the challenges. But if you swap out a few nouns in his lectures and essays, you can create a compendium of everything that is fundamentally wrong with today’s health research — and that is particular bad in public health research.
Consider a recurring theme, the need to rule out (or at least seriously explore) alternative explanations for an observation before reaching a particular causal conclusion. This is rather obvious in the abstract, but its implications range from the patent to the very subtle. An example he gives that is somewhere in the middle of that range is a story he recounts. A psych graduate student had read that X causes Y decided to investigate whether X’, a variation on X, causes Y. He advised her to replicate the original experiment first, to make sure she could replicate the results that imply X causes Y. That way if she did find that X’ does not seem to cause Y, it would support the claim that changing X to X’ makes the difference, rather than having no idea if she did something else different that changed the result. (The story continued that she excitedly tried to take the advice, but her advisor would not let her “waste” resources repeating something that had already been done. Sound familiar?)
To take a rather more obvious case, it would be the epitome of anti-science to compare jurisdictions where fewer people smoke with those where more smoke, observe that smokers are more aggressively trying to quit in the former populations, and from that alone conclude — contrary to both common sense and what is generally accepted — that when there are fewer remaining smokers, they are less dedicated to smoking than those who have already quit. That is, instead of the population of smokers “hardening” down to those least inclined to quit, it somehow magically distills down to people who are more inclined to quit. That is obviously an extraordinary conclusion, contrary to everything we know about human behavior, so would require extraordinary evidence to support it. But even setting that aside, the described association alone is not even ordinary evidence in support of the conclusion. (You can pause here and come up with at least one alternative explanation for the observed association.)
Consider an analogous story: you observe that if the consumption of Coca Cola in a country starts to drop, you tend to see further bigger drops. You conclude that once some people choose to quit drinking it, those who are still drinking it are less interested in drinking it than those who already quit. Seriously? Did you happen to notice that your results were driven by Syria, Russia, and Venezuela? Ok, even ignoring the extremes, did it not even occur to you that whatever factors were causing the initial drop might still exist, causing further usage cessation (or quantity reduction, or serious consideration in cessation) in spite of the remaining consumers being the greater fans of Coke? Did it not strike you that even if there genuinely is positive feedback (that is, a drop in consumption itself causes an increased interest in cessation by others) there are far more plausible explanations for it than the concluding that the consumers are less interested in continuing consumption than those who already quit. Possibilities include more robust competition from expanding alternative products, changing social attitudes that create pressure to quit, or the reduction in easy access as sources of supply disappear.
Of course, that would occur to you, dear reader. Stanton Glantz is another story. The incredibly stupid analysis of smoking I describe and analogize above was published by Glantz (in a “public health” journal, obviously). His claim, and the straightforward debunking of it, are recounted by Brad Rodu here. Brad and his postdoc, Nantaporn Plurphanswat, redid the Glantz analysis using variables that were in the very same dataset Glantz used, which offered measures of differing anti-smoker punishments (taxes, etc.) across jurisdictions. Also, for those interested in deeper wonkery, they also used jurisdiction fixed effects to try to account for changing norms. You will not be surprised at all to learn that Glantz’s results disappeared when they controlled for these other factors.
The critical issue here is not that tobacco controllers lie about what data shows (we know that) or that Brad is a better scientist than Glantz (well, duh!). It is not that Glantz’s claim is wrong (that was already obvious). It is the fact that Nantaporn and Brad had to do this in the first place. Moreover, Brad was a bit hesitant about making the effort it because real science like this is hard to publish in public health.
Robert West, whose journal published the debunking, has been quoted about a million times by vaping activists for his comment on another Glantz paper, “Publication of this study represents a major failure of the peer review system in this journal.” That is certainly true for the present Glantz paper also. But it is also not really interesting because it is also true of 99% of all papers in public health that include causal conclusions (West’s journal being no exception). I am not talking about the near-ubiquitous policy recommendations (which are always garbage) or tangential conclusory statements that could not possibly be supported by the study, but specifically about the failure to properly examine alternative explanations for an observed association before settling on one particular causal story. Even a perfunctory analysis of this is relative rare, let alone the careful drilling down that a real scientist would employ.
The problem is not merely a failure of journal reviewers and editors to magically eliminate junk via (nonexistent) gatekeeping. The problem is that researchers in these fields seldom have real scientific training or experience, and so they do not even realize they are doing (and reviewing, and publishing) junk science, along with their being no career disadvantage for producing junk science. The problem is not merely with those who intentionally seek to mislead, like Glantz. The failure is far more fundamental than either lying or the notoriously sloppy behavior of the journal process. It is the almost complete dissociation of public health discourse from scientific thinking.
Feynman’s point was not that someone else should be encouraged to come along later and figure out the alternative explanation and make a case for it (though that certainly should also be facilitated, but is not in health science where critical analysis is discouraged or ignored). His point — and that of any decent fifth-grade science course — is that exploring other alternative explanations is the duty of the original researchers. If they are not doing that, they are not doing science. You do not have to drill down to the “public health” people who seek to mislead to find massive failure to do this across health sciences.
It strikes me that a good rule of thumb is that if someone writes more than one paper a year that draws nontrivial causal conclusions, then they are probably not really doing science. I am not talking about just reporting results of studies and making trivial observations about them; those papers can be cranked out as fast as field research minions can do the work. Nor am I talking about calculations or critical analyses — a good scientist could do one of those a week in this area if he really wanted to. I am talking about the brass-ring of science, offering a genuinely strong case for some specific substantial cause-effect conclusion about the world (whether the subject matter is people or particles). It takes a lot of work to explore the myriad alternative explanations for an observation sufficiently to justify a particular conclusion. If someone did not spend many months on it, he probably did not do it at all.
And yet most papers in health journals that report an observed association assert a specific a causal conclusion. Sometimes authors hedge those with weasel-words, but that does not change the fact that they made the claim. (Even worse, results where some possible association is absent often explicitly conclude the negative about causation, but that is another story.) Many authors crank out ten such papers a year. Simply put, they are not behaving like scientists. The reason that extremists like Glantz can get away with blatant abuse of scientific methods for purely political purposes is not some occasional failure of the journal process. It is because what they are doing is only slightly different from what the median author in the field is doing.