How the medicalized history of public health damaged its science too (a science and history lesson)

by Carl V Phillips

This week, in my major essay (and breezy follow-up), I argued that the dominance of hate-filled nanny-staters in public health now is actually a product of medic and technocrat influence more than the wingnuttery itself. The worst problem there has to do with inappropriate goals that stem from a medical worldview morphing into a pseudo-ethic. The seemingly inevitable chain of events created by that pseudo-ethic resulted in public health professionals hating the human beings who we think of as the public because we are a threat to what they think of as the public, which is just the collection of bodies we occupy.

But this is not the only damaging legacy in public health of the thoughtless application of medical thinking. The science itself has also suffered, most notably (though far from only) because of the fetishization of clinical experiments (aka RCTs: randomized controlled trials) and denial of research methods that are more appropriate for public health. This is something I have written and taught about extensively. I will attempt to summarize it in a couple of thousand words.

It can be argued that modern public health science got its start with the formalization of new observational epidemiology methods by Hill, Doll, Wynder (who may have “borrowed” his insights from the Nazi-era Germans), and colleagues in the 1950s. Observational studies are at the heart of epidemiology, the science of public health, as they are with most social sciences because realistic experiments are not possible like they are in, say, chemistry. It turns out that experiments are possible in clinical medicine also, and work quite well there too — often better than any other source of evidence.

You can see where this is going if you read the previous essay.

Just as the thoughtless porting of clinical goals (“fix this body; that is all that matters”) crippled public health’s political view and ultimately destroyed its compassion, porting the much simpler scientific context of clinical medicine damaged the ability of the field to understand science. It is ridiculous how much of my career I have spent responding to the damage that this has done (though I guess I cannot complain when I get paid for it).

It is easy to find a zillion sources for the naive and incorrect claims that an RCT is at the top of some hierarchy of study types or is the gold standard. Both of these are totally wrong. “Gold standard” refers to something that is exactly right and which other measures can be calibrated against. Obviously this is not accurate since RCTs are just one of many imperfect methods; they are sometimes best but nothing is gold in this world. Less obvious, but equally true, is that there is no hierarchy of study types. Under some circumstances one study type provides better information, and under other circumstances another does. It depends both on the question being asked (e.g., when the question is, “is it possible that X occurs?” then a single case study is the perfect study design) and physical realities (e.g., if it is not possible to control the exposure, then an experiment is obviously not going to be useful). The typical naive hierarchy puts RCTs on top, followed by the various systematic observational studies (which are, even more absurdly, sometime themselves ordered), followed by case series, individual case studies, and “expert opinion”.

Why is this mythology so pervasive? It is a classic case of offering a simplified point to an audience that is not going to understand the more complex reality, but is in need of something. The target audience are people who start out so misguided that they need some guidance, but that are not going to sit still long enough to really understand at a deep level. It is akin to telling a toddler that pulling the cat’s tail makes her sad. It may be inappropriate to project the feeling “sad” onto a cat. Strictly speaking we are concerned with the concept of simple physical pain (which may not make an impression on the sympathetic but pre-moral toddler), plus the possibility of terror that might result in behavior changes for the cat that make her life less rich (which would be far too complicated for the kid to understand). But “sad” might motivate the right behavior change. And it is not so far off of the technical reality that it is totally a lie. Of course, if the kid grows up thinking “sad” is a complete and precise description of the effects of abusing critters, some bad results might ensue. But chances are he will not stick with this simplification as he learns to deal with the complex real world. Substitute physicians for the toddler, and remove the bit about growing up, and it explains why RCTs are fetishized (that is, they are considered to have magical properties that they are associated with but do not actually possess) and other simplifications.

To start with, medics have a bad habit of thinking they can learn the relevant science through personal experience. (Note that this and similar observations here are based on an extensive literature on the topic and the reported wisdom of those who educate physicians, which once included myself, as well as my own concurring observations.) That is, they think they are basically like plumbers. Plumbers acquire a useful set of specialized skills and knowledge through some technical training and then an apprenticeship and continuing experience. That works well for them to figure out how to deal with the variety of situations and outcomes they face. Physicians are trained and acculturated in a similar fashion, which was about right a hundred years ago. Of course, most medics would resent the comparison, and they are certainly right that they face a much more complicated list of situations than plumbers. More importantly, the cause and effect in their world, and even the outcomes themselves, are much harder to judge.

The result of these differences is that the approach that works great for plumbers fails for medics. Each individual’s experience is woefully inadequate to judge, say, which of two drugs is better to treat a particular patient. The plumber learns that a particular fitting often fails if it is not angled just right because he can see the failure happen. The physician may not even know whether a drug worked for a particular patient (unless it is for a condition with no spontaneous remissions and the drug is the only intervention attempted) and would need to see a hundreds of such patients — administering one treatment to some and the other to some, and keep careful track of successes — to see a statistical difference in the effectiveness. This is unlikely to happen. More likely is that she will fixate on a small number of apparent successes and think those are sufficient evidence (“I have seen this work many times”). Thus, it needs to be hammered into to medics that they should generally defer to expert assessments of the body of formal research findings rather than substituting what they think they have learned from their own experience.

This last bit explains the greatest absurdity of those naive hierarchy lists, which putting “expert opinion” at the bottom of the list. This is simply silly since real expert opinion is the ultimate (indeed, only) source of interpretation of the evidence. Study results do not speak for themselves. But the purpose of these lists is to convince medics to stop paying attention to their highly inexpert opinion that comes from their limited experience, but it is easier to sell it to them if their experience is referred to as expert opinion. Once written down, this gets misinterpreted as suggesting that the synthesis of all available evidence by the real experts — the best source of organized knowledge — is unimportant.

As a related digression, when I talk to reporters about THR, the standard estimate that smoke-free tobacco products are roughly 99% less harmful than smoking usually is part of the conversation. Sometimes they will ask “where does that number come from?” or “whose estimate is that?” I explain that it is mine, that it traces back to calculations I did about smokeless tobacco epidemiology in 2006, and that we extrapolate it to e-cigarettes, which are probably about as low risk or at least close to it based on what we know about the chemistry of the exposure. Reporters are usually quite curious, so I explain more, including how this number is certainly not precise, but that it has stood the test of time, never being seriously challenged and being accepted by the other experts on the topic. In other words, it is expert opinion — mine and others’. What else could it possibly be? But it is not about one battery of calculations I did a decade ago. It is because the experts synthesize those calculations as well as all other available relevant knowledge, and come to the conclusion that it is about right. The key here is assessing all of the available formal study evidence and synthesizing it. No individual study result can substitute for that.

Returning to the narrative, the next step, after overcoming the urge to think that their limited and imperfectly-measured experience constitutes sufficient research, is getting the medics to not pay undue attention to minimally useful evidence. Again this has to be simplified because most of them are not scientists and do not understand scientific inquiry, but we are trying to get them to be science-based. A few are good scientists in addition to being medics, of course (and they understand the points I am making here and find them as frustrating as I do), but the goal is to get the rest of them to behave as if they understood the science. Like people in most professions, they are so busy with the day-to-day activities and technical details of their practice that they are unlikely to engage in much analysis. So they are told that case studies are the lowest form of study (just ahead of “expert opinion”), with collections of case studies just above that. When a medic reads a captivating case report about some utterly improbable series of events (reported because it was unique and thus interesting), we do not want him trying to recreate this in his practice. In general, it is not a good idea for any of us to base our assessment of how the world normally works on a one-off story we have heard. So the rule of thumb is a pretty good idea.

But this simplification — almost always good advice for someone engaged in the biological side of medicine — has been turned into a fetish. There is even a pejorative term for case-based evidence (“anecdotes”, of course). The thing is, case studies / testimonials / anecdotes are often quite useful when studying public health topics. To take the obvious example, the testimonials about the miraculous role that e-cigarettes played in the life of many people who tried a dozen times to quit smoking using every “approved” method are very informative. They do not tell us what portion of smokers e-cigarettes might work for, but they do tell us that the described phenomenon does happen. And because there are so many of them, it tells us it happens a lot, not just rarely. This works because for human behavioral phenomena — those that incorporate preferences, feelings, and volition — such individual experiences can be informative to the individual and thus to those who they report that experience to. People, unlike molecules or plants, are capable of contemplating what they are experiencing and what they are deciding, and reporting details about it. They are not always right, of course, because none of us fully understand ourselves. However, they are generally pretty accurate with observations like “I tried to quit smoking using every pharmaceutical out there” and “I quit for a month five different times, but I was always so miserable I started again”, along with “after a few weeks of using e-cigarettes and finding a flavor I liked, I was happy to never smoke again, and I haven’t done so for two years.” When thousands of people tell stories like that, you have learned a lot, and it is knowledge that is unlikely to be generated by any study design that is supposedly higher on that mythical hierarchy.

But the public health people, stuck in the misleading simplification that originates in medicine (and is generally good advice it that comparatively simple realm), say “no, that is not good evidence about anything because those are anecdotes, and anecdotes are never good evidence.” If asked to explain why case studies are not good evidence about anything, or specifically about the existence of people for whom e-cigarettes are apparently the only way to quit smoking, they have no answer. (“It was on a list of rules we had to memorize for a test in school, and I have never questioned it” is an answer that few will admit.) They have no answer because it is wrong. They could answer, say, “I know that a couple of anecdotes about someone recovering from gastric ulcers after adding large quantities of nutmeg to their diet does not mean medics should recommend nutmeg as a treatment.” And that would be sound reasoning; biological conditions spontaneously change, and it is easy to be superstitious about the cause. But that observation about biological interventions does not generalize to studies of human behavior.

There is another misguided generalization of this “anecdotes are not informative” myth that I have dealt with extensively, specifically in the context of the health problems caused by industrial wind turbines (electric generators) near people’s homes. There are countless testimonials of people who experienced a particular set of health problems when wind turbines started operating in their neighborhood, and also found that the problems abated when they spent time away, and then once again occurred when they returned to the exposure. People who do not understand science and believe the simplification from the faux-hierarchy dismiss these as “just anecdotes”. But not only are these “anecdotes” useful, but they are actually near-perfect experimental evidence, which is generally the most compelling evidence when the experiment actually measures what you are interested in, as it does in this case. The individual experiences, even though they are the testimony of a single person, involve changing the exposure and observing the effects, which defines experiment and provides very compelling evidence. (For those interested in more details, I wrote that up in the article available here.) You will probably notice a similarity with individuals’ smoking cessation experiments, wherein someone tries a particular intervention, such as switching to e-cigarettes, and can definitely figure out whether it works (for that individual) from personal observation.

Returning to the simplified advice to medics, the other disastrous failure in medical research comes when observational studies are so confounded as to be useless. The most obvious example comes when we look at an experimental treatment (drug, surgical technique, whatever) that is only used in circumstances where the standard method looks hopeless. If we naively compare outcomes, it can look like the experimental method is terrible compared to the standard approach, even if it is genuinely a more effective treatment, because it is attempted only for patients who are in particularly bad shape. (Note that you might recognize the similarities of this to the critiques that West and Hajek recently leveled at a junk-science study that claimed e-cigarettes do not help cancer patients quit smoking.) Conversely, if the new option is preferred only by the best physicians working in the best facilities, then it might look like it is better than it is. The solution to this is an experiment (RCT) that randomizes who gets what treatment, and carries it out under similar circumstances. And thus the simplistic advice to the medics is that the RCT is the best study method.

The main advantage (for relevant purposes, the only advantage) of a RCT is that it eliminates the systematic confounding such as that described in the previous paragraph. (Confounding is the problem of people who are getting one treatment differing from those getting the other, for reasons other than the treatment itself, and thus having different outcomes that are not caused by the treatment.) But this needs to be balanced against some serious disadvantages of the method. Conveniently, it turns out that those disadvantages are usually pretty minimal for a real clinical intervention. (Though there are exceptions to this, like the mess that the huge Women’s Health Initiative randomized trial of hormone replacement therapy turned into.) The disadvantages are, however, enormous when we try to do experiments in social sciences like public health. This is where that unfortunate history of public health — being a naive port of clinical ways of thinking into a social science, with those in the field not even recognizing that they are doing social science — really becomes a disaster for the science side of public health, just as it became for the ethical and political side.

RCTs became a fetish in public health because they are often (not always) the best study design in clinical medicine. So we have RCTs of NRT products that are interpreted as saying NRTs are nearly useless. This is actually not quite fair, because in public health we are asking different questions than we do in clinical medicine. If the question is the clinical question “if I ‘administer’ this ‘treatment’ (NRT) to a ‘patient’ presenting with this ‘disease’ (smoking), is it likely to ‘cure’ her?”, then the answer is no. The RCTs show a dismal success rate. But if the question is the real public health question, “does having NRTs available for people who want to quit smoking do any good at all?”, then the answer is yes. On the other side of that is the absurdity, “never mind that smokeless tobacco is obviously responsible for ten to twenty percent less of the male population in Sweden smoking, we do not have an RCT that shows it works.” It boggles the mind that people dismiss clear real-world evidence for lack of a completely inappropriate experiment.

However, I honestly think that the willingness to dismiss the real-world evidence this is not entirely politicized posturing. People in public health have been so damaged by the medical simplification that they really think that an RCT would be better evidence than real-world observations. Similarly, an RCT of e-cigarette use cannot not tell us much of anything we do not already know, and could never be a proper measure of the real-world phenomenon of e-cigarette use in any case. But still there are suggestions that we somehow need such trials.

THR advocacy in the face of the “public health” establishment is saddled with not just the political burden associated with THR being an “impure” behavior that they hate, but also the entrenched anti-science that public health mis-learned from its origins in clinical medicine.

[Update: the next post applies this specifically to research on THR.]

25 responses to “How the medicalized history of public health damaged its science too (a science and history lesson)

  1. Pingback: How the medicalized history of public health da...

  2. For me, most disturbing about using a RCT to test the efficacy of e-cigarettes as a smoking cessation device is not only is it completely inappropriate, but I’m fairly certain the federal government right now is paying some researcher(s) millions of dollars to do exactly that. Moreover, it wouldn’t surprise me at all if such a trial revealed no significant difference between e-cigarettes and whatever controls (patch, cold turkey) they used. By randomly assigning people to treatment arms they’re washing out the real-world factor that’s likely driving smokers’ success with e-cigarettes– the ability to self select their treatment through trial and error. Thus, the RCT could, by design alone, provide a completely misleading result. Clear the dance floor for the ANTZ if this happens.

    • Yes, I totally agree. For the reasons you note (and a couple of others) an RCT is likely to “show” that e-cigarettes don’t “work” very well, just as they show that NRT does not work very well. And, yes, I would have to guess that the ANTZ know this (though I am not sure they are not too lazy to take advantage of it — I don’t recall seeing anything pop up on, though it is not like I monitor it carefully). The Bullen study did a very nice job (sarcasm) of grossly underestimating the true value of ecigs, just as one would have expected.

  3. Just a small suggestion: I really think these blog pieces are excellent and I understand that such issues often require in-depth, lengthy posts. I am not suggesting you shy away from detailed, intellectual discussion of them – but what would be really useful in my opinion would be also having a summary of the post in two, three or four points, perhaps at the start or near the top of the post. I think this would be handy for people who simply do not have the time to read all of it, or who are put off by length, which is a shame, because you are consistently hitting the right target in each post.

    • Thanks. And yeah, that would have some value. But (of course)…

      I hesitate to do that for a number of reasons. What readers consider the key points of the complicated posts varies. I am constantly surprised by what passage someone quotes when recommending the post on twitter. But an author cannot decide how the reader will relate to his work. Also, frankly, I do not want to make it easy to not read the post (such summaries are supposed to be guidance, but we all know that we often just read that instead). I know, that half-contradicts the previous sentence and it rather pushy. But I really think there is some value in not just the conclusions of what I write, but how you get there. Working through that the hard way can teach the reader to use such reasoning in other cases. Also since sometimes I write essays and stories, rather than technical papers, and I would hate to give away the surprise twist :-). Even when I write technical papers, the abstract is always a grudging last step for me. Finally, and most important, my goal here is to write analyses that are convincing in their own right, not argue from authority with (“this summary statement must be true because Phillips wrote it”). Of course I know have no delusion that my authorship does not give the words extra power, and they are not just out there anonymously or by some random writer, but still I want them to carry as much of the weight as possible.

      I try to mix in some easy and breezy posts along with more complex ones (this week just happens to have been extra complex).

  4. Thanks for your reply. You make some very good and important points. And now you mention it, I realise that I sort of suspected that you also have wider writing interests – it comes through clearly in your approach, style and insights.

    Also, amongst other things, THR is of course a political issue, and I think the approach you take with regard to the psychology and rationalisations of anti-THR individuals and organisations, would also be valuable in an understanding of many other political issues. Such subtle and detailed political analysis is virtually non-existent in the (UK) media, which I find frustrating. Too many issues are considered straightforward, when in fact they are often complex, with certain motives and interests overlapping and merging and not fitting so easily into existing social and cultural stereotypes of party and political allegiance.

    I think in this and my previous comment, I may have managed to appear both pedantic and patronising, although that was not of course my intention. I know you didn’t say that, it is just my opinion.

    Looking forward to more posts. Thanks.

    • No offense taken. There is a good legitimate case to be made for what you suggested. I just decided that the case against it is a bit stronger.

  5. Okay, so I should have read your bio on this website first (which I’ve just done). Now I understand a bit more, including about your extensive experience and interests, something that would have served me well before commenting, to be honest.

  6. Carl, I’ve been enjoying your last few dissertations immensely. As a nurse on the front lines, being subject to the vagaries of “evidence based practice” it is refreshing to read someone with your acumen expound on the topic regarding public health.
    A couple notes from the trenches:
    1. As I detailed in the take down of the WHO FCTC COP#6 ( there appears to be a penchant mitigating selection of evidence vis. Evidence tainted by the allegiance of the researcher. In my former field of engineering we called it the NIH (Not Invented Here) problem. Those shaping policy simply do not admit evidence that is NIH or vetted and reviewed positively by their cabal. This form of group confirmation bias must be exposed and countermeasures proposed.
    2. We’ve seen this bandied about: “The plural of anecdote is no data.” You address this briefly in this piece but the research methods associated with empirical data needs to be more fully discussed. So many of those in the fight are unaware of DeGroot’s validation of empirical data or that the creditability of observation matters. This cuts both ways too. The use of increasing calls to poison control centers are just as anecdotal as there is no validation test of the observed phenomenon (i.e.: was the call related to an actual event, a question or concern regarding exposure or potential exposure, etc.).
    Thanks for all you do. Your assistance is a valuable teacher.

    • Thanks. I agree that NIH Syndrome (the concept is understood across fields) is a big problem too. A more complete scholarly study of what I discussed would indeed need to take it into consideration of another anti-scientific force that is pointed in the same direction. I think you are also suggesting that NIHS also extends to methodology, and I agree with that. But I think that fits perfectly into my narrative — a different explanation for exactly the same anti-science bias.

      I have written numerous times in testimony (and I think in this blog and also perhaps in that article I linked) that it is true that “the plural of anecdote is not data”, but that is only because you do not need the “plural”. One anecdote is data. (And, of course, so are many.) It may not be data that answers a particular question of interest, but it is definitely data, and anyone who thinks otherwise is not a scientist. If the question is “has it rained here in the last week?” then the anecdote “I got caught in the rain here yesterday” is not only data, but definitive data (unless of course it is wrong — but that same caveat goes for any data).

      • Surely a single anecdote is datum, and the plural of anecdote IS data?

        • Carl V Phillips

          No, definitely not. I suppose if the anecdote consisted entirely of “I am.” then it would be a datum. Well, actually even then it would convey the stated existence as well as “I write in English” plus perhaps “I have a strange sense of humor and am dangerously close to being a philosopher.” A real anecdote, say one of the many testimonials about quitting with ecigs, probably contains information about someone’s smoking history, their history of quit attempts, and their experience with e-cigarettes, as well as other information, and each one of those categories contains numerous facts within it. Definitely very plural.

        • I’ve had this conversation with a professor of English and she told me that “data” has become, through the natural evolution of the language, both a singular and plural noun. Those who insist on calling a single piece of information “datum” are insisting on a level of precision that most people have abandoned as being an unnecessary distinction to make.

        • Carl V Phillips

          I tend to agree. Though its natural usage in English has become neither singular nor plural, but collective — like water. “Datum” is clear in its meaning, though the natural English has most of us using something like “a single piece of data” or “a single observation” when we want to specify that. I am not bother when someone writes “the data show…” and sometimes I do it. I get very annoyed when some editor “corrects” my use of “the data shows….”

  7. I am suggesting that NIH applies to the source and the methodology, yes. Noting that it appears that most, but not all, of the WHO FCTC COP#6 was taken from Glantz’s work in December without even an acknowledgment that other information existed and was presented a priori provides some evidence of the effect at the global policy level.
    I appreciate the time to discuss the validity of anecdotal evidence. I am mostly concerned with the necessary means to validate or invalidate such evidence. As you point out RCT’s are held as the ersatz “Gold Standard” for completely inappropriate reasons in this case. It is the presence of now thousands of cases of success with THR as evidenced in personal testimonies that is being discredited. The validation of these testimonies comes not from their number or their source (observer) but from their content vis. “Tried all other methods, e-cigs worked”. There is admission that smoking cessation was desired and unsuccessfully achieved until a newer intervention was available. The testimony is as valid as reporting the temperature reading from a thermometer. Yet this empirical evidence continuously discredited. Even when surveys (op.cit. are done they are ignored (more NIHS?).
    So the question then becomes what effective countermeasures are left? PH appears to be coated in Teflon.

    • They are not teflon. It becomes evident how quickly their chrome peels off when they are challenged. It is just that few have done it for a while — at least not those who were easy to ignore. And perhaps that had its advantages. They were left on their own so far that they made themselves bloated easy targets. It is now time to counterattack.

  8. Thank you Carl, that offers encouragement to continue. :-)

  9. Pingback: Why clinical trials are a bad study method for tobacco harm reduction | Anti-THR Lies and related topics

  10. Pingback: Mike Siegel inappropriately blames the failure of his ill-advised research plans on others | EP-ology

  11. As always your research on this, C.V. Phillips, is extraordinary.

    You do leave out one thing that also fuels the nonsensical Catch-22 which is the “disease theory of addiction” promoted, the message carried, by 12-Step sycophants and the recovery group movement. Its business arm is, of course, the addiction treatment industry.

    After all, the nonsense about “addiction” had to come from somewhere.

    Nearly a decade ago I was going to be a social worker specializing in substance dependency but I aborted that once I realized that the movement and the industry spawned from it is actively, religiously, pro-addiction instead of anti-addiction.

    There’s much more but I wanted to at least pique your interest in that little nugget of info. Needless to say, it is the key link between the various prohibitionists out there. Explore that and you’ll find much more that deserves exposure.

    The history of 12-Step and the crap that spawned from it is guaranteed to get that blood boiling: The inmates have been running the asylum since the first day, One Day At A Time.

    You wanted the link between the temperance movement and medicine? There you go,

    — Mark B.

    • Carl V Phillips

      I tend to focus on the research side of things (what we know about what does what) rather than the therapy side (to use that term maximally broadly). There is a lot I don’t know about the latter and would be happy to learn more. I am aware, of course, that the treatment side is very focused on bright-line notions where there are no bright lines. That is perfect for both the treatment industry and the manufacturing industry. For alcohol they worked out the perfect cartel: There are just a few people who are “alcoholics” and should never be allowed to drink a drop ever. They belong to the treatment industry. Everyone else is safe and can do whatever they want, so they belong to the manufacturers.

  12. Pingback: Mike Siegel still doesn’t understand what is wrong with his study “plan” | EP-ology

  13. Pingback: Dear Public Health: the public despises you, so you are probably doing it wrong | Anti-THR Lies and related topics

  14. Pingback: What is peer review really? (part 8 – the case of Borderud et al.) | Anti-THR Lies and related topics

  15. Pingback: Economic innumeracy in public health, with an emphasis on tobacco harm reduction | Anti-THR Lies and related topics

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s