by Carl V Phillips
There has always been a lot of confusion about what counts as a death from smoking — or from the current pandemic, a war, or most anything else. Events of 2020 has caused a lot more people to realize they are confused about what it means. Typically people just recite numbers they hear without pausing to ask what they could possibly mean. Deaths-from-smoking statistics are recited like a factoid one might hear about particle physics. In both cases a moment’s thought would reveal to most people that they really have no idea what it even means. But unlike with a lot of physics, it is possible for most anyone to understand what the death counts mean and how they are properly estimated.
As is sometimes the case when trying to make sense of science, intuitive appeal to obvious simple metaphors tends to prevent rather than facilitate understanding. For physics, that might mean trying to apply everyday understanding of the normal scale physical world to the quantum level. In economics, personal finance is extremely misleading about macroeconomics (e.g., a household that spends money it does not have is incurring a debt it must repay, whereas a sovereign that creates and spends fiat money is not incurring a debt it must repay but rather is changing the rules of its economy). For death counts, the misleading metaphor is trying to extrapolate from simple “obvious sufficient cause” cases.
When someone is instantly killed by massive trauma (car crash, gunshot, whatever), we all tend to agree that the trauma was “the” cause of death and it is easy to tally up the occurrences. Here we have a sufficient cause: change any other characteristics of the person and situation, and the massive trauma would still have be fatal. It is visible. It is very easy to imagine the counterfactual situation where all else in the world is basically unchanged, but the trauma was narrowly avoided. Under the near-miss scenario, the person’s health and longevity are probably not affected at all.
This is not a good model for understanding almost any other cause of death.
Indeed, it is not even a perfect model for counting up deaths from trauma. Note the use of scare quotes on “the”. A common inclination, which is always wrong, is to think of events as having one particular cause. Every death (every event) has an infinite list of causes, tracing back through the evolution of life on Earth and to the formation of the universe. There is no rule for choosing which are the causes that “count”, let alone which is “the” cause. That is entirely a matter of choice and context.
There are some rules of thumb for deciding which causes to count. Typically we ignore all of the necessary causes except when being silly or perhaps poetic. We do not bother to mention that someone’s birth and survival up until yesterday are among the causes of their death today. Even removing those, though, there are still countless complications even in the simple cases.
One immediate implication of this is that anyone who tries to argue about these matters with phrases like “that is not the real cause” rather than “we tend to not count it in the tallies of cause X if Y is the case” is demonstrating they do not understand the topic. The ubiquitous example these days are those who insist that COVID-19 was not “the” cause of someone’s death or because it was “really” some existing comorbidity. (This is pretty much always motivated by their political commitment to pretending COVID-19 or policies around it are not so bad.) You have probably also seen nonsense along the lines of “you never see X written on a death certificate, so therefore you cannot claim X kills anyone”, where X is smoking, or ETS, or lack of access to adequate medical care, or whatever — this “reasoning” is utter nonsense for any X.
A death from trauma was caused by the trauma. But it might have also been caused by the fact that the trauma took place in a remote rural area, whereas had it occurred in the city, quick EMT response and the proximity of an operating room would have saved the person. In casual conversation we would just say the person died of the particular trauma without mentioning the geography. However, if the question was “how is survival (i.e., the probability of dying at any particular age) affected by living in a rural area?”, then this death would count as an extra death caused by being rural. That it was caused by the particular trauma is not relevant to that question. But wait, you might ask, how could we possibly count such deaths — it is not as if we can identify every such event, let alone determine whether the ruralness made the difference? Yes, exactly. Read on.
So in the case of the death of someone with COVID-19 who also has COPD, the cause of death is very likely both SARS-CoV-2 and COPD. And whatever gave them COPD in the first place. And whoever gave them SARS-CoV-2. And the fact that their great grandparents had children, and so on. And of course such deaths should be attributed to the pandemic, though tallying individual events turns out to not be the right measure of deaths from the pandemic (read on). You have probably seen silly bickering about people “dying with” COVID-19 rather than dying from it. But unless the death was caused by a car crash (and perhaps even then), everyone dying while suffering from COVID-19 (which refers to having actual disease, as opposed to merely being colonized with SARS-CoV-2) died from COVID-19, whatever else also caused their death. Same with smoking — smoking causes the death of approximately every smoker.
Wait, what? How can that be?
It is all about the “when”
It turns out the real question is how could that not be. This becomes apparent when you pause and realize that an exposure or event causing someone’s death does not mean it is responsible for them ever dying. You can ask what caused a particular tree to grow rather than never growing, and you can ask what caused it to fall last year, but you cannot sensibly ask why it ever fell rather than standing until the heat death of the universe. Causing a death obviously has to mean causing the death at a particular time rather than later. Almost everyone whose body is damaged by decades of smoking or by a case of COVID-19 died at the particular time rather than later because of that damage. This is true whatever other diseases or conditions they had. If they had some other serious disease, then it is almost certain that both the other disease and the smoking/COVID-19 caused that death at the particular time. Even if that other disease had them on a path to an early death in the absence of smoking/COVID-19, the ravages of smoking/COVID-19 still caused the to die then.
There is a misguided urge to simplify things, as with analogies in particle physics or macroeconomics, and think that every death is like an quick death from trauma. In that confused mindset, there must be a “main” cause, and if we “know” it was the other disease then it was not smoking or COVID-19. This is simply wrong. However one might want to decide on what is the “main” cause (there is no rule or definition there, and it must vary by the question that is being asked), the it does not negate the other causes. Consider the logical extension of this confusion, straying into that silly/poetic zone I mentioned: Being born sets someone inexorably on a path toward death, so that is the main cause, and thus nothing else counts as their cause of death.
Of course, we want to draw a line somewhere and say the death has to be at least X time earlier than it otherwise would have been before we tally that cause for whatever purpose, with X being at least a day and perhaps a year. We do not want to count it as a death from smoking if someone died of a gunshot wound a minute sooner than they would have because their arteries had been damaged by smoking.
Where do we draw that line? Don’t look to the epidemiology textbooks for an answer. They invariably use thoughtless language that refers to the exact moment. Even one second sooner counts, making the stated definitions pretty useless. No one really tries to answer this question, either to offer a general rule or in a particular case when they are trying to count up deaths from a cause.
So how do we count?
As if often the case, practice solves the problem created by the failings of theory. The answer to the existential definitional question about what was being measured is “whatever is implicit in the method they used to measure it”. Questions of causation are always about the counterfactual. If the death would not have occurred at about that time in the counterfactual reality where the exposure (disease, behavior, event) had been absent, then the exposure is a cause. What is meant by “at about that time” (that minute? that year?) is implicitly defined by the method used to assess the counterfactual.
How do we figure out how many people were killed by an broad-acting event, like the SARS-CoV-2 pandemic, a war, or Hurricane Maria? There is basically only one way, which is to look at the number of deaths during the period of the event (including its ripple effects) and compare that to an estimate of the number that would have occurred in that population absent the event. The latter is usually the death toll for the same population during the previous comparable period, the last year of peace before a war or the previous September 15 through the end of November for the hurricane.
Perhaps you have seen one of the comparisons of total deaths during March or April 2020 (or whatever period) compared to the average deaths for that period for the past few years. These have been published for New York, and other places, with the original contributions from Italian towns. The logic is simple (and correct): We would expect about the same number of deaths this year barring the major mortality event. Thus the excess this year compared to last year is attributable to the pandemic.
It turns out that for those comparisons that have been reported, the deaths caused by the pandemic are really about double the official statistics that come from what is toted up by hospitals. This is not even slightly surprising. Widespread upheavals alway kill a lot more people than those who show up at the hospital clearly suffering from a case of upheaval. The proper estimates of deaths from the U.S. annihilation of Iraq or from Hurricane Maria hitting Puerto Rico are in the order of ten times the count that was manually toted up from specifically attributed deaths.
Of course, reactions to these assessments are political. Those who would like to downplay a genocide or government malfeasance go with a “what foul sorcery is this?” type faux-argument: “We have the names of a thousand people that died here from this, but you are telling me there were 10,000? Ha! Then you can tell me their names, right? Who were they? How do you know they died of this? Show me the death certificate that says that!” Needless to say this fails to understand how all research on populations and causation is done. And by “populations” I don’t just mean groups of humans. I mean anything that looks at rates within collections of anything — people, plants, cancer cells, atomic nuclei. Estimates of changes in rates or prevalences pretty much never made by specifically identifying which particular individual had a different outcome because of the exposure. They are made by comparing the totals in the exposed and unexposed states. (In fairness to the people who believe this nonsense — as opposed to the many who know full-well their faux-argument is a lie — they probably never had a decent science class in their life. And they have never heard of Dunning-Kruger syndrome, and thus are confident they don’t exhibit it.)
It is pretty easy to see that this comparison is the only way to do the estimate. For the cases of war, hurricane, or pandemic, there are some deaths that upon inspection would presumably be attributed to the event, but they are never inspected by anyone who is keeping count (e.g., someone dies in their home after being hit by shrapnel and are buried by the family; someone dies from COVID-19 without receiving treatment, but the tallies only include people with biologically diagnosed infections and postmortem tests are not done). But there are also many deaths that were caused by the event that would never be apparent from such inspection, but were indeed caused by the event (e.g., someone died of a heart attack that was caused by the stresses created by the event, or from a foodborne disease caused by a loss of refrigeration, or because they did not have access to some medicine or other supply that would have kept them alive).
There is nothing specific about epidemiology here. Indeed, examples from other population studies are even more clear. When we estimate how much a frog population has declined due to pollution, there is no census of frogs that died from the pollution. When we measure how much a chemical reaction is sped up by increasing temperature, we literally cannot know which particular molecules were created only because of the higher temperature. Or if that is not clear enough, when we lower the temperature, we obviously cannot possibly know which nonexistent molecules were not created due to the lower temperature. Or to take a similar example that is dear to most of those who are inclined to make the political “what sorcery is this?” faux-argument, if an anti-immigration policy is estimated to have reduced immigration by 10,000, we do not based that estimate on a list of people who would have immigrated but for the policy.
Deaths caused by the current pandemic include people who were thriving and healthy but succumbed to COVID-19, people who were slowly dying of something else and died sooner because they got COVID-19, uninfected people who avoided getting needed medical care or could not get medicines or other supplies due to the crisis, and stressed and depressed people who killed themselves or just slipped away because of the troubling times. It also includes on the subtraction side of the ledger some deaths that would have occurred from car crashes (the driving did not happen) or seasonal influenza (which was nearly wiped out by the minimal hygienic behaviors people started doing, pointing out what a nasty grubby people we are normally). The only way to estimate the total is by looking at the total. There is no possible way to identify the individual cases.
Anyway, there was a question pending — “how much sooner?” — but I could not answer it until I presented this. We can circle back to it now: How much sooner does a death have to occur due to a cause before that cause “counts”? The answer is that if the comparison is March 2020 versus March 2019, then we are talking in the order of a couple of weeks (if a death was pulled forward from April, it counts, but if it just shifted within March by a few days, let alone a few minutes, then it does not). If the comparison is the year(s) of an event versus the year before it started, then we are talking in the order of many months. Note that this is necessarily rough, neither precise nor constant. Someone who dies one day sooner than they would have, but that happens to be on the very last day of the period that is considered the event period, they will be counted. Someone who dies earlier during March 2020 is not counted, even if it is 29 days earlier but still within the month. It tends to average to about half the period.
The question of “how much sooner?” is thus not answered based on a theory about what is “right” to count. There is no such theory. The question is simply answered in a practical sense of “this is what was measured in this particular case”.
Oh, and if you find yourself thinking, “but the number of deaths was never going to be exactly the same from one year to the next, so not all the change is due to the event”, then: (a) You are correct, of course; such estimates are never exact. (b) I have some bad news for you about the precision of pretty much every scientific statistic or claim you ever see. And even most of the non-quantitative claims. This source of uncertainty is utterly trivial compared to many others.
So that works for events, but how do you count deaths for ongoing causes?
Perhaps instead you are now thinking, “hey, you promised you would talk about smoking, but you cannot just compare the year in which smoking happened to the previous year in which it did not.” Nor can you make a similar comparison for something like that example of “living in a rural area” or most other statuses, since those do not vary much from year to year and the effects are often cumulative anyway. For such questions we still need to make a similar comparison, but it needs to be versus other people rather than versus the same population the year before.
These estimates are usually made by looking at the mortality rates of those with the behavior/condition and those without it. We still have to deal with the “how much sooner?” issue — we cannot just count total deaths. So the usual way to look at it is differences in age-specific mortality. (Since ages are always measured in years, this makes the usual answer to the “how much earlier?” question the same as it was for a year-over-year comparison.)
Once again we are not estimating changes in population rates by identifying specific individuals who had different outcomes due to the behavior or disease. That is impossible. We do not identify the 43-year-olds who died at that age because they lived in a rural area or the 71-year-olds who died at that age because they smoked. We just count how many extra there are compared to what we would expect based on the comparison group. To the extent that there is identification and tallying (declarations of “the” cause of someone’s death) it is more misleading than informative. We never really know which ones they were. “The” cause of death on a death certificate is a fundamentally unscientific concept.
Unfortunately the comparison across different people is not nearly as clean and legitimate as those year-over-year comparisons (recall that “I have some bad news for you” quip). People with a particular behavior or status are not random. The comparison from one year to the next is a very good proxy for the imaginary experiment where we look at the exact same people under different circumstances. Comparisons of different groups of contemporaries who have some different characteristic… well, not so much. So we have to deal with confounding, “controlling for”, and all that. This is especially true for a behavior like smoking, which is strongly associated with all sorts of other choices and various circumstances.
So how do they do this for smoking?
Well that is not entirely true. Calculations using these methods exist, and they try (almost certainly not very successfully) to use adjustment variables to remove the effects of confounding (the fact that the people who are smokers and nonsmokers would still differ even if smoking did not exist). From this we get numbers like “smokers live an average of 7 fewer years than if they had not smoked”, which can be estimated legitimately this way. (Can be estimated legitimately. I am not suggesting that there is sufficient skill or honesty among those who do this research that this actually happens.)
But those are not the numbers you hear all the time, claims like “400,000 Americans die from smoking every year.” (Or whatever the “official” claim is this week — 480,000? I don’t pay much attention because it is meaningless.) These numbers are not estimated using the normal comparisons. Instead, they are based on taking a list of diseases that smoking causes [supposedly*] and tallying up across them the following: estimate what portion of the cases of that disease are caused by smoking, and then multiply by some declaration of how many people die from that disease.
There are, um, a few problems with this. There is the problem that the estimates of the portion are characterized by study bias, publication bias, and cherrypicking, so every entry in the summation is inflated. [*For most entries on the list, smoking clearly does cause some cases, though the portion is probably overestimated. For some, it is not entirely clear that smoking causes them at all, and you get positive estimates only because of these biases. Thus the “supposedly”.]
But set that aside, because it is not the point here. Someone could use this method without trying to bias the results, even though it has never apparently been attempted. But it would still not be valid, or even meaningful. Notice the phrasing “some declaration of how many people die from the disease.” We can count how many people get a particular cancer or COPD or whatever — so long as we define “getting it” as “having it medically diagnosed”, which is what is always done. But how many people die from it? Such statistics are not terribly meaningful. It is possible to create such numbers using the methods described above, getting estimates of increases in age-specific mortality, and then to parse that against the ages that smokers die and how many of them have the disease, and what portion of those cases for that cohort are attributable to smoking, and so on. That would be an interesting analytic challenge that someone could spend their whole career on.
(Narrator: “No one does that.”)
In practice, the death counts in that toting up come from “official” declarations of “the” cause of a disease. So we are back to the “death certificate” problem. Does this tell us whether someone who was suffering from lung cancer, which had a 90% chance of having been caused by their smoking, died a year earlier than they would have, or a month, or whatever? No. We just know they died while suffering the effects of lung cancer. So it counts as a .9 toward the “498,262 deaths per year” claim. Oh, but she also had two other cancers, which had a 20% and a 40% chance of being caused by her smoking, and COPD (90%) and heart disease (50%), so actually it contributes 2.9 deaths toward that total. Ok, it is not quite that bad, but there is a serious double-counting problem also.
But here’s the biggest problem: She almost certainly died sooner than she would have because of each of those diseases. But even if she had not had those diseases (or even if not one of them was really caused by smoking — there is a tiny probability of that), she still died sooner than she would have because she smoked, almost without a doubt. Smoking damages the body in many different ways, making it less likely to keep functioning one more minute, no matter what else happens. If she had died in the hospital due to injuries from a car crash, smoking would have been a cause of the death happening at that moment.
So according to the epidemiology textbook definition, if you want to estimate of the number of smokers who died from smoking in a given year — so much as one second sooner — you need only count the number of smokers who died. Basically every smoker dies at least a bit sooner because of smoking. But if you don’t smoke, don’t feel too smug about this. About 99% of us die (by that definition) from not eating an optimal diet. And also from not getting the right exercise.
As already noted, this makes the whole concept of counting deaths due to a particular cause pretty much meaningless unless a “how much sooner?” period is specified or implied by the research methodology. This is not done for the 4xx,xxx deaths/year claims rendering that statistic utterly meaningless.
The ultimate irony here: Estimates like “George W Bush et al. killed a million Iraqis as a result of their war of aggression” are treated as highly controversial sorcery, even though they are based on a very solid methodology. Meanwhile, statistics like 4xx,xxx deaths/year from smoking are recited as if they are solid facts, when they are really utterly meaningless.
The main takeaway for life in 2020 is: When you see a statistic for pandemic deaths that is based on year-over-year comparisons, and it disagrees with the official counting-up numbers, believe the former not the latter.