by Carl V Phillips
I have been having an ongoing conversation with Kristin Noll-Marsh about how statistics like relative risks can be communicated in a way that allows most people to really understand their meaning. There is more there than I can cover in a dozen posts, but I thought I would at least start it. I have created the tag “methodology” for these background discussions about how to properly analyze and report statistics (“methodology” is epidemiologist-speak for “how to analyze and report data”).
Most statistics about health risks are reported in the research literature as ratio measures. That is, they are reported in terms of changes from the baseline, as in a risk ratio of 1.5, which means take the baseline level (the level if the exposures that are being discussed are absent) and multiply by 1.5 to get the new level. This is the same as saying a 50% increase in risk. It turns out that these ratios are convenient for researchers to work with, but are inherently a terrible way to report information to the public or decision makers. There is really no way for the average person to make sense of them. What does “increased risk, with an odds ratio of 1.8” mean to most people? It means “increased risk”, full stop.
Every health reporter who puts risk ratios in the newspaper with no further context should be fired (some of you will recall my Unhealthful News series at EP-ology). But the average person should not feel bad because it is likely that the health reporter — and most supposed experts in health — cannot make any more sense of it either.
The biggest problem is that a ratio measure obviously depends on the size of the baseline. When the baseline is highly stable and relatively well understood, then the ratio measure makes sense. This is especially true when that deviation from the baseline is actually better understood than actual quantities. So, for example, we might learn that GDP increased by 2% during a year. Few people have any intuition for how big the GDP even is, so if that were reported as “increased by $X billion” rather than the ratio change, it would be useless. Of course, that 2% is not terribly informative without context, but the context is one that many people basically know or that can easily be communicated (“2% is low by historical standards, but better than the recent depression years”).
By contrast, to stay on the financial page, you might hear that a company’s profits increased by 10,000% last year. Wow! Except that might mean that they profited $1 the year before and got up to $100 last year. Or it might be $1 billion and $100 billion. The problem is that the baseline is extremely unstable and not very meaningful. This contrasts yet again with a report of revenue (total sales) increasing by 50%, which is much more useful information because a company’s sales, as opposed to profits, are relatively stable and when they change a lot (compared to baseline), that really means something concrete.
So returning to health risk, for a few statistics we might want to report, the baseline is a stable anchor point, but not for most reported statistics. It is meaningful to report that overall heart attack rates are falling by about 5% per year. The baseline is stable and meaningful in itself (the average across the whole population), and so the percentage change is useful information in itself. This is even more true because we are talking about a trend so that any little anomalies get averaged out. By contrast, telling you that some exposure increases your own risk of heart attack by about 5% per year is close to utterly uninformative, and indeed probably qualifies as disinformative.
As I mentioned, the ratio measure (in forms like 1.2 or 3.5) are convenient for researchers to use. You probably also noticed me playing with percentage reporting, using numbers you seldom see like 10,000%. This brings us to the reporting of risk ratios in the form of percentages as a method of lying — or if it is not lying (an attempt to intentionally try to make people believe something one knows is not true), it is a grossly negligent disregard for accurate communication.
Reporting a risk ratio of 1.7 for some disease may not mean much to most people, but at least that means it is not misleading them. There is a good way to explain it in simple terms, something like, “there is an increase in risk, though less than double”. If the baseline is low (if the outcome is relatively uncommon) then most people will recognize this to be a bad thing, but not too terribly bad. So the liars will not report it that way, but rather report it as “a 70% increase”. This is technically accurate, but we know that it is very likely to confuse most people, and thus qualifies as lying with the literal truth. Most people see the “70%” and think (consciously or subconsciously), “I know that 70% is most of 100%, and 100% is a sure thing, so this is a very big risk.”
(As a slightly more complicated observation: When these liars want to scare people about a risk, they prefer that a risk ratio come in at 1.7 rather than a much larger 2.4. This is because “70% increase” triggers this misperception, but”140% increase”, while still sounding big and scary, sends a clear reminder that the “almost a sure thing” misinterpretation cannot be correct.)
The problem here is that people — even fairly numerate people when working outside areas they think about a lot — tend to confuse a percent change and a percentage point change. When the units being talked about are percentages (which is to say, probabilities, as opposed to the quantities of money like the above examples) that are changing by some percentage of that original percentage, this is an easy source of confusion that liars can take advantage of. An increase in probability by 70 percentage points (e.g., from a 2% chance to a 72% chance) is huge. An increase of 70 percent (e.g., from 2% to 3.4%) is not, so long as the baseline probability is low, which it is for almost all diseases for almost everyone.
There seems to be more research on this regarding breast cancer than other topics (breast cancer is characterized by an even larger industry than anti-tobacco that depends on misleading people about the risks, and there is also more interest in the statistics among the public). It is pretty clear that when you tell someone an exposure increases her risk of breast cancer by 30%, she is quite likely to freak out about it, believing that this means there will be a 1-in-3 chance she will get the disease as a result of the exposure.
Reporting the risk ratio of 1.3 will at least avoid this problem. But there are easy ways to make the statistic meaningful to someone — assuming someone genuinely wants to communicate honest information and not to lie with statistics to further a political goal or self-enrichment. The most obvious is to report the relative risk based on the absolute risk (the actual risk probability, without reference to a baseline), or similarly report the risk difference (the change in the absolute risk), rather than ratio/percentage. This is something that anyone with a bit of expertise on a topic can do (though it is a bit tricky — it is not quite as simple as a non-expert might think).
Reporting absolute changes is what I did when I reported with the example of 2% changing to 3.4% (or, for the case of 1.3, that would be changing to 2.6%). The risk difference when going from 2.0% to 3.4% would be 1.4 percentage points, or put another way, you would have a 1.4% chance of getting the outcome as a result of the exposure. Most people are still not great at intuiting what probabilities mean, but they are not terrible. At least they have a fighting chance. (Their chances are much better when the probabilities are in the 1% range or higher, rather than the 0.1% range — once we get below about 1% intuition starts to fail badly.)
To finish with an on-topic example of the risk difference, what does it mean to say that smoke-free alternatives cause 1% of the risk of serious cardiovascular even (e.g., heart attack, stroke) of smoking? [Note: that this comparison is yet another meaning of “percent” than those talked about above — even more room for confusion! Also, this is in the plausible range of estimates, but I am not claiming it is necessarily the best estimate.] It means that if we consider a man of late middle age whose nicotine-free baseline risk is 5% over the next decade, then his risk as a smoker is 10%. Meanwhile, his risk as a THR product user would be 5.05%. Moreover, this should still be reported as simply 5% (no measurable change) since the uncertainty around the original 5% is far greater than that 0.05% difference.
Carl, I’ve often discussed the EPA Report figure that gets interpreted as “If you are exposed to secondhand smoke your risk of lung cancer goes up by 19%.”
First of all of course, that leaves out the vital information that the figure comes from a working LIFETIME of exposure … not the exposure you get once in a while walking through a doorway or when the couple on the blanket next to yours lights up at the beach.
Secondly it is derived from an exposure period largely based on workplace exposures of the 1950s through 1970s (due to the 20 to 30 year lag time involved and the post-1980 dates of the studies) when workplace air-management largely consisted of little more than an open window or two and 50% of the population smoked: smoke conditions you’d rarely encounter outside of a truly dive-level bar today.
And thirdly, it fails to communicate the base rate of lung cancer in nonsmokers which seems to be roughly .4% This last consideration would mean that a working lifetime of 40 years in an old smokey environment would increase your LC chances from about 4 in a thousand up to about 5 in a thousand. That’s an increase of one lung cancer for every 40,000 worker-years of exposure if I’ve got my stats hat on correctly.
If the theorized risks were reported as “one cancer for every 40,000 worker-years of exposure” do you think we’d see people opting to stand out in a windy rainstorm so as to avoid the cancer risk from standing under a bus shelter with a nasty smoker?
As I said, I need a dozen posts to really cover all of this. This post was basically about what you mention in your 4th and 5th sentences. Your 2nd and 3rd sentences each require entirely separate lessons in themselves (and they are rather different points). The 6th-8th sentences go a different direction with the question of how to optimally communicate. Once you get down below the 1% range (as I mentioned in the post) the absolute and difference measures start to be misunderstood, and so something like what you suggest is needed.
As for bus shelter bit, I have always pointed out that the health argument against modest ETS exposure is based on junk science, but there is a perfectly legitimate aesthetic argument. Someone might quite rationally stand out in the rain to avoid the smell of the smoke.
“there is a perfectly legitimate aesthetic argument. Someone might quite rationally stand out in the rain to avoid the smell of the smoke.” On that, I’d fully agree, but remembering back into the 1960s and ’70s, that sort of avoidance behavior, either for aesthetic reasons or for health-reactive asthmatic type reasons, was almost non-existent. *I* happened to hate the smell of smoke as a kid, and I remember being universally regarded as “weird” because of that, with people reassuring my grandmother that “he’ll grow out of it…”
Michael, who evidently grew out of it… LOL!
I was reading that having b positive blood increases your risk of pancreatic cancer by 72%. Pancreatic cancer accounts for 1.3 % of all cancers. So what are the odds we’re talking about here. I’m a little confused.
That is a good question. Risk numbers that are communicated to the public definitely should not be in terms of risk ratios, which are pretty much impossible to interpret. Exactly what is best is up for some debate, but the obvious alternative is absolute risk (the probability that this will happen to you). So even the numbers you present (which, I should note, I am not endorsing — I am just taking them and using them) are not enough. A 70% increase in something that is 1.3% of cancers would bring it up to about 2.2% of cancers. But what we really want to know is what the chance of it happening are, which would require converting that into “chance the average person gets the disease x (1+percent increase for high risk population).
Paula, this is something I see in the area of smoking all the time. E.G. If, for the moment, we take the EPA Report estimate of a lifetime of workplace smoke exposure in the 1940s through 1970s as increasing lung cancer risk by 19% what does that mean?
Given the baseline nonsmokers’ risk of lifetime lung cancer of .4%, and and taking the lifetime workplace exposure to be 40 years, it means that such extremely heavy exposure (remember: this data was mainly based on 1950s/60s type workplace exposures) would create ONE lung cancer for every 40,000 worker-years of exposure.
That offers a slightly different picture than warning someone that if they happen to be near a smoker at some point that their lung cancer risk “increases” by 20% eh?
Additionally of course, once you throw in a correction for decent modern ventilation arrangements and reduced population smoker concentration a fair adjusted figure might be more on the order of one lung cancer for every 400,000 worker-years of exposure. And, if you required the EPA to accept the standard “95% Confidence Interval” as the normal scientific rule like normal scientists, that 400,000 years would expand to at least four million years of daily exposure.
Sound a bit less scary than a “20% increase”?