Sunday Science Lesson: What is “meta-analysis”? (and why was Glantz’s inherently junk?)

by Carl V Phillips

The recent controversy (see previous two posts), about Stanton Glantz’s “meta-analysis” that ostensibly showed — counter to actual reality — that e-cigarette users are less likely to quit smoking than other smokers, has left some readers wanting to better understand what this “meta-analysis” thing is, and why (as I noted in the first of the above two links) Glantz’s use of it was inherently junk science.

What is “meta-analysis”?

Synthetic meta-analysis consists of synthesizing the results of previous quantitative estimates of ostensibly(!) the same phenomenon into a single result, theoretically to get a better measure than is otherwise obtainable. For epidemiology, particularly including research about medical therapies, and other social sciences, this generally means averaging together various study results, each of which was already an average across the population that was studied.

In what follows, except when I note otherwise, I am writing about that particular type of meta-analysis, which is what is usually meant when it is used in health sciences. I will leave out the modifier, synthetic, and the scare quotes, which I used to emphasize that this jargon does not mean what the word literally means. (In a sidebar in the first linked post, I briefly explain how other study methods that fall under the rubric “meta-analysis”.)

There are two basic approaches (though, of course, there are countless variations of these, and hybrid versions): 1. Taking the statistical results of each of the studies (e.g., the estimated odds ratios), and creating a weighted average of them (weighting based on the size of the studies). 2. Obtaining the original data from each study and pooling all the individual observations into a single dataset. These theoretically accomplish the same thing, though the second offers several advantages in the rare instances it can be done (e.g., adjustment for confounding can be done by estimating the effect of “confounder variables” across all the data, rather than just accepting the adjusted result from each study).

The unstated fiction that justifies a meta-analysis (or, I should say, supposedly justifies this) is the “separated at birth” assumption: We pretend that all of these studies were actually bits of one large study, but the datasets were divided up and sent to different researchers to analyze. Since each of these smaller subsets will product less reliable results, due to random error, than the (fictitious) whole dataset, the synthesis is intended to put them back together and get the more reliable result. That is what this kind of meta-analysis can accomplish, and nothing more.

Three things should be immediately apparent from this:

1. If it is absurd to pretend the studies were parts of a single large study, the whole enterprise is absurd. I will address that at length below.

2. Garbage in, garbage out. This is the simplest criticism of meta-analyses. If there are serious problems in the original studies (for purposes of answering the question at hand), the meta-analysis does nothing to fix them. It just incorporates their results and thus enshrines the biases. Indeed, it is really worse. It is “garbage in, garbage papered over“, because the meta-analysis not only does not fix the issues but it hides them. Someone reading the original faulty/biased/inappropriate-for-purpose study would at least have a chance of recognizing the problem, but someone just looking at the meta-analysis would not.

3. The only advantage of this method is averaging out random errors. In some sense this is a subset of point 2, but it is important enough to separate out. There are many different ways an epidemiologic study’s results can differ from the true value it is trying to estimate, but the only such error that is ever quantified in 99.99% of public health papers is random sampling error. Those confidence intervals you see are a heuristic quantification of roughly how much uncertainty there is from random error — i.e., that bad luck-of-the-draw resulted in an odd sample of the population. Confidence intervals ignore confounding, selection bias, measurement error (the data not representing the true values), and other problems. A common question about confidence intervals is “how likely is it that the true value falls outside of that range?” It turns out that this is rather complicated to answer, even if there were no other errors, and so most answers offered are wrong. But it is fairly easy to answer it in light of  those other errors: The real answer is “extremely likely.”

I explain this because the single benefit of synthesizing those supposedly separated-at-birth studies is to reduce random error — it effectively creates a larger sample, and the larger the sample, the smaller the probability that the estimate is far from the true value due to chance alone. It averages out the errors from pure bad luck that produces unrepresentative samples of the population in each of the smaller studies. If a study had a major selection bias problem, it is possible that other studies in the collection could have a random scattering of selection bias impacts that average out. But much more likely is that other studies have bias in the same direction, caused by similar selection issues. This is even more true with confounding bias, since confounding is a characteristic of the underlying population, which should(!) be the same for the various studies. The “residual confounding” (confounding that is not “controlled for” with covariates) might not be quite so homogeneous, but it will probably be the case that the bias across studies is mostly in the same direction.  Some measurement error (e.g., a typo in coding the data) is random, but some is not (e.g., people consistently underreporting how alcohol they drink). Thus, the non-random errors are extremely unlikely to be remedied by the meta-analysis. They are just buried and enshrined in the summary statistic.

Why would you want to do meta-analysis?

The short answer, for anything in the realms that are covered by this blog is: You wouldn’t. Not if your goal was truth-seeking and you wanted to do valid science, that is. Even apart from the specific problems described below, it simply serves no legitimate purpose.

To see this, consider a simplified example where synthetic methods make enormous sense. Astrophysicists are trying to tease out an extremely subtle signal from somewhere in the sky. They would do well to synthesize the data from many nights of observations from their telescope, and also observations from the same area and spectrum from other telescopes. It is possible that each one of these studies, considered alone, would produce nothing useful because there is too much random noise in the data, but when combined they produce useful information. Each study of the sky is observing the same phenomenon, so that condition is met, and the problem (in this story) is that random error from every single study overwhelms any signal.

Now consider the closest analogy to that in the world of epidemiology. We are trying to figure out whether drug X or drug Y is a more successful treatment for a disease. Imagine that a hundred hospitals across the country were interested in this, so they randomized patients with the disease to X or Y and recorded their outcomes. The problem is that each hospital only treated ten patients, and so for each of those reports, random error (e.g., by pure chance, assigning three patients who were doomed to not recover to X, but only two to Y) overwhelms the small difference in effect we are hoping to measure. A person just reading through the collection of reports would not be able to sort out the signal, as with the astrophysics case. But a meta-analysis could combine them all, as if they were a single study of a few thousand people, which could be enough to estimate the different effect of the drugs.

So how often do we actually have a situation like that for medical experiment data? Very rarely.

What we have instead are data from ten, maybe twenty, rarely fifty, such experiments that each included more subjects. Each produces enough information — imperfect, as always, of course — that it can be interpreted without meta-analysis tricks. Take a look at this distribution of study results Glantz reported. The figure in that post is a standard representation of a review of study results (leaving out the last row, which is the scientifically meaningless synthesis of the other rows). The rows are a list of studies. The center point of each graph to the right is the point estimate from that study, with the error bar representing how much propensity for random error the study had (the size of the grey blob for each study shows the same information). The exact size of the bar has an arcane meaning that can just be ignored by non-experts, but everyone can understand that the wider the bar, the smaller the study, and thus the greater probability of more random error.

Now pretend this figure summarizes the collection of decent studies testing drug X versus drug Y from the above story. You can look at that and immediately say “almost all the results are to the left, meaning X (let’s say) did better, so it is pretty clear that X works better. That is what we know right now.” But as long as you do not make the mistake of synthesizing all your results into the bottom row, thus throwing away most of the available information, you can say more. For example: “There are a couple of studies on the other side of the null. One is small, so that could have been just some extreme random error, though it would have to be really extreme. But for the larger one, with its lower likelihood of much random error, that is really not plausible. Something needs to be explained, rather than just pretending these studies differ due only to random error and lumping them together.”

In fact, if you know how the errors bars are calculated and see how far apart they are, you can observe that the outlier study (“West”) and the third on the list (“Vickerman”) produce results that are utterly incompatible with the “part of the same large study separated at birth” assumption. That is, it is implausible that they were both reasonably unbiased studies of the same phenomenon. The probability of seeing a pair of results from studies of that size that differ so much by chance alone is down in the range of “might have never happened, even once, in the history of all medical research.” Thus we have no business assuming they differed due to chance and just throwing the results together. We need to think about why those studies that favored drug Y did so. Maybe it will reveal circumstances where drug Y really is better. Maybe there is something identifiably wrong with the study called “West” that means we should not be using its results at all. Or maybe that thinking would tell us, “hey, ‘West’ turns out to be the only one here that actually measures what we are interested in; all the others were really measuring something else or were hopelessly biased. So it alone, rather than a combination of all the studies, gives us our best current estimate.”

Notice that all of this information is lost in the meta-analysis summary. Meta-analysis, then, serves the Orwellian language role of preventing particular lines of thinking because there is no available vocabulary to build the thoughts upon. If a meta-analysis is not serving the purpose of seeking an otherwise unobservable signal within a lot of noise (as in the astrophysics example), then it is destroying signal that does exist. Well, not destroying it, of course. It is still possible to go back and use that information. But it is hidden to consumers of the meta-analysis result. More to the point, using that other information is what should have been done. That other information tells us not just that the studies should not have been combined as if they were separated at birth, but tells us we need to figure out which of these clearly contradictory studies were faulty measures and why. (In this case it was most of those toward the left, due to selection bias, as I discussed previously.)

The more complicated statistical methods are, the easier they are for liars to use to produce a result that actually contradicts the real evidence. Imagine that there were “public health”-type liars in astrophysics, trying to concoct new supernovas that do not really exist. The statistical analysis of all those 1s and 0s the telescopes produce is so arcane that no one reading the press release could ever catch them at it. This could only be done by another expert who reanalyzed the original data or a replicated the studies (though that is reasonably likely to happen in that field, unlike in public health). Most statistics in epidemiology are easy, and tell us little more than we can learn from a simple cross-tab of the study results. A critical thinker with high-school-level science can do that. So meta-analysis is a liar’s dream, offering a chance to bury that information beyond reach.

So, you might ask, why is meta-analysis ever done? Surely there must be reasons other than reverse-engineering a result that supports a personal policy preference, as Glantz did. There are various reasons. In descending order of legitimacy (or, rather, ascending order of illegitimacy) they are:

1. Our assessment of the quantity of interest might really be very close to some razor’s-edge decision point, and we need to make a decision. This is a broadening of the legitimate use of meta-analysis from that “hundreds of tiny experiments” example. What we really need to do is decide whether drug X or Y is going to be used from this point forward, but the results of the 15 (decent-sized and not apparently hopelessly biased) studies are normally distributed around it being a tie. This is apparently the result of random error. It is so close to a tie that (as with the astrophysics study) we cannot just spot the difference by thinking through the body of evidence. We need to break the tie as best we can to make a decision, and a meta-analysis can do that. But it is important to note that the real scientific assessment should remain “it is really too close to be sure, given the limits of our evidence, but as best we can tell, X has the edge”, rather than “the meta-analysis proves X is better.”

2. To dumb things down. At this point, we have already left the realm of scientific legitimacy and are serving other purposes. When reporting pre-election survey results, the news media often do a meta-analysis of several surveys (they would call it “averaging” them, which is an accurate description), to be able to provide a point estimate of support for each candidate. This is done for entertainment purposes. Those who are serious about truth-seeking, like people working for the campaigns to analyze survey results (if they are competent), are not going to just look at the simplistic averages reported on television. They are going to try to make sense of the full corpus of original information. The differences among the surveys contain information too.

Is there any legitimate reason for dumbing down the tests of drugs X and Y under the scenario where we pretended the Glantz figure represented those studies? No. Decisions should be made by people who are expert enough to make sense of the distribution — that most studies clearly favored X, though one outlier strongly supported Y. Averaging the studies together subtracts, rather than adds, information. If they need a soundbite for the media after making their decision, they can go with, “Almost every study favors X. One major study favored Y, and this has been a source of controversy. But the other evidence leads us to conclude that study must have gotten it wrong.”

3. To achieve “statistical significance”. This is purely a legal game, not real scientific inquiry. As with the debate game metaphor I offered in the first post in this collection, sometimes there are artificial rules for games that ape scientific truth-seeking but impose departures from it. Imagine that there are twenty studies that each produced an odds ratio of 0.8, but each is small enough that the result is not “statistically significantly” different from zero. A scientific analysis of that information would say “the OR seems to be about 0.8, and we should make decisions accordingly.” But a legalistic game — like a drug approval process or product liability trial — might have a rule that without a “statistically significant” result, all that evidence does not “count”. So someone does the meta-analysis to spit out a “statistically significant” synthetic point estimate to adhere to these rules.

Legalistic rules are not necessarily a bad thing for society. Rules for strict adversarial systems (like games, approval processes, or trials) cannot be as flexible as science, which is characterized by doing whatever works to figure out the truth. Criminal defendants are often found not guilty because of rules of evidence, even though it is obvious to anyone who is truth-seeking based on the evidence that they are guilty. Those rules might serve a greater purpose of discouraging police misconduct or protecting innocent defendants who the prosecution is trying to railroad. Of course it is also possible to argue that a particular rule does not serve the greater social good. But what is definitely true is that these rules do not exist to best seek the truth in a scientific way. They are designed to try to create a game that is pretty good at getting to the truth in spite of many players involved being willing to do or say anything to win. That is, rules of the legal game inevitably depart from truth-seeking scientific behavior, and so should not be confused with it.

[Aside: Such confusion is rife, of course. In real scientific thinking, “statistical significance” is merely a rule of thumb about whether to gather more data to reduce random error, or to move on to figuring out what the data means. It is an arbitrary line with no inherent importance. I am using the scare quotes to point out that this term sounds like it is a lot more significant(!) than it really is. For decades in epidemiology, it has been agreed (among those who really understand the science) that if you speak of statistical significance when talking about your results, you are doing something wrong.

Reporting your confidence intervals, to give readers a quick rough heuristic for how big the random error problem might be, is useful. But the mere fact of whether that confidence interval overlaps the null or not (which is the same as the result being statistically significant) is of no important consequence. Imagine one good honest study that estimated, say, e-cigarette use increases the risk of lung cancer by 20%. It would still be our best estimate of the risk even if the confidence interval overlapped the null. It would be out-and-out wrong to say that because of the lack of statistical significance, the study does not suggest there is increased risk.]

4. Because we can. There are tens of thousands of semi-skilled researchers in public health searching for ways to get publications to further their careers. The scientific value of the work does not really matter. Simplistic meta-analyses are trivial exercises that do not require any of the hassle of doing new field research, let alone tough scientific thinking. Just do a literature review and run the results through some software. And yet everyone writing in the area will cite it. In fairness, those who can add a bit of tough thinking can do better than the most simplistic meta-analyses, like Glantz’s, but it is still a “because we can” motivation.

5. To hide the flaws (or mere heterogeneity) in a collection of studies behind a summary statistic. It is the most scientifically illegitimate reason for a meta-analysis, and is clearly the reason for Glantz’s.

Characteristics that make a meta-analysis invalid for any purpose

Apart from the fact that there are vanishingly few cases where a meta-analysis could serve a serious scientific goal, there are often affirmative reasons why it is actively wrong to do it. Much of this has been covered already, but it is worth highlighting.

Meta-analysis does not work if the studies are attempts to measure different phenomena. Glantz threw into his mix studies of people who happened to have encountered and used e-cigarettes and studies of people who were encouraged to use e-cigarettes after volunteering for a clinical smoking cessation trial. Whatever the effects of each of these two very different exposures, there is clearly no reason to believe they would be the same. If studies of one got results that were different from studies of the other, it would not be because of random errors that need to be averaged out, but because they were measuring the effects of different exposures. Suggesting otherwise because “both measure effects of e-cigarettes” is the same as the astrophysics meta-analysis combining data from different areas of the sky because “they are all measures of the sky.” (Anyone who cites smoking cessation trial results as if they inform about the real-world effects of e-cigarettes is making the same mistake — see what I have written before.)

But worse than that, it is vanishingly rare that any two non-clinical epidemiologic studies ever measure the same phenomenon. A study of American smokers’ use of e-cigarettes and one of British smokers’ use of e-cigarettes are studying different phenomena. Even more so for other populations. They should produce different results, apart from random or nonrandom error, and averaging them together produces something as meaningless as “what is the average household square footage across a group of 1000 Americans, 400 Brits, 300 French, and 200 Dutch.”

And it is even worse than that. Maybe different populations of Westerners do have similar effects for the phenomenon in question. But we can be sure we are studying populations who are going to have different effects from the exposure if one study is of people who recently had heart surgery and are really motivated to quit smoking, another is random smokers, another is volunteers for a cessation study, another is smokers who are so desperate about their inability to quit that they call a quitline, and so on. And more so when some of the studies look at populations in 2010 and others in 2014, a difference would not matter if you were studying the effects of vitamin intake on a cancer, but is huge for e-cigarettes and behavior.

Then there is the problem that the exposures vary. “Used e-cigarettes” is not a well-defined exposure. It can obviously vary tremendously, and so studies that select or define the exposure differently are measuring different phenomena. The easiest illustration can be found in cessation trials. While cessation trials could not possibly measure the real world impacts that Glantz purports to be trying to measure, they could theoretically be made fairly compatible with one another. But they are not and will not be, and so meta-analysis of them alone is inappropriate. Each will offer subjects different products. Far more important, each will offer somewhat different levels of information and persuasion to subjects, and these differences will not even be reported in the methods. (And, of course, the populations will be different too.) The effects of these experiments will differ because the experiments differ, not merely because of random error that can be averaged out.

Now you might recall that Glantz ran a weak “sensitivity analysis” looking at how the results differed across some of the most glaring of these differences. But who cares? That is not the point. The point is that there is no conceivable legitimate meaning to a statistical average of their results, whatever any sensitivity showed.

A more legitimate reply to these concerns is that if you have enough studies that sort of represent some real-world distribution of exposures and populations, then the average could mean something. In theory, a bunch of cessation clinics that provide their patients with e-cigarettes and advice about them might collect data about outcomes. Each study would be a different exposure in a different population, but collectively they might represent the breadth of the general phenomenon of clinics doing engaging in that practice, and so the average result might represent some real average from the world. But nothing in sight looks like that — and it would take some serious thinking to ascertain if something did, not just throwing whatever came along into a statistical soup — so that is moot.

Notice what was just discussed is not about the original studies being faulty. All of these problems still exist if each study was as perfect for measuring the effect of interest for the specific definition of the exposure, in their particular population, as measured in a particular way as they could be. An additional layer of fatal flaw is added when they are not that.

In particular, consider the selection bias problem that I explained in the previous posts. Because of selection bias, most of the studies Glantz used in his meta-analysis were terrible measures of the supposed phenomenon of interest in the first place. Some were biased by the original authors’ intent, while others were never purported to offer a measure what Glantz used them for (and the authors told him so), but he created the selection bias by interpreting that way. Obviously averaging together a bunch of incommensurate results that are terrible measures of the phenomena of interest is even worse than merely averaging together incommensurate results. Once again, the role of meta-analysis in this case is to hide the fact that they are terrible measures behind statistical games.

Implications

Glantz’s meta-analysis is not just junk science because of details about the studies, though those are problems in themselves. It is junk science because there are probably not even two of the studies in his collection that are similar enough to average together, let alone all of them. I cannot imagine there ever being behavioral studies of tobacco use that could be legitimately combined into a meta-analysis, nor any scientific reason for wanting to do so.

 

 

19 responses to “Sunday Science Lesson: What is “meta-analysis”? (and why was Glantz’s inherently junk?)

  1. Thank you Carl, concise and insightful lesson.
    Indeed, it then speaks volumes that Glantz did the sensitivity analysis in the first place. Clearly he did so not because he was attempting to control for the variables between the studies, but because he knows that, in providing this statistical sleight-of-hand, that those unfamiliar with the truth would buy his method as legitimate when it is nothing more than modern charlatanism..
    Unfortunately the ones that are duped by this dishonesty are the very ones who (will) set policy. They are reporters, legislators, and regulators who need only this statistical tom-foolery to justify a political position favorable to their own funding sources.

    • Carl V Phillips

      Yes, I agree, of course. And my view is that the only way to do anything about this is to make it difficult — rather than trivial as it is now — to use public health science methods to produce nonsense. The first step toward that is to challenge public health’s undeserved reputation for doing legitimate and honest science. Only then will non-experts even start asking the right questions. Of course, there is plenty of room to dispute my perception that this would be effective — I obviously am inferring deeply, based on limited historical evidence. But I notice that no one does dispute this. Instead, they simply assert “we just have to do X because it is quicker and easier”, when X has been tried and seems to not really work. (And then they get mad when I point out that X does not really seem to work.)

      • Carl
        I get the feeling that most of what they publish is not really intended to be read at all; rather its more like a balsa-wood theater prop ‘sword’ to be waved around at press conferences , to impress and scare the ‘natives’.

  2. What about Glantz’s claim that he corrected for the difference between studies of those trying to quit and those not trying to quit by “looking at the numbers, and they were the same” ?

    • Carl V Phillips

      That is one of the (few) things he ran his sensitivity analysis on. If the data were otherwise valid, the result was such that one might say “hmm, I would have expected a bigger difference there, but I guess not.” But since the data are utterly invalid and incommensurate, there is really nothing interpretable in any of the statistics.

      In any case, those being different populations still would mean their results should not be averaged together. Even if the point estimate for each was about the same as the other, they are still not the same phenomenon. The rate of population grown in Haiti and the Alaska state sales tax rate are about the same too, but this does not mean it makes any sense to average them together.

  3. For every single new case of Emphysema and/or COPD., this bastard should be noted!

  4. or…. For every single new case of Emphysema and/or COPD., this hack should be noted!

  5. Roberto Sussman

    I comment on your astrophysical examples.

    You are right on saying that a well posed meta-analysis would be surveying the same region of the sky (at different times) for detecting a weak signal from a given identified source in that sky region.

    On the supernovae. A “public health”-type liar in astrophysics, trying to concoct new supernovas that do not really exist would be easily detected. Supernovae are well classified and catalogues are publicly available. Also, the supernovae used in fitting cosmological parameters are of a special type: SN type Ia (I explain this below). If the liar simply invents an SN Ia and places it in the Hubble diagram (Luminosity distance vs red shift) as an extreme outlier, then the theoretical cosmological model would be questioned, and there would be hundreds of cosmologists asking for details on this SN, such as its catalogue classification. If the fake SN is not placed as an outlier then the liar may get away with mischief, but the consequences would likely be minor and would not affect the statistics (perhaps could tilt the statistics to reduce the likelihood curves of some model). Still, there is a lot of data scrutiny so the liar would be risking a lot of discredit for little or no gain.

    Now, why supernovae? because they are very bright and thus can be used to estimate very large distances (up to 300 million light years), where individual “normal” stars cannot be distinguished from the galaxy image. These distance estimations together with red shift (light becoming redder because of the Universe expansion) are needed to probe theoretical cosmological models.

    Why SN of type Ia and not other type of SN? The reason is that SN Ia are (with some quirks and objections) accepted by the community as being good “standard candles”, which are sources whose absolute luminosity can be well determined. Luminosity is energy emitted by a light source (an astronomical object) by time unit, which decreases with the square of the distance. We observe sources with different luminosity (or brightness in different frequencies), but to estimate the distance we need to distinguish between a tenuous source being close and very bight sources being far. For this purpose, we define the absolute luminosity: the luminosity of the source if placed at 10 parsecs (about 30 light years). To estimate the absolute luminosity of a source you need to understand very well its physical properties. SN of type Ia occur from explosion of white dwarfs in binary systems, and thus the energy output of these explosions has small variation, which means that they will yield very similar values absolute luminosity and can be used reliably (with lesser error) to estimate distances. Other SN, for example those from explosion of massive stars and are brighter than SN Ia, but are not good standard candles, since the masses of these “progenitor” stars have a wide variation and thus they would produce very different absolute luminosities for different masses, and we cannot estimate these masses for stars in distant galaxies. For similar reasons galaxies themselves are not good standard candles.

    A comparison with the meta-analysis of medical sciences is difficult because in cosmology we rely on observations that occur at hundreds of millions of light years, are (largely) infrequent and thus cannot be “repeated” at will. In medicine, the researcher can always interact, modify or control the samples. In a sense, a meta-analysis is done in cosmology when plotting the luminosity distance as function of red shift with different samples of supernovae Ia (i.e. obtained form different observations), but they involve only SN of type Ia, so biassing factors are largely controllable. So, this is not the type of methodological flawed meta-analysis used by Glantz. A Glantz-like meta-analysis would be using assorted supernovae of different type, ignoring their different physical properties and their widely different absolute luminosity, and then announcing the results by statements like “we used supernovae to show X or Y”. No cosmologist would trust results based luminosity distances of generic supernovae.

    Evidently, the Glantz-like dictatorial political-ideological agenda strangling “Public Health” does not occur in cosmology. But then, the average Joe or Jane is much more impacted by “Public Health” telling them that e-cigs increase “cancer risks” than by a Cosmology telling them that supernovae show the Universe undergoing accelerated expansion.

    • Carl V Phillips

      Ok, so the scorecard is:
      -Meta-analysis analogy — check (I did know that or would not have used the example).
      -Supernova was the wrong choice of far-away phenomenon to use as an example of something that could be convincingly concocted — however, see below. (It was arbitrary. I just pulled that one out of the sky.)
      -Suggestion that absurdly false claims would be easy caught in that science — check.

      Additional observations:

      -By choosing supernova, I picked something that is too momentous in its implications to pass unchecked. There are discoveries that are mere catalog entries, where fiction could be possible, but there is too much to learn from an observed supernova. So that presents an interesting compare/contrast: If you think about it, a (real) discovery that e-cigarettes cause people to keep smoking would probably be the most important public health result of the year. I mean, what could possibly compete with that as news, other than a breakthrough cure for something. A product/phenomenon that we think is on its way to causing 10%, maybe 20%, of smokers to quit is found to cause them to not quit? Huge.

      So in a real science and honest field, a lot of people would check the claim before going to the newspapers with it. I doubt someone would go to the newspapers with a “discovery” from a new supernova that the universe is 20% older than previously believed (did I make up a plausible story, RS?) before their nearly unbelievable breakthrough was thoroughly examined. And that is a case where it does not even matter if people learn this soon-to-be-disproven claim, since it does not affect their choices, as RS noted. The fact that such a momentous claim can avoid any serious scrutiny is as much a problem as the junk science in the first place. It is not unlike the fact that the proposed FDA ban of e-cigarettes is sneaking through as if it were some minor technical regulation or adjustment, not the second biggest domestic policy of the decade.

      -I appreciate the extension of my point about heterogeneity of the observations.

      -Re being able to replicate on demand, that does work for medical treatments, as note. However, it is often not possible for population studies and is basically never done. Studying the same population again is hard, unless it was part of the same study design. Also, there is just so much to study (as with the sky), and people are just floundering around picking random things to do (which I hope is not the same), so it is seldom that two studies really hone in on the same point, even when their titles imply they do.

      • Roberto Sussman

        Cosmologists do go to the science newspapers (sometimes even to general media) reporting studies that challenge the accepted results. There are recent studies showing that SN Ia may not be as ideal standard candles as we think. There are also claims that more distant objects (quasars) may be used as standard candles probing larger distances (i.e. older cosmic times) than SN Ia, and possibly predicting a paradigm breakthrough. But since the authors of these studies know they are under close scrutiny, they tend to proceed with caution and mention that these are “promising” but still premature or incomplete studies. Paradigm breaking studies have to pass a lot of tests before they are accepted by the community. This may be a lengthy process. There was a lot of hype last year on the indirect discovery of gravitational waves and indirect confirmation of the theory of inflation. However, these discoveries were based on a weak signal and the noise from nearby dust sources was grossly underestimated. It all went down once it was confirmed that this noise made these estimations completely unreliable. While journalistic hype is unavoidable, the check and balance mechanisms work reasonably well.

        The current cosmic paradigm (the “concordance” model) has so far prevailed when tested by many precise independent data. Still, there is now an increasing minority undercurrent of cosmologists (myself included) who are questioning at least some of the assumptions of the paradigm. Some influential groups of cosmologists have made of the Concordance model a sort of orthodoxy. Since they sit in committees and editorial boards of journals, they decide who gets published and who gets grants. Still, there is nothing comparable to the “Public Health” strangulation of dissenting views by committees directed by Glantz types. If dissenting views are theoretically well grounded and they pass the many precise observational tests, the studies will be published and a change or modification of the paradigm eventually happens and is accepted by the mainstream. It may be a long process because we are trained to be hyper-skeptical, but it happens.

        On the Universe being 20% older. Yes, you made a plausible story: the Universe could not be 20% younger than its currently estimated age value (13.7 Giga year) because cosmic age cannot be less than the ages of oldest stars, which are roughly 11-12 Giga years old. However, this extra 20% would not result from the SN Ia studies because the SN Ia only sample up to half of the Universe age. So, a different cosmic age that is consistent with ages of oldest stars could result from estimations brought by using quasars as standard candles. Though the goodness of quasars as standard candles is still under discussion and is plagued by a host of problems.

  6. Wow that was absolutely brilliant again, I have an even better understanding now of how Glantz’s study isn’t worth the paper it’s written on thank you, I also agree with everyone’s comment here to, however reading your blogs, to me as a layman you come across as someone that’s not so much pro or anti but someone that seeks the truth, and you’ve highlighted to me a massive problem, more so than are ecigs better than smoking which people like Glantz are making it harder day by day, but your reply to rbnye and the quote ” the first step forward that is to challenge public health’s undeserved reputation for doing legitimate and honest science”, you make me wonder about that when you said previously about scientist unwilling to challenge other scientist in public health, and I believe you was kinda saying that if science kinda follows the political will that can help their careers (please correct me if I got it wrong), if this is the case before we can get good reliable information that we can all trust, I feel we have a very steep hill to climb :-)

    • Carl V Phillips

      I am not quite sure I understand the question. I was saying that within the public health community, junk scientists thrive. Also, anyone who tries to complain about junk will usually be punished for it, an equally important point I have not made so explicit. That does not mean there will not be people within public health who have the scientific chops, courage, and selflessness to try. There are quite a few real scientists scattered around, doing what they can from very small podiums. But, yes, it is a very steep hill to seriously try to do anything about it. Thus, it tends to be only those who actively oppose the public health agenda who are willing to make the effort. They can be motivated by both truth and worldly outcomes.

  7. sorry Carl for my writing skills but your reply was spot on thank you :-)

  8. Pingback: The Week in Vaping – Sunday January 24th, 2016

  9. Pingback: Sunday Science Lesson: 13 common errors in epidemiology writing | Anti-THR Lies and related topics

  10. Pingback: Sunday Science Lesson: So much of what is wrong with public health, in one short rejection letter | Anti-THR Lies and related topics

  11. Pingback: Should e-cigarettes be regulated? - Facts Do Matter

  12. Pingback: Sunday Science Lesson: Smoking protects against COVID-19, but most of the related “science” is badly misguided | Tobacco harm reduction, anti-THR lies, and related topics

  13. Pingback: The unfortunate case of the Cochrane Review of vaping-based smoking cessation trials | Tobacco harm reduction, anti-THR lies, and related topics

Leave a comment