by Carl V Phillips
After I was rudely interrupted by two of the most prolific weeks in anti-THR lies in recent memory, followed by a week of recovering from that, I am finally getting back to this series. You will recall that in Part 1 I reviewed the history of journal peer review, and pointed out why it is largely obsolete now, as well as why its current general failure in health science fields is inevitable. In Part 2, I started presenting a list of myths about peer review (in no particular order), with Myth 1: Peer reviewers have access to more information than any other reader of the paper.
Submyth 1a. Peer reviewers can vouch for the accuracy of the data and how it is represented.
Any data can be fundamentally wrong (miscoded, etc.), and it is almost never possible for anyone other than the original researchers (and sometimes not even them) to check whether that is the case.
That is only the simplest and most obvious problem here.
As noted when I discussed Myth 1, reviewers only see what any other reader of the paper sees. The rare conscientious reviewer of a paper based on a publicly available dataset could check the original to see if it is being accurately represented, but this almost never happens. If it is not public data, the reviewer not only cannot access the data, but generally cannot learn what it really says. When I am asked to review a paper based on non-public survey data I usually respond by saying that the authors should include an appendix with the survey instrument (the data would be better still — this would be provided in most serious sciences — but I do not even bother to ask), and that I will really review the paper after that revision, but that I am not going to try to assess something when I do not even know what the variables really mean. Not once has an editor ever required this be done and sent me the paper to re-review — they just chalk up my “review” as being done and move on.
(On the upside, after a few of those, where I ask for the opportunity to properly analyze the paper — more examples follow — the editors stop bothering to ask me to review for them.)
This is not an idle request. When the author’s define a response to a question to be, say, “this user of e-cigarettes is considering taking up smoking”, what does it mean? It means very different things if the question is posed as the isolated neutral “Are you considering switching from e-cigarettes to smoking?” versus asking “Are you ever tempted to try a real cigarette?” after a series of leading “push poll” questions about whether the subject is worried about the antifreeze in e-cigarettes and whether they are worried that vaping looks silly.
Indeed, the three-week delay from the time I drafted the above paragraph provided us a pretty good illustration of that very question: CDC and FDA put out a “peer-reviewed” paper (which I did a much better job of reviewing here; see also this followup) that defined teenagers as intending to start smoking if they did not give unrealistic “definitely no” answer to two different questions about the possibility of smoking. If they merely answered “probably not” to just one of them, they were defined as intending to start smoking. In this case, the authors actually disclosed their bizarre redefinition of “intend” in the paper. But in many cases, this would not be made clear to the reviewers, who would have to guess at how the authors defined their variables. (Since it was disclosed, it obviously represents a very different failure of the journal’s review process: they were allowed to get away with this blatant absurdity.)
So, when FDA declares that 1 in 5 nonsmoking students is “curious” about little cigars, was the question, “Have you seriously contemplated embracing the bold delight that is smoking little cigars?” as they are implying, or was it merely some equivalent of “Has any thought about little cigars ever passed through your head, even momentarily?”
It matters a lot. But the reviewer, like the reader, generally does not know the answer. Even if he is conscientious enough to make the effort to try to find out, he is very unlikely to get an answer.
Submyth 1b. Peer reviewers can assess the validity of statistical models.
Once again, what you see in the journal is all the reviewers see. Exactly what statistical model was used? How did the model details affect the result? The reported methods and footnote in tables provide only a fraction of the information you need to answer that question. Those models produce dozens or hundreds of other statistics which, if reported, would provide substantial insight about the model and how its details affected results. These are basically never reported in health science as they would be in other fields.
You really have no idea unless you get the data and fiddle with it until you almost[*] match the results in the paper. I have done this sort of forensic epidemiology many times for trial testimony (without the aid of a trial discovery, it is generally impossible to get the data — honest serious scientists share data; “public health” researchers generally do not), and I can tell you that it is far more work than any journal reviewer ever puts into to doing a review. Indeed, you would have to be insane to bother to do such an analysis (even if you could get the data) for a review, because even if you found a fatal flaw, it would probably be ignored by the editors or, if not, the authors would just submit it to another journal which would, with probability approximately 1.0, pick less conscientious reviewers.
[*The “almost” refers to the fact that you almost never manage to match the results in the paper exactly. The original authors frequently do some data “cleaning” that they never report, or exclude some observations without explanation, or they just mistype some numbers in their tables — you never really know for sure.]
But even if you can figure out the model and it is basically defensible, there is still a huge problem. As I (example, example) and others have pointed out at great length, there are many different seemingly-valid ways to analyze a dataset to address what are vaguely the same questions. If you try a lot of these options and only report the one you like best, pretending it is the only analyses you tried, you can almost always make it look like the data fit the claims you want to make a lot better than if you pick a model more honestly. Such model shopping is standard practice in “public health” research. There are even software tools for doing it. Honest researchers respond to this obvious hazard of the trade in various ways. They can make the cleaned final dataset available for others to try different models; they can report the results of sensitivity analysis; they can report the various results that different defensible models produce (not just reporting the crude (unadjusted) model correlations alongside their chosen cooked model — though even reporting the crude results is not always done — but reporting the results of other multivariate models). Needless to say, you will not find any of these sounding very familiar, because they are basically never done in tobacco research or “public health” more generally.
When I am asked to review a paper where the reasons for choosing the particular model details are never explained (which is basically any time when something other than very simple statistics are reported), I ask to see a revision of the paper where the authors explicitly state whether this was a predetermined model that they applied once to the data and reported the results, or whether they tried different models with the data to see which one they liked best. If it is the latter, I ask that results of other model runs be reported, at least as an appendix. Once again, no editor has ever required the authors to provide this information.
Anyone who has ever worked with data, let alone has ever taken a decent class on proper ways to work with data, knows that this matters. I suspect that most of those who fetishize journal peer review have no idea how big a problem this can be and how it probably makes much of the published quantitative “public health” research little more than editorializing dressed up with some numbers. They apparently have no idea that journal review seldom does anything whatsoever to address such problems.
(to be continued)