by Carl V Phillips
[Update: Index of this series:
- What is peer review really? (part 2)
- What is peer review really? (part 3)
- What is peer review really? (part 4 — a case study)
- What is peer review really? (part 4a – case study followup)
- What is peer review really? (part 5)
- What is peer review really? (part 6)
- What is peer review really? (part 7 – an amusing aside)
- What is peer review really? (part 8 – the case of Borderud et al.)
- What is peer review really? (part 9 — it is really a crapshoot)
Some related posts:
- New Phillips-Burstyn-Carter working paper on the failure of peer review in public health
- Peer review – are they really even trying anymore?
- On the dangers of trusting in peer review
- SRNT believes research should be replicated (when they don’t like the results)
- A real peer review of Hughes et al paper on teenage use of ecigs
- Sunday Science Lesson: So much of what is wrong with public health, in one short rejection letter
- The failures of peer review do not begin with the journal – more on the Popova-Ling fiasco
- Post publication peer-review: Correction to Burstyn (2014) and related matters
- New study shows that if you have an MI, you should hope you use tobacco
- Letter re fatal flaws in Schober et al. paper on environmental vapor
- post with links to some additional example reviews I wrote
You can also click on the “peer review” tag for more still.]
In this series I am going to jot down some observations I am building into more formal presentations about the nature of peer review in the health sciences, and in to tobacco subfield in particular. This is, in part, motivated by my observations about the FDA’s apparent relationship with the corpus of scientific evidence in CASAA’s comment on the e-cigarette deeming regulation, in which they acted as if anything stated in a peer-reviewed journal article must be true, whereas the rest of human knowledge does not even exist. This is not just a problem of false negatives (failing to recognize the vast majority of the useful scientific information that does not appear in journals), though that is the worst problem. It is also a matter of false positives — they apparently believe that publication in a “peer-reviewed journal” confers some claim of accuracy — on not only the research results but every last offhand opinion in the introduction — that excuses them from acquiring real expertise.
I should point out for readers who may not know that this is not just a random observer’s rant about a failing of peer review (though such rants — so long as they are written by someone who basically understands the systematic problems and is not taking the naive tone of “look at this bad exception to the wonderful institution of peer review” — are often very good and quite appropriate, and are frequently amusing). I approach this from the perspective of extensive expertise on the topic, having been an author of many papers in public health as well as fields with higher standards, a reviewer in public health and in more serious fields, an editor of several public health journals, and the creator and editor-in-chief of an epidemiology journal (which tried to do it better), and having made a formal study of the nature of peer review in the field and published a few papers about it.
First, it is useful to realize that the formal journal review system currently practiced by almost all health journals is not some bit of timeless ancient wisdom. It was a relatively recent invention and has already been made obsolete by technology and unworkable by changing circumstances.
Throughout most of the history of literacy, scientific publication was relatively unedited. This was not such a problem because most of those who read the work for anything other than idle curiosity (i.e., as a building block for more science, or in rare instances for policy-making) tended to be sufficiently expert (or could consult someone who was) to judge the work. Also, the barriers to publishing were sufficient that someone had to be highly motivated to believe they had something worth reading, as well as being successful enough to have some access to financing (I trust I do not have to explain to people who are currently staring at a computer what happens when the barriers to publishing drop to zero).
When learned societies started creating journals, one or a few editors served as gatekeepers for everything that appeared, and often solicited it. This worked because a few top polymaths could know enough about most every science to be able to separate the grain from the total chaff. As a model, just think of the NYRB or most any other serious but non-ivory-tower publication today — those of us who trust the editors to pick good things to read about and good authors to write it will read those publications (in all our copious spare time – hah!). This is not to say that science always worked great during this era, obviously, but it was not a bad system for the scale of the endeavor of the time.
By the middle of the 20th century, this system no longer scaled. So much science was being produced that necessitated having advanced and specialized knowledge to assess it that a handful of editors could not judge most of what was being produced, even in the subdivided fields that they now worked in. An elegant solution was that a few experts on the particular paper could be identified and asked for their opinions, so that they played the role that the polymath editors once did. Of course, in either of those systems it was quite possible for one or three or even ten reviewers to be wrong about the claims in a paper. Reasons include: (a) key existing beliefs of even the greatest experts might be wrong, (b) something was wrong with the research that even the author did not realize and thus failed to communicate, (c) something was wrong with the research that the author intentionally does not disclose to the reviewers and thus they could not vet, (d) the reviewers are not really the experts who should be reviewing the work, (e) the reviewers base their assessment on something other than the quality of the work (worldly politics, personal politics of the field, laziness).
Back when the system was still scaled properly and honest, (d) and (e) would not be so bad; today they are fatal flaws. But even back then there were objections. Most famously, Einstein strenuously objected to having his papers subject to this newfangled “sending out to reviewers” thing. And he had a point. Any system that might not allow Einstein (late in his career) to publish whatever he wanted would be a bad system. Perhaps the second or third time he tried to publish his grocery list or drycleaning ticket, it might be time to cut him off, but that is about the level of deference he should have gotten.
Einstein’s objections were based on much weaker concerns than exist in health science now. Peer review is supposed to serve the dual purposes of improving a paper thanks to the feedback of experts and gatekeeping. Chances are that he would have gotten a few suggestions for improvements (which might have been genuinely useful) and the paper would have been published whatever he chose to do with those. This contrasts with current health science peer review where there is rarely a substantive useful suggestion (I can think of only one occasion where I fundamentally improved a paper based on peer review comments) and overworked and inadequately-expert editors simply use it as a voting process to decide which of the many submissions they receive to publish (secondary, of course, to the political preference of the editors and their desire to publish particular types of papers regardless of what reviewers say).
By the end of the 20th century, the formal journal peer review system was already the wrong scale for health science and many other fields. The number of people whose collective expertise was needed to really assess a paper is typically dozens or even hundreds, and so a sample of two or three of them was never going to be adequate. The breadth and depth of fields is such that an editor or two could not even reliably identify even two such people except for those focusing on highly specialized subfield journals. The editors are typically so far from being expert in a specific topic that they cannot even recognize who the real experts are. And the quantity of health science papers is such that every qualified reviewer would be required to do ten or twenty or fifty of these unrewarded anonymous projects annually, which resulted in many unqualified reviewers and general sloppiness on the part of reviewers.
The solution to this is the next logical step in scaling, made possible by modern technology: working papers and crowdsourcing the review process to the whole interested community. This is what is done in serious sciences, but not health science. Physics, statistics, and others have working paper archives that (better) serve the purpose that paper journals did in the 20th century. In economic science, important papers are published in working paper series, and by the time they appear in a journal (if they ever do) it is possible to cite responses and derivative papers based on them. Indeed, this is a much more accurate interpretation of the term, peer review, than is the fetishistic process practiced by archaic journals like those in health science.
Moreover, as with any set of rules that is in place for a long time, gaming the rules became an institution in itself. Rules are usually a poor substitute for honesty, and in complicated arenas where there are great incentives for dishonesty, the rules are forced to become extremely complex and require aggressive enforcement (consider: tax laws, banking regulations). The journal review system simply does not rise to the demands imposed by the incentives created by ideologically-motivated (and to a much lesser extent, profit-motivated) dishonesty in public health and medicine. (This is not to say there are not such incentives in some subfields of other sciences, but they are so overwhelming in health science that it is a whole different world. In particular, there are lots of ways to make money and affect the world by discovering something in many sciences, but health sciences are among the few where putting it in a journal actually affects whether that happens).
Because the fetish peer review system has (undeserved) imprimatur in the courts and among non-expert policymakers and opinion leaders, there are strong incentives to get such status in pursuit of personal rewards or personal political goals, rather than actually doing useful science. (Contrast: There basically are no rewards for producing physics or statistics that are not right, and so there is no incentive to publish junk. There are sometimes rewards for producing junk economics, but it matters little whether it appears in a journal. If I ask you to name one area outside of health where there is an highly-influential ideological battle that depends substantially on publishing in journals I would guess >90% of you would name …stop and think of an example before finishing reading this sentence… global warming. There just are not many other examples.)
Thus, countless health journals were created whose only real purpose was to stamp “peer-reviewed journal article” on papers that served the goals of politics and occasionally profit (the latter often being in the form of getting products approved, but also in maintaining government funding gravy trains). Every journal with “tobacco” in the title, along with most with “public health”, and indeed a large proportion of those in health in general serve that purpose.
I will continue this series by addressing some of the common apparent misunderstandings about what peer review does.