by Carl V Phillips
Returning to this series, I previously explored Myth 1, that peer reviewers have access to more information than any other reader of the paper. On to Myth 2 (and, again, the order of this presentation is based on narrative convenience, not necessarily importance).
Myth 2: Health science reviewers have the skills and incentive to do the job they are assumed to be doing.
Now I do not want to overstate that and imply that it is never true. But it is seldom true. Usually the reviewers of a paper for a health science journal are not very good scientists, if they can be called scientists at all, and have no particular reason to care whether the analysis is done correctly or to turn in a good performance when reviewing it.
Recall the example from Part 4 in this series, which looked at the reviews of the Popova-Ling paper. Recall that the paper is severely flawed in numerous ways that I pointed out. Yet the first review consisted of only two comments, one calling for adding an irrelevant reference and the other disputing a grammatical choice. The second review contained more words but still failed to challenge any of the flaws, and in fact asked for changes that made a couple of them worse.
These reviewers were not outliers in terms of quality. I would estimate that about a quarter of the time the journal reviews of a public health paper are of that quality or worse, and this rises to half when the topic is tobacco products. (I base this on having seen many hundreds of such reviews, and while I have not kept a formal count, I think this is a pretty good estimate.) It is extremely rare that reviews are genuinely good. Still, it was disappointing that what is arguably the best journal with “public health” in the title allowed them to count as peer review, and the editors also failed to catch the glaring flaws.
There are numerous reasons for this pattern of low quality reviews. I will try to cover as many as I can. First, let’s consider who is chosen to review a paper. Most often it consists of random people who happen to have written on the subject matter before (i.e., the editor looks for them in the references in the paper). Other times it consists of hand-selected people who have written on the subject matter and the editor happens to know personally. Perhaps you already see the problem here: Writing about the subject matter does not necessarily mean that someone is an expert on whether research was done and reported well. After all, that “qualification” would include every news reporter and blogger who had written about the topic too.
But, you might counter, unlike news reporters, aren’t people doing research and writing about it supposed to know something about how to do good research? Yes, that would be nice. And it is true in most fields. But not in health sciences.
Submyth 2a. Most people doing health science do not have the qualifications we would normally ascribe to scientists.
Graduate-school level training in most research fields includes a broad and deep understanding of the science, built from the basics up. But a large portion of those doing research in health sciences are either medics or people with non-scientific public health training.
Medics usually have a decent undergraduate level understanding of biological sciences but no graduate-school-level scientific education (i.e., they have mostly learned science factoids but never studied how to actually think like a scientist). Few have any background whatsoever in the social sciences they are dabbling in when they address public health. (In fairness to medics, almost all those I had as students were much better than almost all the purely public health people. But I saw only the self-selected sample who proactively sought out a class or mentor devoted to teaching about how to think like a scientist.)
People trained in public health do not necessarily have any scientific background at all. At the MPH level they simply do not learn anything about science. The MPH degree that is a supposed credential of many study authors (and, thus, study reviewers) typically consists of one (maybe two) classes each of freshman-level statistics, epidemiology, toxicology, and cost-effectiveness analysis, but really focuses on the history and ideology of public health special interest activism. Not that having an MPH disqualifies someone from being a real scientist, but it certainly is not a positive statement about his abilities.
[Note: For those who may not know, epidemiology -- the primary science of public health -- is the study, usually quantitative, of diseases and their causes (which includes cures) in populations. The majority of it is observational research, and is just an application to health questions of the greater science of what might be called "human metrics", which includes parts of econometrics, political science, etc. However, few of those doing epidemiology have any clue about this latter fact, and the quality of their work would generally be considered appalling in those other sciences. Epidemiology also includes experimental research (e.g., clinical trials) that are relevant to quantifying the causes or cures of diseases.]
MS degree training in epidemiology and other public health sciences tends to be real scientific training, of course, though the quality varies. In the case of epidemiology, it is generally quite bad. Most students are merely trained in how to conduct decent field studies (which does give them one up on the medics, who generally do not even know even that, though they try to do them anyway), but really have no understanding of how to think about the results. Doctoral programs typically consist of doing the MS and then hanging out doing a field study or two without ever learning much more. Lacking good role models, the students simply are taught to imitate the typical low quality of what they see others publishing. There are a few exceptions at a handful of good epidemiology PhD programs, but they are rare exceptions.
Because of the above, most or all of the following statements are true of the vast majority of public health researchers, and thus reviewers of such papers for journals:
- they never think seriously about formulating hypotheses and how they might test them, and thus it does not even occur to them when conclusions do not follow from data,
- they have no idea what the software and statistics they are using really do, and thus are not able to judge whether they were right for a particular application,
- they genuinely do not realize that trying a bunch of variations on the statistical model to get the “best” result produces bad science, and thus do not question whether the authors of another paper did so,
- they can list common problems with epidemiology studies (selection bias, confounding, etc.) and have a general idea of how to make them less bad than they might be when doing a study, but really have no serious understanding of them or their implications, and thus cannot assess whether they appear to be a serious problem in a study.
(Note that I am still setting aside both the fact that many “public health” people are not interested in scientific truths, and are really pursuing personal political agendas, and also not addressing the fact that reviewers have little incentive to work very hard. I will come back to those. This is all about the limited abilities of the reviewers, even assuming they are trying to do a good job.)
To sum that up, most people doing health science research are what we would normally call techs, not scientists. They know how to operate the machinery to produce outputs, as it were, and possibly[*] are good at that, but do not have the skills to understand either the machinery or what might have gone wrong with it in a particular case. This is made worse by the fact that the rules of thumb about how to operate the machinery (e.g., “just throw whatever variables you have into the statistical model and you have controlled for confounding”) are often wrong.
[*In many cases, they are not good at even that. When a skilled scientist, or even very good tech, gets a chance to look at the nuts-and-bolts of what a typical public health researcher has done, it is not unusual to find serious problems at the tech level, such as errors in the software code. Recall from Myth 1 that the journal reviewers never get to see such errors even if they are qualified to notice them.]
There is a subset of those doing epidemiology (typically called methodologists) who are deeply cognizant of the nature of the science itself and who pursue improving it. The methodologists are generally ignored by the masses in “public health” who prefer to just crank out results. In the vocabulary of a more serious science, this would be described as the techs operating the lab equipment and publishing papers without any supervision by the actual scientists, and indeed often not even realizing that there are real scientists in their field. Needless to say, one of the reviewers of any epidemiology paper should be a genuine methodologist, but I would be surprised if that were the case as much as 5% of the time.
So for most reviewers in this area, even when they truly care about making the paper better — including identifying and challenging flaws in it — and are willing to put in the time it takes to make it better, they lack the skills to do so. There are far too few people with the right skills, compared to the deluge of papers that need to be reviewed, and they generally have better things to do.
Consider an analogy: Imagine you wanted to figure out whether a particular lawn mower was a high-quality piece of equipment, but your strategy was simply “pick two people who have experience operating lawn mowers, give it to them, and ask them if it is good.” They would probably catch serious problems, like “it does not start” or “the wheels are square”, and catching such glaring errors is basically all we can count on from the journal peer-review process. (And we cannot even always count on that — I will come back to that in this series when I talk about the failure of the journal process to provide even basic reality-checks.)
They could not tell you whether the engine is so badly assembled that it will break after 10 hours’ use, or whether the blade spins at an optimal speed. They could probably tell you whether the blade was sharp and spins in the right direction if you asked them specifically, but they might not think to check (the peer review process in academia is kind of like teaching — there is no manual or systematic education about how to do it, so those who never had a good mentor will probably fail at it). The mower reviewers are likely to be unduly impressed by cosmetic details and bells-and-whistles. If there is some genuine innovation in the particular design that is beyond their experience, they are unlikely to even understand it, and may just complain about the unfamiliarity.
Obviously there are people in the world who can assess all the details of a mower and report on them. Indeed, an experienced operator of mowers who has a solid knowledge of how they work (which is, of course, only a fraction of all operators), if he devoted a lot of time and attention to the unit in question, could figure out most of the good and bad points about it. Even then, though, he could probably never recognize that, for example, the engine had been redesigned to run more efficiently. Still, that would be pretty good in the real world of lawn mowers (where there are only a few hundred varieties, mass-produced with good quality control) where each can be tried out by experienced product reviewers and tested extensively. But now imagine that every unit was produced by hand, with an ad hoc design, often by people whose skill set is basically really only how to use a mower, and each had to be evaluated individually. Not a pretty picture. There would not be nearly enough qualified reviewers, and the reviewers would have to have the skill to assess problems that they could safely assume do not exist in the real world.
Many of my readers may be misled about quite how ugly the picture is because I make it look easy to “get under the hood” of papers and point out their flaws and other implications. But that is a relatively rare arcane skill, which includes not just knowing how to do good research, but understanding the many ways in which the research can go wrong, coupled with a sixth-sense intuition for knowing where to dig for clues about problems (which may just be a matter of luck about how one’s brain is wired). I happen to be very good at such evaluations. I have made a serious study of research methods and specifically about ways in which research can go wrong. Others with that background also tend to be good reviewers, but there are not that many of us. Anyone who really understands the science can look at my analysis of a study and immediately see it is right, but even most of those individuals could not necessarily generate the insights themselves — it is a particular skill.
So do I do a lot of reviews for journals? No. I review a lot of papers because authors ask me to look at them (that is the world of real peer review among people who actually care whether their papers are accurate). But journal editors in the “public health” space generally do not care much about getting a high-quality and thorough review. Indeed, as I mentioned earlier in the series, they tend to not like getting them. I frequently identify information that is missing from the papers, without which I cannot do a complete review, or fatal flaws that have to be corrected before there is any hope the paper will be right. I ask for the authors to resubmit a new version that corrects those problems before I evaluate the details (no point rearranging deck chairs, and all). Editors never — literally never — bother to make that happen. They are in the business of making money for their journal by cranking out papers or are working as volunteers and cannot be bothered with it. They either just accept or reject the paper, ignoring my attempt to improve it, and they avoid asking me to review again because my attempts to do something substantive interfere with their business model.
Of course usually I just refuse to do the review except in the rare cases where I think the research is valuable enough to be worth really getting right, and I believe the authors genuinely want to get it right. Why should I bother, after all?
I will continue on that theme, about the lack of incentives to do decent reviews, in the next entry in this series.