by Carl V. Phillips
[Update: I have submitted a comment to BMC Public Health that is based on this post. My copy of it can be viewed here.]
[Update: The comment has now been accepted by the journal and appears, attached to the original article, here.]
I interrupt the flow of this series, in which I am currently laying out some common myths about journal peer-review, to provide a motivational case study that makes many points better than any abstract principles can. As I discussed in the previous post, which built on what Clive Bates had already written, a newly published article by Popova and Ling was unethical and misleading, fraught with anti-THR lies. But here is the good news: It was published in a Biomed Central (BMC) journal. While BMC still basically practices the 20th-century version of peer-review that I have pointed out to be a failure, they do not keep it an anonymous black-box like most journals do. (This is a huge improvement over the standard health science practice — enough so that when I started a journal, I chose to do it at BMC — though still far short of other fields’ real peer review, as I have discussed previously in this series.) Thus, we can review not only the paper, but the “peer-reviews” that caused it to be published.
To start with, I am going to go ahead and do a quick peer-review report of my own. As you will have learned earlier in this series, journal reviewers do not have access to anything other than the paper itself, so I am reviewing the exact same paper the reviewers could review (albeit the final version, not the original, which has one interesting implication that you will see below). If you read the posts by Clive and me you probably know enough about the paper to make sense of this, even if you did not read the paper itself (which you can find here if you want to). The order of the bullets is for narrative convenience, not order of importance; a journal might require a particular structure (e.g., the one used by BMC that you see below), but I wrote this in logical thought blocks separated by line breaks.
- The paper repeatedly refers to “warning labels” but includes graphic images which are emotional manipulation, not warnings. A warning conveys information, while gory photographs are emotional violence, designed to manipulate people, not inform them. Thus, the description of what is being studied is not accurate.
- The study involved deceiving some of the study subjects with those disturbing visual images and attached messages that products had risks that they do not have (namely that e-cigarettes cause oral cancer, for which there is literally no supporting evidence, and that smokeless tobacco causes oral cancer, which has clearly been shown to be either false or to be true only at such a low rate it is inconsequential and below limits of detection). This deception could lead someone (the subject or someone they influence) to smoke instead of using a low-risk alternative. The subjects were also offered a free sample from among a group of smoke-free tobacco products, but after indicating their choice were told they would not be given it after all. Nor, apparently, were they given any substitute to make up for the reneging on this promise, such as giving them the retail purchase price of the chosen product. Indeed, they were effectively scolded and told that selecting anything was bad behavior.
- Such deception and abuse of study subjects, which is directly harmful to study subjects and potentially harmful to the reputation of human subjects research in general, can only be justified by extremely valuable research, which this is not. Indeed, it is almost valueless. All that the results show — and all that they could have ever shown — is that people who are told that risks are higher believe that risks are higher, and those who are told that risks are lower believe that. The methodology allows no quantification. Thus there is no possible way that deceiving people — or for that matter, even taking people’s time to participate — could be justified as ethical human subjects research.
- Moreover, it appears that there was no post-experiment briefing to tell subjects that some of the messages were deceptive, which there can be no excuse for.
- Based on principles of research ethics that have been unchanged since Nuremberg, results from such unethical studies should not be published no matter what their value (which in this case is approximately nil). Even when the fruit of the poisonous trees is valuable (as it is not in this case), allowing it to be used encourages the planting of other poisonous trees.
- The study methodology itself is so badly designed as to be nearly useless. The legitimate purpose of warning labels or other risk communication is to move people closer to believing the truth. But since the authors used an uncalibrated scale for measuring subject perceptions, we do not know whether a score of 7 means that they are overestimating or underestimating a particular risk. At the very least the authors should have calibrated their scale at the top end by asking the subjects to rate the risk from smoking. Far better would have been to calibrate the other end by asking them about a well-understood risk that is in the neighborhood of the estimated risk from smoke-free tobacco products, such as commuting by car. That could be used for the basis of estimating whether the subjects seemed to over- or under-estimate the risk from such products.
- Notwithstanding the aforementioned fatal flaw, it seems safe to conclude that subjects’ ranking of the small (and, indeed, entirely speculative) risks from smokeless tobacco products, in the range of 7 to 7.5 on a 9-point scale, indicates that they overestimate those risks. That is certainly consistent with other research on that point (which the authors do not even mention — see below). The rating of e-cigarette risk in the range of 5, while not as clearly wrong, still seems likely to represent an overestimate. Thus, any labeling that tends to reduce the perceived risk is actually moving people in the right direction (clearly for smokeless tobacco, and probably for e-cigarettes). And yet the authors conclude that the labels that move people toward a more accurate perception are bad and those that move them further away from the truth are good.
- By far the most policy-relevant result is that the current mandated labels on smokeless tobacco caused the assessments of those products to move slightly further away from reality (i.e., increase the already overestimated risk), indicating that those labels are misleading.
- What is worse, the authors obfuscate their analysis, offering no connection between the study results and their discussion and conclusions about them. Had they said, “we believe that public policy should take any action that causes people to believe tobacco products have higher risk, even if they already dramatically overestimate the risks, and therefore….”, then at least they would have a complete argument from observation to conclusion. But they do not say that, presumably because they know it would be indefensible. Thus they inappropriately hide that fundamental premise, even as they use it as the crux of their argument and as their implicit excuse for why their lack of useful calibration does not matter.
- Perhaps the authors actually believe that the products in question are truly higher risk as the subjects perceived and/or that there is scientific justification for claiming they cause oral cancer. I can see no possible way that they could defend such beliefs. But if such beliefs are the justification for their premises, then they should have stated them and defended them, rather than burying such dubious claims in unstated assumptions, hoping the reader would not notice what they were doing.
- The inappropriateness of that hidden premise also illustrates the fundamental inappropriateness of the entire tone of this paper. The authors did a field study and reported the result (setting aside for the moment that it was unethical and the methodology was terrible), and that is what this research report should be about. They offered no assertions, let alone analysis, of what constitutes reasonable social goals for communication about risks in this arena, nor about what we know about the real-world effects of using these labels outside an artificial research situation. They did not even review the other similar research that has been done in artificial situations. Yet their focus is not on reporting their study results — which is all they can legitimately do — but on making broad pronouncements about policies that they neither analyzed nor justified. Their conclusions do not even remotely follow from their analysis.
- The authors’ intention of writing this as a political opinion piece dressed up as a research report, rather than reporting actual science, is well illustrated in their introduction. They do not provide any background about: previous research on perceptions of risks from the products they are studying; evidence about the actual risks from the products; or what is known from the broader science about how labels like the ones they tested affect people in real life. This is the proper background information for this research report. Instead, the background consists of an extended discussion of the history of the imposition of labeling in the policy arena, whose existence should be briefly mentioned as motivation for the research, but the details are irrelevant to the study itself.
- The introduction begins with background on the health effects of smoking, which the authors pretend is relevant to their study of perceived risks of smoke-free tobacco products by referring to this as the risks from “tobacco”. This is blatantly, and presumably intentionally, misleading and should be replaced with a legitimate analysis of the estimated health effects of the products they are actually studying.
- Further on methodology, the authors do not justify their methodological choice of studying adults who do not use cigarettes or the products being studied. This seems to be a completely inappropriate choice, given that the target audience for warnings or other labels on tobacco products are the users of tobacco products. It is obvious that their subjects, as compared to the target audience for labels, are (a) more likely to already believe products are high risk, (b) more likely to believe any further negative claims about the products because they are unaware of the truth (particularly compared to e-cigarette users), and perhaps (c) less likely to be reassured by accurate information about lower risks. This flaw, by itself, calls into question any worldly scientific conclusions that are drawn based on this analysis, even if they are limited to an analysis of the results and not the political preferences of the authors.
- One of the four labels tested in the study says “FDA approved” with the U.S. FDA logo. There is no possibility that any such label will ever (legally) appear on any tobacco product. And yet the authors provide no background information to put reactions to this in perspective and no justification offered for even including it. Presumably the authors believe that this somehow advances their political advocacy, but they do not even make a case for that.
- In general, the methods are inadequate for the reader to understand what was really done. For a perceptions experiment like this, myriad details matter, ranging from the details of the experience (Were the subjects seated alone in a small room with only a table, a researcher, and the reflective side of a one-way mirror, or were they seated comfortably together in a living-room setting? If the latter, were they allowed to talk with one another? Etc.), to how the questions were laid out. Critical is whether the new “information” the subjects were given was presented as being true by people in white coats, or whether they were just casually asked “imagine you saw this….” None of this is reported.
- The authors explicitly deny that they have any competing interests, in blatant violation of the journal’s policy. BMC policy says, “Authors should disclose any financial competing interests but also any non-financial competing interests…” and gives examples of the latter. These authors have a documented history of political advocacy against smoke-free tobacco products and tobacco harm reduction, and thus clearly have a political/ideological conflict of interest whose magnitude is on par with the greatest financial conflict of interest (i.e., being employed by an organization with a stake in the outcome of the research). This is demonstrated in the paper itself, particularly in the above-noted implicit objective of wanting to cause people who already overestimate the risks to further overestimate the risks. The authors also work for an institution whose senior personnel consistently take strong stands against the use of any tobacco product and against harm reduction. Even if the authors think themselves immune to the influence of this, it creates a clear perception of conflict of interest that also must be disclosed according to BMC rules.
So that is what a legitimate peer review of Ling and Popova would look like (or it might have been phrased explicitly in terms of what needed to be done, rather than simply noting the problems — that is a matter of style). Perhaps that is what you imagine that real journal reviews look like. To that, I say, with apologies and all due respect, hahahahaha. Let’s take a look at what the real reviews said (and keep in mind these are reviews written by people who have to sign their reviews and know they will be published — you can imagine how much worse it is when there is not even that reputational check on the quality).
The first reviewer was Israel Agaku is a dentist who works for the CDC as a tobacco control advocate. He wrote:
Well conducted and written study. Authors should kindly address the following:
1. For comparative purposes, the authors should discuss how the proposed FDA graphic warning labels align with the World Health Organization’s Framework Convention on Tobacco Control, as well as the newly revised European Commission’s Tobacco Product Directive.
2. Please, change “ads” to “advertisements” throughout the paper
What else did he write in this scientific peer-review? Nothing. That was literally all — an instruction to add a bit of background that is not actually relevant to the research, and an objection to word choice. To this peer-reviewer, the complete train-wreck of a study — which suffered from terrible methodology and garbled reporting, even setting aside the fact that the paper was mostly unrelated political editorial — was so “well conducted and written” (presumably he means that it was a well conducted study and a well written report, but could not be troubled to write the extra two words) that there was no reason to change a thing.
That recommendation constitutes half of the “peer-review” that BMC Public Health considered before publishing this paper. The other half was by Saida Sharapova, who also works for the CDC, and whose background is in medicine. She wrote rather more in her initial review and brief follow-up after revision (Agaku did not follow up after the revision), so I will not be quoting every word. But if you care to follow the links, you can see that I am not leaving out any bits where she caught any of the problems I noted in my review. She begins,
The authors have attempted to provide much needed evidence to support FDA regulation of the tobacco products other than combustible cigarettes. The study is timely and addresses and [sic] important knowledge gap.
So it was as evident to her as it was to me that the authors wrote this paper specifically to try to justify regulation, even though there is no analysis of regulation or of what is justified public policy in the paper. Thus, the editors were told by a reviewer (just in case they could not notice it themselves) that this paper was really about making policy pronouncements even though it did not analyze policy, and yet they published it anyway. Of course, unlike I or anyone else who cares about the integrity of the scientific literature, Sharapova was not bothered by this. Indeed she endorsed it and, like the original authors, basically did not care about the details of the field study that was the excuse for doing it.
It is worth noting that Sharapova and Agaku also each lied that they did not have any competing interests in spite of receiving their paychecks from an organization that is intent on demonizing smoke-free tobacco products, and in spite of Sharapova explicitly stating here that she is personally of the opinion that more regulation is good.
Her assessment of the methods did not identify any of the real problems, and merely consisted of:
1) Methods section of the abstract does not provide adequate information about the study design. In particular, it is not clear what represents ‘control’, and what were the outcomes researched.
2) Methods section does not mention comparing perceived harm of non-combustible and combustible tobacco products. However, in the table 3, figure and Results we are presented with data for cigarettes. Either remove ‘Cigarettes’ column from table 3, figure 2 and Results or include into study objectives, as well as Methods and Discussion sections.
She is right that the Methods section did not (and still does not) provide adequate information. However, quibbles about what constitutes the “control” have nothing to do with that. The relevant control (i.e., comparison measure) is the “before” perception, which the “after” perception can be measured against. The inclusion by the authors of a supposed “control” group (who were shown an unrelated ad before the “after” measure) is of no consequence.
Her second recommendation was actively harmful. It seems that the original paper actually did have cigarettes in the results, as you can see here, and the authors indeed removed it in response to this comment, as they confirmed in their response to the reviewers. (Note that I wrote my review not knowing this.) While that did not fully calibrate the arbitrary scale (as I noted, more useful would be a low-risk calibration), it contributed quite a bit of useful information. In particular, from that we know that the average rating, on the 9-point scale, for cigarettes was around 8, barely higher than the 7 to 7.5 for smokeless tobacco. This confirms my belief that we should interpret the latter results as gross overestimates of the risk (not that there was any real doubt about it).
It turns out that it is not at all unusual for substantive recommendations in health journal reviews to make the paper worse rather than better.
Sharapova’s other “major compulsory revisions” were:
3) In Discussion section, authors emphasize statistical significance of increased perceived harm and decreased positive attitudes and are dismissive of the statistical significance of increased openness to trying alternative tobacco products, though effect sizes in all three are similarly small.
4) In Conclusion section, the major claim that “lower risk” label is similar with “FDA approved” label is not supported by the data. The effect sizes are small and inconclusive, sample sizes of the groups are too small to provide adequate power to detect such a small change. The practically significant results of the study are increase in perceived harm of e-cigarettes after exposure to graphic warning label, and reduced positive attitude towards e-cigarettes after exposure to current and graphic warning labels, which support the claim the “Regulatory agencies should consider implementing graphic warning labels for smokeless tobacco and investigate use of warning labels for e-cigarettes”. This study definitely warrants a bigger, stronger designed study of the warning labels for alternative tobacco products.
Point 3 is a complete yawn. Obviously quibbles about statistical significance make no difference whatsoever. You might think the fact that none of the authors’ conclusions related to the statistical analysis of the study results would be a clue about that.
The first point 4 is basically correct, though again inanely trivial. The authors did not remove the incorrect claim, as you can see in the final paper. Instead, as you can see in their response, they employed the standard tactic of pretending to agree with the change and making some change that was related to the text in question, but not actually making the “mandatory revision”. This is yet another common problem with journal peer-review: Even if a recommendation is valid, it is easy to pretend to accept suggestions while not actually accepting them, counting on the reviewers to not check the revised version. Indeed, the authors actually made it worse by adding a common bit of innumerate nonsense which effectively claimed that though their result was not really statistically robust, if they had done a bigger study it would have been. (Gee, if you are so sure what a big study would show, why even do research?) And, sure enough, when Sharapova wrote her follow-up comment on the revised version it included only a single copy editing suggestion, with no apparent recognition that they ignored her valid (albeit unimportant) “mandatory revision”.
Her last point in the above was a demand to make the conclusions of the paper wander even further from what was supported by the study, stating the authors’ unsupported political opinions even more strongly. Needless to say, the authors complied with that one.
Finally, a few selections from Sharapova’s minor suggestions:
5) Newer publication is available to support “Tobacco use remains the leading cause of preventable death in the United States” claim, e.g. The Health Consequences of Smoking – 50 Years of Progress, A Report of the Surgeon General, 2014.
So not only did she notice that the authors were using irrelevant studies of smoking to make claims about smoke-free products, but she actually suggested another one.
10) The sample has about 60% of college educated people. It would be interesting to see a discussion of how it might have affected the study results.
She recognizes that the sample might be misleading, but fails to recognize that the main reason for this is that it is restricted to people who chose to never use the products in question. Instead, she only notices the deck-chairs level detail about them not representing that irrelevant subpopulation very well.
11) Discussion section, ‘It is worth noting that although participants were non-users of tobacco (who had not smoked more than 100 cigarettes or used smokeless tobacco more than 20 times in their lifetime), over 8% reported trying alternative tobacco products and over 4% were current (past month) users.’ It is not clear what do the current users consume.
That is actually a legitimate catch. I noticed this weirdness too, though I did not bother to even mention it, given how trivial it was compared to the real problems. Still, it further illustrates she was aware of the issue I just noted, and still did not find it to be a problem. It turns out that the answer to her question as asked is in the next sentence, so the authors just blew this off with reference to that. But this still does not add up after reading that next sentence and Sharapova did not follow-up.
And there we have it. The rest of what she had to say was even less consequential than what I quoted.
You might think that this was an extreme case. To summarize, what we saw here was: one reviewer not even assessing the content and just making his recommendation about whether to publish based on what he thought of the conclusions; the second doing basically the same in terms of her recommendation to publish; the second reviewer making some comments, unlike the first, but failing to note any of the real problems; the second reviewer demanding a change that actually made the paper substantially worse.
But this is not an extreme case. I would say, based on extensive experience with reviews in public health journals, that this comes in at about the 40th percentile for quality in public health journal reviews. Maybe it is as low as the 30th percentile. So it is worse than average, but it is far from being an outlier. For tobacco-related reviews in public health, it is the modal experience, and comes in at about the 60th percentile, maybe even 70th.
So, who is feeling good about the institution of journal peer-review?