Can We Trust Published Scientific Research? |

The core principle of evidence-based medicine is that not all evidence is created equal. Controlled scientific studies provide more reliable evidence than anecdotes because they contain controls for universal sources of error in human perception and judgment. This is a major feature that distinguished science from pseudoscience, real science-based medicine from faith-based practices like homeopathy and energy medicine, in which belief and “seeing with your own eyes” always trumps controlled research evidence.

The fact that controlled research is better than haphazard observation does not, of course, mean that observation is always wrong. Nor does it mean that the results every scientific study are true. In fact, when proponents of alternative therapies try to claim that their practices actually are scientifically validated, they often parade lots of scientific studies that seem to support these claims. When the flaws in these studies that undermine their conclusions are pointed out by skeptics, this is often seen as cheating, as closed-minded rejection of evidence in favor of CAM. However, critical appraisal, the evaluation of the quality and limitations of scientific studies, is another core principle of evidence-based medicine, and it applied to all research, regardless of the kind of hypothesis under study. Scientists, unlike proponents of CAM, are generally their own toughest critics, because rigorous criticism is necessary to weed out error and eventually uncover the truth about nature.

One of the leading scientists working to identify the weakness in published scientific reports, and to develop strategies for correcting them, is John Ioannidis at Stanford University.

Ioannidis, J. Why most published research findings are false. PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124.

Despite the inflammatory title of the article, it is actually a tightly reasoned and mathematically rigorous look at specific sources of error in the published scientific literature. Knowing the features that suggest the results of a particular article may not be reliable helps one determine the degree of confidence it is appropriate to have in a particular conclusion. Here are the features Ioannidis identifies as most significant:

The smaller the studies conducted in a scientific field, the less likely the research findings are to be true.
The smaller the effect sizes…
The greater the number and the lesser the selection of tested relationships…
The greater the flexibility in designs, definitions, outcomes, and analytical modes…
The greater the financial and other interests and prejudices…
The hotter a scientific field (with more scientific teams involved)…

Many of these variables are well-known to be associated with false positive results. And while they reduce our confidence in the literature of many fields, generally the CAM literature is weaker in terms of most of these factors than the literature of mainstream scientific medicine. In particular, CAM studies are seldom replicated, usually involve small numbers of patients, often show very small, marginally significant effects, and involve considerable preconceptions or bias on the part of investigators which the studies do little to control.

Many of these factors all hinge on the degree of individual judgment or flexibility allowed in a study. The more choices the investigators have to make, in the design, conduct, and analysis of a research study, the more likely their own biases are to influence the results. Another article from a few years ago specifically addresses this issue:

Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci. 2011 Nov;22(11):1359-66. doi: 10.1177/0956797611417632. Epub 2011 Oct 17.

The conclusions of this paper reinforce that of Dr. Ioannidis and also emphasize a point I make here frequently; that when bias and error to creep into scientific research, they are far more likely to create the false impression of a positive result than a negative one. In other words, scientists, like all human beings, see what they want and expect to see, and if a scientific study allows these desires and expectations to influence the results, these results will tend to confirm the investigators’ beliefs even when they are false.

…flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not.

This is, as always, true in all areas, but it is a particular problem in CAM research where investigators almost always begin with strong, nearly unshakeable faith in their beliefs and where research often has poor controls for bias generally.

The point, of course, is not to suggest that scientific studies are useless and we should all go back to believing whatever we like based on our own experiences or anecdotes told to us by others. The point is that understanding the nature and severity of the weaknesses in scientific research gives us power; power to avoid excessive confidence in our conclusions and power to correct the weaknesses that reduce this confidence.

Both Dr. Ionnidis and the authors of the article in Psychological Science offer concrete measures for improving the reliability of published research. Simmons and colleagues offer these suggestions:

These measures mostly involve more transparency in the reporting of how studies are conducted. This should both help us identify weaknesses that might reduce confidence in the results and also encourage authors to address these in the design and conduct of the trial, since they know they will have to disclose them later.

In a recent paper, Dr. Ioannidis also offers some advice for improving the quality of published scientific research:

Ioannidis JPA (2014) How to Make More Published Research True. PLoS Med 11(10): e1001747. doi:10.1371/journal.pmed.1001747

His suggestions are wide-ranging, from improved statistical practices to changing the financial and career incentives for scientists. These address both the way in which personal bias influences results and some of the sources of that bias.

Box 1. Some Research Practices that May Help Increase the Proportion of True Research Findings

Large-scale collaborative research
Adoption of replication culture
Registration (of studies, protocols, analysis codes, datasets, raw data, and results)
Sharing (of data, protocols, materials, software, and other tools)
Reproducibility practices
Containment of conflicted sponsors and authors
More appropriate statistical methods
Standardization of definitions and analyses
More stringent thresholds for claiming discoveries or ‘‘successes’’
Improvement of study design standards
Improvements in peer review, reporting, and dissemination of research
Better training of scientific workforce in methods and statistical literacy

Anything human beings do is imperfect, and this applies to science as much as anything. Often, when I suggest that science is a more reliable guide to how nature works, and what is or is not safe and effective medicine, people object that science conflicts with their personal beliefs or experiences, and they trust their gut or their eyes more than controlled data. But these sources of information contain very little in the way of controls for human error. The evidence of history, and the clear improvement in our health and well-being since we began to apply scientific methods, demonstrate that science is dramatically superior as a means for gathering knowledge than such methods.

It is true, however, that even the best methods we have for obtaining knowledge are still imperfect and still involve some freedom for bias to enter into our conclusions. Relying on science does not mean blinding trusting the results of every single study. Evidence-based medicine requires that we carefully and critically evaluate individual studies and the process of scientific research as a whole, always seeking to identify and reduce error. Merely criticizing or dismissing science as imperfect is not, in itself, useful. Such criticism must be sufficiently specific and focused to allow for strategies of improvement, and must contain at least an implicit recognition that science is still the best tool we have for understanding how nature works.