A study published in the now defunct Journal of Serendipitous and Unexpected Results featured researchers showing several pictures of people interacting to a subject who sat inside an fMRI machine that took brain measurements.
What makes the experiment somewhat unusual is that the subject was an Atlantic Salmon — 18 inches long, 3.8 pounds and dead, very dead. After taking measurements, the scientists cooked and ate it.
And yet, even this dead fish produced results that suggested its brain actively responded to the stimulus with a 99.9 percent certainty.
These results are, to put it mildly, absolutely absurd, yet the study has taken on a life of its own, earning accolades from other scientists as well as the prestigious Ig Nobel Prize, which is offered to achievements that “first make people laugh, then make them think.” That’s because the point of the experiment had nothing to do with measuring brain waves of dead fish — it was a demonstration showing that researchers can’t blindly trust statistical methods.
In this particular case, the fMRI generates such an astounding amount of data — about 130,000 points per measurement — that some of the noise will appear to be signal and produce apparent correlation out of coincidence.
This same kind of problem permeates research by scientists unbeknownst to them. In order to publish, the scientists need to do statistical tests that produce a probability value, referred to as a p-value, below .05. This indicates that there is less than a 5 percent chance of getting the same results if there is no correlation between the two things being compared.
But formulas can’t take human error or potential bias into account.
With the dead fish study, the scientists didn’t state what they were expecting to find before they went looking for it and found a result where none existed. This is like throwing a dart at the wall and drawing a target around it.
The extent of the problem can best be summarized by the title of a theory paper published in 2005: Why Most Published Research Findings Are False. Nearly 10 years old, the paper still ranks as the second-most viewed on the PLOS Medicine website. As the paper explains, if researchers don’t take into account the probability of a finding being true prior to an experiment, they will produce a lot of results that fizzle out upon closer inspection.
The science community is slowly becoming aware of how bad this “p-hacking” problem is. Studies have looked at results across several journals and discovered there is a much larger clump of papers with results on or around that magic “.05” number than we would expect by chance alone. And this clump is higher than it was 40 years ago, partially because computers are more readily available now. Rather than having to break out the slide rule or reserve processor time, scientists can calculate p-value on their laptops while taking data, stopping early if they get the results they want or throwing out data that raises the number too high. Some have even cynically described the results of several papers to be little more than a fairly accurate measure of the researchers’ bias.
For this reason, it is important for those outside the scientific community to take newspaper headlines with a healthy dose of skepticism. Just because scientists found a result doesn’t make it so. Also, the question shouldn’t be “Is there an effect?” It should be “What is the size of the effect?” Overloaded studies can produce certainty with minute effect sizes, but if said effect is so small that it requires millions of subjects to uncover it, even if it is genuine, how significant can it really be?
One of science’s greatest strengths is that it isn’t decided by a single experiment but by many that converge on the same answer. If several experiments manage to replicate the results of the dead fish study, maybe we need to rethink our views on the thought processes of post-mortem salmon. Until then the only major conclusion the research can lead us to is that when properly prepared, store-bought fish can be quite tasty.