Science journals commonly use p-values, a statistical index used in research data, to decide whether papers are worthy of publication. However, even scientists frequently misuse p-values, according to a statement released by the American Statistical Association.
Contrary to popular belief, a p-value does not measure whether a hypothesis is true or whether the data were produced randomly or by chance.
Neither a p-value or statistical significance measures how large an effect is or whether a result is important. This statement also stressed that no scientific conclusions, business or policy decisions should rely solely only on whether a p-value passes a specific threshold.
“No single index should substitute for scientific reasoning,” the statement concluded.
Donald Berry, a biostatistics professor at the University of Texas MD Anderson Cancer Center in Houston, who contributed to the statement, wrote in a commentary that there have been real medical and economic consequences to this issue.
Berry said almost everyone has difficulty understanding p-values, and that many people often do not realize they misinterpret it.
“If you were to imagine the best scientist in the world, [winner of multiple] Nobel prizes, walking with the best reporter in the world — and I know candidates for both — and I ask them what a p-value is, they would tell me, and I would say it’s wrong,” Berry said. “It’s wrong in a very serious way that is detrimental to the field. [A p-value] is fundamentally un-understandable.”
Ronald Wasserstein, executive director of the American Statistical Association and lead author of the statement, said this issue often leads to decisions that accept certain methodologies as better when they are not and reject promising methodologies because they fail to meet certain requirements.
Wasserstein said a p-value is a commonly-used metric that summarizes the strength of evidence in experiments, while statistical significance shows that a p-value is small enough to indicate notable results. Unfortunately, however, he said statistical significance has become an arbitrary threshold of whether research is publishable.
“There is a difference between something being statistically significant and scientifically meaningful,” Wasserstein said.
Wasserstein gave an example of a survey that attempted to find out if students in two different degree programs spend different amounts of time in the library. After interviewing students in program A and program B and getting an estimate of how much time students in each program spend in the library, Wasserstein said his statistical analysis resulted in a small p-value. This confirmed the finding is statistically significant which means there is an actual difference between the time spent in the library by students in program A and those in program B.
However, in total, students in program A spend only 11 more minutes per week in the library than those in program B.
“You have to ask yourself, ‘Even if this is statistically significant, does the [11 minutes difference] really matter?” Wasserstein said. “‘Does that [finding] really accomplish anything? How accurate are the data to begin with anyway?’”
Michael Mahometa, lead statistical consultant and lecturer for the Department of Statistics and Data Sciences, citing a principle in the ASA statement, said scientists may be able to mitigate the issue with transparency in scientific reporting and openness in research data.
“P-value can be, as we have seen, misinterpreted — but it’s really not the p-value’s fault,” Mahometa said. “What about all those findings that are not reported because they were ‘not significant’? [Those results] still tell part of the scientific story, yet for some reason they have been deemed unimportant.”