Monday, December 29, 2014

The crisis in p-hacking

Here is a new article in American Scientist:
The Statistical Crisis in Science

Data-dependent analysis—a “garden of forking paths”— explains why many statistically significant comparisons don't hold up.

Andrew Gelman, Eric Loken

There is a growing realization that reported “statistically significant” claims in scientific publications are routinely mistaken. Researchers typically express the confidence in their data in terms of p-value: the probability that a perceived result is actually the result of random variation. The value of p (for “probability”) is a way of measuring the extent to which a data set provides evidence against a so-called null hypothesis. By convention, a p-value below 0.05 is considered a meaningful refutation of the null hypothesis; however, such conclusions are less solid than they appear.
It is a good article, but the author admits:
Russ correctly noted that the above statement is completely wrong, on two counts:

1. To the extent the p-value measures “confidence” at all, it would be confidence in the null hypothesis, not confidence in the data.

2. In any case, the p-value is not not not not not “the probability that a perceived result is actually the result of random variation.” The p-value is the probability of seeing something at least as extreme as the data, if the model (in statistics jargon, the “null hypothesis”) were true. ...

Russ also points out that the examples in our paper all are pretty silly and not of great practical importance, and he wouldn’t want readers of our article to get the impression that “the garden of forking paths” is only an issue in silly studies.
I guess it is just an editing error. It is amazing that a major science magazine can hire expert statisticians to explain what is wrong with p-values, and get it wrong in the first paragraph.

No comments:

Post a Comment