If you can’t see the image, these are the steps you should follow to commit a common FAIL in statistics:
- Look at data.
- make a hypothesis based upon what you’ve observed.
- Run standard statistical tests as if you have chosen your hypothesis independent of the data.
- Publish your paper without mention of step #2 (i.e., that your analysis was data-dependent).
While not all statistics fails are of this kind, many are. The replication crisis is in part built upon this problem: see Gelman and Loken’s article:
Unwelcome though it may be, the important moral of the story is that the statistically significant p-value cannot be taken at face value — even if it is associated with a comparison that is consistent with an existing theory.
The problem is not p-values in-themselves. They are just a tool. And while we can argue over whether it is the right tool for the job, that would miss the point here: lack of openness in the process of science leads to misleading, even outright false claims being taken as “truth” by those of us reading the papers later.
Often, researchers are NOT doing this consciously! They are simply falling prey to the downsides of a bad method they are hardly aware they are using.
Openness and transparency about every step of the process is the key. Data-dependent analysis is sometimes appropriate, but readers should know how the process was conducted end-to-end. This is similar to the push for good documentation in computer programming so that the next person isn’t totally lost when reading your code. Science, like programming, is algorithmic — it’s a process. The more documentation we have of how each step was taken, the better.
Nothing in science should be taken at face-value. That goes for the entire process, not just p-values.
Now go lift something heavy,