The p-value of 0.05, popularized by Ronald Fisher in the early 20th century, remains a central fixture of statistical practice. It has also become a subject of ongoing debate, with critics frequently arguing that the threshold is subjective, arbitrary, or even misleading.
The claim of arbitrariness is not incorrect in a literal sense. The figure of 0.05 is indeed a convention rather than a mathematically privileged constant. However, this critique misses the central purpose of the cutoff. The 0.05 threshold was never intended as a metaphysical truth about evidence, nor as a categorical endorsement of an alternative hypothesis whenever the line is crossed. Its function is regulatory. It serves as a normative filter for discourse: data that do not meet at least this level of rarity under the null hypothesis are excluded from serious consideration.
The rule says: "results weaker than this shall not even enter the conversation"
It does NOT say: "results at this level must be accepted as true"
The exact quote by Fisher is:
Personally, the writer prefers to set a low standard of significance at the 5 per cent point, and ignore entirely all results which fail to reach this level (c) Ronald Fisher
This distinction matters. The 0.05 p-value is not a guarantee of correctness but a convention for excluding noise, much like a quality-control gate. Without such a filter, research would be overwhelmed by spurious correlations and chance findings.
A stronger filter may, in fact, be preferable. One proposal is to set the threshold at 1/32 (p = 0.03125). This is directly interpretable in terms of a coin-toss analogy: the probability of observing five heads in a row is 1/32. Few would regard 4 consecutive heads as remarkable; 5, however, may be granted a bare modicum of attention, despite being pretty unremarkable too. By analogy, results that do not reach the 1/32 benchmark should be discarded without further debate or explanation.
In this framing, the threshold is not about enshrining a mystical significance level but about maintaining a minimum standard of evidence to prevent the proliferation of weak claims. The 0.05 rule is already a pragmatic boundary; adopting a more stringent one, such as 0.03125, may further strengthen the reliability of accepted findings.