Originally published October 11, 2016
Here is an excerpt:
What matters to a scientific observer is how often you’ll be wrong if you claim that an effect is real, rather than being merely random. That’s a question of induction, so it’s hard. In the early 20th century, it became the custom to avoid induction, by changing the question into one that used only deductive reasoning. In the 1920s, the statistician Ronald Fisher did this by advocating tests of statistical significance. These are wholly deductive and so sidestep the philosophical problems of induction.
Tests of statistical significance proceed by calculating the probability of making our observations (or the more extreme ones) if there were no real effect. This isn’t an assertion that there is no real effect, but rather a calculation of what would be expected if there were no real effect. The postulate that there is no real effect is called the null hypothesis, and the probability is called the p-value. Clearly the smaller the p-value, the less plausible the null hypothesis, so the more likely it is that there is, in fact, a real effect. All you have to do is to decide how small the p-value must be before you declare that you’ve made a discovery. But that turns out to be very difficult.