Statistical Significance

In general, a measure of how confidently an observed event or difference between two or more groups can be attributed to a hypothesized cause. The p-value is the most commonly encountered way of reporting statistical sig-

328 Steering nificance. The (frequentist) interpretation of a p-value of 0.05 is that, if you repeated the experiment a very large number of times, you would expect that result, or a more extreme one, 5 per cent of the time by chance alone. More formally, one forms a null hypothesis about what the underlying data or relationships are. The null hypothesis is typically that something is not present, that there is no effect or that there is no difference between the populations comprising the experimental group and the controls in an experiment. One then calculates the probability of observing those data if the null hypothesis is correct, using an appropriate statistical test (which will depend on the shape of the distribution of the sampled variables). If the p-value is small (0.05 is conventionally used) the result is said to be 'statistically significant' (that is, it is highly unlikely that the null hypothesis is true). The precision of an estimated value is not the same thing as its statistical significance.

Clinical significance and policy significance are entirely different from statistical significance. One can have highly statistically significant estimates of things that are wholly irrelevant clinically, biologically or in terms of public policy. One reason why it may be irrelevant is that an effect may be highly statistically significant but so small in its absolute effect as to be completely uninteresting. Similarly, an important clinical difference may not be reflected by a statistically significant outcome. Cf. Statistical Power.