2 min read

To P or not to P? This is a question.

Murtaugh, P.A. (2014). In defense of P values. Ecology, 95, 611–617.

Burnham, K.P. & Anderson, D.R. (2014). P values are only an index to evidence: 20th- vs. 21st-century statistical science. Ecology, 95, 627–630.

Ecologists usually argue that using confidence intervals and/or differences in AIC is better than using p-values since p-values usually lead to binary decisions (yes/no). However, this is mainly because of issues from procedures of practitioners rather from mathematical sound. It is straightforward that p-value and confidence intervals are calculated in a similar way. In addition, Murtaugh (2014) demonstrated that p-value and delta AIC actually have one-to-one relationships when we talk about nested linear models with normal errors (Fig. 2 in the paper). p value and AIC
As a result, most criticisms of p-value also apply with delta AIC and confidence intervals. The choice of cutoffs when one uses delta AIC in model selections is as arbitrary as the choice of significance levels of p-value. One advantage of delta AIC is that we can use it to compare non-nested models, where p-value is of no help. Based on these, Murtaugh argued that:

  • P-values, confidence intervals and delta AIC all have their places in modern statistical practice.
  • Instead of debating which one is the best, we should pay more attention on experimental designs.
  • When report p-value, always report the estimated effect size.

On the other hand, Burnham and Anderson (2014) disagreed with Murtaugh (2014), opening with: >We were surprised to see a paper defending P-values and significance testing at this time in history.

They argued that p-values are conditional on the null hypothesis being true and include probabilities of data (theoretical distribution) never observed. As a result, the sample spaces of the observed data and the theoretical data are “irrelevant”, which violates the likelihood principle. They also criticism that the fig.2 above only apply to the simplest situation: nested linear model differing by only one parameter. However, in real life, we usually need to handle with more complex issues.

In their opinion, R. A. Fisher went too far when he (and others) showed that the differences of log likelihood was distributed asymptotically as chi square distribution. Inferential data analysis given the data should be based directly on the likelihood ratio. There is no need for a null hypothesis, asymptotic distribution of a test statistics, alpha level, p-value and statistical significance.

The goal of 21st century is to base inference on all the models weighted by their model probabilities (model averaging).