Why ANOVA is not the choice for non-normal data

Reading notes of Stroup, Walter W., “Rethinking the Analysis of Non-Normal Data in Plant and Soil Science”, Agronomy Journal 107, 2 (2015), pp. 811.

Some history: Fisher and Mackenzie (1923) published the first ANOVA results. Nelder and Wedderburn (1972) introduced generalized linear models, a major departure in approaching non-normal data. Breslow and Clayton (1993) and Wolfinger and O’Connell (1993) integrated mixed models and generalized linear mode theory and methods. The following two decades saw intense development of GLMM theory and methods.

ANOVA rests on three assumptions: independent observations (vs correlated observations), normally distributed data (vs non-normal data), and homogeneous variance (vs heterogeneous variance). However, non-normal data are common in most cases, e.g. count (Poission or Negative binomial), time of flowing (Exponential or Gamma), continuous proportion such as leaf area affected (Beta), quadrats observed out of n quadrats (Binomial). For all non-normal distributions, their variance depend on the mean. Thus, if data are non-normal, chances are their variance are not homogeneous. Traditionally, the Central Limit Theorem assures that sampling distribution of means will approximately normal if sample size is large enough. Standard variance-stabilizing transformations are used to deal with heterogeneous variances, e.g. log(count + 1), sqrt(small_count + 3/8), count^(2/3), asin(sqrt(proportion)). GLMMs extended the linear model theory to accommodate data the may be non-normal, have heterogeneous variance, and be correlated. On the GLMMs point of view, ANOVA is antiquated or even obsolete.

Stroup (2015) showed that ANOVA with untransformed and log-/sqrt-transformed count data and GLMM all control Type I error adequately, but GLMMs have more power to detect treatment differences; for discrete proportion data, untransformed ANOVA yields estimates of the marginal \(p_i\) but not the correct standard errors, the GLMM yields estimates of the conditional \(p_i\) and correct standard errors, the arc sine transformed ANOVA does not provide estimates of either.

Take a binomial example: the ith treatment in the jthe block with \(N_{ij}\) yes-no observations and probability \(p_{ij}\) of a yes response on any given ijth observation unit. Three distributions relevant to the analysis of these experimental data.

The distribution of block effects (random effects). Blocking is a design strategy to ensure that units within blocks are as similar as possible. Variability among blocks are expected and we assume the blocks are representative of blocks we could have used. Thus variation among blocks is assumed to be a normal distribution: \(b_j\sim NI(0,\sigma_{B}^{2})\) (normal and independently).
The distribution at the unit level: observations in the ij unit ~ \(Binomial(N,p_{ij})\). This distribution conditional on the random effects. \(y_{ij}|b_j\sim Binomial(N,p_{ij})\): the distribution of the observations, conditional on the observation being in the jth block, is binomial distributed (with N and \(p_{ij}\)).
The actually observed distribution: the marginal distribution. When we say we have binomial data, we are referring to the distribution of the observations conditional on the ijth unit. The distribution of observed data–the marginal distribution–is most likely not binomial distributed.

The first two distributions, we cannot observed directly. The only distribution we observed is the third one. This is not an issue if the first two are normal distributions as the third will also be normal. For all other non-normal data, the marginal distribution of the observed data is quite different. Our usual intuitions can betray and mislead. The fundamental problem of analyzing non-normal data is that what we want to estimate or test (in this example, the treatment effects on \(p_{ij}\) of binomial data) involves parameters of distributions that we cannot directly observed. In another word, the information we want are camouflaged in a complex observed marginal distribution. GLMMs can extract the information we want from the observations we have but not ANOVA and regression.

The GLMM conditional estimate asks: “if I take an average number of the population, which means a member of the population whose block effect \(b_j=0\), what is the estimated binomial probability?” (think about median value). The marginal estimate asks: “if I average across all the members of the population, what is the mean proportion?” (think about mean value). Which one to use depends on your questions.

Stroup (2015) argues for binomial data, ANOVA with or without transformation should be considered unacceptable for publication. If the marginal mean best address the research objectives, the correct approach requires an alternative formulation of the GLMM, that is generalized estimating equations (GEEs, Zeger et al. 1988). GEE replaces random effects in the linear predictor with working variance and correlation and replaces the distribution with a quasi-likelihood. Assuming equal N for all experimental units, the beta GLMM is the preferred method if the marginal mean is the appropriate target. For unequal N, use the GEE.

In sum, Stroup’s (2015) main take-home message: for non-normal data, ANOVA, with or without transformed data, won’t work. The loss of accuracy and power are too great. GLMMs and, in some cases, GEEs are the methods of choice.