2 min read

Maximum likelihood estimation of normal distribution

The probability density function of normal distribution is: f(x)=1σ2πe(xμ)22σ2

Support we have the following n i.i.d observations: x1,x2,,xn. Because they are independent, the probability that we have observed these data are: f(x1,x2,,xn|σ,μ)=i=1n1σ2πe(xiμ)22σ2=(1σ2π)ne12σ2i=1n(xiμ)2

log(f(x1,x2,,xn|σ,μ))=log((1σ2π)ne12σ2i=1n(xiμ)2)=nlog1σ2π12σ2i=1n(xiμ)2=n2log(2π)nlogσ12σ2i=1n(xiμ)2

Let’s call log(f(x1,x2,,xn|σ,μ)) as L, then let: dLdμ=12σ2i=1n(xiμ)2μ=0 solve this equation, we get 12σ2i=1n(2μ^2xi)=0

Because σ2 should be larger than zero, μ^=i=1nxin

Similarly, let dLdσ=nσ+i=1n(xiμ)2σ3=0

I realized that it would be better to get the maximum likelihood estimator of σ2 instead of σ. Thus

σ^2=i=1n(xiμ^)2n

But this MLE of σ2 is biased. A point estimateor θ^ is said to be an unbiased estimator of θ is E(θ^)=θ for every possible value of θ. If θ^ is not unbiased, the difference E(θ^)θis called the bias of θ^.

We know that σ2=Var(X)=E(X2)(E(X))2E(X2)=Var(X)+(E(X))2

Then E(σ^2)=1nE(i=1n(xiμ^)2)=1nE(xi2nμ^2)=1nE(xi2(xi)2n)=1n{E(xi2)1nE[(xi)2]}=1n{(σ2+μ2)1n[nσ2+(nμ)2]}=1n{nσ2+nμ2σ2nμ2}=n1nσ2σ2

Bias is E(σ2)σ2=σ2n. In fact the unbiased estimator of σ2 is s2=i=1n(xiμ^)2n1. But the fact that s2 is unbiased does not imply that s is unbiased for estimating σ. The expected value of the square root is not the square root of the expected value. Fortunately, the biase of s is small unless the sample size is very small. Thus there are good reasons to use s as an estimator of σ.