The probability density function of normal distribution is:
f(x)=1σ√2πe−(x−μ)22σ2
Support we have the following n i.i.d observations: x1,x2,…,xn.
Because they are independent, the probability that we have observed
these data are:
f(x1,x2,…,xn|σ,μ)=n∏i=11σ√2πe−(xi−μ)22σ2=(1σ√2π)ne−12σ2∑ni=1(xi−μ)2
log(f(x1,x2,…,xn|σ,μ))=log((1σ√2π)ne−12σ2∑ni=1(xi−μ)2)=nlog1σ√2π−12σ2∑ni=1(xi−μ)2=−n2log(2π)−nlogσ−12σ2∑ni=1(xi−μ)2
Let’s call log(f(x1,x2,…,xn|σ,μ)) as L,
then let:
dLdμ=−12σ2n∑i=1(xi−μ)2∣μ=0
solve this equation, we get
12σ2n∑i=1(2^μ−2xi)=0
Because σ2 should be larger than zero,
^μ=∑ni=1xin
Similarly, let
dLdσ=−nσ+n∑i=1(xi−μ)2σ−3=0
I realized that it would be better to get the maximum likelihood estimator
of σ2 instead of σ. Thus
^σ2=∑ni=1(xi−^μ)2n
But this MLE of σ2 is biased. A point estimateor ^θ is said to be an unbiased estimator
of θ is E(^θ)=θ for every possible value
of θ. If ^θ is not unbiased, the difference E(^θ)−θis
called the bias of ^θ.
We know that
σ2=Var(X)=E(X2)−(E(X))2⇒E(X2)=Var(X)+(E(X))2
Then
E(^σ2)=1nE(∑ni=1(xi−^μ)2)=1nE(∑x2i−n^μ2)=1nE(∑x2i−(∑xi)2n)=1n{∑E(x2i)−1nE[(∑xi)2]}=1n{∑(σ2+μ2)−1n[nσ2+(nμ)2]}=1n{nσ2+nμ2−σ2−nμ2}=n−1nσ2≠σ2
Bias is E(σ2)−σ2=−σ2n. In fact the unbiased estimator of
σ2 is s2=∑ni=1(xi−^μ)2n−1.
But the fact that s2 is unbiased does not imply that s is
unbiased for estimating σ. The expected value of the square
root is not the square root of the expected value. Fortunately, the
biase of s is small unless the sample size is very small. Thus
there are good reasons to use s as an estimator of σ.