Maximum likelihood estimation of normal distribution

The probability density function of normal distribution is: $f (x) = \frac{1}{σ \sqrt{2 π}} e^{- \frac{(x - μ)^{2}}{2 σ^{2}}}$

Support we have the following n i.i.d observations: $x_{1}, x_{2}, \dots, x_{n}$ . Because they are independent, the probability that we have observed these data are: $f (x_{1}, x_{2}, \dots, x_{n} | σ, μ) = \prod_{i = 1}^{n} \frac{1}{σ \sqrt{2 π}} e^{- \frac{(x_{i} - μ)^{2}}{2 σ^{2}}} = (\frac{1}{σ \sqrt{2 π}})^{n} e^{- \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} (x_{i} - μ)^{2}}$

$\begin{array}{cl} \log (f (x_{1}, x_{2}, \dots, x_{n} | σ, μ)) & = \log ((\frac{1}{σ \sqrt{2 π}})^{n} e^{- \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} (x_{i} - μ)^{2}}) \\ = n \log \frac{1}{σ \sqrt{2 π}} - \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} (x_{i} - μ)^{2} \\ = - \frac{n}{2} \log (2 π) - n \log σ - \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} (x_{i} - μ)^{2} \end{array}$

Let’s call $\log (f (x_{1}, x_{2}, \dots, x_{n} | σ, μ))$ as $L,$ then let: $\frac{d L}{d μ} = - \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} (x_{i} - μ)^{2} ∣_{μ} = 0$ solve this equation, we get $\frac{1}{2 σ^{2}} \sum_{i = 1}^{n} (2 \hat{μ} - 2 x_{i}) = 0$

Because $σ^{2}$ should be larger than zero, $\hat{μ} = \frac{\sum_{i = 1}^{n} x_{i}}{n}$

Similarly, let $\frac{d L}{d σ} = - \frac{n}{σ} + \sum_{i = 1}^{n} (x_{i} - μ)^{2} σ^{- 3} = 0$

I realized that it would be better to get the maximum likelihood estimator of $σ^{2}$ instead of $σ$ . Thus

${\hat{σ}}^{2} = \frac{\sum_{i = 1}^{n} (x_{i} - \hat{μ})^{2}}{n}$

But this MLE of $σ^{2}$ is biased. A point estimateor $\hat{θ}$ is said to be an unbiased estimator of $θ$ is $E (\hat{θ}) = θ$ for every possible value of $θ$ . If $\hat{θ}$ is not unbiased, the difference $E (\hat{θ}) - θ$ is called the bias of $\hat{θ}$ .

We know that $σ^{2} = V a r (X) = E (X^{2}) - (E (X))^{2} \Rightarrow E (X^{2}) = V a r (X) + (E (X))^{2}$

Then $\begin{array}{cl} E ({\hat{σ}}^{2}) & = \frac{1}{n} E (\sum_{i = 1}^{n} (x_{i} - \hat{μ})^{2}) \\ = \frac{1}{n} E (\sum x_{i}^{2} - n {\hat{μ}}^{2}) \\ = \frac{1}{n} E (\sum x_{i}^{2} - \frac{(\sum x_{i})^{2}}{n}) \\ = \frac{1}{n} {\sum E (x_{i}^{2}) - \frac{1}{n} E [(\sum x_{i})^{2}]} \\ = \frac{1}{n} {\sum (σ^{2} + μ^{2}) - \frac{1}{n} [n σ^{2} + (n μ)^{2}]} \\ = \frac{1}{n} {n σ^{2} + n μ^{2} - σ^{2} - n μ^{2}} \\ = \frac{n - 1}{n} σ^{2} \\ \neq σ^{2} \end{array}$

Bias is $E (σ^{2}) - σ^{2} = - \frac{σ^{2}}{n}$ . In fact the unbiased estimator of $σ^{2}$ is $s^{2} = \frac{\sum_{i = 1}^{n} (x_{i} - \hat{μ})^{2}}{n - 1}$ . But the fact that $s^{2}$ is unbiased does not imply that $s$ is unbiased for estimating $σ$ . The expected value of the square root is not the square root of the expected value. Fortunately, the biase of $s$ is small unless the sample size is very small. Thus there are good reasons to use $s$ as an estimator of $σ$ .