4
$\begingroup$

As stated by the title, I am looking for an example (if any exists) of a distribution for which annulling the gradient of the (log-)likelihood function w.r.t. the parameters is not enough to ensure we have a MLE. I mean, cases where checking for negative definite Hessian is crucial and might lead to reject first-order solutions ?

More generally, are there cases for which second-order conditions can be skipped ? For examples, when computing MLE of a normal distribution, I have rarely seen people checking the Hessian's nature...

$\endgroup$
7
  • 1
    $\begingroup$ There are several examples of likelihood functions with saddle points. A classic one is a bivariate normal with a functional relationship between the means (see Solari 1969). Another are so-called "stochastic frontier" models (see Waldman 1981). $\endgroup$
    – Durden
    Commented 2 days ago
  • $\begingroup$ Are you asking for cases where the mle is not at a point with zero gradient, or where a point with zero gradient is not the mle? The current answer seems to address the first point. $\endgroup$ Commented 2 days ago
  • 3
    $\begingroup$ Any example where the MLE is found on the boundary of the parameter space will serve. There are many, as attested by a substantial literature studying the asymptotic distribution of the MLE in such cases. This question really is about how students tend to forget that finding zeros of the derivative of a function is only part of finding an extremum. $\endgroup$
    – whuber
    Commented 2 days ago
  • 1
    $\begingroup$ Also re I have rarely seen people checking the Hessian's nature, most probably, it involves least square in which case you are also dealing with convex function @MysteryGuy. $\endgroup$ Commented 2 days ago
  • 1
    $\begingroup$ Perhaps not quite what you seek but an example where the gradient of the log-likelihood being zero is insufficient is finding the MLE of the mode of a triangular distribution with support on (0,1) and an intermediate mode. In that case, the likelihood has cusps, and any maximum will occur at one of those. The zeros of derivative of the log-likelihood are local minima; between the cusps the second derivative of the likelihood is positive. There's a couple of posts that mention this distribution and that MLE issue here. $\endgroup$
    – Glen_b
    Commented 2 days ago

2 Answers 2

7
$\begingroup$

Here is the likelihood function for fitting a Cauchy distribution with scale $\gamma = 1$ and unknown location $\lambda$, where we made three observations $x_1,x_2,x_3 = 0, 6,6$.

You can see that there are two points (the local maximum around 0.4 and the local minimum around 1.7) where the gradient is zero, but they are not a global maximum.

likelihood with Cauchy distribution

related question: Maximum likelihood estimator of Cauchy distribution but with a catch

$\endgroup$
4
  • 1
    $\begingroup$ Interesting and simple as well ! +1 $\endgroup$
    – MysteryGuy
    Commented 2 days ago
  • 1
    $\begingroup$ The extra maxima don't go away with increasing sample size, either. Though the spurious minima and maxima do diverge to infinity, so the stationary point reached by optimisation starting at the median is a reliable estimator in larger samples. $\endgroup$ Commented 2 days ago
  • $\begingroup$ Isn't there a global maximum at 6? $\endgroup$
    – Davidmh
    Commented 23 hours ago
  • $\begingroup$ @Davidmh yes, the global maximum is around 6, but there is another local maximum around 1 and a local minimum around 2 where the gradient is zero but the value is not the MLE. $\endgroup$ Commented 23 hours ago
5
$\begingroup$

Two quick examples that aren't contrived:

  • When $\langle X_i\rangle_{i\in\{1,~\ldots~,~n\}}\sim\mathrm U[0, \theta),~\theta> 0. $

    It's an easy exercise to check the $n$th–order statistic is the mle and yet the derivative of the likelihood cannot be zero there.

  • When $\langle X_i\rangle_{i\in\{1,~\ldots~,~n\}}\sim\mathrm{Lap}(\theta), ~\theta>0.$

    A unique mle exists at the median of the sample when $n$ is odd. But again the log likelihood is not differentiable at any of the sample variables.


For a situation where one encounters a likelihood having a stationary point and yet it fails to turn out to be an mle, consider this location family

$$f(x\mid \theta)=\begin{cases}\frac{1}{4\exp(\vert x\vert)\sqrt{\pi\vert x\vert}}, & x\in(-\infty,0]\\ \frac{1}{2\pi\sqrt{x(1-x)}},& x\in (0,1)\\\frac{1}{4\exp{(x-1)}\sqrt{\pi(x-1)}},&x\in(1,\infty)\\0,&x\in\{0,1\}\end{cases};$$

though apparently intimidating, the construction of the density function was based on the objective that there would be a local minimum in $(0,1)$ and the graph in $(-\infty, 0)$ and $(1, \infty) $ would be mirror image of each other.

When $n=1, $ the likelihood function is same as the density. Hence even though a stationary point exists, it is a local minimum and hence no mle exists.

enter image description here

--

Reference:

$\rm[I]$ Counterexamples in Probability and Statistics, Joseph P. Romano, Andrew F. Siegel, Wadsworth, $1986, ~8.14, 8.16, 8.17.$

$\endgroup$
3
  • $\begingroup$ Okay, I agree with your examples but I was specifically referring to cases where I assume some stationary points, at least ^^ $\endgroup$
    – MysteryGuy
    Commented Jun 30 at 12:31
  • $\begingroup$ @MysteryGuy, since it is of one sample observation, the likelihood is just the pdf and so it inherits the stationary property of the pdf. $\endgroup$ Commented Jun 30 at 12:38
  • 2
    $\begingroup$ I imagine that the authors of your reference had fun writing a book about counterexamples! 🤪 $\endgroup$
    – Galen
    Commented 2 days ago

Not the answer you're looking for? Browse other questions tagged or ask your own question.