Recently, one of my students asked me about optimization routines in R. He told me he that R performed well on the estimation of a time series model with different regimes, while he had trouble with a (simple) GARCH process, and he was wondering if R was good in optimization routines. Actually, I always thought that mixtures (and regimes) was something difficult to estimate, so I was a bit surprised…

Indeed, it reminded me some trouble I experienced once, while I was talking about *maximum likelihooh estimation*, for non standard distribution, i.e. when optimization had to be done on the log likelihood function. And even when generating nice samples, giving appropriate initial values (actually the *true* value used in random generation), each time I tried to optimize my log likelihood, it failed. So I decided to play a little bit with standard optimization functions, to see which one performed better when trying to estimate mixture parameter (from a mixture based sample). Here, I generate a mixture of two gaussian distributions, and I would like to see how different the mean should be to have a high probability to estimate properly the parameters of the mixture.

The density is here proportional to

The true model is , and being a parameter that will change, from 0 to 4.

The log likelihood (actually, I add a minus since most of the optimization functions actually *minimize* functions) is

> logvraineg <- function(param, obs) {

+ p <- param[1]

+ m1 <- param[2]

+ sd1 <- param[3]

+ m2 <- param[4]

+ sd2 <- param[5]

+ -sum(log(p * dnorm(x = obs, mean = m1, sd = sd1) + (1 – p) *

+ dnorm(x = obs, mean = m2, sd = sd2)))

+ }

The code to generate my samples is the following,

>X1 = rnorm(n,0,1)

> X20 = rnorm(n,0,1)

> Z = sample(c(1,2,2),size=n,replace=TRUE)

> X2=m+X20

> X = c(X1[Z==1],X2[Z==2])

Then I use two functions to optimize my log likelihood, with identical intial values,

> O1=nlm(f = logvraineg, p = c(.5, mean(X)-sd(X)/5, sd(X), mean(X)+sd(X)/5, sd(X)), obs = X)

> logvrainegX <- function(param) {logvraineg(param,X)}

> O2=optim( par = c(.5, mean(X)-sd(X)/5, sd(X), mean(X)+sd(X)/5, sd(X)),

+ fn = logvrainegX)

Actually, since I might have identification problems, I take either or , depending whether or is the smallest parameter.

On the graph above, the x-axis is the difference between means of the mixture (as on the animated grap above). Then, the red point is the median of estimated parameter I have (here ), and I have included something that can be interpreted as a *confidence interval*, i.e. where I have been in 90% of my scenarios: the**black** vertical segments. Obviously, when the sample is not enough heterogeneous (i.e. and rather different), I cannot estimate properly my parameters, I might even have a probability that exceed 1 (I did not add any constraint). The blue plain horizontal line is the *true* value of the parameter, while the blue dotted horizontal line is the initial value of the parameter in the optimization algorithm (I started assuming that the mixture probability was around 0.2).

The graph below is based on the second optimization routine (with identical starting values, and of course on the same generated samples),

(just to be honest, in many cases, it did not converge, so the loop stopped, and I had to run it again… so finally, my study is based on a bit less than 500 samples (times 15 since I considered several values for the mean of my second underlying distribution), with 200 generated observations from a mixture).

The graph below compares the two (empty circles are the first algorithm, while plain circles the second one),

On average, it is not so bad…. but the probability to be far away from the tru value is not small at all… except when the difference between the two means exceeds 3…

If I change starting values for the optimization algorithm (previously, I assumed that the mixture probability was 1/5, here I start from 1/2), we have the following graph

which look like the previous one, except for small differences between the two underlying distributions (just as if initial values had not impact on the optimization, but it might come from the fact that the surface is nice, and we are not trapped in regions of local minimum).

Thus, I am far from being an expert in optimization routines in R (see here for further information), but so far, it looks like R is not doing so bad… and the two algorithm perform similarly (maybe the first one being a bit closer to the *true*parameter).

Very heplful, thanks!

Hi Arthur!

I’ve just read the above article and the one about the multivariate model with Gumbel copula you wrote in another website.

These 2 articles/scripts interest me because I have to estimate the 2 covariance matrices and a mixing parameter of a bivariate normal and a log-normal distribution by maximizing the log-likelihood. The maximization really has to be performed with the non-linear minimization routine (nlm). The 2 means of the distribution are known and equal to 1.

I’ve already tried the 2 ways below to do it for the bivariate NORMAL distribution but both methods give me an important amount of warnings and illogical results…

In the first one, I defined the bivariate normal distribution with the command dmvnorm

x<-rnorm(6000, 2.4, 0.6)

x <- matrix(c(x), ncol=1)

y<-rlnorm(6000, 1.3,0.1)

y <- matrix(c(y), ncol=1)

XY <- cbind(x,y)

L <- function(par,x,y) {

return (-sum(log(par[4]*dmvnorm(XY, mean=c(1,1), sigma= matrix(c(par[1], par[1]*par[2]*par[3],par[1]*par[2]*par[3], par[2] ),nrow=2, ncol=2)) )))

}

par.start<- c(0.5, 0.5 ,0.5 ,0.5)

result<-nlm(L,par.start,y=y,x=x, hessian=TRUE)

par.hat <- result$estimate

In the second one, I wrote step by step the bivariate normal distribution in order to have each parameter separately (not in a matrix) but it didn’t work as well:

x<-rnorm(6000, 2.4, 0.6)

y<-rlnorm(6000, 1.3,0.1)

L <- function(par,x,y) {

return (-sum(log((1-par[4])*( (1/(2*pi*par[1]*par[2]*sqrt(1-par[3])))*exp( (-1/2*(1-par[3]^2))* ((y-1)/par[2])^2 +((x-1)/par[1])^2 – 2*(y-1)*(x-1)/(par[2]*par[1]) )) )))

}

#par [1]= sigma_x , par [2]= sigma_y par [3]= rho_xy par[4] is a mixing parameter. The final step of my calculation will be to have a mixture of bivariate normal and log-normal distributions.

par.start<- c(0.5, 0.5 ,0.5 ,0.5)

result<-nlm(L,par.start,y=y,x=x, hessian=T)

par.hat <- result$estimate

I'm really new in R so I don't know if my scripts make sense or not…

I'm wondering if you see any errors or missing code in my 2 scripts?

Do you recommand me a special package to estimate the covariance matrix of multivariate distribution?

Thanks in advance for your help and avdices

Best regards,

Gladys Hertzog

Master student in environmental engineering

ETH Zurich, Switzerland

Merci pour votre billet! C’est très apprécié! En fin de compte, pour l’estimation numérique d’un modèle GARCH en minimisant la fonction de logvraisemblance x -1, il n’y a pas de problème avec R. C’était moi qui avait mal tenu compte des contraintes (donc ce n’était pas R qui était poche, mais plutôt moi!). On peut utiliser “nlminb” ou “constrOptim” qui sont des optimisateurs avec lesquels on peut spécifier des contraintes et ils fonctionnent bien dans le cas GARCH. En général, “nlminb” était plus rapide mais à quelques rares occasions ne trouvait pas le minimum absolu, tandis que “constrOptim” convergeait toujours vers la bonne solution.

Je vais essayer de fitter un modèle RS-GARCH et je vous donnerai des nouvelles si une optimisation avec EM est nécessaire ou non pour l’optimisation.

Hi Arthur,

you do realise that this is an ill-posed optimisation problem, because the likelihood diverges to +infinity?

Also, the log-likelihood has typically many local modes, so you results depends mostly on your choice of a starting point…

For mixtures, the standard approach for calculating the MLE is to use EM, not a standard optim procedure.

All the best,

Nicolas Chopin

REPONSE: hi Nicolas… I agree, but the point is that I have seen many people using that (in practice)…. And so far, it is not that bad actually…. So I guess the best would be to see how an EM algorithm performs one the same kind of samples… ok, I’ll start to work on that new post 🙂