Consider a (simple) Poisson regression . Given a sample where , the goal is to derive a 95% confidence interval for given , where is the prediction. Hence, we want to derive a confidence interval for the *prediction*, not the *potential observation*, i.e. the dot on the graph below

> r=glm(dist~speed,data=cars,family=poisson) > P=predict(r,type="response", + newdata=data.frame(speed=seq(-1,35,by=.2))) > plot(cars,xlim=c(0,31),ylim=c(0,170)) > abline(v=30,lty=2) > lines(seq(-1,35,by=.2),P,lwd=2,col="red") > P0=predict(r,type="response",se.fit=TRUE, + newdata=data.frame(speed=30)) > points(30,P1$fit,pch=4,lwd=3)

i.e.

Let denote the maximum likelihood estimator of . Then

where is Fisher information of (from standard maximum likelihood theory). Recall that

where computation of those values is based on the following calculations

In the case of the log-Poisson regression

Let us get back to our initial problem.

**confidence interval for the linear combination**

A first idea to get a confidence interval for is to get a confidence interval for (by taking exponential values of bounds, since the exponential is a monotone function). Asymptotically, we know that

thus, an approximation for the variance matrix of will be based on , obtained by plugging estimators of the parameters.

Then, since as an asymptotic multivariate distribution, any linear combination of the parameters will also be normal, i.e.

has a normal distribution, centered on , with variance where is the variance of . All those quantities can be easily computed. First, we can get the variance of the estimators

> i1=sum(predict(reg,type="response")) > i2=sum(cars$speed*predict(reg,type="response")) > i3=sum(cars$speed^2*predict(reg,type="response")) > I=matrix(c(i1,i2,i2,i3),2,2) > V=solve(I)

Hence, if we compare with the output of the regression,

> summary(reg)$cov.unscaled (Intercept) speed (Intercept) 0.0066870446 -3.474479e-04 speed -0.0003474479 1.940302e-05 > V [,1] [,2] [1,] 0.0066871228 -3.474515e-04 [2,] -0.0003474515 1.940318e-05

Based on those values, it is easy to derive the standard deviation for the linear combination,

> x=30 > P2=predict(r,type="link",se.fit=TRUE, + newdata=data.frame(speed=x)) > P2 $fit 1 5.046034 $se.fit [1] 0.05747075 $residual.scale [1] 1 > sqrt(V[1,1]+2*x*V[2,1]+x^2*V[2,2]) [1] 0.05747084 > sqrt(t(c(1,x))%*%V%*%c(1,x)) [,1] [1,] 0.05747084

And once we have the standard deviation, and normality (at least asymptotically), confidence intervals are derived, and then, taking the exponential of the bounds, we get confidence interval

Based on that technique, confidence intervals are no longer centered on the prediction. But who cares ?

**delta method**

Actually, those who like to use “more or less” expressions for confidence intervals will not like non centered intervals. So, an alternative is to use the delta method. Instead of writing (again) something on the theory, we can use a package which computes that method,

> estmean=t(c(1,x))%*%coef(reg) > var=t(c(1,x))%*%summary(reg)$cov.unscaled%*%c(1,x) > library(msm) > deltamethod (~ exp(x1), estmean, var) [1] 8.931232 > P1=predict(r,type="response",se.fit=TRUE, + newdata=data.frame(speed=30)) > P1 $fit 1 155.4048 $se.fit 1 8.931232 $residual.scale [1] 1

The delta method gives us (asymptotic) normality, so once we have a standard deviation, we get the confidence interval.

Note that those quantities – obtained with two different approaches – are rather close here

> exp(P2$fit-1.96*P2$se.fit) 1 138.8495 > P1$fit-1.96*P1$se.fit 1 137.8996 > exp(P2$fit+1.96*P2$se.fit) 1 173.9341 > P1$fit+1.96*P1$se.fit 1 172.9101

**bootstrap techniques**

And a third method (but far from what I expect to teach on that course) is to use bootstrap techniques to about those results based on asymptotic normality (we have only 50 observations). The idea is to sample from out dataset, and to run a log-Poisson regression on those new samples, and to repeat a lot of time,

Do you miss dividing n in I (fisher information matrix)? Is var(\beta) = 1/(n*I^-1)?

Thanks a lot, crystal clear.

How does the variance formula change when we have more than one explanatory variable? Is there any closed form for various number of variables?

Wonderfull! But there are a lot of missing pictures.

Everything is very open with a very clear description of the issues.

It was definitely informative. Your site is very useful.

Thanks for sharing!

Got the idea that GLM has the ability to predict confidence bounds. Thank you for explanation. Very helpful!