I want to predict a fitted poisson glm on newdata, given that it was fitted using offset=log(Exposure), but I get confused with the inclusion of the term "offset" inside of predict.glm(). I do not understand if it is necessary to include it and, if I want to include it, how should I do that with an equal impact as the offset used in fitting?
To clear my doubts, I created a dataset from scratch and I decided to compare three possible approaches:
- predict(glm.model, data=newdata, type="response", offset=log(Exposure));
- predict(glm.model, data=newdata, type="response", offset=Exposure);
- predict(glm.model, data=newdata, type="response");
However, when comparing the results, I was surprised to find all three approaches resulted in the same predictions. Conclusion: offset term in predict.glm() is ignored? I still ask this, because I am a new to R and my procedure to check it may be wrong.
Also, even though similar questions may have been posted, answers are scarce and inconsistent with this result.
set.seed(1)
exp<-c(1:100)/100
dummy<-rep(c(0,1), each=50)
y<-floor(rpois(100,dummy*exp*10)) #count data
DTtest<-as.data.frame(exp)
DTtest$D<-dummy
DTtest$y<-y
gtest<-glm(y~D, offset=log(exp), family=poisson(link="log"), data=DTtest, trace=TRUE)
DTtest1<-predict(gtest, newdata=DTtest, type="response", offset=log(exp)) #predict1
DTtest2<-predict(gtest, newdata=DTtest, type="response", offset=exp) #predict2
DTtest3<-predict(gtest, newdata=DTtest, type="response") #predict3
sum(DTtest1==DTtest2)
sum(DTtest2==DTtest3)
sum(DTtest1==DTtest3)
As a result for the sums I got: 100, 100, 100, which is equal to the length of the data used to predict (I also used DTtest as newdata). This means all predictions have the same value between the 3 approaches.
I was expecting the values to differ due to different impact of the offset explicit in predict(), but the result says otherwise.