Skip to main content

5.8) Generalized Linear Models


Fitting generalized linear models is easy: there's a function glm() that works just like lm() but also has a family= argument. If you ignore this, the default is "Gaussian" (i.e. normal) and you get the same model as using lm() (though with slightly different output). But if, for example, you want to do a binary logistic regression, specify family="binomial". This sets two things which must be specified for a generalized linear model: the error distribution (binomial) and the link function (logit). The binomial version of glm() is actually a bit special, in that you can either give your response variable as a single column of 1s and 0s, or a 2-column matrix with rows for batches of trials and then, for each batch, the numbers of successes (1s) and failures (0s) in the two columns respectively.

If you have count data and want to fit a Poisson model, set family="poisson". This sets the error distribution to Poisson and the link function to log. To find out more about these options, type ?family (because family() is also a function).

If you need to fit a negative binomial model (typically for overdispersed count data), glm() doesn’t do this but there is a function called glm.nb() in the package called MASS. This package should already be installed with R, so type library(MASS) (or use the Packages menu) to get access to its functions. Then try ?glm.nb.