Introduction to generalised linear models (GLMs)

Recall, the simple linear regression model from Module 3:

\[Y_i = \alpha + \beta_1x_i + \epsilon_i\] where

\[\epsilon_i \sim \text{Normal}(0,\sigma^2).\]

Here for observation \(i\)

\(Y_i\) is the value of the response
- \(x_i\) is the value of the explanatory variable
- \(\epsilon_i\) is the error term: the difference between \(Y_i\) and its expected value
- \(\alpha\) is the intercept term (a parameter to be estimated), and
- \(\beta_1\) is the slope: coefficient of the explanatory variable (a parameter to be estimated)

We also saw a different specification of this model in module 3:

There is an alternative, equivalent way of specifying the linear regression model which attributes the randomness directly to the response variable rather than the error \(\epsilon_i\):

\[Y_i \sim \text{Normal}(\alpha + \beta_1 x_i, \sigma^2).\]

That is, we assume the \(i^{th}\) observation’s response, \(Y_i\), comes from a normal distribution with mean \(\mu_i = \alpha + \beta_1 x_i\) and variance \(\sigma^2\).

In this case we assume that

the \(i^{th}\) observation’s response, \(Y_i\), comes from a normal distribution,
- the mean of \(Y_i\) is a linear combination of the explanatory terms,
- the variance of \(Y_i\), \(\sigma^2\), is the same for all observations, and
- that each observation’s response is independent of all others.

But, what if we want to be a little more flexible and move away from some of these assumptions? What if we want to rid ourselves from a model with normal errors? The answer, Generalised Linear Models (GLMs).