Introduction to generalised linear models (GLMs)
Recall, the simple linear regression model from Module 3:
\[Y_i = \alpha + \beta_1x_i + \epsilon_i\] where
\[\epsilon_i \sim \text{Normal}(0,\sigma^2).\]
Here for observation \(i\)
- \(Y_i\) is the value of the response
- \(x_i\) is the value of the explanatory variable
- \(\epsilon_i\) is the error term: the difference between \(Y_i\) and its expected value
- \(\alpha\) is the intercept term (a parameter to be estimated), and
- \(\beta_1\) is the slope: coefficient of the explanatory variable (a parameter to be estimated)
We also saw a different specification of this model in module 3:
There is an alternative, equivalent way of specifying the linear regression model which attributes the randomness directly to the response variable rather than the error \(\epsilon_i\):
\[Y_i \sim \text{Normal}(\alpha + \beta_1 x_i, \sigma^2).\]
That is, we assume the \(i^{th}\) observation’s response, \(Y_i\), comes from a normal distribution with mean \(\mu_i = \alpha + \beta_1 x_i\) and variance \(\sigma^2\).
In this case we assume that
- the \(i^{th}\) observation’s response, \(Y_i\), comes from a normal distribution,
- the mean of \(Y_i\) is a linear combination of the explanatory terms,
- the variance of \(Y_i\), \(\sigma^2\), is the same for all observations, and
- that each observation’s response is independent of all others.
But, what if we want to be a little more flexible and move away from some of these assumptions? What if we want to rid ourselves from a model with normal errors? The answer, Generalised Linear Models (GLMs).