Linear Discriminant Analysis (LDA)

LDA is a supervised learning technique: The main goal is to predict some feature of interest using sing one or more variables (the predictors)

Example in R

The data

Some variables can predict group of a patient

Possible classification rules?

Carrying out LDA

Some similarity to regression

Components of diabetes_lda

  • diabetes_lda$prior gives the prior probabilities of belonging to each group. By default these reflect the proportions of membership in the data:

–> randomly chosen subject has probability 0.52 of coming from group 3

  • diabetes_lda$mean gives the means of each predictor in each group:

  • Proportion of Trace gives the percentage separation achieved by each discriminant function

  • diabetes_lda$scaling contains the linear discriminant functions (i.e., the linear combination of variables giving best separation between groups):

i.e.,

  • LD1: \(-0.00446 \times \text{insulin} - 0.00578 \times \text{glutest}\)

    • LD2: \(-0.01591 \times \text{insulin} + 0.00481 \times \text{glutest}\)

How well does LDA do on training data?

The missclassification rate is therefore

Prediction

  • $class: predicted group for each observation
  • $posterior: probability of falling into each group
  • $x: matrix with 2 columns one for each LD score

Output

  • Every possible point is classified to one of three groups
  • The divisions between groups are linear (Linear Discriminant Analysis)

The three ellipses represent the class centres and the covariance matrix of the LDA model. Note there is only one covariance matrix, which is the same for all three classes. This results in

  • the sizes and orientations of the ellipses being the same for the three classes (only their centres differ)
    • the ellipses represent contours of equal class membership probability.

A key assumption of LDA is that the correlations between variables are the same in each group (i.e., common covariance matrix).

Recall that, by default, the prior probabilities are the initial proportions. What if we set equal prior probabilities?

The confusion matrix/missclassification rate:

There are now 8 cases classified as Group 3 with prior weights classified as Group 2 with equal weights \(\rightarrow\) bias towards group with larger initial size.