• BIOSCI 738: Advanced Biological Data Analysis
  • Nau mai, haere mai. Welcome to BIOSCI 738
  • Key Information
    • Overview
    • Course policies
    • FAQs
  • Module 1
    • R and RStudio
      • Recap: R terminology
      • Which R syntax?
    • Error handling and debugging
    • Accuracy and Honesty
      • Reproducible research
      • Good coding practice
    • Version control with git and GitHub
      • Setting up
      • Cloning a repository from GitHub using RStudio
      • Commiting and pushing changes
      • System specific hurdles
    • Respectful Handling of Data
      • Data sovereignty
      • Māori Data Sovereignty principles
    • Awareness of Consequences
      • Case study
    • Data Visualization
      • Exploratory plots (for your own purposes)
      • Explanatory plots
  • Module 2
    • Permutation and randomisation tests
      • A permutation test: Jackal mandible lengths
    • Bootstrap resampling
      • Example: constructing bootstrap confidence intervals
    • Parametric hypothesis testing
      • The vocabulary of hypothesis testing
      • Example
      • Differences between two means
    • Linear regression
      • Model formula syntax in R
      • Some mathematical notation
      • A Null model
      • Single continuous variable
      • A single factor variable
      • One factor and a continuous variable
      • Interactions
    • Model diagnostics for a linear model
      • Model selection
    • Inference for a linear model
      • Point prediction
      • Confidence intervals for parameters
      • Marginal predictions
  • Module 3
    • Introduction to the design and analysis of experiments
      • Key phrases
      • Setting up an experiment
    • The three (main) principles of experimental design
    • Some basic experimental designs
      • Completely randomised design (CRD)
      • Randomised complete block design (RCBD)
      • Factorial design
    • Modelling experimental data
      • A completely randomised design (CRD) as a linear model
      • Analysis of a CRD in R
      • A Factorial experiment (as a CRD)
      • Unqual replications (unbalanced design)
      • Sums of squares (\(SS\))
    • Multiple comparisons
      • Adjustments for multiple testing
      • Classification of multiple hypothesis tests
      • Multiple comparison procedures
    • Linear mixed-effect models (LMMs)
      • A Randomised Controlled Block Design (RCBD)
      • A Split-plot design
      • A repeated measures design
  • Module 4
    • Introduction to generalised linear models (GLMs)
    • Poisson regression
      • An example: bird abundance
    • Logistic regression
      • An example: lobsters
    • Model diagnostics
      • Residuals
      • The quantile residual Q-Q plot
      • Deviance (using the chi-squared approach)
    • A summary of GLMs
      • Other distributions
    • Generalised linear mixed-effects models (GLMMMs)
      • Fitting a GLMM
  • Module 5
    • Clustering
      • Divisive (partitioning) methods.
      • K-means: an example using the palmerpenguins data
      • Hierarchical agglomerative clustering.
      • Hierarchical clustering: an example
    • Principal Component Analysis (PCA)
      • Examples in R
    • Multidimensional Scaling (MDS)
      • Metric Scaling
      • Correspondence Analysis (CA)
      • Non-metric Multidimensional Scaling
    • Linear Discriminant Analysis (LDA)
      • Example in R
      • Prediction
  • Module 6
    • Least Squares Estimation
      • Some basic matrix algebra
      • Linear least squares
      • Matrix representation of a CRD
      • A numeric example
    • Maximum likelihood estimation
      • Differeniation rules
      • Logarithm rules
      • Maximum likelihood estimation for a Binomial distribution
      • Maximum likelihood estimation for a CRD
      • Maximising the log-likelihood
      • Maximum likelihood estimation for a Poisson distribution
      • Maximum likelihood estimation for a continuous random variable
    • Introduction to Bayesian statistics
      • Conditional probability
      • Bayes’ rule
  • CANVAS

Advanced Biological Data Analysis

Module 5

Key topics

  • Clustering
  • Principal Component Analysis
  • Multidimensional Scaling
  • Linear Discriminant Analysis

R packages and datasets

It is assumed in all following examples of this module that the following code has been executed successfully.

packages used in this module

library(tidyverse) ## general functionality
library(palmerpenguins) ## penguins data
library(GGally) ## nice looking pairs plots
library(factoextra) ## multivariate analysis
library(vegan) ## multivariate analysis
library(ape) ## dendograms
library(MASS) ## linear discriminant analysis
library(ggfortify) ## plotting
library(pheatmap) ## plotting
library(ade4) ## plotting

datasets used in this module

base_url <- "https://raw.githubusercontent.com/STATS-UOA/databunker/master/data/"
  1. palmerpenguins::penguins
data(penguins, package = "palmerpenguins")
  1. ants
ants <- read_csv(paste(base_url, "pitfalls.csv", sep = ""))
  1. north_island
north_island <- read_csv(paste(base_url, "north_islands_distances.csv", sep = "")) %>% 
  column_to_rownames(var = "...1")
  1. ekman11
ekman <- read_csv(paste(base_url, "ekman.csv", sep = ""))
  1. eurodist
data("eurodist", package = "datasets")
  1. HairEyeColor12
data("HairEyeColor", package = "datasets")
  1. diabetes
diabetes <- read_csv(paste(base_url, "diabetes.csv", sep = ""))

  1. Source: Dimensions of color vision↩

  2. Source: Graphical display of two-way contingency tables and Graphical methods for categorical data↩