• BIOSCI 738: Advanced Biological Data Analysis
  • Nau mai, haere mai. Welcome to BIOSCI 738
  • Key Information
    • Overview
    • Course policies
    • FAQs
  • Module 1
    • R and RStudio
      • Recap: R terminology
      • Which R syntax?
    • Error handling and debugging
    • Accuracy and Honesty
      • Reproducible research
      • Good coding practice
    • Version control with git and GitHub
      • Setting up
      • Cloning a repository from GitHub using RStudio
      • Commiting and pushing changes
      • System specific hurdles
    • Respectful Handling of Data
      • Data sovereignty
      • Māori Data Sovereignty principles
    • Awareness of Consequences
      • Case study
    • Data Visualization
      • Exploratory plots (for your own purposes)
      • Explanatory plots
  • Module 2
    • Permutation and randomisation tests
      • A permutation test: Jackal mandible lengths
    • Bootstrap resampling
      • Example: constructing bootstrap confidence intervals
    • Parametric hypothesis testing
      • The vocabulary of hypothesis testing
      • Example
      • Differences between two means
    • Linear regression
      • Model formula syntax in R
      • Some mathematical notation
      • A Null model
      • Single continuous variable
      • A single factor variable
      • One factor and a continuous variable
      • Interactions
    • Model diagnostics for a linear model
      • Model selection
    • Inference for a linear model
      • Point prediction
      • Confidence intervals for parameters
      • Marginal predictions
  • Module 3
    • Introduction to the design and analysis of experiments
      • Key phrases
      • Setting up an experiment
    • The three (main) principles of experimental design
    • Some basic experimental designs
      • Completely randomised design (CRD)
      • Randomised complete block design (RCBD)
      • Factorial design
    • Modelling experimental data
      • A completely randomised design (CRD) as a linear model
      • Analysis of a CRD in R
      • A Factorial experiment (as a CRD)
      • Unqual replications (unbalanced design)
      • Sums of squares (\(SS\))
    • Multiple comparisons
      • Adjustments for multiple testing
      • Classification of multiple hypothesis tests
      • Multiple comparison procedures
    • Linear mixed-effect models (LMMs)
      • A Randomised Controlled Block Design (RCBD)
      • A Split-plot design
      • A repeated measures design
  • Module 4
    • Introduction to generalised linear models (GLMs)
    • Poisson regression
      • An example: bird abundance
    • Logistic regression
      • An example: lobsters
    • Model diagnostics
      • Residuals
      • The quantile residual Q-Q plot
      • Deviance (using the chi-squared approach)
    • A summary of GLMs
      • Other distributions
    • Generalised linear mixed-effects models (GLMMMs)
      • Fitting a GLMM
  • Module 5
    • Clustering
      • Divisive (partitioning) methods.
      • K-means: an example using the palmerpenguins data
      • Hierarchical agglomerative clustering.
      • Hierarchical clustering: an example
    • Principal Component Analysis (PCA)
      • Examples in R
    • Multidimensional Scaling (MDS)
      • Metric Scaling
      • Correspondence Analysis (CA)
      • Non-metric Multidimensional Scaling
    • Linear Discriminant Analysis (LDA)
      • Example in R
      • Prediction
  • Module 6
    • Least Squares Estimation
      • Some basic matrix algebra
      • Linear least squares
      • Matrix representation of a CRD
      • A numeric example
    • Maximum likelihood estimation
      • Differeniation rules
      • Logarithm rules
      • Maximum likelihood estimation for a Binomial distribution
      • Maximum likelihood estimation for a CRD
      • Maximising the log-likelihood
      • Maximum likelihood estimation for a Poisson distribution
      • Maximum likelihood estimation for a continuous random variable
    • Introduction to Bayesian statistics
      • Conditional probability
      • Bayes’ rule
  • CANVAS

Advanced Biological Data Analysis

Module 4

Key topics

  • Poisson regression
  • Logistic regression
  • Generalised linear models
  • Generalised linear mixed-effects models
“Ecology has increasingly become a data- and model-centric discipline…”
— Trends in ecology and conservation over eight decades

Figure 5: Data analysis, statistical methods, and genetics n-grams from 1930 to 2010

R packages and datasets

It is assumed in all following examples of this module that the following code has been executed successfully.

packages used in this module

library(tidyverse) ## general functionality
library(lme4) ## mixed effects models

datasets used in this module

base_url <- "https://raw.githubusercontent.com/STATS-UOA/databunker/master/data/"
  1. birds8
birds <- read_delim(paste(base_url, "bird_abundance.csv", sep = "")) %>%
  dplyr::select(c("OliveFarm","Management","Turdus_merula","Phylloscopus_collybita")) %>%
  pivot_longer(., c(-OliveFarm, -Management), names_to = "Species", values_to = "Count") 
  1. lobster9
lobster <- read_csv(paste(base_url, "lobster.csv", sep = ""))
  1. mice10
mice <- read_csv(paste(base_url, "autism.csv", sep = ""))

  1. Source: Partitioning beta diversity to untangle mechanisms underlying the assembly of bird communities in Mediterranean olive groves↩

  2. Source: Influence of predator identity on the strength of predator avoidance responses in lobsters.↩

  3. Source: Human Gut Microbiota from Autism Spectrum Disorder Promote Behavioral Symptoms in Mice↩