Module 2

Statistical inference is the process of deducing properties of an underlying distribution by analysis of data. The word inference means conclusions or decisions. Statistical inference is about drawing conclusions and making decisions based on observed data. Data, or observations, typically arise from some underlying process. It is the underlying process we are interested in, not the observations themselves. Sometimes we call the underlying process the population or mechanism of interest. The data are only a sample from this population or mechanism. We cannot possibly observe every outcome of the process, so we have to make do with the sample that we have observed. The data give us imperfect insight into the population of interest. The role of statistical inference is to use this imperfect data to draw conclusions about the population of interest, while simultaneously giving an honest reflection of the uncertainty in our conclusions. Hypothesis testing is a form of statistical inference that uses data from a sample to draw conclusions about a population parameter or a population probability distribution.

TASK Imagine that you have a fair coin and toss it 10 times. Write out all the possible outcomes alongside how likely you think each is.

R packages and datasets

It is assumed in all following examples of this module that the following code has been executed successfully.

packages used in this module

library(tidyverse) ## general functionality
library(MASS) ## geyser data
library(palmerpenguins) ## penguins data
library(gglm) ## nice lm() diagnostic plots
library(ggeffects) ## marginal predictions

datasets used in this module

base_url <- "https://raw.githubusercontent.com/STATS-UOA/databunker/master/data/"
  1. jackal
## Mandible lengths (mm) for  golden jackals (Canis aureus) of each sex from the British Museum
jackal <- data.frame(mandible_length_mm = c(120, 107, 110, 116, 114, 111, 113, 117, 114, 112,
                                            110, 111, 107, 108, 110, 105, 107, 106, 111, 111),
                     sex = rep(c("Male","Female"), each = 10))
  1. paua
paua <- read_csv(paste(base_url, "paua.csv", sep = ""))
  1. MASS::geyser
data(geyser, package = "MASS")
  1. rats
rats <- read_csv(paste(base_url, "crd_rats_data.csv", sep = ""))
rats$Surgery <- as_factor(rats$Surgery)
  1. palmerpenguins::penguins
data(penguins, package = "palmerpenguins")