R
and RStudio
Throughout this course we will be programming using R
. R
is a free open-source statistical programming language created by Ross Ihaka (of Ngati Kahungunu, Rangitane and Ngati Pakeha descent) and Robert Gentleman, here at UoA in the Department of Statistics! It is widely used across many scientific fields for data analysis, visualization, and statistical modeling. Proficiency in R
will enable you to wrangle and explore datasets, conduct statistical analyses, and create visualizations to communicate your findings. These are all essential tools required in any scientific discipline.
TASK Go to the Statistics Department display in the science building foyer (building 302) to see a CD-Rom with the first ever version of R
.
RStudio
is an integrated development environment (IDE) for the R
programming language. It serves as a user-friendly workspace, offering additional tools for coding, data visualization, and project management. RStudio
simplifies the process of learning R
by providing a structured interface with many built-in tools to help organize workflow, fostering a systematic approach to data analysis and research.
TASK Research the meaning of open-source software and briefly outline the pros and cons of this in the context of statistical analysis.
Recap: R
terminology
Term | Description |
---|---|
Script |
A file containing a series of R commands and code that can be executed sequentially. |
Source |
To execute the entire content of an R script, often done using the “Source” button in RStudio . |
Running Code |
The process of executing R commands or scripts to perform specific tasks and obtain results. |
Console |
The interactive interface in RStudio where R commands can be entered and executed line by line. |
Commenting |
Adding comments to the code using the # symbol to provide explanations or annotations. Comments are ignored during code execution. |
Assignment Operator |
The symbol <- or = used to assign values to variables in R . |
Variable |
A named storage location for data in R , which can hold different types of values. |
Data Type |
The classification of data into different types, such as numeric, character, logical, etc. |
Object |
A data structure that holds a specific type of data. Objects are used to store and manipulate data, and they can take various forms depending on the type of information being represented. |
Logical Operator |
Symbols like == , != , < , > , <= , and >= used for logical comparisons in conditional statements. |
Function |
A block of reusable code that performs a specific task when called (e.g., mean(c(3, 4)) . |
Argument |
The input values that are passed to the function when it is called. |
Error Handling |
The process of anticipating, detecting, and handling errors that may occur during code execution. |
Debugging |
The process of identifying and fixing errors or bugs in the code. |
Workspace |
The current working environment in R , which includes loaded data, variables, and functions. |
Which R
syntax?
This is not a comprehensive list of R
syntax, but a call to develop your own coding style and use whatever you are most comfortable with. Throughout this course I will likely use a mix of syntax and functions to carry out the same operations. This is (somewhat) by design, and is intended to expose you to a strength1 of R
, which is that there are multiple ways of doing the same thing. As long as it works and is reproducible then whatever works for you, although of course there are some recommended practices, see Good coding practice .
Example <-
vs =
Both <-
and =
are assignment operators. It is mainly a personal choice which you use2. However, there is a difference depending on which environment you evaluate the assigning. In the top level there is no difference; however, =
does not act as an assignment operator in a function call, whereas <-
does.
TASK Both lines of code below give you a two row matrix, can you work out what the difference is between them (HINT look at what is created in your environment after each line)?
Example %>%
vs |>
Recall the tidyverse
(specifically the magrittr
) pipe operator %>%
, which allows us to combine multiple operations in R
into a single sequential chain of actions. There is also a base pipe |>
which acts similarly, but not exactly. Essentially, |>
passes the left-hand side as the first argument in the right-hand side call. This is subtly different from %>%
as it cannot pass the left-had side onto multiple arguments. The %>%
operator, however, does allow for this and you can also change the placement of the left-hand object with a .
placeholder3.
some might say↩︎
Ross Ihaka has been know to express his preference for
=
simply due to requiring less typing↩︎although since v.4.2.0 the base pipe does now allow for a names placeholder
_
↩︎