Accuracy and Honesty

Honesty is an expectation: we expect honesty from you and you expect the same from your teaching team. Honesty is an expectation in any scientific discipline as is accuracy. These are morals, ethical principles we should abide by. But this course isn’t here to discuss philosophy or character development. This course, in particular this section of the course, aims to expose you to the tools and principles that will aid you in your own pursuit of ethical data practice. Teaching you the tools so that your analysis is reproducible goes someway towards ensuring accuracy in your research. This because, reproducibility promotes transparency, facilitates error detection and correction, and contributes to the overall reliability and accuracy of your research findings.

Reproducible research

“Reproducibility, also known as replicability and repeatability, is a major principle underpinning the scientific method. For the findings of a study to be reproducible means that results obtained by an experiment or an observational study or in a statistical analysis of a data set should be achieved again with a high degree of reliability when the study is replicated. … With a narrower scope, reproducibility has been introduced in computational sciences: Any results should be documented by making all data and code available in such a way that the computations can be executed again with identical results.”
— Reprodicibility, Wikipedia

Reproducibility is a stepping stone towards ensuring accuracy. This is because, reproducibility promotes transparency, facilitates error detection and correction, and contributes to the overall reliability and accuracy of your research findings. Establishing good practice when dealing with data and code right from the beginning is essential. Good practice 1) ensures that data is collected, processed, and stored accurately and consistently, which helps maintain the quality and integrity of the data throughout its lifecycle; and 2) creates a robust code base, which can be easily understood and adapted as the project progresses, which leads to faster development.

Good coding practice

You should always start with a clean workspace. Why? So your ex (code) can’t come and mess up your life!

To ensure that RStudio does not load up your previous workspace go to Tools > Global Options and uncheck the highlighted options below.

The reasoning may not be immediately obvious; however, it is something you will later regret if you don’t start as you mean to go on! Loading up a previous workspace may seem convenient as your previous objects and code are immediately on-hand. However, this is the exact reason that it is not good practice, loading up a previous workspace is NOT reproducible, does NOT create a fresh R process, makes your script vulnerable, and it will come back to bite you.

TASK Below are two quotes from Jenny Bryan, an R wizard which reference two snippets of R code. Find out what each snippet does and why Jenny is so against them.

If the first line of your R script is setwd("C:\Users\jenny\path\that\only\I\have") I will come into your office and SET YOUR COMPUTER ON FIRE 🔥.
— Jenny Bryan, Tidyverse blog, workflow vs script
If the first line of your R script is rm(list = ls()) I will come into your office and SET YOUR COMPUTER ON FIRE 🔥.
— Jenny Bryan, Tidyverse blog, workflow vs script

A project-oriented workflow in R refers to a structured approach to organizing and managing your code, data, and analyses. This helps improve reproducibility and the overall efficiency of your work. Within this it is essential essential to write code that is easy to understand, maintain, and share. To do so, coding best practice is to follow the 5 Cs by being

  1. Clear
    • Code Clarity: Write code that is easy to read and understand. Use meaningful variable and function names that convey the purpose of the code. Avoid overly complex or ambiguous expressions.
    • Comments: Include comments to explain the purpose of your code, especially for complex or non-intuitive sections. Comments should add value without stating the obvious.
  2. Concise:
    • Avoid Redundancy: Write code in a way that avoids unnecessary repetition. Reuse functions and use loops or vectorized operations when appropriate to reduce the length of your code.
    • Simplify Expressions: Simplify complex expressions and equations to improve readability. Break down complex tasks into smaller, manageable steps.
  3. Consistent:
    • Coding Style: Adhere to a consistent coding style throughout your project. Consistency in indentation, spacing, and naming conventions makes the code visually coherent.
    • Function Naming: Keep naming conventions consistent. If you use camelCase for variable names, continue to use it consistently across your codebase.
  4. Correct:
    • Error Handling: Implement proper error handling to ensure that your code gracefully handles unexpected situations. Check for potential issues, and provide informative error messages.
    • Testing: Test your code to ensure it produces the correct output. Use tools like unit tests (e.g., with testthat) to verify that your functions work as intended.
  5. Conformant:
    • Follow Best Practices: Adhere to best practices and coding standards in the R community. For example, follow the tidyverse style guide or the Google R Style Guide.
    • Package Guidelines: If you’re creating an R package, conform to package development guidelines. Use the usethis package to help set up your package structure in a conformant way.

TASK Read Good enough practices in scientific computing and briefly summarise why good coding practices are key to any scientific discipline.

There are many other good practice tips when it comes to coding these include ensuring your code is modular, implementing unit testing, automating workflows and implementing version control. In this course you will be using git to manage your project code and data. Use of git, or similar, will very likely be an expectation of your future career, see Version control with git and GitHub for an introduction to these tools.

File naming policies

Item What Why
Be nice to your computer: No white spaces as some systems can be confused by them Ensures compatibility across different systems and prevents errors
No special characters (e.g., *, ^, +, …) Prevents interpretation issues and potential conflicts with system functions
Don’t assume case is meaningful Avoids confusion on systems that do not differentiate by case
Consistency is KEY Avoids confusion in general
Be nice to yourself and your collaborators Concise and descriptive names Just makes life easier :)
Make sorting and searching easy Dates should follow YYYY-MM-DD format Standardizes date representation makes sorting and interpretation easy
Use numbers as a prefix to order files Enables sequential sorting of files regardless of system
Left pad with 0 so numbers have the same length Ensures proper numerical order when sorting files
Use keywords You can search these!

TASK Work through Danielle Navarro’s presentation on Project Structure and see how many pitfalls you have fallen into to-date.