A thousand++ ways to skin a cat (in R)

Author

Written by Alice Pidd (PhD Student), Scales Lab University of the Sunshine Coast, Queensland

Published

July 2, 2025

Ahoy!

Whether you’re just staRting out, or are a veteRan, there is always something new to learn in R.

This is a doc created for the purpose of sharing our top 10 tips (Commandments) for developing good coding habits in R.

The 10 Commandments cover everything from project organisation to parallel processing. At the bottom, you’ll find handy shortcuts and tools for everyday coding, although there are loads more out there (Google is your friend!).

This is a living document — we will continually update it as we discover better ways of doing things. The goal is for it to be as useful as possible, to as many people as possible. So if you discover something wicked, find an error, or have updates to make (e.g., new packages, deprecated packages, faster work-arounds), please send them through to alice.pidd@research.usc.edu.au.

Happy coding! 🚀

The 10 R Commandments

1. Use R Projects

Gone are the days of setwd(), struggling to find where you put your dang data, or all those scripts that you wrote. Using R projects (.Rproj) is like being a squirrel who actually remembers where they buried all their nuts.

Within the location of your .Rproj, you keep your data, scripts, outputs, and other relevant files needed for your analyses (e.g., shapefiles). This way you can easily call on things without writing entire file pathways. You can have as many .Rproj’s as you like (the limit does not exist), but recommend using one per chapter for sanity’s sake.

Creating an R project:

Open RStudio
File > New Project…
New Directory
New Project
Name it, and choose where you want to put the .Rproj (Recommend in a folder specifically for the chapter)

Double-click that .Rproj file to open RStudio, and start writing scripts! You should see all contents of the folder in the Files pane (data, scripts, other folders), including your .Rproj.

2. Write a README, and update it as you go

README’s are kind of like a recipe x “dear diary” of your workflow. They note the authors, when the code base was written, for what purpose, and what someone needs in order to replicate your work. README’s come in .Rmd form (see Handy tidbits: Quarto) and are highly customisable.

They are also very useful for keeping track of your methods decision-making. If you’ve forgotten what you did, a regularly-updated README can be used to write your entire methods section when you’re up to the publishing phase. How bloody good.

What is helpful to include in a README:

Title of the project/paper
Author(s)
Initial date and last update
Summary of what the whole code base does (like an Abstract)
Stepwise script workflow
Dependencies - R versions, machine specs
Contact info for questions/suggestions

How to create a README in RStudio:

File > New File > R Markdown…
Name it
Save the new script it opens in your RStudio. This will save as a .Rmd file
Write away! (See links below)
Toggle between Source (raw markdown code) and Visual (roughly what it will look like) mode at the top left of the script window.

Helpful links:

A great example of a README — here
Syntax and how to write in markdown — here
A whole book on markdown — here

3. Helpers.R - the holy grail

If you’re using all the same packages throughout your scripts, or you routinely use a bunch of functions you’ve written, the easiest way to make sure you have everything loaded is to source a special kind of “base” script at the start of each step in the pipeline.

We call it a Helpers.R script, and it’s essentially like a Swiss Army knife for your code. In it, you can put all the relevant libraries, commonly-used functions, pre-programmed objects, palettes, shapefiles, naming conventions, and any base files needed throughout your pipeline. Once you have a Helpers.R script, save it to wherever your .Rproj is, and it can be sourced at the start of your scripts using:

source("Helpers.R)

Warning!

You can put whatever you like in your Helpers.R, but recommend being mindful of cramming your environment full of large objects that you don’t need each time.

Consider creating different kinds of Helpers which you source only in the relevant scripts. Alternatively, just load any large objects individually, as needed.

4. Number your scripts

Speaking of scripts, it is likely you’ll share yours at some point — with supervisors, collaborators, reviewers, etc. Numbering your scripts helps to keep your pipeline clear, concise, and makes it easier for you and everyone else to follow.

It is best to start with a number, and use the rest of the name to explain what it does — e.g., "03_Crop_rasters.R" would contain code to crop raster files. This also means they will appear in sequential order in your folder (tickles the brain doesn’t it).

With numbered scripts, you have the added benefit of being able to revisit your work years down the track, and knowing almost exactly what you were doing at each step. This + your README will make you unstoppable.

5. Save outputs as .RDS files

Data takes up space. Not much more to add to that really. But one way to get around this battle for real estate is to save your outputs as .RDS files.

.RDS files are a compressed/squashed storage format, kind of like using a shrink ray. Objects you save end up a fraction of their original size, but when you read them back in, they’re exactly what you saved — netCDFs saved as .RDS’s will read back in as netCDFs, and so on. They load and write faster, and preserve all the original formatting and data types.

Intuitive syntax!

The code for saving and reading .RDS files is easy to follow:

saveRDS(object_you_want_to_save, 
        "path/what_you_want_to_call_the_file.RDS")
        
readRDS("path/the_file_you_want_to_read.RDS")

Examples:

Saving a file as .RDS after cropping (e.g., in your "03_Crop_rasters.R" script).

input_fol <- "path/to/input/folder" # Where your input netCDF files are
output_fol <- "path/to/output/folder" # Where you want to save the cropped files

e <- ext(xmin, xmax, ymin, ymax) # Extent for your study region

in_dat <- rast(paste0(input_fol, "/your_file.nc")) # Read in the netCDF
out_dat <- st_crop(in_dat, e) # Crop the original file, on the basis of e

out_nm <- paste0(output_fol, "/cropped_file.RDS") # Output folder + the new file
saveRDS(out_dat, out_nm) # Save the output file with the name we made

Reading in that .RDS file for use in the next script "04_next_thing.R".

input_fol <- "path/to/cropped/files" # Your cropped .RDS files
output_fol <- "path/to/next/output" # Your next output folder

in_dat <- readRDS(paste0(input_fol, "/cropped_file.RDS"))
# in_dat will be a netCDF

Rinse and repeat.

Efficiency!

You can readRDS() and saveRDS() within functions too (see “9. Work in parallel processing”)! This means you could set your function loose on hundreds of files, and not a single thing would enter your R environment. Very cool very nice!

6. Short scripts, with one purpose, that can stand alone

Short and punchy!

For your own sanity, now and in the future, keep your scripts short (<100 lines long) and make them do only one thing. For example, a script called "03_Crop_rasters.R" would be the 3rd script in the pipeline, and all it would do is load your raster files, crop them to a new extent, and save them for later processing. Simples.

This helps you to keep track of your progress through the pipeline, and breaks the code into more mentally-manageable chunks. AND, if you’re trying to debug issues, it can also help you to isolate where the f*ckery began.

Stand alone

Each of your scripts should also be runnable without having to run all the previous scripts. To run your "03_Crop_rasters.R" script from above, you shouldn’t need to run scripts "01_" and "02_" every time.

How to guarantee your scripts will be stand alone

Start every script the same way:

source("Helpers.R") # Load your packages and functions
input_dat <- readRDS("previous_output.RDS") # Read the data you need

End every script the same way:

saveRDS(output, "output_for_next_script.RDS") # Save for the next script

Test the standalone-ness:

Close RStudio completely — Session > Quit Session
Reopen your .RProj
Open ONLY the script you want to test (don’t run anything else)
Run it
If it works without errors, it’s standalone! 🎉
If you get errors, check line by line for these things:
- You readRDS()’d all relevant data needed for this script. It could still depend on objects created in previous scripts
- Your whole Helpers.R script runs without a hitch
- All file paths are correct/exist

Extra credit:

If you’ve done all of this, you could theoretically create one diabolical "0_Do_it.R" script.

This is where you would source each standalone script in the pipeline in consecutive order, and run it all from start to finish. If ya nasty.

7. Annotate

Not even you can read your own mind. Writing down what each line of code does will help you remember what the hell past you was thinking. Write a short comment, or a short story, it really doesn’t matter. Leave yourself as many breadcrumbs as you need to make it back out of the forest.

To annotate, just write a # to the right of any code. This makes the subsequent text in that line non-executable i.e., just a text string. The code you want to run will still run, and will print in the console along with your annotation.

For example:

summary(cars) # Print a summary of the data

     speed           dist       
 Min.   : 4.0   Min.   :  2.00  
 1st Qu.:12.0   1st Qu.: 26.00  
 Median :15.0   Median : 36.00  
 Mean   :15.4   Mean   : 42.98  
 3rd Qu.:19.0   3rd Qu.: 56.00  
 Max.   :25.0   Max.   :120.00

This is also very helpful for when we inevitably need help debugging errors — at least people will see what we thought we were trying to do.

8. Functions are your friend

If you ever need to run the same code on 3+ things, it’s time to write a function that can do it for you. It makes your code way cleaner!

How to write a function:

# E.g., Count how many different species we found at each site
count_species <- function(species_list) {
  unique_species <- length(unique(species_list))
  return(unique_species)
}

# Species found at three different sites
site1 <- c("kangaroo", "koala", "wombat", "kangaroo", "wombat", "platypus", "echidna")
site2 <- c("possum", "kangaroo", "bandicoot", "quoll")

count_species(site1)  # Returns 5 unique species

[1] 5

count_species(site2)  # Returns 4 unique species

[1] 4

Tip:

Start by writing your code normally for one case, get it working, then wrap it in a function. If you can make it work for one thing, it will likely work on several! Woo!

9. Work in parallel processing

Cut down on processing time, call on more of the elves in your computer to do your bidding.

Each computer has a number of elves (cores) available for doing things, some more than others. R usually uses one core at a time, called sequential processing. When we work in parallel, it essentially spreads the load over more cores so that the computation happens faster for repetitive tasks, like applying the same function to lots of files in a directory.

The packages we use here are furrr, and future.

furrr vs purrr – what’s the difference?

You may have heard of the purrr package as well — this is what furrr is built off.

Essentially:

purrr is used in sequential processing, in place of loops
furrr is just purrr functions with extra sparkle, coupled with future for use in parallel processing (what we are doing here)

More on purrr and furrr functions in the Handy tidbits: purrr and furrr.

How many cores does my computer have?
There are two ways you can check in R, both will show you numbers, but sometimes not the same numbers:

parallel::detectCores() — just counts the raw number of CPU cores your computer has
parallelly::availableCores() — respects system limits or server restrictions to tell you what is actually usable

So parallel::detectCores() might say “you have 8 cores” while parallelly::availableCores() says “you can actually only use 4 cores”

Why do I sometimes see :: in code?

e.g., parallelly::availableCores()

The double colon :: is used to specify which package a function comes from. It’s like giving R the full address instead of just the street name.

Lots of packages have functions with the same name — e.g., both dplyr and stats have a filter() function. Specifying the package means there will be no conflicts (errors).

When to use it:

When you only need one specific function from a package
In functions/packages you’re writing for others
If you’re ever in doubt

All you need to do is:
Add a line before your code, which initialises the subsequent code to be run in parallel using multisession, with workers = where you put the number of cores you want to use. Here, we are using all the available cores minus 2 (so the computer keeps running).

  future::plan(multisession, workers = availableCores() - 2)

Your code that does the thing will go here

Add a line after to switch it back into the normal mode, sequential.

  future::plan(sequential)

Example of parallel processing with a function:

# Set your folder paths
  source_fol <- "path/to/source/folder" 
  output_fol <- "path/to/output/folder"
  
# Get a list of files to push through the function
  files <- dir(source_fol, full.names = TRUE) 

# Write your function
  process_files <- function(f) {
    dat <- readRDS(f) # Read in a single file
    
    processed_dat <- dat %>%
      dplyr::filter(some_condition) %>% # Filter for something
      mutate(new_column = some_calculation) # Do something else
    
    # Make a new filename for the processed file, and change where it goes
    out_path <- paste0(output_fol, "/processed_", basename(f)) 
      saveRDS(processed_dat, out_path) # Save it!
  }

# Setup the parallel processing
  future::plan(multisession, workers = parallelly::availableCores() - 2)
  
  tic() # Time how long it takes
  future_walk(files, process_files) # Do the function in parallel for each file
  toc() # Stop the clock

# Set back to sequential processing (the standard mode in R) 
  future::plan(sequential)

Note: The function can be written anywhere in your script — you only need the future_walk() call between multisession and sequential.

10. Use GitHub

From code sharing, to version control, to peeping other people’s public code, to just having a backup of your whole workflow somewhere that you can’t lose it(!!!), GitHub is a great thing to get on.

Jessie has put together this wicked how-to on gitting started. However, since the time of writing, Jessie and others now recommend using GitHubquarto publish quarto-pub Desktop - an app on your computer that gets around issues associated with Git version control inside of RStudio. Way more user friendly, and easier to set up.

GitHub Desktop installation and tutorial coming soon!

Handy tidbits

Shortcuts!

Action	Mac Shortcut	Windows Shortcut
`<-`	Option + -	Alt + -
`%>%` (`tidyverse` pipe)	Cmd + Shift + M	Ctrl + Shift + M
Indent a line	Tab	Tab
Un-indent a line	Shift + Tab	Shift + Tab
Comment out a line of code	Cmd + Shift + C	Ctrl + Shift + C
Insert a section with separator	Cmd + Shift + R	Ctrl + Shift + R
Wrap lines	Tools > Global Options… > Code > tick “Soft-wrap source lines”	Same as Mac
Duplicate line	Cmd + Shift + D	Ctrl + Shift + D
Move line up/down	Option + ↑/↓	Alt + ↑/↓
Move cursor to beginning of line	Cmd + Left	Home
Move cursor to end of line	Cmd + Right	End
Move cursor to Console	Cmd + 2	Ctrl + 2
Select a word	Option + Shift + Left/Right	Ctrl + Shift + Left/Right
Quit Session	Cmd + Q	Ctrl + Q

More shortcuts here

pacman

Load multiple libraries, in fewer lines of code, without needing to install.packages()!

This package checks whether packages have already been installed before loading them, and if they haven’t, it installs and loads them. This is super useful when you’re running someone else’s code, as they may use libraries you don’t already have.

Recommend putting this in your Helpers.R script, so that it is sourced all the time. Package info here

# The fast way with less code
  pacman::p_load(pacman, tidyverse, purrr, furrr, ncdf4, 
  raster, terra, sf, tmap, beepr, tictoc, pushoverr)

# One of the old, slow ways
  library(pacman)
  library(tidyverse)
  library(purrr)
  library(furrr)
  library(ncdf4)
  library(raster)
  library(terra)
  library(sf)
  library(tmap)
  library(beepr)
  # etc...

tictoc

Times how long it takes to run labour-intensive code. Just bookend your code with tic() and toc(). Package info here.

tic() # Starts the clock the moment the line is run
# Insert your long, labour-intesive function here
toc() # Stops the clock, and will print the run time (in seconds) to the console

beepr

Play a sound when the long, labour-intensive code is finished being run! There are 11 different sounds to choose from - just turn up your volume, walk away, make a marg, and wait for the beep. Package info here.

# Insert your code here
beep(2) # Number 2 is my favourite, but choose your own adventure!

pushoverr

Get notifications to your phone/computer each time a file is finished processing!

Really useful if you have hundreds of files with hours- or days-worth of processing ahead of you. You need to download an app, but you can customise the messages you get, how frequently, and the sound it makes. So so cool! Package info here and here.

In the thick of it, 18 hours and counting:

purrr and furrr

Loops, but better

If you’re in the market for new ways to loop over several things in a cleaner and less error-prone way, purrr functions are your pal. When you have a need for speed, upgrade to furrr + future for parallel processing!

Package info for purrr here

To reiterate, essentially:

purrr is used in sequential processing, in place of loops
furrr is just purrr functions with extra sparkle, coupled with future for use in parallel processing

What does the code look like?

# The old loop way (no shade, just not it)
  files <- c("data1.csv", "data2.csv", "data3.csv")
  all_data <- list()
  for(i in 1:length(files)) {
    all_data[[i]] <- read_csv(files[i])
  }
  combined_data <- bind_rows(all_data)

# The purrr way (2 lines...)
  files <- c("data1.csv", "data2.csv", "data3.csv") 
  combined_data <- map_dfr(files, read_csv)

# The furrr way (when you need speed)
  plan(multisession, workers = availableCores() - 2)
  combined_data <- future_map_dfr(files, read_csv)
  plan(sequential)

For the converted, heres a purrr cheatsheet…

Function equivalents:

`purrr`	`furrr`	What it does
`walk()`	`future_walk()`	Apply function to each item, returns nothing (e.g., saving files)
`map()`	`future_map()`	Apply function to each item, returns a list
`map_dfr()`	`future_map_dfr()`	Apply function to each item, combine results into one data frame (by rows)
`map_dfc()`	`future_map_dfc()`	Apply function to each item, combine results into one data frame (by columns)
`map_chr()`	`future_map_chr()`	Apply function to each item, return character vector
`map_dbl()`	`future_map_dbl()`	Apply function to each item, return numeric vector
`map2()`	`future_map2()`	Apply function using two inputs simultaneously
`pmap()`	`future_pmap()`	Apply function using multiple inputs simultaneously

R Snippets (by Dave Schoeman)

One of the best ways to save keystrokes is with text snippets, which are essentially keyboard expansions or Macros for RStudio. If you find yourself typing out a certain but of code fairly often, you can make this into a snippet (shortcut).

How to make a snippet:

Tools > Edit Code Snippets… > (You will see that several exist, already.)

Say you want to use pacman to add packages at the start of a script, you could save time by adding this one.

Just paste this at the bottom of the list:

snippet pman
    pacman::p_load()("${1:package1}", "${2:package2}, ")

The first line defines the code as a snippet, and provides the shortcut code (pman) you would type in your script to invoke it.

The second line is the command, as you would code it, but with some slots reserved for package names.

The 1 in the first one identifies that this will be highlighted first; the package1 is the text that will appear in that slot, ready for you to overwrite with something like tidyverse; the same for the second slot, etc. Feel free to add more slots.

Save

Once you’re done making your snippet, go to your console and start typing pman, you will see that it fills in the line with the snippet and inserts your cursor on the text package1 so that you can simply type tidyverse to replace it. Hit tab, and package2 will be highlighted, etc.

Now think about the number of lines of code you can automate!

Quarto

A lot of the helpful tutorials, course notes, and self-made info docs we come across are written using something called markdown.

Markdown provides a way of creating an easy to read plain text file which can incorporate text, images, code, sections, links to other documents — you name it. Quarto uses that, but makes it look schmick!

This doc was made on Quarto, here is a great tutorial on setting up Quarto, and here is a book on Quarto, all using Quarto.

Writing a reprex (reproducible example)

“Help me, help you”

A reprex is a minimal, complete, and reproducible example of your problem — basically, the smallest piece of code that shows exactly what’s going wrong.

A good reprex has:

The ability to stand alone (see commandment #6)
All required packages (see pacman)
Minimal data: Use built-in datasets (mtcars, iris) or create tiny fake data
Only the problematic code, not your entire script/pipeline
“I expected X but got Y”
Things you’ve already tried
Annotations up the wazoo (see commandment #7)
R version info, if it might be relevant

Why reprexes are the bee’s knees:

You usually solve your own problem while making one (seriously)
People can actually help you — if done correctly, anyone can run it
Forces you to isolate the problem instead of sharing 400 lines of garbage (see first point)
You’ll become a better coder!

Helpful links from Stack Overflow, the actual package reprex, and CRAN do’s and don’ts

Vibes