104 Coding style
Written by Marija Pejcinovska and last updated on 5 February, 2022.
104.1 Introduction
By now you’ve probably worked through dozens of modules and are feeling a lot more comfortable coding in R. This is a good spot to spend a moment or two thinking about your coding style. A good style will help you keep your R scripts consistent and easy to read (and, of course, much easier to navigate through when you revisit them at a later time).
In this lesson we’ll focus on the coding style used throughout the tidyverse
. More specifically, we will highlight some naming conventions for objects and functions and discuss useful structures that will make your programs easier to write and read.
Prerequisite skills include:
- Some data manipulation
- The tidyverse
- Writing basic functions and conditional statements
104.2 Naming things: the tidy way
104.2.1 Naming files
In a previous lesson you learned how to set up folders and organize files in your R projects. Here we’ll talk about a few things you might want to keep in mind when naming your R files.
- Make sure your files have meaningful (though, if possible, also relatively short) names that end in
.R
. - Try sticking to a specific capitalization. The tidyverse style recommends using lower case letters.
- Avoid spaces and special characters. Consider separating words in a file name by using
_
or-
.
# A good example of file name
my_first_script.R
# And a few bad (and less than ideal) examples
# there should be no spaces
my first script.R # not a terrible file name but hard to read
myfirstscript.R # r should be capitalized
my_first_script.r # not descriptive enough
file1.R &results.R # contains special characters models
104.2.2 Naming objects
Just like with file names, you should name your R objects using descriptive and informative names. Here are some guidelines to keep in mind:
- Object and function names must start with a letter (and not a number)!
# Good
group_1# Also good
group_one
# Bad
1st_group
- Names should include only letters, numbers,
_
, or.
. Though, you should probably decide on a single separator; pick either_
or.
to be somewhat consistent and follow basic conventions. - Since classes and methods in the S3 object system use dots, to avoid confusion, it might be best to use
_
in your function names. In fact, the tidy style guide, in general, recommends using_
to separate words both in function names and object names. Separating lowercase words with_
is sometimes referred to as using snake_case. - To avoid errors, names should be kept as short as possible.
- When naming your functions consider using active verbs. For instance,
# Okay function names
permute()
count_event()
# Less okay
permuatation()
event_counter()
- Typos and cases matter; so, be careful when calling your objects and functions. Sometimes errors in your code could end up being just silly misspellings.
- Avoid re-using names of common R functions and variables. For instance, avoid naming objects
T
orF
since R reserves these forTRUE
andFALSE
.
104.3 Tidying up your “syntax”
104.3.1 Commas and spaces
Many of the syntax rules in the English language are applicable to R coding. Commas and spaces are a good example of this.
- Place spaces after commas, but not before them (just like you would when you write).
# Good
B[, 2]
# Not so good
B[,2]
B[ ,2]
B[ , 2]
- Avoid putting space between your function name and the arguments in parenthesis when calling your function
- When using conditional (
if
,for
,while
) statements separate the condition expression in()
by placing space around()
-
Infixed operators (i.e. those placed between operands), such as
==
,+
,-
,<-
, and so on, should always be surrounded by spaces
# Good
my_var <- y + 24 + (x * 0.5)
# Not great
my_var<-y+24+(x*0.5)
- There are, however, a few exception to the above rule. Operators with high precedence, such as those that access stuff in a namespace (
::
,:::
), those used for extracting slots or components ($
,@
), those used for indexing ([]
,[[]]
), the exponentiation operator (^
), or the sequence operator (:
) should not be surrounded by space.
# Good
x <- data$height
y <- 1:20
z <- x^2
# Bad
x <- data $ height
y <- 1 : 20
z <- x ^ 2
104.3.2 Curly braces and code hierarchies
- When writing functions and conditional statements which are not short or simple you would need to place your code in a block sectioned off by curly braces. Enclosing the “body” of a function or conditional statement inside curly braces allows one to more easily see the hierarchy of a piece of code. There are few things to keep in mind here:
-
{
should be the last character on a line. Once you open the left squiggly bracket start the actual body of the code in a new line. - The code content inside the curly braces should be indented by two spaces. This way you’ll be able to see the hierarchy of the code block more easily
- When you are done with specifying the code that defines your function or conditional statement place the closing curly brace on a new line, so that it’s the first character on the line. In fact, unless it is followed by
else {
, the closing curly brace should be on its own line.
-
# Good
if (x < 3) {
exp(x)
else {
} ^3
x
}
if (i < 10){
if (x > 0){
= log(x)
y else {
} = x
y
}else {
} message("i is too large")
}
# Not so good
if (x < 3) {
exp(x)}
else {
^3}
x
if (i < 10){
if (x > 0){
= log(x)
y else {
} = x
y
}else {
} message("i is too large")}
104.3.3 Function arguments and assignments
- When calling a function, it’s good to be familiar with the function’s arguments. Most have an argument that supplies the data which is used to perform some calculation. There might, of course, be other arguments that control additional aspects of that calculation. For instance, the
mean
function lists the following argumentsmean(x, trim = 0, na.rm = FALSE, ...)
, wherex
is the data argument. With most functions you might be able to get away with supplying the data without using the default value of the argument (in the case ofmean()
this would mean without explicitly typingx = ...your data here..
). However, a bit more care should be taken when supplying the other arguments. You should be aware of the order of those arguments and should avoid partial matching.
# Better
mean(x = c(NA,1:10), na.rm = TRUE)
# Also good
mean(c(NA,1:10), na.rm = TRUE) # notice here we omitted supplying the data as x =
# Don't do!
mean(c(NA,1:10), TRUE) # in fact, this will throw an error — do you see why?
- Avoid making any assignments inside function calls. Where necessary try defining objects outside of your function.
104.3.4 A note on indentation
Consider using two spaces when indenting your code. This is preferred to using tabs. One thing you should probably avoid is mixing both! In other words, avoid indenting some lines in your code using spaces and others using tabs.
104.3.5 Few other notes on syntax
- Avoid using semicolons (
;
) at the end of a line of code. - Avoid putting multiple commands on one line; hence, avoid using semicolons to separate multiple commands on one line! Multiple commands on a single line make your code cluttered and harder to read.
- You can use
<-
or=
for assignment in R, however, the tidyverse style guide strongly advocates for consistently using<-
for value assignment.
# Do
my_var <- 7
# Maybe don't do
my_var = 7
- The pipe operator,
%>%
, should be preceded by space and should usually be followed by a new line. Just like indentation in code blocks, the code following a pipe should be indented by two spaces (Side note: if you are working on an R script or an RMarkdown file in RStudio you may have noticed that the indentation on the next line is automatically done for you!).
# Good
data %>%
filter(x > 10) %>%
group_by(my_cat_var) %>%
summarize(my_sum = sum(my_other_var)) %>%
ungroup()
# Not so good
data %>% filter(x > 10) %>% group_by(my_cat_var) %>% summarize(my_sum = sum(my_other_var)) %>%
ungroup()
- Recall that in the
tidyverse
when using the packageggplot2
layers of your plot are separated by+
instead of%>%
. However, the style suggestions remain the same. The+
operator should be preceded by a space and the code that follows should appear on the next line. It is recommended that if you are incorporating your plot code inside an existing piped code you keep a single level of indentation. It is also customary to add new layers on separate lines. If you write a ggplot layer with too many arguments, for clarity it would be preferable to split this long line and place each argument of you layer in a separate line.
- Inside
if()
clauses, use&&
and||
instead of the usual logical operators,&
and|
.
104.4 Next Steps
This tutorial is largely based on content in the tidyverse style guide. For more detailed information check out: https://style.tidyverse.org/
104.5 Exercises
104.5.1 Question 1
Which of these expressions follows the tidy style? Select all that apply.
-
x<-3
-
x=3
-
x <- 3
x < - 3
104.5.2 Question 2
Select the appropriately styled line of code from the following choices:
-
here::here("data/my_file.R")
-
here::here ("data/my_file.R")
-
here ::here("data/my_file.R")
here :: here("data/my_file.R")
104.5.5 Question 5
Suppose you made a function called normalized_sum
taking on some arguments. When calling your function you should leave space between the function name and the arguments in the parenthesis.
- True
- False
104.5.6 Question 6
The following code chunk is in accordance with tidy coding practices:
if (sum > 0) {
print("It works!")
}
- True
- False
104.5.7 Question 7
Which of the following R object assignments follows the tidy style guide principles:
-
T <- FALSE
c <- 1
-
sum <- mean(1:10)
-
a <- 2
- All of the above
- None of the above
104.5.9 Question 9
my_function <- function(age = age, total = total, lambda = 0.8,
prob = 0.3,
sum = NULL) {
# Code for some difficult calculation
}
Which of the following best describes the reason the above code chunk is not in line with tidy principles?
- The spaces between the function arguments and the code.
- The indentation of the function arguments.
- None of the above! This function is actually tidy.
- Both a. and b. are the culprits.