28 The pipe

Written by Mariam Walaa and last updated on 7 October 2021.

28.1 Introduction

In this lesson, you will learn how to:

Load the package required to use the pipe %>%
Use the pipe %>%

Prerequisite skills include:

Loading packages

Highlights:

The pipe %>% is used to keep code clean and concise.
The pipe %>% works by piping data into a function.
The pipe %>% can pipe data into functions repeatedly.

28.2 Overview

The pipe is helpful for keeping your code clean when you have to apply multiple transformations to your data. You can start using the pipe after you have loaded the magrittr package. The magrittr package is also part of tidyverse, so if you have already loaded tidyverse then you will be able to start using the pipe.

In this tutorial, we will be using the penguins data to present the uses of the pipe. This data contains records on measurements for penguin species, including their size, sex, and where they live. There are 344 rows and 8 columns in this data set.

Figure: The Palmer Penguins Credits: Allison Horst

Lets start with loading the tidyverse.

library(tidyverse)

Here is an example using base R summary() function, without using the pipe.

summary(penguins)
#>       species          island    bill_length_mm 
#>  Adelie   :152   Biscoe   :168   Min.   :32.10  
#>  Chinstrap: 68   Dream    :124   1st Qu.:39.23  
#>  Gentoo   :124   Torgersen: 52   Median :44.45  
#>                                  Mean   :43.92  
#>                                  3rd Qu.:48.50  
#>                                  Max.   :59.60  
#>                                  NA's   :2      
#>  bill_depth_mm   flipper_length_mm  body_mass_g  
#>  Min.   :13.10   Min.   :172.0     Min.   :2700  
#>  1st Qu.:15.60   1st Qu.:190.0     1st Qu.:3550  
#>  Median :17.30   Median :197.0     Median :4050  
#>  Mean   :17.15   Mean   :200.9     Mean   :4202  
#>  3rd Qu.:18.70   3rd Qu.:213.0     3rd Qu.:4750  
#>  Max.   :21.50   Max.   :231.0     Max.   :6300  
#>  NA's   :2       NA's   :2         NA's   :2     
#>      sex           year     
#>  female:165   Min.   :2007  
#>  male  :168   1st Qu.:2007  
#>  NA's  : 11   Median :2008  
#>               Mean   :2008  
#>               3rd Qu.:2009  
#>               Max.   :2009  
#>

Here is code providing the same output using the pipe.

penguins %>% summary()
#>       species          island    bill_length_mm 
#>  Adelie   :152   Biscoe   :168   Min.   :32.10  
#>  Chinstrap: 68   Dream    :124   1st Qu.:39.23  
#>  Gentoo   :124   Torgersen: 52   Median :44.45  
#>                                  Mean   :43.92  
#>                                  3rd Qu.:48.50  
#>                                  Max.   :59.60  
#>                                  NA's   :2      
#>  bill_depth_mm   flipper_length_mm  body_mass_g  
#>  Min.   :13.10   Min.   :172.0     Min.   :2700  
#>  1st Qu.:15.60   1st Qu.:190.0     1st Qu.:3550  
#>  Median :17.30   Median :197.0     Median :4050  
#>  Mean   :17.15   Mean   :200.9     Mean   :4202  
#>  3rd Qu.:18.70   3rd Qu.:213.0     3rd Qu.:4750  
#>  Max.   :21.50   Max.   :231.0     Max.   :6300  
#>  NA's   :2       NA's   :2         NA's   :2     
#>      sex           year     
#>  female:165   Min.   :2007  
#>  male  :168   1st Qu.:2007  
#>  NA's  : 11   Median :2008  
#>               Mean   :2008  
#>               3rd Qu.:2009  
#>               Max.   :2009  
#>

As you can see, the pipe %>% operator takes the penguins data frame and pipes it into the summary() function, so you do not need to pass penguins as a parameter to summary().

In this example, it is hard to see why using the pipe makes the code clean and concise, but when you have multiple transformations that you want to apply to your data, it becomes clearer why using the pipe makes your code cleaner, more concise, and easier to read.

Here is a similar example without the pipe, but this time we will also filter the data before we summarize it using the summary function.

adelie <- filter(penguins, species == "Adelie")
summary(adelie)
#>       species          island   bill_length_mm 
#>  Adelie   :152   Biscoe   :44   Min.   :32.10  
#>  Chinstrap:  0   Dream    :56   1st Qu.:36.75  
#>  Gentoo   :  0   Torgersen:52   Median :38.80  
#>                                 Mean   :38.79  
#>                                 3rd Qu.:40.75  
#>                                 Max.   :46.00  
#>                                 NA's   :1      
#>  bill_depth_mm   flipper_length_mm  body_mass_g  
#>  Min.   :15.50   Min.   :172       Min.   :2850  
#>  1st Qu.:17.50   1st Qu.:186       1st Qu.:3350  
#>  Median :18.40   Median :190       Median :3700  
#>  Mean   :18.35   Mean   :190       Mean   :3701  
#>  3rd Qu.:19.00   3rd Qu.:195       3rd Qu.:4000  
#>  Max.   :21.50   Max.   :210       Max.   :4775  
#>  NA's   :1       NA's   :1         NA's   :1     
#>      sex          year     
#>  female:73   Min.   :2007  
#>  male  :73   1st Qu.:2007  
#>  NA's  : 6   Median :2008  
#>              Mean   :2008  
#>              3rd Qu.:2009  
#>              Max.   :2009  
#>

Equivalently, here is code providing the same output, using the pipe instead.

penguins %>%
  filter(species == "Adelie") %>%
  summary()
#>       species          island   bill_length_mm 
#>  Adelie   :152   Biscoe   :44   Min.   :32.10  
#>  Chinstrap:  0   Dream    :56   1st Qu.:36.75  
#>  Gentoo   :  0   Torgersen:52   Median :38.80  
#>                                 Mean   :38.79  
#>                                 3rd Qu.:40.75  
#>                                 Max.   :46.00  
#>                                 NA's   :1      
#>  bill_depth_mm   flipper_length_mm  body_mass_g  
#>  Min.   :15.50   Min.   :172       Min.   :2850  
#>  1st Qu.:17.50   1st Qu.:186       1st Qu.:3350  
#>  Median :18.40   Median :190       Median :3700  
#>  Mean   :18.35   Mean   :190       Mean   :3701  
#>  3rd Qu.:19.00   3rd Qu.:195       3rd Qu.:4000  
#>  Max.   :21.50   Max.   :210       Max.   :4775  
#>  NA's   :1       NA's   :1         NA's   :1     
#>      sex          year     
#>  female:73   Min.   :2007  
#>  male  :73   1st Qu.:2007  
#>  NA's  : 6   Median :2008  
#>              Mean   :2008  
#>              3rd Qu.:2009  
#>              Max.   :2009  
#>

The code looks a lot cleaner, and we did not have to separate the process into two different steps or assign the filtered data to an object.

28.3 Exercises

You can learn more about the penguin characteristics and what they describe through this illustration by Allison Horst.

Figure: The Palmer Penguins Credits: Allison Horst

28.3.1 Exercise 1

Here is some code that filters the data first by species and then by sex, and summarizes the data using the summary function from above.

adelie <- filter(penguins, species == "Adelie")
female_adelie <- filter(adelie, sex == "female")
summary(female_adelie)
#>       species         island   bill_length_mm 
#>  Adelie   :73   Biscoe   :22   Min.   :32.10  
#>  Chinstrap: 0   Dream    :27   1st Qu.:35.90  
#>  Gentoo   : 0   Torgersen:24   Median :37.00  
#>                                Mean   :37.26  
#>                                3rd Qu.:38.80  
#>                                Max.   :42.20  
#>  bill_depth_mm   flipper_length_mm  body_mass_g  
#>  Min.   :15.50   Min.   :172.0     Min.   :2850  
#>  1st Qu.:17.00   1st Qu.:185.0     1st Qu.:3175  
#>  Median :17.60   Median :188.0     Median :3400  
#>  Mean   :17.62   Mean   :187.8     Mean   :3369  
#>  3rd Qu.:18.30   3rd Qu.:191.0     3rd Qu.:3550  
#>  Max.   :20.70   Max.   :202.0     Max.   :3900  
#>      sex          year     
#>  female:73   Min.   :2007  
#>  male  : 0   1st Qu.:2007  
#>              Median :2008  
#>              Mean   :2008  
#>              3rd Qu.:2009  
#>              Max.   :2009

As an exercise, try to convert the code above into equivalent code using the pipe.

# You don't have to assign it to an object

# You can filter multiple times within filter()

28.3.2 Exercise 2

Here is some code that filters the data, first by sex and then by year of study, and counts the number of penguins using the count function.

females <- filter(penguins, sex == "female")
females_2007 <- filter(females, year == "2007")
count(females_2007)
#> # A tibble: 1 × 1
#>       n
#>   <int>
#> 1    51

As another exericse, try to convert the code above into equivalent code using the pipe.

# You don't have to assign it to an object

28.3.3 Exercise 3

28.4 Common Mistakes & Errors

Below are some common mistakes and errors you may come across:

You might type the wrong operator. The pipe operator is as follows: %>%
You might try to pipe something into a function other than data.

28.5 Next Steps

If you would like to read more about the pipe operator, here are some additional resources that you may find helpful:

DoSS Toolkit

28 The pipe

28.1 Introduction

28.2 Overview

28.3 Exercises

28.3.1 Exercise 1

28.3.2 Exercise 2

28.3.3 Exercise 3

28.4 Common Mistakes & Errors

28.5 Next Steps

28.6 Exercises

28.6.1 Question 1

28.6.2 Question 2

28.6.3 Question 3

28.6.4 Question 4

28.6.5 Question 5

28.6.6 Question 6

28.6.7 Question 7

28.6.8 Question 8

28.6.9 Question 9

28.6.10 Question 10