65 Tidying up datasets

Written by Mariam Walaa and last updated on 7 October 2021.

65.1 Introduction

In this lesson, you will learn how to:

  • Use recode()
  • Use coalesce()
  • Use lag() and lead()
  • Use replace_na(), drop_na()
  • Use n_distinct(), distinct()

Prerequisite skills include:

  • Familiarity with NA values
  • Familiarity with data types

Highlights:

  • Use replace_na() and drop_na() to work with NA values
  • Use n_distinct() and distinct() to look at unique rows
  • Use lag() and lead() to push a set of values forward or backward in a vector
  • Use coalesce() to look at the first occurrence of a non-NA value across vectors
  • Use recode() to change certain values to something else that is of the same data type

65.2 Video

65.3 Arguments

65.3.1 recode()

The recode() function takes the following as arguments:

Argument Parameter Details
.x vector the vector you want to modify
old value old value = new value; assign a new value to the old value you want to modify

You can read more about the arguments in the recode() functions documentation here.

65.3.2 replace_na()

The replace_na() function takes the following as arguments:

Argument Parameter Details
data input data frame data frame with columns we want to replace NAs for
replace list list of values for each column to replace their NAs with

You can read more about the arguments in the replace_na() functions documentation here.

65.3.3 coalesce()

The coalesce() function takes the following as arguments:

Argument Parameter Details
set of vectors set of vectors to extract series of first non-empty elements from

You can read more about the arguments in the coalesce() functions documentation here.

65.3.4 n_distinct()

The n_distinct() function takes the following as arguments:

Argument Parameter Details
set of vectors set of vectors to count number of distinct elements for

You can read more about the arguments in the n_distinct() functions documentation here.

65.3.5 distinct()

The distinct() function takes the following as arguments:

Argument Parameter Details
.data tibble tibble to return distinct rows for

You can read more about the arguments in the distinct() functions documentation here.

65.3.6 drop_na()

The drop_na() function takes the following as arguments:

Argument Parameter Details
data input data frame data frame with columns we want to drop rows with NAs for
vector columns you want to drop observations for if they have NAs

You can read more about the arguments in the drop_na() functions documentation here.

65.3.7 lag(), lead()

The lag() and lead() functions take the following as arguments:

Argument Parameter Details
x vector vector of values to work with
n number number of positions to lead or lag by
default number value to fill the empty spots with

You can read more about the arguments in the function documentation here.

65.4 Exercise

Match each of the function names to their descriptions.

Function Description
A This function pulls a vector backward by n positions and fills with NAs.
B This function provides all the distinct values in a vector.
C This function replaces NA values with a specified value.
D This function counts the number of distinct values in a vector.
E This function pushes a vector forward by n positions and fills with NAs.
F This function returns the first non-NA value at each row of a set of data.
G This function takes out all the rows that include NA values.
H This function allows you to change values of certain categories into new values of the same data type.

65.6 Exercises

65.6.1 Question 1

65.6.2 Question 2

65.6.3 Question 3

65.6.4 Question 4

65.6.5 Question 5

65.6.6 Question 6

65.6.7 Question 7

65.6.8 Question 8

65.6.9 Question 9

65.6.10 Question 10