50 apply, sapply, and lapply

Written by Ananya Jha and last updated on 5 February 2022.

50.1 Introduction

The apply family of functions is one of the most commonly used classes and is used to manipulate data (in the form of matrices, arrays, dataframes, lists, etc.) repetitively. They are generally used to run a function on multiple chunks of data of an object.

If you have programmed before, the functionality of apply functions can be thought of as a ‘for loop’ but they are considered more efficient.
In this lesson we will be covering apply(), lapply() and sapply(). There is a huge variety of apply functions available, and while they work for different kinds of data, all of these have at least two common arguments- an object and a function (which can be either user-defined or built in) and we can “apply” the latter to the former. We will get into the details of the arguments and how to use them later.

In this lesson, you will learn how to:

  • Use apply(), lapply() and sapply() (functions from the apply family)

  • Use appropriate versions of the apply family of functions for the correct use cases

Prerequisites:

  • Basic understanding of lists, vectors, arrays, matrices, and data frames prerequisites

50.1.1 Arguments

Let’s first take a look at what the R documentation has as the arguments for these functions:

  • apply(X, MARGIN, FUN, ..., simplify = TRUE)

  • lapply(X, FUN, ...)

  • sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

Where:

Argument Details
X The object to which we want to apply the function (can be array/matrix/list/vector/data frame)
MARGIN A vector to specify how the function is applied (1 for along row/2 for column)
FUN The function (in-built or user defined) to be applied repeatedly to the elements of X
Optional arguments to FUN
simplify Logical to pick simplified results (a matrix or vector when possible)
USE.NAMES Logical; set to TRUE X is character and want X as names for the result

At first look these functions might all seem the same.

Now let’s dive deeper into their differences to understand better.

50.2 Apply

Let’s start with apply which is the most basic from the apply family of functions. It operates primarily on arrays and matrices, and can occasionally work with data frames but with restrictions (the data frame object type must be compatible as function arguments).

This is the only function (out of the ones covered in this lesson) that requires margins to be specified in the arguments. For example in the case of a matrix, 1 would mean the function would be applied along the rows, 2 would be columns, and c(1,2) would be both rows and columns.

50.3 Lapply

This is another member of the apply family of functions that operates on list, data frames, vectors. It has two main differences compared to apply-

  1. It always returns a list the same length as the input list

  2. It does not require margin specification as it only applies the function to columns

50.4 Sapply

The final apply function covered in this lesson is sapply.

Just like lapply, sapply also operates on lists, data frames and vectors and does not need margin specification and just like in apply, we can use simplify = TRUE to simplify the results.

Sapply is commonly known as “a user-friendly version of lapply.” In layman’s terms, sapply is a wrapper of lapply and the only difference is that it simplifies the output (when possible) and returns a vector instead of a list.

Note- sapply(x, f, simplify = FALSE, USE.NAMES = FALSE) would be the same as lapply(x, f)

50.5 Summary of differences table

To summarize the differences between these functions, we can use this table:

Function Operates on Returns Margins to specify(yes/no)
Apply arrays and matrices a vector or array or list of values Yes
Lapply lists, data frames, vectors a list No
Sapply lists, data frames, vectors a vector No

50.5.1 Examples

50.5.1.1 Example 1

We will first work with the popular mtcars data set to understand how these functions work. Let’s take a quick look at it

data <- mtcars
head(mtcars)
#>                    mpg cyl disp  hp drat    wt  qsec vs am
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0
#>                   gear carb
#> Mazda RX4            4    4
#> Mazda RX4 Wag        4    4
#> Datsun 710           4    1
#> Hornet 4 Drive       3    1
#> Hornet Sportabout    3    2
#> Valiant              3    1

Let’s subset this dataset to fit our needs better and call it cars.

We can now find the means of all columns:

apply(cars, 2, mean)
#>        hp        wt 
#> 146.68750   3.21725

We can also find the column quantiles using the quantile function

apply(cars, 2, quantile, probs = c(0.10, 0.25, 0.50, 0.75, 0.90))
#>        hp      wt
#> 10%  66.0 1.95550
#> 25%  96.5 2.58125
#> 50% 123.0 3.32500
#> 75% 180.0 3.61000
#> 90% 243.5 4.04750

While it is not too relevant in this context, we can also compute the sum of rows like this:

apply(cars, 1, sum)
#>           Mazda RX4       Mazda RX4 Wag          Datsun 710 
#>             112.620             112.875              95.320 
#>      Hornet 4 Drive   Hornet Sportabout             Valiant 
#>             113.215             178.440             108.460 
#>          Duster 360           Merc 240D            Merc 230 
#>             248.570              65.190              98.150 
#>            Merc 280           Merc 280C          Merc 450SE 
#>             126.440             126.440             184.070 
#>          Merc 450SL         Merc 450SLC  Cadillac Fleetwood 
#>             183.730             183.780             210.250 
#> Lincoln Continental   Chrysler Imperial            Fiat 128 
#>             220.424             235.345              68.200 
#>         Honda Civic      Toyota Corolla       Toyota Corona 
#>              53.615              66.835              99.465 
#>    Dodge Challenger         AMC Javelin          Camaro Z28 
#>             153.520             153.435             248.840 
#>    Pontiac Firebird           Fiat X1-9       Porsche 914-2 
#>             178.845              67.935              93.140 
#>        Lotus Europa      Ford Pantera L        Ferrari Dino 
#>             114.513             267.170             177.770 
#>       Maserati Bora          Volvo 142E 
#>             338.570             111.780

50.5.1.2 Example 2

As mentioned before, we can also use a user-defined function with apply. Let’s first generate a random matrix.

mat <- matrix(rep(seq(5), 4), ncol = 5)

We can use either a user defined function inside the apply function. Here, we sum along the rows and add 5 to each element:

apply(mat, 1, function(x) sum(x) + 5)
#> [1] 20 20 20 20

Or we can use the user-defined function as an argument:

func <- function(x){
  return (sum(x)+5)
}
apply(mat, 1, func)
#> [1] 20 20 20 20

50.5.1.3 Example 3

We can create a random list to test out the lapply and sapply functions.


list_i <- list(i1 = 1:10, 
             i2 = rnorm(20), 
             i3 = rnorm(20, 1), 
             i4 = rnorm(100, 5))
list_i
#> $i1
#>  [1]  1  2  3  4  5  6  7  8  9 10
#> 
#> $i2
#>  [1] -0.2045534 -0.6324466  0.8970894 -0.9727964 -0.3983415
#>  [6]  0.5687444 -0.3852953  0.1477497  0.2811383  0.2443239
#> [11] -0.6827844  0.5756808  0.2963548  0.4984564  0.5354025
#> [16]  0.8651955  0.6675924 -0.2219791 -1.4316497 -0.7275366
#> 
#> $i3
#>  [1]  2.14482562  1.03057942  1.60846936 -0.34973467
#>  [5]  1.24786698  0.95353053 -0.29092860  1.40775111
#>  [9]  0.28953589  2.72335829  1.80398240  1.66199585
#> [13]  0.78575781  0.54529491 -0.93771947  1.67283899
#> [17] -0.41631177  1.73724696 -0.03494943  0.40580213
#> 
#> $i4
#>   [1] 4.386153 5.033071 4.626716 3.840784 3.626118 6.279478
#>   [7] 3.746071 5.282170 4.879214 5.898871 6.040309 4.650732
#>  [13] 3.473012 5.050392 5.205905 4.748566 4.187117 4.342289
#>  [19] 7.232713 5.674597 5.417315 4.519410 5.046732 4.450993
#>  [25] 5.037396 4.665372 5.209705 5.021464 4.928530 6.055118
#>  [31] 6.300963 4.634427 3.830402 5.461589 1.483365 4.805855
#>  [37] 4.800940 4.782546 4.374689 4.552665 4.693131 5.029811
#>  [43] 4.431300 5.416743 6.881346 5.739352 6.266772 5.432847
#>  [49] 5.752277 4.252155 4.686701 4.276075 5.233844 5.674320
#>  [55] 6.324034 4.239288 5.634833 4.672740 4.632464 3.082046
#>  [61] 3.683778 4.331642 6.097118 4.544176 5.871782 4.224979
#>  [67] 3.115118 5.343443 5.719994 6.342525 6.569598 4.935664
#>  [73] 5.953346 4.614961 3.521891 6.291215 4.926431 5.456534
#>  [79] 6.251372 3.465674 5.749395 4.024947 2.790972 6.220917
#>  [85] 4.926340 5.151460 4.237463 5.109241 4.899154 5.716756
#>  [91] 5.594018 5.222826 4.953629 4.856371 6.496773 4.906841
#>  [97] 6.097357 4.858970 4.561514 4.548418

lapply(list_i, mean)
#> $i1
#> [1] 5.5
#> 
#> $i2
#> [1] -0.003982732
#> 
#> $i3
#> [1] 0.8994596
#> 
#> $i4
#> [1] 4.981204
sapply(list_i, mean)
#>           i1           i2           i3           i4 
#>  5.500000000 -0.003982732  0.899459615  4.981204351

As you can see, lapply returns a more complicated vector output and sapply returns the same output in a list.

50.6 Exercise

a) Use the numerical columns(columns 1-4) of the iris data set to compute the minimum value of each column using apply, lapply and sapply and store the outputs.

head(iris)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa
ap <- apply(iris[,-5], MARGIN = 2,  FUN = min) 
l_ap <- lapply(iris[,-5], FUN = min) 
s_ap <- sapply(iris[,-5], FUN = min)

b) Add the three outputs from above to a list called “list_output.”

list_output <- list(ap, l_ap, s_ap )

c) Use an appropriate apply function to find the type of each element of list_output (using the type of function)

sapply(FUN = typeof, X = list_output) 
#> [1] "double" "list"   "double"

50.7 Some extra notes

  1. If you are deciding between using for loops or apply functions, always pick apply functions as they are slightly faster and require fewer lines of code. Using apply also makes the code easier to read and understand.
  1. Apply functions are capable of a lot of cool things. We can also use the apply function to generate plots for the objects like this:
apply(iris[,-5], 2, boxplot) #ignoring the Species column

Here, we are generating a box plot for all numerical columns of the iris data set.

  1. Using these functions can often be tricky. If you are ever struggling to decide between these functions and want the results simplified, the easiest option would be sapply. On the off-chance that you don’t want simplified results, use lapply. Generally, either of these two would be ideal.

50.8 Common mistakes and errors

  • In the case of one dimensional data (like vectors), lapply or sapply should be used as the apply function will never work. It expects the data to have at least two dimensions and will give errors.
  • If your data happens to have NA values and you use apply function, the result would be NA regardless of the function choice. You can choose to ignore the NA values (if there are any) by using na.rm inside the apply function.

50.9 Next steps

50.11 Exercises

50.11.1 Question 1

  1. Which arguments do all apply functions have in common (pick all that apply)?
    1. Object
    2. Margin
    3. Function
    4. Simplify

50.11.2 Question 2

  1. Which apply function requires Margin in its arguments (pick one)?
    1. lapply()
    2. sapply()
    3. apply()
    4. mapply()

50.11.3 Question 3

  1. Which function returns a list of the same length as the input list (pick one)?
    1. sapply()
    2. apply()
    3. lapply()
    4. tapply()

50.11.4 Question 4

  1. What does the sapply function return?
    1. a vector or an array
    2. an array
    3. a list
    4. a vector

50.11.5 Question 5

  1. What are the benefits of using apply functions (pick all that apply)?
    1. Is efficient
    2. Makes code readable
    3. Makes code easier to understand
    4. All of the above

50.11.6 Question 6

  1. Which function is considered a simpler version(and a wrapper) of lapply (pick one)?
    1. lapply()
    2. tapply()
    3. sapply()
    4. apply()

50.11.7 Question 7

  1. What margin is the lapply function applied to (pick all that apply)?
    1. Rows
    2. Columns
    3. Either rows or columns
    4. All of the above

50.11.8 Question 8

  1. Which function would you use if you had a matrix and wanted to get a sum of every row or every column (pick one)?
    1. sapply()
    2. apply()
    3. tapply()
    4. lapply()

50.11.9 Question 9

  1. Which function would you use if you had a list of matrices and wanted to get a sum of the second column for each matrix (pick one)?
    1. mapply()
    2. sapply()
    3. apply()
    4. lapply()

50.11.10 Question 10

  1. Which function would you use if you wanted the most simplified output?
    1. sapply()
    2. apply()
    3. tapply()
    4. lapply()