62 setseed, runif, rnorm, and sample

Written by Haoluan Chen and last updated on 7 October 2021.

62.1 Introduction

In this lesson, you will learn how to:

  • Generate numbers from a uniform distribution or normal distribution
  • Sample from a collection of numbers

Prerequisite skills include:

  • Run code in R
  • Basic knowledge of uniform distribution and normal distribution and sampling

Highlights:

  • Generate random value from a uniform distribution and normal distribution
  • Generate random value from a set
  • set.seed() for reproducibility

62.2 The content

Simulation is an important topic in statistics because it helps you understand how random data might be generated. For some experiments, you may want to simulate values from probability distributions. In R, we can use runif() and rnorm() function to generate random number from uniform distribution or normal distributionr. Also, we can randomly sample from a set of numbers by using the sample() function.

62.2.1 runif()

The runif() function generate random numbers from a uniform distribution.

62.2.1.1 Arguments

It takes in three parameters: n, min and max. The parameter n specifies the number of random values you want to generate. The parameter min and max specify the range of the uniform distribution. The default of min and max are 0 and 1.

Arguments What does it mean
n (required) number of random values you want to generate (numeric)
min (optional) the minimum value of the uniform distribution you are sampling from (numeric)
max (optional) the maximum value of the uniform distribution you are sampling from (numeric)

62.2.1.2 Example

runif(n = 3)
#> [1] 0.6034775 0.9619177 0.2129215

The above code means to generate three random numbers from unif(0,1) where unif is a uniform distribution with a minimum value of 0 and maximum value of 1.

What if you want to generate number from unif(2,8) uniformly?

In runif() function, we can specify the min and max to be 2 and 8 to generate three numbers from unif(2,8):

runif(n = 3, min = 2, max = 8)
#> [1] 5.881770 6.486302 3.368771

62.2.2 rnorm()

In R, we can use rnorm() function to generate numbers from a normal distribution.

62.2.2.1 Arguments

It takes in three parameters: n, mean, and sd. The parameter n specifies the number of random values you want to generate. The parameter mean and sd specifies the mean and standard deviation of the normal distribution you wish to sample. The default of mean and sd are 0 and 1.

Arguments What does it mean
n (required) number of random values you want to generate (numeric)
mean (optional) the mean value of the normal distribution you are sampling from (numeric)
sd (optional) the standard deviation of the normal distribution you are sampling from (numeric)

62.2.2.2 Example

Let’s say we want to generate 5 random number from a normal distribution with mean = 0 and sd = 2.

rnorm(n = 5, sd = 2)
#> [1] -3.1218571  3.9995003 -0.9345657 -0.1337965 -3.9527948

Here, we set the n to be 5 and sd to be 2, because we want to generate five random numbers from a normal distribution with a standard deviation of 2. We do not have to specify the mean value here because the default of the mean parameter is 0, which is exactly what we want.

What if we want to generate 5 number from normal(10,2)?

rnorm(n = 5, mean = 10, sd = 2)
#> [1]  6.336855  8.474858  7.251483 10.894489  8.786920

Here, we generated 5 numbers from normal(10,2) distribution.

62.2.3 sample()

In R, we can use sample() to randomly sample numbers from a collection of numbers.

62.2.3.1 Arguments

It takes in three parameters: x, size, and replace. The x is the vector of one or more elements that you wish to sample. The parameter size specifies the number of random values you want to generate. The parameter replace is a logical variable; true if you want to sample with replacement.

When replace is set to true, you will be sampling from the same set of numbers for each generation. When replace is set to false, every time you sample a number, it will be taken out of the vector x for the next number generation.

Arguments What does it mean
x (required) vector of one or more elements that you are sampling from
size (required) number of random values you want to generate (numeric)
replace (optional) true if you want to sample with replacement (logical)
prob (optional) a vector of probability weights for obtaining the elements of the vector being sampled.

62.2.3.2 Examples

Here, we have a vector containing 6 numbers to simulate rolling dice. Let’s roll the dice 6 times and see what we get:

x <- c(1, 2, 3, 4, 5, 6)
set.seed(2)
sample(x = x, size = 6, replace = TRUE)
#> [1] 5 6 6 1 5 1

Don’t worry about set.seed(1), you will learn in this tutorial! What if we set the replace to FALSE?

set.seed(1)
sample(x = x, size = 6, replace = FALSE)
#> [1] 1 4 3 6 2 5

As we see that when replace = TRUE, we obtain repeated sample of 6 and 1. However, when replace = FALSE, there is no repeated sample in the output.

When setting the replace to FALSE, the numbers are taken out for each round of sampling. Here, our first number is 5, which means that the second number will only be a sample from the set {2,3,4,5,6}. The 1 will be taken out of the vector for this sampling process. So, that is why we always get each number to appear once in the simulation.

Using the prob argument, we can assign the probability of each elements the vector being sampled. For example, for a unfair dice, The probability of getting a 6 is 40% and the probability of getting any other number are 12%. We may simulate this unfair dice using sample() function and setting the prob argument.

set.seed(2)
sample(x = x, size = 6, replace = TRUE, prob = c(0.12, 0.12, 0.12, 0.12, 0.12, 0.4))
#> [1] 6 5 4 6 1 1

62.2.4 set.seed()

Let’s run our dice simulation twice and see what happens (run the following code twice)

x <- c(1, 2, 3, 4, 5, 6)
sample(x = x, size = 6, replace = TRUE)
#> [1] 6 1 6 6 1 3

We get different results every time we run the simulation, because we randomly sampled from 1-6 with replacement. What if you want to reuse the result from one simulation? Sometimes you do not want your result to change every time you run the function. This is what set.seed() does.

When you use set.seed() function before your simulation, the simulation output will be the same every time.

62.2.4.1 Arguments

The set.seed() function takes in a number, and it can be any number.

62.2.4.2 Example

Let’s use set.seed() before we do the dice simulation

Please run the following code twice.

set.seed(2)
x <- c(1, 2, 3, 4, 5, 6)
sample(x = x, size = 6, replace = TRUE)
#> [1] 5 6 6 1 5 1

This also works for runif(), rnorm() and other simulation functions. Once you use set.seed() your simulation will always produce the same result.

62.3 Exercises

62.3.1 Exercise 1

Please generate 10 random values from unif(-1,1)

62.3.2 Exercise 2

Please generate 10 random values from normal(0,5)

62.3.3 Exercise 3

Exercise 1 & 2Exercise 4 & 5 & 6

62.4 Common Mistakes & Errors

  • Make sure you have input parameter in the right order!

62.5 Next Steps

Sometimes you need to do additional things to make your simulated more similar to your data. You can take a look on this book: R Programming for Data Science: https://bookdown.org/rdpeng/rprogdatascience/simulation.html. It has videos that explains simulation concepts and simulating a linear model.

You can also generate binomial random variables using rbinom(), and Poisson random variables using rpois(), among others!

62.6 Exercises

62.6.1 Question 1

runif() function generate random numbers from a uniform distribution.
a. True b. False ### Question 2 What is the required parameter for runif()? a. n b. min c. mix d. There is no required parameter

62.6.2 Question 3

What is the optional parameter for runif()? a. n b. min c. mix d. There is no optional parameter

62.6.3 Question 4

which code generate 5 numbers from a normal(0, 5)? (multiple answer) a. rnorm(5, 0, 5) b. rnorm(0, 5, 5) c. rnorm(n = 5, sd = 5) d. rnorm(5, 5)

62.6.4 Question 5

rnorm(n=10, mean=10, sd=2) generate 10 numbers from a normal(10, 2)? a. Yes b. No

62.6.5 Question 6

sample() can randomly sample numbers from a collection of numbers. a. True b. False

62.6.6 Question 7

When replace = TRUE in sample() we will obtain repeated sample a. True b. False

62.6.7 Question 8

Which of the following code simulates rolling a fair dice 5 times? a. sample(c(1, 2, 3, 4, 5, 6), 5)") b. runif(5, 1, 6) c. sample(c(1, 2, 3, 4, 5, 6), 5, replace = FALSE) d. sample(c(1, 2, 3, 4, 5, 6), 5, replace = TRUE)

62.6.8 Question 9

Use set.seed() will make sure your simulation output will be the same every time. a. True b. False

62.6.9 Question 10

set.seed() can take any integer as a parameter a. True b. False