54 row and column names

Written by Leuven Wang and last updated on 7 October 2021.

54.1 Introduction

Data frames can be very well organized and if you’re anything like me, you might sometimes even find them aesthetically pleasing in their clean simplicity. But sometimes, their rows and columns can be named with unintelligible or extremely long headings not meant for human review. This may be undesirable in cases where you need to present your data frame in tables or would like to understand them more intuitively during the analytics stage.

In this lesson, you will learn how to:

Prerequisites:

  • Basic understanding of vectors c().
  • General knowledge of basic data frame structures i.e. what records and columns are.
  • Indexing and subsets of vectors and data frames.

Let’s stick with a dataset we’re familiar with: mtcars. For simplicity, we will use only the first 5 columns of the dataset.

data <- mtcars[1:5]
head(data)
#>                    mpg cyl disp  hp drat
#> Mazda RX4         21.0   6  160 110 3.90
#> Mazda RX4 Wag     21.0   6  160 110 3.90
#> Datsun 710        22.8   4  108  93 3.85
#> Hornet 4 Drive    21.4   6  258 110 3.08
#> Hornet Sportabout 18.7   8  360 175 3.15
#> Valiant           18.1   6  225 105 2.76

The column names are the headers at the top of the column. In this case they are mpg, cyl, disp, hp, and drat. The row names are the unique identifiers that head each record. It is important to know that row names do not exist under a column variable the same way that the rest of their record does. In this case, our row names are the models of cars: Mazda RX4, Mazda RX4 Wag, etc…

54.2 Using colnames()

Now, if you work a lot with this dataset or you just happen to be a car aficionado you may know what these column headings mean. But for us mere plebes, this isn’t so intuitive! A full description of the column variables can be found here.

Using colnames and a vector of names of length matching the number of columns, we can change each column’s name.

colnames(data) <- c("Miles/Gallon", "Cylinders", "Displacement", "Horsepower", "Rear Axle Ratio")
head(data)
#>                   Miles/Gallon Cylinders Displacement
#> Mazda RX4                 21.0         6          160
#> Mazda RX4 Wag             21.0         6          160
#> Datsun 710                22.8         4          108
#> Hornet 4 Drive            21.4         6          258
#> Hornet Sportabout         18.7         8          360
#> Valiant                   18.1         6          225
#>                   Horsepower Rear Axle Ratio
#> Mazda RX4                110            3.90
#> Mazda RX4 Wag            110            3.90
#> Datsun 710                93            3.85
#> Hornet 4 Drive           110            3.08
#> Hornet Sportabout        175            3.15
#> Valiant                  105            2.76

Note that using the colnames() function on our data variable automatically changes the column names inside! If we use a vector with less elements than the the number of columns, the first few column names will change and the others will become nameless. Using a vector with a length that exceeds the number of columns results in an error.

colnames(data) <- c("Miles/Gallon", "Cylinders")
head(data)
#>                   Miles/Gallon Cylinders  NA  NA   NA
#> Mazda RX4                 21.0         6 160 110 3.90
#> Mazda RX4 Wag             21.0         6 160 110 3.90
#> Datsun 710                22.8         4 108  93 3.85
#> Hornet 4 Drive            21.4         6 258 110 3.08
#> Hornet Sportabout         18.7         8 360 175 3.15
#> Valiant                   18.1         6 225 105 2.76

If you want to change a single specific column’s name, you can reference it by index and use characters instead of vectors:

colnames(data)[3] <- "Displacement"
head(data)
#>                   Miles/Gallon Cylinders Displacement  NA
#> Mazda RX4                 21.0         6          160 110
#> Mazda RX4 Wag             21.0         6          160 110
#> Datsun 710                22.8         4          108  93
#> Hornet 4 Drive            21.4         6          258 110
#> Hornet Sportabout         18.7         8          360 175
#> Valiant                   18.1         6          225 105
#>                     NA
#> Mazda RX4         3.90
#> Mazda RX4 Wag     3.90
#> Datsun 710        3.85
#> Hornet 4 Drive    3.08
#> Hornet Sportabout 3.15
#> Valiant           2.76

And of course, if you want to change a subset of column names, you can reference it with index and use vectors:

colnames(data)[4:5] <- c("Horsepower","Rear Axle Ratio")
head(data)
#>                   Miles/Gallon Cylinders Displacement
#> Mazda RX4                 21.0         6          160
#> Mazda RX4 Wag             21.0         6          160
#> Datsun 710                22.8         4          108
#> Hornet 4 Drive            21.4         6          258
#> Hornet Sportabout         18.7         8          360
#> Valiant                   18.1         6          225
#>                   Horsepower Rear Axle Ratio
#> Mazda RX4                110            3.90
#> Mazda RX4 Wag            110            3.90
#> Datsun 710                93            3.85
#> Hornet 4 Drive           110            3.08
#> Hornet Sportabout        175            3.15
#> Valiant                  105            2.76

Conversely, if we wanted to retrieve the current column names in the table, we can also call upon colnames():

colnames(data)
#> [1] "Miles/Gallon"    "Cylinders"       "Displacement"   
#> [4] "Horsepower"      "Rear Axle Ratio"

The same can be done for singular or subsets of columns.

colnames() also applies to tibbles:

colnames(tibble(data))
#> [1] "Miles/Gallon"    "Cylinders"       "Displacement"   
#> [4] "Horsepower"      "Rear Axle Ratio"

54.3 Using row.names()

Similarly, row.names() is used to handle row names. Most of the time, row names are just unique ordered numbers. In the case of our dataset, our row names are car models. The problem with row names is that because they are not grouped under a column variable, they can’t be subjected to the same filter(), select() or handling functions that the rest of the data is. This may be problematic at times.

Let’s say we need the car model names to be a column variable. We can first use row.names() to retrieve them:

car_models <- row.names(data)
car_models
#>  [1] "Mazda RX4"           "Mazda RX4 Wag"      
#>  [3] "Datsun 710"          "Hornet 4 Drive"     
#>  [5] "Hornet Sportabout"   "Valiant"            
#>  [7] "Duster 360"          "Merc 240D"          
#>  [9] "Merc 230"            "Merc 280"           
#> [11] "Merc 280C"           "Merc 450SE"         
#> [13] "Merc 450SL"          "Merc 450SLC"        
#> [15] "Cadillac Fleetwood"  "Lincoln Continental"
#> [17] "Chrysler Imperial"   "Fiat 128"           
#> [19] "Honda Civic"         "Toyota Corolla"     
#> [21] "Toyota Corona"       "Dodge Challenger"   
#> [23] "AMC Javelin"         "Camaro Z28"         
#> [25] "Pontiac Firebird"    "Fiat X1-9"          
#> [27] "Porsche 914-2"       "Lotus Europa"       
#> [29] "Ford Pantera L"      "Ferrari Dino"       
#> [31] "Maserati Bora"       "Volvo 142E"

As car_models preserves the same order as before, we can add easily add it to the rest of the data frame as a normal column:

data <- cbind(car_models, data)
#cbind() is used to combine columns and datasets together to form a new dataset. Other methods exist but for the purposes of our demonstration we shall use it here. It's not important for you to understand it thus far but it is useful.
head(data)
#>                          car_models Miles/Gallon Cylinders
#> Mazda RX4                 Mazda RX4         21.0         6
#> Mazda RX4 Wag         Mazda RX4 Wag         21.0         6
#> Datsun 710               Datsun 710         22.8         4
#> Hornet 4 Drive       Hornet 4 Drive         21.4         6
#> Hornet Sportabout Hornet Sportabout         18.7         8
#> Valiant                     Valiant         18.1         6
#>                   Displacement Horsepower Rear Axle Ratio
#> Mazda RX4                  160        110            3.90
#> Mazda RX4 Wag              160        110            3.90
#> Datsun 710                 108         93            3.85
#> Hornet 4 Drive             258        110            3.08
#> Hornet Sportabout          360        175            3.15
#> Valiant                    225        105            2.76

The problem now is that we have the car models both within the table as a variable and outside of it as row names. We can now use row.names() to fix this redundancy and maintain a guaranteed unique identifier for rows. Using an empty vector numbers the row names in ascending order:

row.names(data) <- c()
head(data)
#>          car_models Miles/Gallon Cylinders Displacement
#> 1         Mazda RX4         21.0         6          160
#> 2     Mazda RX4 Wag         21.0         6          160
#> 3        Datsun 710         22.8         4          108
#> 4    Hornet 4 Drive         21.4         6          258
#> 5 Hornet Sportabout         18.7         8          360
#> 6           Valiant         18.1         6          225
#>   Horsepower Rear Axle Ratio
#> 1        110            3.90
#> 2        110            3.90
#> 3         93            3.85
#> 4        110            3.08
#> 5        175            3.15
#> 6        105            2.76

See how the rows are now numbered as their names? Of course, you can also customarily name your rows and use subsetting:

row.names(data)[1:2] <- c("Hot Wheels","Not so Hot Wheels")
head(data)
#>                          car_models Miles/Gallon Cylinders
#> Hot Wheels                Mazda RX4         21.0         6
#> Not so Hot Wheels     Mazda RX4 Wag         21.0         6
#> 3                        Datsun 710         22.8         4
#> 4                    Hornet 4 Drive         21.4         6
#> 5                 Hornet Sportabout         18.7         8
#> 6                           Valiant         18.1         6
#>                   Displacement Horsepower Rear Axle Ratio
#> Hot Wheels                 160        110            3.90
#> Not so Hot Wheels          160        110            3.90
#> 3                          108         93            3.85
#> 4                          258        110            3.08
#> 5                          360        175            3.15
#> 6                          225        105            2.76

54.4 Common Mistakes and Errors

  • Incorrect indexing when working with specific or subsets of columns/rows. In R, indexing begins at 1.
  • Trying to set row names with a vector of length shorter than the number of rows you’re changing.

54.5 Exercises

54.5.1 Question 1

Given the dataset iris, change the column names such that the periods are replaced with spaces."

54.5.2 Question 2

54.5.3 Question 3

Given an unmodified version of iris, fix the mistake in the code below such that it successfully changes the column of the first two columns to “Sepal Length” and “Sepal Width.”

54.5.4 Question 5

For each car in mtcars, print a string which says “I want to drive the ‘car_name’.” Your final output should look something like this:

I want to drive the Mazda RX4
I want to drive the Mazda Rx4 Wag
I want to drive the Datsun 710

Tip: use cat("I want to drive the ", car_name, "\n") to print your outputs.

54.5.5 Question 6

Change the row name of the second record of mtcars to be the exact same as the first record without changing the first record. See what happens and read the message. Then, figure out a way to swap the two row names without prompting an error.

54.7 Exercises

54.7.1 Question 1

54.7.2 Question 2

54.7.3 Question 3

54.7.4 Question 4

54.7.5 Question 5

54.7.6 Question 6

54.7.7 Question 7

54.7.8 Question 8

54.7.9 Question 9

54.7.10 Question 10