z <- c(1, 2, 3, 4, 5, 6)A Field Guide to Base R
The focus this week is on writing your own functions in R. While we’ve learned a lot of great tools in R, these tools have largely lived in the tidyverse set of packages. This week, we’re going to learn more about the functionality of “base R,” which will likely come in handy when we are writing our own functions. By the end of this coursework you should be able to:
- explain the differences between double, integer, character, and factor data types
- explain the difference between a vector, a data frame, and a list
- use
[]to extract elements from a vector - use
[]and$to extract elements from a data frame
📖 Readings: 60-75 minutes
✅ Preview Activities: 2
1 Working with Base R Tools
In this part of the course you are going to learn more about R as a programming language. We are going to take the foundational programming ideas you learned in your CS-1 course and see how they are implemented in R.
To get us there, we are first going to go back to the very beginning. So far, we’ve worked extensively with data frames (or tibbles) in R, but we haven’t dedicated time to learning about the data structure that is the backbone of a data frame. This week, we are going to remedy that!
1.1 Objects in R
Let’s start by reading about the different types of objects in R:
📖 Required Reading: Hands-on Programming with R – Base R
✅ Check-in 7.1: Objects in R
- What data type will this atomic vector have?
- double
- integer
- logical
- character
- factor
- In R, some vectors have a names attribute. What does it mean for a vector to have names?
- Each element of the vector is stored as a name–value pair, like a dictionary or key–value map.
- The vector’s names attribute provides labels for its elements—metadata that does not change the underlying one-dimensional structure of the vector.
- The vector becomes equivalent to a one-row data frame, with the names serving as column headers.
- The vector can now hold elements of different types, since the names give each element its own type context.
- Using the vector
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), what code would recreate the matrix below?
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
- Which of the following attributes does a matrix have? Select all that apply!
- names
- class
- row.names
- dim
- What data type will the elements of
xbelow be coerced into?
x <- c(3, 4, TRUE, "cat")- double
- integer
- logical
- character
- factor
- Both lists and data frames in R are types of lists under the hood. Which statement best describes how they differ?
- A list and a data frame are identical — both can contain elements of different types and lengths.
- A data frame is a list whose elements (columns) must all have the same length, while a regular list can contain elements of different lengths or structures.
- A list can only contain atomic vectors, whereas a data frame can contain other data frames or lists as elements.
- A data frame is one-dimensional, while a list is two-dimensional.
- What are the attributes of a data frame? Feel free to use the code below to explore!
df <- data.frame(num = c(1, 2, 3),
letter = c("A", "B", "C"),
logic = c(TRUE, FALSE, TRUE)
)- names
- class
- row.names
- dim
1.2 Extracting Elements of an Object
Alright, now that we’ve learned more about the types of objects in R, let’s learn about the “base R” tools for extracting elements from these objects.
📖 Required Reading: R4DS – Base R
If you are still a bit confused about the difference between a $ and [[]], I would recommend this section of Hands-on Programming with R: Dollar Signs & Double Brackets
✅ Check-in 7.2: Extracting Elements of Vectors & Data Frames
x <- c(10, 3, NA, 5, 8, 1, NA)- Which output will
!is.na(x)return?
TRUE TRUE FALSE TRUE TRUE TRUE FALSE1 2 4 5 610 3 5 8 1
- Which output will
x[c(3, 2, 5)]return?
FALSE TRUE TRUE FALSE TRUE FALSE FALSE3 2 5NA 3 8
- Which output will
x[-c(1, 3, 5)]return?
FALSE TRUE FALSE TRUE FALSE TRUE TRUE-1 -3 -53 5 1 NA
Suppose we decided to give the elements of x names:
x <- c(a = 1, b = 2, c = 5)- Which of the following code would extract the
aandcelements? Select all that apply!
x[1, 3]x[c(1, 3)]x[-2]x[c("a", "c")]x["a", "c"]x[-"b"]
Suppose we have the following data frame:
df <- tibble::tibble(
x = 1:5,
y = c("a", "e", "f", "k", "z"),
z = runif(5)
)
df# A tibble: 5 × 3
x y z
<int> <chr> <dbl>
1 1 a 0.522
2 2 e 0.588
3 3 f 0.262
4 4 k 0.467
5 5 z 0.0230
- Suppose I wanted to filter
dfso that the values ofxwere greater than 3. Previously we would have useddf |> filter(x > 3). What base R code would we use? Select all that apply!
df[df$x >= 4, ]df[ , df$x >= 4]df[df$x > 3, ]df[ , df$x > 3]df[df$x > 3]df[which(df$x > 3), ]
- Suppose I wanted to select the
yandzcolumns ofdf. Previously, we would have useddf |> select(y, z). What base R code would we use? Select all that apply!
df[c("y", "z")]df[c("y", "z") , ]df[ , c("y", "z")]df["y", "z"]
- Previously, we could remove columns we were not interested in using a
-inside ofselect(). The code below produces an error.
df[-c("x")]Error in -c("x"): invalid argument to unary operator
Which of the following best explains why?
- tibble objects cannot use negative indices
"x"is not a valid column name, so R can’t find it.- Negative indices in R work only with numeric positions, not character names.
- You need to use double brackets (
df[[-"x"]]) to remove a column by name.