# a
arrange(avg_bill_ratio)
# b
group_by(species)
# c
penguins
# d
summarize(
avg_bill_ratio = mean(bill_ratio, na.rm = TRUE)
)
# e
mutate(
bill_ratio = bill_length_mm / bill_depth_mm
)
dplyr Review
This module consists of readings reviewing material typically taught in STAT 331. It is possible you can skip over portions of this reading. It is your responsibility to decide which areas you need to review before diving into Stat 541.
Answer the following questions to see if you can safely skip this section.
- In essence, a
data.frameis simply a speciallist- with a few extra restrictions on thelistformat.
Think about the datasets you have already worked with. Which of the following restrictions on a list do you think are needed for the list to be a data.frame? (Select all that apply)
- The elements of the list must all be vectors of the same length.
- The elements of the list must all be the same data type.
- The elements of the list must all have no missing values.
- The elements of the list must all have names.
- Tibbles are described as βopinionatedβ dataframes. Which of the following are true about a
tibbleβs behavior? (Select all that apply)
-
tibbles only print the first 10 rows of a dataset -
tibbles allow for non-syntactic variable names, like:) -
tibbles never convert strings to factors -
tibbles create row names
If you had a hard time answering these questions, I would recommend reviewing Section 1.1.
- Match each of the base R code excerpt to the associated dplyr verb.
penguins[order(penguins$bill_length_mm) , ]penguins[penguins$species == "Adelie", ]aggregate(bill_length_mm ~ species, data = penguins, FUN = mean)with(penguins, mass_ratio = body_mass_g / flipper_length_mm)penguins$speciesmean(penguins[penguins$species == "Adelie", ], na.rm = TRUE)
If you had a hard time answering this question, I would recommend reviewing Section 1.2.
- Suppose we would like to study how the ratio of bill length to bill depth across the different penguin species. Arrange the following steps into an order that accomplishes this goal (assuming the steps are connected with a
|>or a%>%).
If you had a hard time answering this question, I would recommend reviewing Section 1.3.
dplyr
You should feel comfortable using:
-
The five main dplyr verbs:
arrange()select()mutate()summarize()
Incorportaing
group_by()to perform groupwise operationsChaining together data wrangling operations with the pipe operator (
|>or%>%)
Data Structures
Choose one of these two options:
In addition, read the following section from the first edition of R for DS:
Data Wrangling with dplyr
If you had a hard time answering Question 3, I would recommend reviewing this content.
The Pipe Operator
If you had a hard time answering Question 4, I would recommend also reviewing this content.