Writing functions that work with data frames and call on the functions weβve become used to (e.g., filter(), select(), summarise()) requires we learn about tidy evaluation. To write these functions you will need to know, at a high level, whether the function you are trying to incorporate uses data masking or tidy selection.
At a high level, data masking is used in functions like arrange(), filter(), and summarize() that compute with variables. Whereas, tidy selection is used for functions like select() and rename() that select variables.
Your intuition about which functions use tidy evaluation should be good for many of these functions. If you can input c(var1, var2, var3) into the function (e.g., select(mtcars, c(vs, am, gear))), then the function uses tidy selection! If you cannot input c(var1, var2, var3) into the function, then the function is performing computations on the data and uses data masking.
If you are interested in learning more about tidy evaluation, I would highly recommend:
I do want to note that this video is from 2019 and some things have changed since then. Namely, we used to need to use the enquo() function to inject
variable names into dplyr functions, whereas we now use embracing {{}}. π€
Question 1: Fill in the code below to write a function that finds all flights that were cancelled or delayed by more than a user supplied number of hours:
Question 2: Fill in the code below to write a function that converts the user supplied variable that uses clock time (e.g., dep_time, arr_time, etc.) into a decimal time (i.e. hours + (minutes / 60)).
standardize_time <-function(df, time_var) { df |># Times are stored as 2008 for 8mutate( {{ time_var }} :=as.numeric(## Grab first two numbers for hourstr_sub( {{ time_var }}, start =1, end =2) ) +as.numeric(## Grab second two numbers for minutesstr_sub( {{ time_var }}, start =3, end =4) ) /60 )}nycflights |>standardize_time(arr_time)
Question 3: For each of the following functions determine if the function uses data-masking or tidy-selection:
Question 4: Fill in the code below to build a rich plotting function which:
draws a scatterplot given dataset and x and y variables,
adds a line of best fit (i.e. a linear model with no standard errors)
add a title.
scatterplot <-function(df, x_var, y_var) { label <- rlang::englue("A scatterplot of _____ and _____, including a line of best fit.") df |>ggplot(mapping =aes(x = _____, y = _____ ) ) +geom_point() +geom_smooth(method ="lm", _____) +labs(title = _____)}