Lab 6 - Alternative

Contributing to R for Data Science

The exercises from the functions below were pulled from the newest version of R for Data Science. Specifically, from Chapters 25 and 26. For this “alternative” lab you will complete the exercises from the textbook, with the option of submitting a pull request to the repository for the textbook solutions.

Vector Functions

Question 1: The rescale01() function below performs a min-max scaling to standardize a numeric vector, but infinite values are left unchanged. Rewrite rescale01() so that -Inf is mapped to 0, and Inf is mapped to 1? *Hint: This seems like a great place for case_when()!

rescale01 <- function(x) {
  rng <- range(x, na.rm = TRUE, finite = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}

Question 2: Write a function that accepts a vector of birthdates and computes the age of each person in years.

Question 3: Write a function that computes the variance and skewness of a numeric vector. Feel free to look up the definitions on Wikipedia or elsewhere!

Question 4: Write a function called both_na() which takes two vectors of the same length and returns the number of positions that have an NA in both vectors.

Data Frame Functions

Question 5: Insert the data frame function you wrote from Lab 6 (either Exercise 1 or Exercise 2).

For Questions 6 - 10 you will write different functions which work with data similar to the nycflights13 data.

Question 6: Write a filter_severe() function that finds all flights that were cancelled (i.e. is.na(arr_time)) or delayed by more than an hour.

Question 7: Write a summarize_severe() function that counts the number of cancelled flights and the number of flights delayed by more than an hour.

Question 8: Modify your filter_severe() function to allow the user to supply the number of hours that should be used to filter the flights that were cancelled or delayed.

Question 9: Write a summarize_weather() function that summarizes the weather to compute the minimum, mean, and maximum, of a user supplied variable.

Question 10: Write a standardize_time() function that converts the user supplied variable that uses clock time (e.g., dep_time, arr_time, etc.) into a decimal time (i.e. hours + (minutes / 60)).

Plotting Functions

You might want to read over the Plot Functions section of R for Data Science

Question 11: Build a sorted_bars() function which:

  • takes a data frame and a variable as inputs and returns a vertical bar chart
  • sorts the bars in decreasing order (largest to smallest)
  • adds a title that includes the context of the variable being plotted

Hint 1: The fct_infreq() and fct_rev() functions from the forcats package will be helpful for sorting the bars! Hint 2: The englue() function from the rlang package will be helpful for adding a variable’s name into the plot title!

Iteration

Alright, now let’s take our plotting function and iterate it!

Question 12: Make a sorted barplot for every character variable in the mpg dataset (built into ggplot2).

Contributing to the R for Data Science Community!

The functions you wrote for exercises 1-10 came from R for Data Science. You could consider making a pull request to the repository for the solutions!

https://github.com/mine-cetinkaya-rundel/r4ds-solutions

To learn more about how to make a pull request I would suggest this article: https://usethis.r-lib.org/articles/pr-functions.html