<- function(x) {
rescale01 <- range(x, na.rm = TRUE, finite = TRUE)
rng - rng[1]) / (rng[2] - rng[1])
(x }
Lab 6 - Alternative
Contributing to R for Data Science
The exercises from the functions below were pulled from the newest version of R for Data Science. Specifically, from Chapters 25 and 26. For this “alternative” lab you will complete the exercises from the textbook, with the option of submitting a pull request to the repository for the textbook solutions.
Vector Functions
Question 1: The rescale01()
function below performs a min-max scaling to standardize a numeric vector, but infinite values are left unchanged. Rewrite rescale01()
so that -Inf
is mapped to 0, and Inf
is mapped to 1? *Hint: This seems like a great place for case_when()
!
Question 2: Write a function that accepts a vector of birthdates and computes the age of each person in years.
Question 3: Write a function that computes the variance and skewness of a numeric vector. Feel free to look up the definitions on Wikipedia or elsewhere!
Question 4: Write a function called both_na()
which takes two vectors of the same length and returns the number of positions that have an NA
in both vectors.
Data Frame Functions
Question 5: Insert the data frame function you wrote from Lab 6 (either Exercise 1 or Exercise 2).
For Questions 6 - 10 you will write different functions which work with data similar to the nycflights13
data.
Question 6: Write a filter_severe()
function that finds all flights that were cancelled (i.e. is.na(arr_time)
) or delayed by more than an hour.
Question 7: Write a summarize_severe()
function that counts the number of cancelled flights and the number of flights delayed by more than an hour.
Question 8: Modify your filter_severe()
function to allow the user to supply the number of hours that should be used to filter the flights that were cancelled or delayed.
Question 9: Write a summarize_weather()
function that summarizes the weather to compute the minimum, mean, and maximum, of a user supplied variable.
Question 10: Write a standardize_time()
function that converts the user supplied variable that uses clock time (e.g., dep_time
, arr_time
, etc.) into a decimal time (i.e. hours + (minutes / 60)).
Plotting Functions
You might want to read over the Plot Functions section of R for Data Science
Question 11: Build a sorted_bars()
function which:
- takes a data frame and a variable as inputs and returns a vertical bar chart
- sorts the bars in decreasing order (largest to smallest)
- adds a title that includes the context of the variable being plotted
Hint 1: The fct_infreq()
and fct_rev()
functions from the forcats package will be helpful for sorting the bars! Hint 2: The englue()
function from the rlang package will be helpful for adding a variable’s name into the plot title!
Iteration
Alright, now let’s take our plotting function and iterate it!
Question 12: Make a sorted barplot for every character variable in the mpg
dataset (built into ggplot2
).
Contributing to the R for Data Science Community!
The functions you wrote for exercises 1-10 came from R for Data Science. You could consider making a pull request to the repository for the solutions!
https://github.com/mine-cetinkaya-rundel/r4ds-solutions
To learn more about how to make a pull request I would suggest this article: https://usethis.r-lib.org/articles/pr-functions.html