summ_stats(diamonds$carat)
summ_stats(diamonds$color)Practice Activity Week 2
Accessing the Practice Activity
Download the template Practice Activity Quarto file here: pa-2.qmd
Important
Be sure to save the file inside your Week 2 folder of your STAT 431 (or 541) folder!
Writing Functions
- Write a function called
summ_stats()that takes a vectorxas input and produces the following output:
- for numeric variables: returns the mean, median, standard deviation, and IQR as a dataframe
- for categorical variables: returns the number of levels (categories) of the variable as a dataframe
Hint: You can use tibble() to create the data frame. For example, tibble(a = 1:2, b = 2:3) creates a data frame with variables a and b.
- Confirm that your function works on each type of variable by running the code below.
Iterating Functions
- Use
map()to apply yoursumm_stats()function to every column in thediamondsdataset.
Hint: Look up the bind_rows() documentation from dplyr to combine summary statistics for all the variables into one data frame. The .id argument will be especially helpful in adding the variable names!
Data Transformation
- Let’s make the output more intuitive, with the variables on the columns and the summary statistics on the rows.
Hint: You will need to do a double pivot (pivot_longer() then pivot_wider()) to achieve this result!
Using Helper Functions (Stat 541 only)
- Now that we have the output we want, let’s use our code to write a
summ_df()function that takes a data frame as an input and outputs a table of summary statistics for every variable in the data frame.
Hint: The body of your function should contain all the code from Question 3.
- Demonstrate that your function works using the
mpgdataset.