add_something <- function(x, something = 2){
stopifnot(is.numeric(x), is.numeric(something))
x + something
}Today we are going to do a hands-on coding activity! We will create our own versions of the table() and prop.table() base R functions.
This will help us…
{{ }} operatorall_of() and pick()Specifically, we learned about:
This week, we’re writing functions that take a data frame and variable names as arguments.
These functions can be incredibly powerful, but they require us to learn some interesting details about how some of the functions we’ve grown very accustomed to (e.g., select(), mutate(), group_by()) work “behind the scenes.”
We are going to use a hands-on activity to explore these concepts!
In the Week 8 Module, navigate to the Lecture Activity section.
Click on the Tidy Eval Colab Notebook link.
Make a copy of the notebook (like you do for Practice Activities)!
02:00
Recreate the
table()function in R
table() Function FirstLet’s start with one categorical variable.
Function Design
What do you notice about the layout of the table?
table() Function FirstOkay, let’s add a second categorical variable.
Biscoe Dream Torgersen
Adelie 44 56 52
Chinstrap 0 68 0
Gentoo 124 0 0
Function Design
What do you notice about the layout of the table?
tidy_table() FunctionBased on this exploration, it seems like our function should have the following qualities:
Using the penguins data, write dplyr code (not table()) and tidyr code which will:
count() the number of penguins for each species and islandNA values with 0s05:00
Now that we have a working example, let’s try and generalize our code.
Copy the tidy_table() function in your Colab notebook!
02:00
The tidyverse functions use either “tidy selection” or “data masking.” Both of these features makes common tasks easier at the cost of making less commons tasks harder.
count()Blurs the line between the two different meanings of the word “variable”:
env-variables – “programming” variables that live in an environment
<-.data-variables — “statistical” variables that live in a data frame.
pivot_wider()In the case of our function, the name of the columns we want to use are stored in an intermediate variable (e.g., col_var = island).
Update your tidy_table() function in your Colab notebook!
01:00
# A tibble: 1 × 3
Adelie Chinstrap Gentoo
<int> <int> <int>
1 152 68 124
Which argument is species being inserted into?
row_var is missing()tidy_table <- function(df, col_var, row_var){
if(missing(row_var)){
df |>
count({{ col_var }}) |>
pivot_wider(names_from = {{ col_var }},
values_from = n,
values_fill = 0)
}
else {
df |>
count({{ col_var }}, {{ row_var }}) |>
pivot_wider(names_from = {{ col_var }},
values_from = n,
values_fill = 0)
}
}all_of()Add the quote_table() function in your Colab notebook!
02:00
Recreate the
prop.table()function in R
prop.table() Function FirstLet’s start with one categorical variable.
Function Design
What do you notice about the proportions?
prop.table() Function FirstOkay, let’s add a second categorical variable.
Biscoe Dream Torgersen
Adelie 0.1279070 0.1627907 0.1511628
Chinstrap 0.0000000 0.1976744 0.0000000
Gentoo 0.3604651 0.0000000 0.0000000
Function Design
What do you notice about the proportions?
The prop.table() function has an optional margin argument.
Function Design
What do you notice about the proportions?
tidy_prop_table() FunctionBased on this exploration, it seems like our function should have the following qualities:
Using the penguins data, write dplyr code (not table() or prop.table()) which will:
count() the number of penguins for each species and island03:00
These give joint proportions for the entire table.
What if I wanted marginal proportions for each species? (i.e., within a species, the proportions should add to 1)
species# A tibble: 5 × 4
# Groups: species [3]
species island n prop
<fct> <fct> <int> <dbl>
1 Adelie Biscoe 44 0.289
2 Adelie Dream 56 0.368
3 Adelie Torgersen 52 0.342
4 Chinstrap Dream 68 1
5 Gentoo Biscoe 124 1
Notice that there is still a grouping variable?
What should I add to my code?
For this table, we don’t care about the counts. Let’s add some code that:
NA values with 0s03:00
penguins |>
count(species, island) |>
group_by(species) |>
mutate(prop = n / sum(n)) |>
ungroup() |>
select(-n) |>
pivot_wider(names_from = island,
values_from = prop,
values_fill = 0)# A tibble: 3 × 4
species Biscoe Dream Torgersen
<fct> <dbl> <dbl> <dbl>
1 Adelie 0.289 0.368 0.342
2 Chinstrap 0 1 0
3 Gentoo 1 0 0
row_var?The margin argument of prop.table() has the following behavior:
margin = 1 the proportions are conditional on the rowsmargin = 2 the proportions are conditional on the columnstidy_prop_table <- function(df, col_var, row_var, margin = NULL){
# Default to joint proportions
if(is.null(margin)){
df |>
count({{ row_var }}, {{ col_var }}) |>
mutate(prop = n / sum(n)) |>
ungroup() |>
select(-n) |>
pivot_wider(names_from = {{ col_var }},
values_from = prop,
values_fill = 0)
}
else if(margin == "row"){
df |>
count({{ row_var }}, {{ col_var }}) |>
group_by({{ row_var }}) |>
mutate(prop = n / sum(n)) |>
ungroup() |>
select(-n) |>
pivot_wider(names_from = {{ col_var }},
values_from = prop,
values_fill = 0) |>
print()
}
else{
df |>
count({{ row_var }}, {{ col_var }}) |>
group_by({{ col_var }}) |>
mutate(prop = n / sum(n)) |>
ungroup() |>
select(-n) |>
pivot_wider(names_from = {{ col_var }},
values_from = prop,
values_fill = 0)
}
}Joint Proportions
Marginal Proportions – Rows
Marginal Proportions – Columns