filter(.data = colleges, REGION == 5) |>
mutate(TUITION_DIFF = TUITIONFEE_OUT - TUITIONFEE_IN)
Today we will…
dplyr
practice activity (40-minutes)dplyr
We’re going to explore some key dplyr
verbs using manipulatives:
filter()
– select rows based on their valuesselect()
– select columnsmutate()
– add new columns or change existing columnssummarize()
– perform summary operations on columnsgroup_by()
– facilitate group-wise operationsUse the pipe operator (|>
or %>%
) to chain together data wrangling operations.
Every function in dplyr has the data
as the first argument. You can choose whether to:
declare your data as the first argument of the function
How many rows?
How many columns?
Here is one observation:
Looking at the column green
, how many sides does the observation have?
filter()
filter()
include rows based on one or more logical statements
filter it (only include rows where)
OR
the red column only includes observations with three sides (triangles) OR the green column only includes observations with more than four sides (pentagons, hexagons)
“Take the
data
and then filter it to only includered
observations with 3 sides orgreen
observations with 4 or more sides.”
What if we wanted observations where the red column had three sides AND the green column had four or more sides?
How would the code change?
the default in filter()
is the AND condition
&
?If a ,
is equivalent to an &
why not just use a &
?
Well, with a lot of &
s your code can get hard to read…
Reset your Data Frame!
select()
select()
include columns based on one or more logical statements
Reset your Data Frame!
mutate()
mutate()
create new columns or change existing columns
if_else(condition, true, false)
condition
is a logical test (or combination of logical tests)true
is the value output if the logical test is found to be TRUE
false
is the value output if the logical test is found to be FALSE
arrange()
Organize the rows of the data in order of a particular variable.
What order does arrange()
use as default?
arrange()
: Descending OrderDefault is ascending order…
…but can add desc()
to get descending order!
arrange()
+ filter()
These functions implicitly arrange the data before slicing it (selecting rows).
slice_min()
– select rows with the lowest value(s) of a variableslice_max()
– select rows with the highest value(s) of a variable
Reset your Data Frame!
summarize()
summarize()
compute a table of summaries
group_by()
group_by()
put rows into groups based on values in column(s)
Reset your Data Frame!
Today you will use the dplyr
package to clean some data and then use that cleaned data to figure out what college Ephelia has been accepted to.
This activity will require knowledge of:
None of us have all these abilities. Each of us has some of these abilities.
Every of you should have a dplyr cheatsheet!
On the Front
group_by()
+ summarize()
)filter()
ing values with logical comparisonsselect()
ing and mutate()
ing variablesOn the Back
summarize()
Every group should have a task card! These cards remind you of the expectations for each role and the collaborative norms we agree to.
The Computer needs to:
Once you have your copy, you need to:
Throughout the activity you will swap roles—the Computer will become the Coder, and the Coder will become the Computer.
We are alternating roles so everyone:
During the Practice Activity, you are not permitted to use Google or ChatGPT for help.
You are permitted to use:
dplyr
cheatsheet,Submit the full name of the college Ephelia will attend to the Canvas Quiz.
The partner whose birthday is closest to January 1st starts as the Computer, making a copy of the PA Colab notebook!
The partner whose birthday is closest to January 1st starts as the Computer, making a copy of the PA Colab notebook!