Working with Categorical Variables
The second section of this coursework is focused on working with factor data types. By the end of this week, you should be able to:
- Use forcats to reorder and relabel factor variables in data cleaning steps and data visualizations.
▶️ Watch Videos: 9-minutes
📖 Readings: 15-minutes
💻 Activities: 0
✅ Check-ins: 1
1 Factors with forcats
We have been floating around the idea of factor data types. In this section, we will formally define factors and why they are needed for data visualization and analysis. We will then learn useful functions for working with factors in our data cleaning steps.
▶️ Required Video: Working with factors using forcats – 9 minutes
In short, factors are categorical variables with a fixed number of values (think a set number of groups). One of the main features that set factors apart from groups is that you can reorder the groups to be non-alphabetical. In this section we will be using the forcats
package (part of the tidyverse
!) to create and manipulate factor variables.
📖 Required Reading: R4DS Chapter 16 (Factors)
✅ Check-in 4.3: Functions from forcats
Answer the following questions in the Canvas Quiz.
1. Which of the following tasks can fct_recode()
accomplish? Select all that apply!
- changes the values of the factor levels
- reorders the levels of a factor
- remove levels of a factor you don’t want
- collapse levels of a factor into a new level
2. Which of the following tasks can fct_relevel()
accomplish?
- reorders the levels of a factor
- changes the values of the factor levels
- remove levels of a factor you don’t want
- collapse levels of a factor into a new level
3. What is the main difference between fct_collapse()
and fct_recode()
?
fct_recode()
uses strings to create factor levelsfct_recode()
uses groups to create factor levelsfct_recode()
cannot create an “Other” group
4. What ordering do you get with fct_reorder()
?
- largest to smallest based on another variable
- order of appearance
- largest to smallest based on counts
- alphabetical order
5. What ordering do you get with fct_inorder()
?
- order of appearance
- alphabetical order
- largest to smallest based on counts
- largest to smallest based on another variable :::