Working with Categorical Variables

The second section of this coursework is focused on working with factor data types. By the end of this week, you should be able to:


▶️ Watch Videos: 9-minutes

📖 Readings: 15-minutes

💻 Activities: 0

✅ Check-ins: 1


1 Factors with forcats

We have been floating around the idea of factor data types. In this section, we will formally define factors and why they are needed for data visualization and analysis. We will then learn useful functions for working with factors in our data cleaning steps.

The image is a hexagonal sticker with a light blue background. In the center, there is an orange cat with a smiling face, standing on its hind legs and holding a string of colorful flags that spell 'forcats' in lowercase letters. The cat appears playful and joyful, and the design gives a whimsical, fun feel. The borders of the hexagon are outlined in white, and the overall design is simple yet vibrant, focusing on the cat and the string of flags.

▶️ Required Video: Working with factors using forcats – 9 minutes

In short, factors are categorical variables with a fixed number of values (think a set number of groups). One of the main features that set factors apart from groups is that you can reorder the groups to be non-alphabetical. In this section we will be using the forcats package (part of the tidyverse!) to create and manipulate factor variables.

📖 Required Reading: R4DS Chapter 16 (Factors)

Check-in 4.3: Functions from forcats

Answer the following questions in the Canvas Quiz.

1. Which of the following tasks can fct_recode() accomplish? Select all that apply!

  • changes the values of the factor levels
  • reorders the levels of a factor
  • remove levels of a factor you don’t want
  • collapse levels of a factor into a new level

2. Which of the following tasks can fct_relevel() accomplish?

  • reorders the levels of a factor
  • changes the values of the factor levels
  • remove levels of a factor you don’t want
  • collapse levels of a factor into a new level

3. What is the main difference between fct_collapse() and fct_recode()?

  • fct_recode() uses strings to create factor levels
  • fct_recode() uses groups to create factor levels
  • fct_recode() cannot create an “Other” group

4. What ordering do you get with fct_reorder()?

  • largest to smallest based on another variable
  • order of appearance
  • largest to smallest based on counts
  • alphabetical order

5. What ordering do you get with fct_inorder()?

  • order of appearance
  • alphabetical order
  • largest to smallest based on counts
  • largest to smallest based on another variable :::