Lab 4: Childcare Costs in California

Author

Instructions

Part 1: GitHub Workflow

Step 1: Making a Copy from GitHub Classroom

You can access the Lab 4 materials through the Lab 4 assignment on GitHub Classroom. We’re going to follow the same steps as last week to make your own copy of this repository:

Use these steps to make a copy of the Lab repository: List of Steps to Copy the Lab Assignment from GitHub Classroom

Step 2: Making a Small Change

Now, find the lab-4-student.qmd file in the “Files” tab in the lower right hand corner. Click on this file to open it.

At the top of the document (in the YAML) there is an author line that says "Your name here!". Change this to be your name and save your file either by clicking on the blue floppy disk or with a shortcut (command / control + s).

Step 3: Pushing Your Lab to GitHub

Now for our last step, we need to commit the files to our repo.

  • Click the “Git” tab in upper right pane
  • Check the “Staged” box for the lab-4-student.qmd file
  • Click “Commit”
  • In the box that opens, type a message in “Commit message”, such as “Added my name”.
  • Click “Commit”.
  • Click the green “Push” button to send your local changes to GitHub.

RStudio will display something like:

>>> /usr/bin/git push origin HEAD:refs/heads/main
To https://github.com/atheobold/introduction-to-quarto-allison-theobold.git
   3a2171f..6d58539  HEAD -> main

Step 4: Let’s get started tidying some data!

Part 2: Some Words of Advice

  • Set chunk options carefully.

  • Make sure you don’t print out more output than you need.

  • Make sure you don’t assign more objects than necessary—avoid “object junk” in your environment.

  • Make your code readable and nicely formatted.

  • Think through your desired result before writing any code.

Part 3: Exploring Childcare Costs

In this lab we’re going look at the median weekly cost of childcare in California. A detailed description of the data can be found here.

The data come to us from TidyTuesday.

0. Load the appropriate libraries and the data.

childcare_costs <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-05-09/childcare_costs.csv')

counties <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-05-09/counties.csv')

1. Briefly describe the dataset (~ 4 sentences). What information does it contain?

California Childcare Costs

Let’s start by focusing only on California.

2. Create a ca_childcare dataset of childcare costs in California, containing (1) county information and (2) all information from the childcare_costs dataset. Hint: There are 58 counties in CA and 11 years in the dataset. Therefore, your new dataset should have 53 x 11 = 638 observations.

3. Using a function from the forcats package, complete the code below to create a new variable where each county is categorized into one of the ten (10) Census regions in California. Use the Region description (from the plot), not the Region number.
Hint: This is probably a good place to use ChatGPT to reduce on tedious work. But you do need to know how to prompt ChatGPT to make it useful!

Tip

I have provided you with code that eliminates the word “County” from each of the county names in your ca_childcare dataset. You should keep this line of code and pipe into the rest of your data manipulations.

You will learn about the str_remove() function from the stringr package next week!

ca_childcare <- ca_childcare |> 
  mutate(county_name = str_remove(county_name, " County")) |>
  ...

4. Let’s consider the median household income of each region, and how that income has changed over time. Create a table with ten rows, one for each region, and two columns, one for 2008 and one for 2018. The cells should contain the median of the median household income (expressed in 2018 dollars) of the region and the study_year. Arrange the rows by 2018 values.

Tip

This will require transforming your data! Sketch out what you want the data to look like before you begin to code. You should be starting with your California dataset that contains the regions!

5. Which California region had the lowest median full-time median weekly price for center-based childcare for infants in 2018? Does this region correspond to the region with the lowest median income in 2018 that you found in Q4?

Warning

The code should give me the EXACT answer. This means having the code output the exact row(s) and variable(s) necessary for providing the solution.

6. The following plot shows, for all ten regions, the change over time of the full-time median price for center-based childcare for infants, toddlers, and preschoolers. Recreate the plot. You do not have to replicate the exact colors or theme, but your plot should have the same content, including the order of the facets and legend, reader-friendly labels, axes breaks, and a loess smoother.

Tip

This will require transforming your data! Sketch out what you want the data to look like before you begin to code. You should be starting with your California dataset that contains the regions.

You will also be required to use functions from forcats to change the labels and the ordering of your factor levels.

Remember to avoid “object junk” in your environment!

Plot to recreate

Median Household Income vs. Childcare Costs for Infants

Refresher on Linear Regression

While a second course in statistics is a pre-requisite for this class, here is a refresher on simple linear regression with a single predictor.

7. Create a scatterplot showing the relationship between median household income (expressed in 2018 dollars) and the full-time median weekly price charged for center-based childcare for an infant in California. Overlay a linear regression line (lm) to show the trend.

8. Look up the documentation for lm() and fit a linear regression model to the relationship shown in your plot above.

# complete the code provided
reg_mod1 <- lm()
summary(reg_mod1)

9. Using the output from summary(), write out the estimated regression line (recall: \(y = mx + b\)).

10. Do you have evidence to conclude there is a relationship between the median household income and the median weekly cost of center-based childcare for infants in California? Cite values from your summary() output to support your claim!

There is no Challenge this week!

Please take this time to try your best to recreate the plot I provided in Question 6!

  • Can you match my colors?
  • Can you get the legend in the same order?
  • Can you get the facet names to match and be in the same order?

Could you even make the plot better? Could you add dollar signs to the y-axis labels? I might suggest you look into the scales package.