Unit 1
In this unit, we dive headfirst into the wild world of messy data. Instead of tidy spreadsheets, you’ll wrangle real-world data pulled from APIs and scraped straight from the web. Along the way, you’ll build robust data pipelines to clean, transform, and make sense of chaos. To do that, we’ll sharpen one of your most powerful tools: functions. We’ll start with a refresher on writing functions that work with vectors, then level up by iterating those functions over many inputs using the map() functions from purrr family.
Once you’re comfortable with the mechanics, we’ll peek under the hood. How efficient is your code, really? We’ll explore what happens behind the scenes—how many intermediate objects are created, the order operations run, and which functions are doing the heavy lifting. You’ll compare classic for loops with the purrr approach and learn when (and why) you might choose one over the other.
Then it’s time to put those skills to work on data that doesn’t come in spreadsheet form. You’ll start by navigating hierarchical JSON data and making repeated calls to an API. After that, you’ll venture onto the open web, learning how to scrape data from webpages and iterate across many pages to build your own datasets from scratch.
We’ll wrap up the unit by returning to one of the most fun parts of data science: visualization. You’ll experiment with non-standard geometries that add texture and flair to your plots—like geom_tile(), geom_density_ridges(), and geom_ribbon(). You’ll also learn how to make your graphics clearer and more intentional by replacing legends with annotations, choosing thoughtful color palettes, and using built-in themes to give your work a polished, personal style.
By the end of the unit, you’ll bring everything together in a professionally polished Quarto document, showcasing both visual and statistical insights from a dataset you chose.
Throughout these sections, you will encounter “check-ins” assessing your knowledge on a particular topic. These check-ins are delineated by a red (important) callout box and are associated with the Canvas quizzes assigned each week.
This is where you will find the body of the question. The callout name refers to the name of the Canvas quiz the question is associated with.