Introduction to ggplot

The greatest possibilities of visual display lie in vividness and inescapability of the intended message. A visual display can stop your mental flow in its tracks and make you think. A visual display can force you to notice what you never expected to see. (“Why, that scatter diagram has a hole in the middle!”) – John Tukey, Data Based Graphics: Visual Display in the Decades to Come

▶️ Watch Videos: 50 minutes

  • I’ve broken the videos up into smaller ~5 minute increments!

📖 Readings: 45 minutes

  • I’ve added an optional video on how to take notes in a programming textbook here.

💻 Activities: 45-60 minutes

  • You’ll be working through a series of tutorials helping you practice making plots with ggplot.

✅ Check-ins: 1 (8 questions broken out by sections)

0.1 Learning Objectives

  • Describe charts using the grammar of graphics
  • Create layered graphics that highlight multiple aspects of the data
  • Evaluate existing charts and develop new versions that improve accessibility and readability

0.2 Loading in the ggplot2 Package

In this class, we’re going to use the ggplot2 package to create graphics in R. This package is already installed as part of the tidyverse, but can be installed:

install.packages("ggplot2")

and/or loaded:

library("ggplot2")

# alternatively
library("tidyverse") # (my preference!)

Building a masterpiece, by Allison Horst

1 Data Visualization with ggplot2

This coursework will walk you through the different sections of Chapter Nine in R for Data Science.

📖 Required Reading: Layers.

1.1 The Grammar of Graphics

The grammar of graphics is an approach first introduced in Leland Wilkinson’s book (Wilkinson 2005). Unlike other graphics classification schemes, the grammar of graphics makes an attempt to describe how the data set itself relates to the components of the chart.

This has a few advantages:

  1. It’s relatively easy to represent the same data set with different types of plots (and to find their strengths and weaknesses)
  2. Grammar leads to a concise description of the plot and its contents
  3. We can add layers to modify the graphics, each with their own basic grammar (just like we combine sentences and clauses to build a rich, descriptive paragraph)

A pyramid view of the major components of the grammar of graphics, with data as the base, aesthetics building on data, scales building on aesthetics, geometric objects, statistics, facets, and the coordinate system at the top of the pyramid.

A pyramid view of the major components of the grammar of graphics, with data as the base, aesthetics building on data, scales building on aesthetics, geometric objects, statistics, facets, and the coordinate system at the top of the pyramid. Source: (Sarkar 2018)

1.2 Making Your First ggplot

▶️ Required Video: ggplot2 First Steps (4 minutes)

💻 Required Tutorials

✅ Check-in 2.1

Question 1 – What specifically does the code ggplot(data = mpg) do?

  • Creates a blank plot
  • Makes a scatterplot of the first two variables in the mpg dataset
  • Creates a ggplot with the mpg dataset

1.3 Aesthetics

▶️ Required Video: What is an aesthetic? (7 minutes)

✅ Check-in 2.1

ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

Question 2: What aesthetics does this plot contain?

  • x
  • y
  • color
  • data

Question 3: Which of the following changes would set the color of the points to be blue?

## Option A
ggplot(data = mpg) +
geom_point(
  mapping = aes(x = displ, 
                y = hwy, 
                color = blue)
  )

## Option B
ggplot(data = mpg) +
geom_point(
  mapping = aes(x = displ, 
                y = hwy), 
  color = "blue"
  )

## Option C
ggplot(data = mpg, 
       mapping = aes(color = "blue")
       ) +
geom_point(
  mapping = aes(x = displ, 
                y = hwy)
  )

1.4 geoms

▶️ Required Video: What is a geometric object?

What type of chart to use?

It can be hard to know what type of chart to use for a particular type of data. I recommend figuring out what you want to show first, and then thinking about how to show that data with an appropriate plot type. Consider the following factors:

  • What type of variable is x? Categorical? Continuous? Discrete?

  • What type of variable is y?

  • How many observations do I have for each x / y variable?

  • Are there any important moderating variables?

  • Do I have data that might be best shown in small multiples? E.g. a categorical moderating variable and a lot of data, where the categorical variable might be important for showing different features of the data?

Once you’ve thought through this, take a look through catalogs like the R Graph Gallery to see what visualizations match your data and use-case.

✅ Check-in 2.1

Question 4: Match each plot with the geom_XXX() function used to create it!

  • Line Chart
  • Boxplot
  • Histogram
  • Area Chart
  • geom_boxplot()
  • geom_point()
  • geom_hist()
  • geom_bar()
  • geom_smooth()
  • geom_point()
  • geom_area()
  • geom_line()
  • geom_histogram()

1.5 Getting a Bit Fancier

In this section we dig a bit deeper into how we can make our code more efficient and other ways we can add additional variables to our plots.

Global vs. local aesthetics

▶️ Required Video: Global vs. local aesthetics (4 minutes)

✅ Check-in 2.1

Question 5: Match the code to the type of aesthetics that are being used:


Global



Local

ggplot(data = mpg, 
       mapping = aes(x = mpg, y = hwy)
       ) + 
  geom_point()
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = mpg, y = hwy))

1.6 Facets

▶️ Required Video: Adding a third variable with facets

1.7 Data transformations

▶️ Required Video: What is being plotted? (6 minutes)

1.8 Changing the positions of bars

▶️ Required Video: How do I change positions?

✅ Check-in 2.1

Question 6: What arguments for geom_jitter() control the amount of jittering?

  • width
  • height
  • stat
  • position
  • na.rm
  • show.legend
  • data
  • mapping


1.9 Plot Customizations

📖 Required Reading: Communication

✅ Check-in 2.1

Question 7: What can the labs() function do? Select all that apply.

  • Customize the x-axis label
  • Customize the y-axis label
  • Customize the legend label
  • Customize the plot title
  • Add a caption to the plot

References

Sarkar, Dipanjan (DJ). 2018. “A Comprehensive Guide to the Grammar of Graphics for Effective Visualization of Multi-Dimensional….” Medium. https://towardsdatascience.com/a-comprehensive-guide-to-the-grammar-of-graphics-for-effective-visualization-of-multi-dimensional-1f92b4ed4149.
Wilkinson, Leland. 2005. The Grammar of Graphics. 2nd ed. Statistics and Computing. New York: Springer Science.