install.packages("ggplot2")
Introduction to ggplot
The greatest possibilities of visual display lie in vividness and inescapability of the intended message. A visual display can stop your mental flow in its tracks and make you think. A visual display can force you to notice what you never expected to see. (“Why, that scatter diagram has a hole in the middle!”) – John Tukey, Data Based Graphics: Visual Display in the Decades to Come
▶️ Watch Videos: 50 minutes
- I’ve broken the videos up into smaller ~5 minute increments!
📖 Readings: 45 minutes
- I’ve added an optional video on how to take notes in a programming textbook here.
💻 Activities: 45-60 minutes
- You’ll be working through a series of tutorials helping you practice making plots with ggplot.
✅ Check-ins: 1 (8 questions broken out by sections)
0.1 Learning Objectives
- Describe charts using the grammar of graphics
- Create layered graphics that highlight multiple aspects of the data
- Evaluate existing charts and develop new versions that improve accessibility and readability
0.2 Loading in the ggplot2 Package
In this class, we’re going to use the ggplot2
package to create graphics in R. This package is already installed as part of the tidyverse
, but can be installed:
and/or loaded:
library("ggplot2")
# alternatively
library("tidyverse") # (my preference!)
1 Data Visualization with ggplot2
This coursework will walk you through the different sections of Chapter Nine in R for Data Science.
1.1 The Grammar of Graphics
The grammar of graphics is an approach first introduced in Leland Wilkinson’s book (Wilkinson 2005). Unlike other graphics classification schemes, the grammar of graphics makes an attempt to describe how the data set itself relates to the components of the chart.
This has a few advantages:
- It’s relatively easy to represent the same data set with different types of plots (and to find their strengths and weaknesses)
- Grammar leads to a concise description of the plot and its contents
- We can add layers to modify the graphics, each with their own basic grammar (just like we combine sentences and clauses to build a rich, descriptive paragraph)
1.2 Making Your First ggplot
▶️ Required Video: ggplot2 First Steps (4 minutes)
💻 Required Tutorials
✅ Check-in 2.1
Question 1 – What specifically does the code ggplot(data = mpg)
do?
- Creates a blank plot
- Makes a scatterplot of the first two variables in the
mpg
dataset - Creates a ggplot with the
mpg
dataset
1.3 Aesthetics
▶️ Required Video: What is an aesthetic? (7 minutes)
✅ Check-in 2.1
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
Question 2: What aesthetics does this plot contain?
- x
- y
- color
- data
Question 3: Which of the following changes would set the color of the points to be blue?
## Option A
ggplot(data = mpg) +
geom_point(
mapping = aes(x = displ,
y = hwy,
color = blue)
)
## Option B
ggplot(data = mpg) +
geom_point(
mapping = aes(x = displ,
y = hwy),
color = "blue"
)
## Option C
ggplot(data = mpg,
mapping = aes(color = "blue")
+
) geom_point(
mapping = aes(x = displ,
y = hwy)
)
1.4 geoms
▶️ Required Video: What is a geometric object?
What type of chart to use?
It can be hard to know what type of chart to use for a particular type of data. I recommend figuring out what you want to show first, and then thinking about how to show that data with an appropriate plot type. Consider the following factors:
What type of variable is
x
? Categorical? Continuous? Discrete?What type of variable is
y
?How many observations do I have for each
x
/y
variable?Are there any important moderating variables?
Do I have data that might be best shown in small multiples? E.g. a categorical moderating variable and a lot of data, where the categorical variable might be important for showing different features of the data?
Once you’ve thought through this, take a look through catalogs like the R Graph Gallery to see what visualizations match your data and use-case.
✅ Check-in 2.1
Question 4: Match each plot with the geom_XXX()
function used to create it!
- Line Chart
- Boxplot
- Histogram
- Area Chart
geom_boxplot()
geom_point()
geom_hist()
geom_bar()
geom_smooth()
geom_point()
geom_area()
geom_line()
geom_histogram()
1.5 Getting a Bit Fancier
In this section we dig a bit deeper into how we can make our code more efficient and other ways we can add additional variables to our plots.
Global vs. local aesthetics
✅ Check-in 2.1
Question 5: Match the code to the type of aesthetics that are being used:
Global
Local
ggplot(data = mpg,
mapping = aes(x = mpg, y = hwy)
+
) geom_point()
ggplot(data = mpg) +
geom_point(mapping = aes(x = mpg, y = hwy))
1.6 Facets
1.7 Data transformations
1.8 Changing the positions of bars
▶️ Required Video: How do I change positions?
✅ Check-in 2.1
Question 6: What arguments for geom_jitter()
control the amount of jittering?
- width
- height
- stat
- position
- na.rm
- show.legend
- data
- mapping
1.9 Plot Customizations
📖 Required Reading: Communication
✅ Check-in 2.1
Question 7: What can the labs()
function do? Select all that apply.
- Customize the x-axis label
- Customize the y-axis label
- Customize the legend label
- Customize the plot title
- Add a caption to the plot