ggplot(
data = <DATA>,
mapping = aes(<MAPPINGS>)
+
) <GEOM FUNCTION>() +
any other arguments...
Today we will…
ggplot2
)We are using Posit Cloud for collaborative coding, so you and your partner can type in the same document at the same time.
Take 3-minutes to:
The Grammar of Graphics (GoG) is a principled way of specifying exactly how to create a particular graph from a given data set. It helps us to systematically design new graphs.
Think of a graph or a data visualization as a mapping…
…FROM variables in the data set (or statistics computed from the data)…
…TO visual attributes (or “aesthetics”) of marks (or “geometric elements”) on the page/screen.
data
: dataframe containing variablesaes
: aesthetic mappings (position, color, symbol, …)geom
: geometric element (point, line, bar, box, …)stat
: statistical variable transformation (identity, count, linear model, quantile, …)scale
: scale transformation (log scale, color mapping, axes tick breaks, …)coord
: Cartesian, polar, map projection, …facet
: divide into subplots using a categorical variableggplot2
Complete this template to build a basic graphic:
Notice, every +
adds another layer to our graphic.
Also notice that I’m using named arguments to make my code easier to read.
We map variables (columns) from the data to aesthetics on the graphic using the aes()
function.
What aesthetics can we set (see ggplot2 cheat sheet for more)?
We map variables (columns) from the data to aesthetics on the graphic using the aes()
function.
What aesthetics can we set (see ggplot2 cheat sheet for more)?
Wee use a geom_XXX()
function to represent data points.
one variable
geom_density()
geom_dotplot()
geom_histogram()
geom_boxplot()
two variable
geom_point()
geom_line()
geom_density_2d()
three variable
geom_contour()
geom_raster()
This is not an exhaustive list – see ggplot2 cheat sheet.
To create a specific type of graphic, we will combine aesthetics and geometric objects.
Let’s try it!
Start with the TX housing data.
Make a plot of median house price over time (including both individual data points and a smoothed trend line), distinguishing between different cities .
Extracts subsets of data and places them in side-by-side plots.
facet_wrap(~ b)
: facets by one variable
nrow
controls the number of rows the facets are output intoncol
controls the number of columns the facets are output intofacet_grid(a ~ b)
: facet by two variables
a
will be assigned to the rowsb
will be assigned to the columns into both rows and columnsYou can set scales to let axis limits vary across facets:
facet_grid(y ~ x, scales = ______)
"free"
– both x- and y-axis limits adjust to individual facets"free_x"
– only x-axis limits adjust"free_y"
– only y-axis limits adjuststat
A stat
transforms an existing variable into a new variable to plot.
identity
leaves the data as is.count
counts the number of observations.summary
allows you to specify a desired transformation function.Sometimes these statistical transformations happen under the hood when we use a specific geom_XXX()
.
stat
Position adjustments determine how to arrange geom
’s that would otherwise occupy the same space.
position = "dodge"
: Arrange elements side by side.position = "fill"
: Stack elements on top of one another + normalize height.position = "stack"
: Stack elements on top of one another.position = "jitter"
: Add random noise to x
& y
position of each element to avoid overplotting (see geom_jitter()
).ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, color = cyl)
) +
geom_jitter() +
labs(x = "Engine Displacement (liters)",
y = " ",
color = "Cylinders",
title = "Cars with More Cylinders Have Larger Engine Displacement\n and Lower Fuel Efficiency") +
theme_bw() +
theme(legend.position = "bottom")
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, color = cyl)
) +
geom_jitter() +
labs(x = "Engine Displacement (liters)",
y = " ",
color = "Cylinders",
title = "Cars with More Cylinders Have Larger Engine Displacement\n and Lower Fuel Efficiency") +
scale_y_continuous(limits = c(0, 50),
breaks = seq(from = 0, to = 50, by = 5)
)
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, color = cyl)
) +
geom_jitter() +
labs(x = "Engine Displacement (liters)",
y = " ",
color = "Cylinders",
title = "Cars with More Cylinders Have Larger Engine Displacement\n and Lower Fuel Efficiency") +
scale_color_gradient(low = "white", high = "green4")
It is good practice to put each geom
and aes
on a new line.
This puzzle activity will require knowledge of:
None of us have all these abilities. Each of us has some of these abilities.
During your collaboration, you and your partner will alternate between two roles:
Developer
Coder
Group Norms
Every group should have a ggplot2 cheatsheet!
On the Front
On the Back
The partner whose family name starts first alphabetically starts as the Developer!
Both of you need to do the following:
PA2-ggplot.qmd
fileWhen you have completed the visualization tasks, you will work as a group to answer the five questions posed at the end of the document.
Each person will input the answers to these questions in the PA2 Canvas quiz.
The person who last occupied the role of Developer will download and submit the PA-2.html
file for the group.