Basics of Graphics

Tuesday, October 1

Today we will…

Accessing Practice Activities
New Material
- Tidy Data
- Load External Data
- Graphics (and ggplot2)
PA 2: Using Data Visualization to Find the Penguins
- Peer Programming

Accessing Practice Activities

We are using Posit Cloud for collaborative coding, so you and your partner can type in the same document at the same time.

Take 3-minutes to:

Open the email you were sent (from me and / or Posit Cloud)
Click on the link to join your workspace
Log-in to Posit Cloud

Grammar of Graphics

The Grammar of Graphics (GoG) is a principled way of specifying exactly how to create a particular graph from a given data set. It helps us to systematically design new graphs.

Think of a graph or a data visualization as a mapping…

…FROM variables in the data set (or statistics computed from the data)…

…TO visual attributes (or “aesthetics”) of marks (or “geometric elements”) on the page/screen.

Why GoG?

It’s more flexible than a “chart zoo” of named graphs.
The software understands the structure of your graph.
It easily automates graphing of data subsets.

Components of GoG

data: dataframe containing variables
aes : aesthetic mappings (position, color, symbol, …)
geom : geometric element (point, line, bar, box, …)
stat : statistical variable transformation (identity, count, linear model, quantile, …)
scale : scale transformation (log scale, color mapping, axes tick breaks, …)
coord : Cartesian, polar, map projection, …
facet : divide into subplots using a categorical variable

Using `ggplot2`

How to Build a Graphic

Complete this template to build a basic graphic:

ggplot(
  data = <DATA>, 
  mapping = aes(<MAPPINGS>)
  ) +
  <GEOM FUNCTION>() + 
  any other arguments...

Notice, every + adds another layer to our graphic.

Also notice that I’m using named arguments to make my code easier to read.

Add data
Add aesthetics
Add one geom per layer

This begins a plot that you can add layers to:

ggplot(data = mpg)

An image of a blank gray square representing a blank plotting canvas.

ggplot(data = mpg, 
       mapping = aes(x = class, y = hwy)
       )

An image of a gray square with white gridlines representing a plotting canvas where the variables have been assigned to the x and y aesthetics. On the x-axis there is a variable named 'class' with seven different values mapped to a particular white gridline---2seater, compact, midsize, minivan, pickup, subcompact, suv. On the y-axis there is a variable named 'hwy' with three printed values mapped to different white gridlines---20, 30, and 40. The spaces between these gridlines represent values between the printed values (e.g., 25).

ggplot(data = mpg, 
       mapping = aes(x = class, 
                     y = hwy)
       ) +
  geom_jitter()

A visualization of the different hwy milage for different types of cars. On the x-axis there is a variable named 'class' with seven different values mapped to a particular white gridline---2seater, compact, midsize, minivan, pickup, subcompact, suv. On the y-axis there is a variable named 'hwy' with three printed values mapped to different white gridlines---20, 30, and 40. The spaces between these gridlines represent values between the printed values (e.g., 25). The visualization has a variety of points (black dots), where each point represents a particular vehicle. Each point is horizontally aligned with the type of vehicle and vertically aligned with the highway mileage for the vehicle. Overall, it appears that pickups and suvs have the lowest highway mileage, with minivans having slightly better milage. compact, midsize, and subcompact vehicles have fairly similar milage, with cars ranging from 25 to 40 miles per gallon.

ggplot(data = mpg, 
       mapping = aes(x = class, 
                     y = hwy)
       ) +
  geom_jitter() +
  geom_boxplot()

The same visualization is presented, except there are not boxplots superimposed on top of the points (dots) for each vehicle type. The boxplots display the median (center line), quantiles (edges), and min / max (whiskers) of each vehicle's mileage.

Aesthetics

We map variables (columns) from the data to aesthetics on the graphic using the aes() function.

What aesthetics can we set (see ggplot2 cheat sheet for more)?

x, y
color, fill
linetype
size
shape

Aesthetics

We map variables (columns) from the data to aesthetics on the graphic using the aes() function.

What aesthetics can we set (see ggplot2 cheat sheet for more)?

x, y
color, fill
linetype
size
shape

ggplot(data = housingsub, 
       mapping = aes(
         x = date, 
         y = median)
       ) +
  geom_point()

ggplot(data = housingsub) +
  geom_point(mapping = aes(x = date,
                           y = median)
             )

Special Properties of Aesthetics

Mapping Aesthetics
Setting Aesthetics

ggplot(data = housingsub) +
  geom_point(mapping = aes(x = date, 
                           y = median,
                           color = city)
             )

ggplot(data = housingsub) +
  geom_point(mapping = aes(x = date, 
                           y = median), 
             color = "blue")

Geometric Objects

Wee use a geom_XXX() function to represent data points.

one variable

geom_density()
geom_dotplot()
geom_histogram()
geom_boxplot()

two variable

geom_point()
geom_line()
geom_density_2d()

three variable

geom_contour()
geom_raster()

This is not an exhaustive list – see ggplot2 cheat sheet.

Creating a Graphic

To create a specific type of graphic, we will combine aesthetics and geometric objects.

Let’s try it!

The Goal
Game Plan
ggplot

Start with the TX housing data.

Make a plot of median house price over time (including both individual data points and a smoothed trend line), distinguishing between different cities .

Code

ggplot(data = sm_tx,
       mapping = aes(x = date, 
                     y = median, 
                     color = city)
       ) + 
  geom_point() + 
  geom_smooth(method = "loess") + 
  labs(x = "Date",
       y = "",
       color = " ",
       title = "Changes in Median Home Price for Texas Metro Areas") +
  scale_y_continuous(labels = scales::label_dollar())

Faceting

Extracts subsets of data and places them in side-by-side plots.

Code

ggplot(data = sm_tx,
       mapping = aes(x = date, 
                     y = median)
       ) + 
  geom_point() + 
  facet_wrap(~city) +
  geom_smooth(method = "loess") + 
  labs(x = "Date",
       y = "",
       color = " ",
       title = "Changes in Median Home Price for Texas Metro Areas") +
  scale_y_continuous(labels = scales::label_dollar())

Faceting

Options
Scales

facet_wrap(~ b): facets by one variable
- nrow controls the number of rows the facets are output into
- ncol controls the number of columns the facets are output into
facet_grid(a ~ b): facet by two variables
- variable a will be assigned to the rows
- variable b will be assigned to the columns into both rows and columns

You can set scales to let axis limits vary across facets:

facet_grid(y ~ x, scales = ______)

"free" – both x- and y-axis limits adjust to individual facets
"free_x" – only x-axis limits adjust
"free_y" – only y-axis limits adjust

Statistical Transformation: `stat`

A stat transforms an existing variable into a new variable to plot.

identity leaves the data as is.
count counts the number of observations.
summary allows you to specify a desired transformation function.

Sometimes these statistical transformations happen under the hood when we use a specific geom_XXX().

Statistical Transformation: `stat`

stat_count()
stat_summary()

ggplot(data = mpg,
       mapping = aes(x = class)
       ) +
  geom_bar()

ggplot(data = mpg,
       mapping = aes(x = class)
       ) +
  stat_count(geom = "bar")

ggplot(data = mpg,
       mapping = aes(x = class,
                     y = hwy)
       ) +
  stat_summary(geom = "bar",
               fun = "mean") +
  scale_y_continuous(limits = c(0, 45))

ggplot(data = mpg,
       mapping = aes(x = class,
                     y = hwy)
       ) +
  stat_summary(geom = "bar",
               fun = "max") +
  scale_y_continuous(limits = c(0, 45))

Position Adjustements

Position adjustments determine how to arrange geom’s that would otherwise occupy the same space.

position = "dodge": Arrange elements side by side.
position = "fill": Stack elements on top of one another + normalize height.
position = "stack": Stack elements on top of one another.
position = "jitter": Add random noise to x & y position of each element to avoid overplotting (see geom_jitter()).

Position Adjustements

ggplot(data = mpg, 
       mapping = aes(fl, fill = drv)
       ) + 
  geom_bar(position = "_____")

Plot Customizations

Labels
Themes
Scales: Axes Ticks
Scales: Color

Code

ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy, color = cyl)
       ) + 
  geom_jitter() + 
  labs(x = "Engine Displacement (liters)", 
       y = " ", 
       color = "Cylinders",
       title = "Cars with More Cylinders Have Larger Engine Displacement\n and Lower Fuel Efficiency")

Code

ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy, color = cyl)
       ) + 
  geom_jitter() + 
  labs(x = "Engine Displacement (liters)", 
       y = " ", 
       color = "Cylinders",
       title = "Cars with More Cylinders Have Larger Engine Displacement\n and Lower Fuel Efficiency") +
  theme_bw() +
  theme(legend.position = "bottom")

Code

ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy, color = cyl)
       ) + 
  geom_jitter() + 
  labs(x = "Engine Displacement (liters)", 
       y = " ", 
       color = "Cylinders",
       title = "Cars with More Cylinders Have Larger Engine Displacement\n and Lower Fuel Efficiency") +
  scale_y_continuous(limits = c(0, 50),
                     breaks = seq(from = 0, to = 50, by = 5)
                     )

Code

ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy, color = cyl)
       ) + 
  geom_jitter() + 
  labs(x = "Engine Displacement (liters)", 
       y = " ", 
       color = "Cylinders",
       title = "Cars with More Cylinders Have Larger Engine Displacement\n and Lower Fuel Efficiency") +
  scale_color_gradient(low = "white", high = "green4")

Nicely Formatted Code

ggplot Code

It is good practice to put each geom and aes on a new line.

This makes code easier to read!
Generally: no line of code should be over 80 characters long.

Bad
Good
Amazing

ggplot(data = mpg, mapping = aes(x = cty, y = hwy, color = class)) + geom_point() + theme_bw() + labs(x = "City (mpg)", y = "Highway (mpg)")

ggplot(data = mpg, 
       mapping = aes(x = cty, y = hwy, color = class)) + 
  geom_point() + 
  theme_bw() + 
  labs(x = "City (mpg)", y = "Highway (mpg)")

ggplot(data = mpg, 
       mapping = aes(
         x = cty, 
         y = hwy, 
         color = class
         )
       ) + 
  geom_point() + 
  theme_bw() + 
  labs(x = "City (mpg)", 
       y = "Highway (mpg)")

Let’s Make Some Visualizations!

PA 2: Using Data Visualization to Find the Penguins

Artwork by Allison Horst

Using Data Visualization to Find the Penguins

This puzzle activity will require knowledge of:

installing and loading packages in R
formatting code chunks in Quarto
interpreting the context of a dataset
data types / variable types
different types of visualizations
what visualization(s) go with different data types
how to make visualizations with ggplot2
choosing between different aesthetic options

None of us have all these abilities. Each of us has some of these abilities.

Pair Programming Expectations

During your collaboration, you and your partner will alternate between two roles:

Developer

Reads prompt and ensures Coder understands what is being asked.
Types the code specified by the Coder into the Quarto document.
Runs the code provided by the Coder.
Works with Coder to debug the code.
Evaluates the output.
Works with Coder to write code comments.

Coder

Reads out instructions or prompts
Directs the Developer what to type.
Talks with Developer about their ideas.
Manages resources (e.g., cheatsheets, textbook).
Works with Developer to debug the code.
Works with Developer to write code comments.

Group Norms

Think and work together. Do not divide the work.
You are smarter together.
Be open minded.
No cross-talk with other groups.
Communicate with each other!

ggplot2 Resources

Every group should have a ggplot2 cheatsheet!

On the Front

Column 1: the “template” for making a ggplot
Column 3: creating plots for two continuous variables
Column 4: creating plots for one discrete or one continuous variable

On the Back

Column 4: adding facets and labels to your plot
Column 3: adding themes to your plot (if you have extra time)

A picture of the ggplot2 cheatsheet, which contains helpful information on assembling a variety of visualizations all using the ggplot2 package.

Opening the Practice Activity in Posit Cloud

The partner whose family name starts first alphabetically starts as the Developer!

Both of you need to do the following:

Open the email you were sent (from me and / or Posit Cloud)
Click on the link to join your workspace
Log-in to Posit Cloud
Open the PA 2: Using Data Visualization to Find the Penguins project
Open the PA2-ggplot.qmd file
Get started!

Submission

When you have completed the visualization tasks, you will work as a group to answer the five questions posed at the end of the document.
Each person will input the answers to these questions in the PA2 Canvas quiz.
The person who last occupied the role of Developer will download and submit the PA-2.html file for the group.
- Only one submission per group!

Basics of Graphics

Tuesday, October 1

Accessing Practice Activities

Grammar of Graphics

Grammar of Graphics

Why GoG?

Components of GoG

Using `ggplot2`

How to Build a Graphic

Aesthetics

Aesthetics

Special Properties of Aesthetics

Special Properties of Aesthetics

Geometric Objects

Creating a Graphic

Faceting

Faceting

Statistical Transformation: `stat`

Statistical Transformation: `stat`

Position Adjustements

Position Adjustements

Plot Customizations

Nicely Formatted Code

ggplot Code

Let’s Make Some Visualizations!

PA 2: Using Data Visualization to Find the Penguins

Using Data Visualization to Find the Penguins

Pair Programming Expectations

ggplot2 Resources

Opening the Practice Activity in Posit Cloud

Submission

Wrap-up

What skills did you need?

Basics of Graphics

Tuesday, October 1

Accessing Practice Activities

Grammar of Graphics

Grammar of Graphics

Why GoG?

Components of GoG

Using ggplot2

How to Build a Graphic

Aesthetics

Aesthetics

Special Properties of Aesthetics

Special Properties of Aesthetics

Geometric Objects

Creating a Graphic

Faceting

Faceting

Statistical Transformation: stat

Statistical Transformation: stat

Position Adjustements

Position Adjustements

Plot Customizations

Nicely Formatted Code

ggplot Code

Let’s Make Some Visualizations!

PA 2: Using Data Visualization to Find the Penguins

Using Data Visualization to Find the Penguins

Pair Programming Expectations

ggplot2 Resources

Opening the Practice Activity in Posit Cloud

Submission

Wrap-up

What skills did you need?

Using `ggplot2`

Statistical Transformation: `stat`

Statistical Transformation: `stat`