ggplot(data = <DATA>,
mapping = aes(<VARIABLE MAPPINGS>)) +
<GEOM_FUNCTION>()
Lab 2: Exploring Rodents with ggplot2
1 Part 1: GitHub Workflow
1.1 Step 1: Making a Copy from GitHub Classroom
You can access the Lab 2 materials through the Lab 2 assignment on GitHub Classroom. We’re going to follow the same steps as last week to make your own copy of this repository:
Use these steps to make a copy of the Lab 2 repository: List of Steps to Copy the Lab Assignment from GitHub Classroom
1.2 Step 2: Making a Small Change
Now, find the lab-2-student.qmd
file in the “Files” tab in the lower right hand corner. Click on this file to open it.
At the top of the document (in the YAML) there is an author
line that says "Your name here!"
. Change this to be your name and save your file either by clicking on the blue floppy disk or with a shortcut (command / control + s).
1.3 Step 3: Pushing Your Lab to GitHub
Now for our last step, we need to commit the files to our repo.
- Click the “Git” tab in upper right pane
- Check the “Staged” box for the
lab-2-student.qmd
file - Click “Commit”
- In the box that opens, type a message in “Commit message”, such as “Added my name”.
- Click “Commit”.
- Click the green “Push” button to send your local changes to GitHub.
RStudio will display something like:
>>> /usr/bin/git push origin HEAD:refs/heads/main
To https://github.com/atheobold/introduction-to-quarto-allison-theobold.git
3a2171f..6d58539 HEAD -> main
1.4 Step 4: Verifying Your Changes
Go back to your browser. I assume you’re still viewing the GitHub repo you just cloned. Refresh the page. You should see all the project files you committed there. If you click on “commits”, you should see one with the message you used, e.g. “Added my name”.
1.5 Step 5: Let’s get started making some plots!
Part of learning to program is learning from a variety of resources. Thus, I expect you will use resources that you find on the internet. There is, however, an important balance between copying someone else’s code and using their code to learn. Therefore, if you use external resources, I want to know about it.
If you used Google, you are expected to “inform” me of any resources you used by pasting the link to the resource in a code comment next to where you used that resource.
If you used ChatGPT, you are expected to “inform” me of the assistance you received by (1) indicating somewhere in the problem that you used ChatGPT (e.g., below the question prompt or as a code comment), and (2) downloading and attaching the
.txt
file containing your entire conversation with ChatGPT.
Additionally, you are permitted and encouraged to work with your peers as you complete lab assignments, but you are expected to do your own work. Copying from each other is cheating, and letting people copy from you is also cheating. Please don’t do either of those things.
2 Part Two: Data Context
The questions in this lab are noted with numbers and boldface. Each question will require you to produce code, whether it is one line or multiple lines.
This document is quite plain, meaning it does not have any special formatting. As part of your demonstration of creating professional looking Quarto documents, I would encourage you to spice your documents up (e.g., declaring execution options, specifying how your figures should be output, formatting your code output, etc.).
2.1 Setup
In the code chunk below, load in the packages necessary for your analysis. You should only need the tidyverse
package for this analysis.
2.2 Data Context
The Portal Project is a long-term ecological study being conducted near Portal, AZ. Since 1977, the site has been used to study the interactions among rodents, ants, and plants, as well as their respective responses to climate. To study the interactions among organisms, researchers experimentally manipulated access to 24 study plots. This study has produced over 100 scientific papers and is one of the longest running ecological studies in the U.S.
We will be investigating the animal species diversity and weights found within plots at the Portal study site. The data are stored as a comma separated value (CSV) file. Each row holds information for a single animal, and the columns represent:
Column | Description |
---|---|
record_id |
Unique ID for the observation |
month |
month of observation |
day |
day of observation |
year |
year of observation |
plot_id |
ID of a particular plot |
species_id |
2-letter code |
sex |
sex of animal (“M”, “F”) |
hindfoot_length |
length of the hindfoot in mm |
weight |
weight of the animal in grams |
genus |
genus of animal |
species |
species of animal |
taxon |
e.g. Rodent, Reptile, Bird, Rabbit |
plot_type |
type of plot |
2.3 Reading the Data into R
1. Using the read_csv()
function, write the code necessary to load in the surveys.csv
dataset (stored in the data folder). For simplicity, name the data surveys
.
2. What are the dimensions (# of rows and columns) of these data?
3. What are the data types of the variables in this dataset?
3 Exploratory Data Analysis with ggplot2
ggplot()
graphics are built step by step by adding new elements. Adding layers in this fashion allows for extensive flexibility and customization of plots.
To build a ggplot()
, we will use the following basic template that can be used for different types of plots:
Let’s get started!
3.1 Scatterplot
4. First, let’s create a scatterplot of the relationship between weight
(on the \(x\)-axis) and hindfoot_length
(on the \(y\)-axis).
We can see there are a lot of points plotted on top of each other. Let’s try and modify this plot to extract more information from it.
5. Let’s add transparency (alpha
) to the points, to make the points more transparent and (possibly) easier to see.
Despite our best efforts there is still a substantial amount of overplotting occurring in our scatterplot. Let’s try splitting the dataset into smaller subsets and see if that allows for us to see the trends a bit better.
6. Facet your scatterplot by species
.
7. No plot is complete without axis labels and a title. Include reader friendly labels and a title to your plot.
It takes a larger cognitive load to read text that is rotated. It is common practice in many journals and media outlets to move the \(y\)-axis label to the top of the graph under the title.
8. Specify your \(y\)-axis label to be empty and move the \(y\)-axis label into the subtitle.
3.2 Boxplots
10. Create side-by-side boxplots to visualize the distribution of weight within each species.
A fundamental complaint of boxplots is that they do not plot the raw data. However, with ggplot we can add the raw points on top of the boxplots!
11. Add another layer to your previous plot that plots each observation.
Alright, this should look less than optimal. Your points should appear rather stacked on top of each other. To make them less stacked, we need to jitter them a bit, using geom_jitter()
.
12. Remove the previous layer and include a geom_jitter()
layer instead.
That should look a bit better! But its really hard to see the points when everything is black.
13. Set the color
aesthetic in geom_jitter()
to change the color of the points and add set the alpha
aesthetic to add transparency.
You are welcome to use whatever color you wish! Some of my favorites are “orange3” and “steelblue”.
Great! Now that you can see the points, you should notice something odd: there are two colors of points still being plotted. Some of the observations are being plotted twice, once from geom_boxplot()
as outliers and again from geom_jitter()
!
14. Inspect the help file for geom_boxplot()
and see how you can remove the outliers from being plotted by geom_boxplot()
. Make this change in your code!
Some small changes can make big differences to plots. One of these changes are better labels for a plot’s axes and legend.
15. Modify the \(x\)-axis and \(y\)-axis labels to describe what is being plotted. Be sure to include any necessary units! You might also be getting overlap in the species names – use theme(axis.text.x = ____)
or theme(axis.text.y = ____)
to turn the species axis labels 45 degrees.
Some people (and journals) prefer for boxplots to be stacked with a specific orientation! Let’s practice changing the orientation of our boxplots.
16. Now copy-paste your boxplot code you’ve been adding to above. Flip the orientation of your boxplots. If you created horizontally stacked boxplots, your boxplots should now be stacked vertically. If you had vertically stacked boxplots, you should now stack your boxplots horizontally!
Notice how vertically stacked boxplots make the species labels more readable than horizontally stacked boxplots (even when the axis labels are rotated). This is good practice!
4 Conecting Visualizations with Statistical Analyses
Exploratory Data Analysis (EDA) is always a great start to investigating a dataset. Can we see a relationship between rodent weight and hindfoot length? How does rodent weight differ between species? After performing EDA, we can then conduct appropriate statistical analyses to formally investigate what we have seen.
In this section, we are going to conduct a one-way analysis of variance (ANOVA) to compare mean weight between the fourteen different species of rodents.
While a second course in statistics is a pre-requisite for this class, you may want a refresher on conducting a one-way ANOVA.
I have outlined the null and alternative hypotheses we will be testing:
Look up the help documentation for aov()
.
17. Using aov()
, complete the code below to carry out the analysis.
<- aov()
species_mod
summary(species_mod)
18. Based on the results of the ANOVA F-test, draw a conclusion in context of the hypotheses. Make sure to cite appropriate output from above.
5 Lab 2 Submission
For Lab 2 you will submit only your HTML file. Your HTML file is required to have the following specifications in the YAML options (at the top of your document):
- have the plots embedded (
embed-resources: true
) - include your source code (
code-tools: true
) - include all your code and output (
echo: true
)
If any of the options are not included, your Lab 2 or Challenge 2 assignment will receive an “Incomplete” and you will be required to submit a revision.
In addition, your document should not have any warnings or messages output in your HTML document. If your HTML contains warnings or messages, you will receive an “Incomplete” for document formatting and you will be required to submit a revision.