Lab 2: Exploring Rodents with ggplot2

Author

Instructions

1 Part 1: GitHub Workflow

1.1 Step 1: Making a Copy from GitHub Classroom

You can access the Lab 2 materials through the Lab 2 assignment on GitHub Classroom. We’re going to follow the same steps as last week to make your own copy of this repository:

Use these steps to make a copy of the Lab 2 repository: List of Steps to Copy the Lab Assignment from GitHub Classroom

1.2 Step 2: Making a Small Change

Now, find the lab-2-student.qmd file in the “Files” tab in the lower right hand corner. Click on this file to open it.

At the top of the document (in the YAML) there is an author line that says "Your name here!". Change this to be your name and save your file either by clicking on the blue floppy disk or with a shortcut (command / control + s).

1.3 Step 3: Pushing Your Lab to GitHub

Now for our last step, we need to commit the files to our repo.

  • Click the “Git” tab in upper right pane
  • Check the “Staged” box for the lab-2-student.qmd file
  • Click “Commit”
  • In the box that opens, type a message in “Commit message”, such as “Added my name”.
  • Click “Commit”.
  • Click the green “Push” button to send your local changes to GitHub.

RStudio will display something like:

>>> /usr/bin/git push origin HEAD:refs/heads/main
To https://github.com/atheobold/introduction-to-quarto-allison-theobold.git
   3a2171f..6d58539  HEAD -> main

1.4 Step 4: Verifying Your Changes

Go back to your browser. I assume you’re still viewing the GitHub repo you just cloned. Refresh the page. You should see all the project files you committed there. If you click on “commits”, you should see one with the message you used, e.g. “Added my name”.

1.5 Step 5: Let’s get started making some plots!

Part of learning to program is learning from a variety of resources. Thus, I expect you will use resources that you find on the internet. There is, however, an important balance between copying someone else’s code and using their code to learn. Therefore, if you use external resources, I want to know about it.

  • If you used Google, you are expected to “inform” me of any resources you used by pasting the link to the resource in a code comment next to where you used that resource.

  • If you used ChatGPT, you are expected to “inform” me of the assistance you received by (1) indicating somewhere in the problem that you used ChatGPT (e.g., below the question prompt or as a code comment), and (2) downloading and attaching the .txt file containing your entire conversation with ChatGPT.

Additionally, you are permitted and encouraged to work with your peers as you complete lab assignments, but you are expected to do your own work. Copying from each other is cheating, and letting people copy from you is also cheating. Please don’t do either of those things.

2 Part Two: Data Context

The questions in this lab are noted with numbers and boldface. Each question will require you to produce code, whether it is one line or multiple lines.

This document is quite plain, meaning it does not have any special formatting. As part of your demonstration of creating professional looking Quarto documents, I would encourage you to spice your documents up (e.g., declaring execution options, specifying how your figures should be output, formatting your code output, etc.).

2.1 Setup

In the code chunk below, load in the packages necessary for your analysis. You should only need the tidyverse package for this analysis.

2.2 Data Context

The Portal Project is a long-term ecological study being conducted near Portal, AZ. Since 1977, the site has been used to study the interactions among rodents, ants, and plants, as well as their respective responses to climate. To study the interactions among organisms, researchers experimentally manipulated access to 24 study plots. This study has produced over 100 scientific papers and is one of the longest running ecological studies in the U.S.

We will be investigating the animal species diversity and weights found within plots at the Portal study site. The data are stored as a comma separated value (CSV) file. Each row holds information for a single animal, and the columns represent:

Column Description
record_id Unique ID for the observation
month month of observation
day day of observation
year year of observation
plot_id ID of a particular plot
species_id 2-letter code
sex sex of animal (“M”, “F”)
hindfoot_length length of the hindfoot in mm
weight weight of the animal in grams
genus genus of animal
species species of animal
taxon e.g. Rodent, Reptile, Bird, Rabbit
plot_type type of plot

2.3 Reading the Data into R

1. Using the read_csv() function, write the code necessary to load in the surveys.csv dataset (stored in the data folder). For simplicity, name the data surveys.

2. What are the dimensions (# of rows and columns) of these data?

3. What are the data types of the variables in this dataset?

3 Exploratory Data Analysis with ggplot2

ggplot() graphics are built step by step by adding new elements. Adding layers in this fashion allows for extensive flexibility and customization of plots.

To build a ggplot(), we will use the following basic template that can be used for different types of plots:

ggplot(data = <DATA>,
       mapping = aes(<VARIABLE MAPPINGS>)) +
  <GEOM_FUNCTION>()

Let’s get started!

3.1 Scatterplot

4. First, let’s create a scatterplot of the relationship between weight (on the \(x\)-axis) and hindfoot_length (on the \(y\)-axis).

We can see there are a lot of points plotted on top of each other. Let’s try and modify this plot to extract more information from it.

5. Let’s add transparency (alpha) to the points, to make the points more transparent and (possibly) easier to see.

Despite our best efforts there is still a substantial amount of overplotting occurring in our scatterplot. Let’s try splitting the dataset into smaller subsets and see if that allows for us to see the trends a bit better.

6. Facet your scatterplot by species.

7. No plot is complete without axis labels and a title. Include reader friendly labels and a title to your plot.

It takes a larger cognitive load to read text that is rotated. It is common practice in many journals and media outlets to move the \(y\)-axis label to the top of the graph under the title.

8. Specify your \(y\)-axis label to be empty and move the \(y\)-axis label into the subtitle.

3.2 Boxplots

10. Create side-by-side boxplots to visualize the distribution of weight within each species.

A fundamental complaint of boxplots is that they do not plot the raw data. However, with ggplot we can add the raw points on top of the boxplots!

11. Add another layer to your previous plot that plots each observation.

Alright, this should look less than optimal. Your points should appear rather stacked on top of each other. To make them less stacked, we need to jitter them a bit, using geom_jitter().

12. Remove the previous layer and include a geom_jitter() layer instead.

That should look a bit better! But its really hard to see the points when everything is black.

13. Set the color aesthetic in geom_jitter() to change the color of the points and add set the alpha aesthetic to add transparency.

Note

You are welcome to use whatever color you wish! Some of my favorites are “orange3” and “steelblue”.

Great! Now that you can see the points, you should notice something odd: there are two colors of points still being plotted. Some of the observations are being plotted twice, once from geom_boxplot() as outliers and again from geom_jitter()!

14. Inspect the help file for geom_boxplot() and see how you can remove the outliers from being plotted by geom_boxplot(). Make this change in your code!

Some small changes can make big differences to plots. One of these changes are better labels for a plot’s axes and legend.

15. Modify the \(x\)-axis and \(y\)-axis labels to describe what is being plotted. Be sure to include any necessary units! You might also be getting overlap in the species names – use theme(axis.text.x = ____) or theme(axis.text.y = ____) to turn the species axis labels 45 degrees.

Some people (and journals) prefer for boxplots to be stacked with a specific orientation! Let’s practice changing the orientation of our boxplots.

16. Now copy-paste your boxplot code you’ve been adding to above. Flip the orientation of your boxplots. If you created horizontally stacked boxplots, your boxplots should now be stacked vertically. If you had vertically stacked boxplots, you should now stack your boxplots horizontally!

Notice how vertically stacked boxplots make the species labels more readable than horizontally stacked boxplots (even when the axis labels are rotated). This is good practice!

4 Conecting Visualizations with Statistical Analyses

Exploratory Data Analysis (EDA) is always a great start to investigating a dataset. Can we see a relationship between rodent weight and hindfoot length? How does rodent weight differ between species? After performing EDA, we can then conduct appropriate statistical analyses to formally investigate what we have seen.

In this section, we are going to conduct a one-way analysis of variance (ANOVA) to compare mean weight between the fourteen different species of rodents.

Refresher on one-way ANOVA

While a second course in statistics is a pre-requisite for this class, you may want a refresher on conducting a one-way ANOVA.

I have outlined the null and alternative hypotheses we will be testing:

Null

The population mean weight is the same between all fourteen rodent species.

Alternative

At least one rodent species has a different population mean weight.

Look up the help documentation for aov().

17. Using aov(), complete the code below to carry out the analysis.

species_mod <- aov()

summary(species_mod)

18. Based on the results of the ANOVA F-test, draw a conclusion in context of the hypotheses. Make sure to cite appropriate output from above.

5 Lab 2 Submission

For Lab 2 you will submit only your HTML file. Your HTML file is required to have the following specifications in the YAML options (at the top of your document):

  • have the plots embedded (embed-resources: true)
  • include your source code (code-tools: true)
  • include all your code and output (echo: true)

If any of the options are not included, your Lab 2 or Challenge 2 assignment will receive an “Incomplete” and you will be required to submit a revision.

In addition, your document should not have any warnings or messages output in your HTML document. If your HTML contains warnings or messages, you will receive an “Incomplete” for document formatting and you will be required to submit a revision.