Lab 3: Grading Guide

Question 1 – Size of `and_vertebrates` dataset

Success:

has glimpse() somewhere in their code
states there are 32,209 rows and 16 columns

Growing:

If no code is present
If they flip the rows and columns
If they do not provide size of data

Feedback for no code: You needed to write code to explore the size of the data and the types of variables.

Feedback for no / incorrect dimensions: Look again at the output of the glimpse() function, how many rows and columns are included in the and_vertebrates dataset?

Question 2 – Categorical variables in dataset

Success:

Lists all variables with a chr data type:
- sitecode
- section
- reach
- unittype
- species
- clip
- notes
Permitted to miss 1 variable

Growing:

If misses 2 or more variables
If includes date variable (sampledate)

Feedback: You need to be careful to include ALL of the variables that R believes are categorical. What data types did we say in class that categorical variables can have?

Question 3 – Levels of the species variable

Success

Provides code that uses distinct() function
Inputs the species variable

Growing: Does not use distinct() function

Feedback: There is a function which finds the distinct levels of a categorical variable. What function might that be?

Question 4 – Filtering to including only trout

Success: Code should look similar to:

trout <- filter(and_vertebrates, 
                species == "Cutthroat trout", 
                length_1_mm > 101)

Note

The can use %>% or pass in and_vertebrates as first argument

Growing: If filter() is not correct

using < instead of >

Feedback: Careful! What lengths of trout were you told to include in the dataset? We think really small fish are not catchable, so we decided to remove them from the dataset.

using the wrong variable / value (e.g., wrong spelling of Cutthroat trout, putting 101 in quotations)

Feedback for spelling Cutthroat trout incorrectly: Careful! R is case sensitive, so we need to be sure that we use the same spelling and capitalization the researchers used. How did these researchers spell Cutthroat trout? The output of the distinct() function should help you!

Feedback for using "101" instead of 101: Careful! We use quotations to indicate values of a categorical variable. Is length_1_mm a categorical variable?

Question 5 – Distribution of trout lengths

Success: Acceptable geoms:

geom_histogram()
geom_density()

Growing:

Uses geom_dotplot()

Feedback for using a dotplot: Careful! A dotplot doesn’t automatically resize the dots based on the number of observations you have. This makes it so your dots are running off the page! You can resize the dots using the dotsize argument, with a number smaller than 1 (e.g., 0.5).

Uses geom_boxplot()

Feedback for using a boxplot: For Question 6, we need to use a geom which allows for us to see the shape of the distribution. Boxplots hide distributions with multiple modes, so what type of plot would be better?

Not including axis label with units (mm)

Feedback for not including axis labels with units: Every plot we make should have descriptive axis labels which include the units the variable was measured in. What unit were the length of each trout measured in?

The y-axis label doesn’t reflect that we are plotting counts

Feedback: Great job changing your y-axis label! The label you chose makes it a bit unclear what is being plotted on the y-axis. Remember, the original label was “count,” so our label should say something about the number of observations (trout) that are being plotted.

Question 6 – Sources of variation

Success: Names three “reasonable” sources of variation in trout length

Growing: Names an unreasonable source of variation in trout lengths

Feedback: We are interested in variables that, if changed, we would expect the length of the Cutthroat trout to change.

Question 7 – Ridge plot

Success: Code should look like the following

ggplot(data = trout, 
       mapping = aes(x = length_1_mm, y = unittype)) +
  geom_density_ridges() +
  labs(x = "Length (mm)", 
       y = "Channel Section")

Growing:

Doesn’t include units (mm) in x-axis label

Feedback: It is important for the axis label to contain the units of the variable. What units were the lengths measured in?

Doesn’t change y-axis label to something related to channel types

Feedback: It is important for the axis label to describe the variable being plotted. What does the “unittype” variable represent? You can look up what each variable means in the help file (?and_vertebrates).

contain the units of the variable. What units were the lengths measured in?

Question 8 – Adding another categorical variable

Success: Uses either color or facets to incorporate species

## Option 1 -- Facets
ggplot(data = trout, 
       mapping = aes(x = length_1_mm, y = unittype)) +
  geom_density_ridges() +
  facet_wrap(~species)

## Option 2 -- Colors
ggplot(data = trout, 
       mapping = aes(x = length_1_mm, y = unittype, fill = species)) +
  geom_density_ridges()

If they used color instead of fill

Feedback: The color aesthetic only colors the outside of the ridge plot. However, if you were to use the fill aesthetic the entire ridge would be filled with color!

If they didn’t use alpha

Feedback: You can’t see the CC distributions for most of the channel types because they are hidden behind the OG distributions. Adding alpha = 0.5 inside of the geom_density_ridges() makes the distribution more transparent, so you can see BOTH distributions!

Growing:

y-axis should say something about the type of channel

Feedback: It is important for the axis label to describe the variable that is being measured. What would be a good y-axis title for what the unittype variable measured?

Not marking down on axis label

If they got a growing on #7 for an axis label, they don’t get a growing here.

If doesn’t use facets or colors

Question 9 – Based on the plot, how different are the lengths between the channel types and forest sections?

Success: Must have all of the following

Comparisons between forest sections (centers, spreads, shapes)
Comparisons between channel types
State what channel types did not have both types of forest

Growing:

If they say the distributions are all fairly similar

Feedback: While I would agree that most of these distributions are fairly similar (they overlap a lot), there are a few channel types where the lengths of trout are quite different between clear cut and old growth forests. Which channel types are these? In these channel types, how do the distributions of fish lengths differ (e.g., are trout in the CC section larger?)?

Comparisons are incomplete

Feedback: Your comparison of these distributions should include similarities / differences in their centers and shapes.

Does not state what channel types did not have both types of forest

Feedback: Careful! Were there clear cut and old growth forests for every type of channel? If not, what channel types only had one section of forest?

Question 10 – Average length of Cutthroat trout between channel types

Success: Code should look similar to:

trout %>%
  group_by(unittype) %>%
  summarize(mean_length = mean(length_1_mm, na.rm = TRUE))

If they don’t name their summary statistics

Feedback: For Question 11, the output of your summary statistics looks a lot nicer if you give them names! (like the example that was given)

Growing: If process is not correct

Question 11 – Find the average length of Cutthroat trout between channel types and forest section

Success: Code should look similar to:

trout %>%
  group_by(unittype, section) %>%
  summarize(mean_length = mean(length_1_mm))

Growing: If process is not correct

Question 12 – Differences in averages compared to plot

Success:

States that the averages are all fairly similar (comparing the centers)
Connects averages with the skew seen in the visualizations

Growing:

If their statement doesn’t connect larger means to skewed distributions

Feedback: While I would agree that most of these means are similar to the location of the peaks you found in Question 8, are there any groups that have a mean that is NOT close to where the peak appears to be (on the plot). Why might that be the case? What might this have to do with the shape of the distributions?

Question 1 – Size of and_vertebrates dataset

Question 2 – Categorical variables in dataset

Question 3 – Levels of the species variable

Question 4 – Filtering to including only trout

Question 5 – Distribution of trout lengths

Question 6 – Sources of variation

Question 7 – Ridge plot

Question 8 – Adding another categorical variable

Question 9 – Based on the plot, how different are the lengths between the channel types and forest sections?

Question 10 – Average length of Cutthroat trout between channel types

Question 11 – Find the average length of Cutthroat trout between channel types and forest section

Question 12 – Differences in averages compared to plot

Question 1 – Size of `and_vertebrates` dataset