Lab 3: Grading Guide

Question 1 – Size of and_vertebrates dataset

Success:

  • has glimpse() somewhere in their code
  • states there are 32,209 rows and 16 columns

Growing:

  • If no code is present
  • If they flip the rows and columns
  • If they do not provide size of data

Feedback for no code: You needed to write code to explore the size of the data and the types of variables.

Feedback for no / incorrect dimensions: Look again at the output of the glimpse() function, how many rows and columns are included in the and_vertebrates dataset?

Question 2 – Categorical variables in dataset

Success:

  • Lists all variables with a chr data type:
    • sitecode
    • section
    • reach
    • unittype
    • species
    • clip
    • notes
  • Permitted to miss 1 variable

Growing:

  • If misses 2 or more variables
  • If includes date variable (sampledate)

Feedback: You need to be careful to include ALL of the variables that R believes are categorical. What data types did we say in class that categorical variables can have?

Question 3 – Levels of the species variable

Success

  • Provides code that uses distinct() function
  • Inputs the species variable

Growing: Does not use distinct() function

Feedback: There is a function which finds the distinct levels of a categorical variable. What function might that be?

Question 4 – Filtering to including only trout

Success: Code should look similar to:

trout <- filter(and_vertebrates, 
                species == "Cutthroat trout", 
                length_1_mm > 101)
Note

The can use %>% or pass in and_vertebrates as first argument

Growing: If filter() is not correct

  • using < instead of >

Feedback: Careful! What lengths of trout were you told to include in the dataset? We think really small fish are not catchable, so we decided to remove them from the dataset.

  • using the wrong variable / value (e.g., wrong spelling of Cutthroat trout, putting 101 in quotations)

Feedback for spelling Cutthroat trout incorrectly: Careful! R is case sensitive, so we need to be sure that we use the same spelling and capitalization the researchers used. How did these researchers spell Cutthroat trout? The output of the distinct() function should help you!

Feedback for using "101" instead of 101: Careful! We use quotations to indicate values of a categorical variable. Is length_1_mm a categorical variable?

Question 5 – Distribution of trout lengths

Success: Acceptable geoms:

  • geom_histogram()
  • geom_density()

Growing:

  • Uses geom_dotplot()

Feedback for using a dotplot: Careful! A dotplot doesn’t automatically resize the dots based on the number of observations you have. This makes it so your dots are running off the page! You can resize the dots using the dotsize argument, with a number smaller than 1 (e.g., 0.5).

  • Uses geom_boxplot()

Feedback for using a boxplot: For Question 6, we need to use a geom which allows for us to see the shape of the distribution. Boxplots hide distributions with multiple modes, so what type of plot would be better?

  • Not including axis label with units (mm)

Feedback for not including axis labels with units: Every plot we make should have descriptive axis labels which include the units the variable was measured in. What unit were the length of each trout measured in?

  • The y-axis label doesn’t reflect that we are plotting counts

Feedback: Great job changing your y-axis label! The label you chose makes it a bit unclear what is being plotted on the y-axis. Remember, the original label was “count,” so our label should say something about the number of observations (trout) that are being plotted.

Question 6 – Sources of variation

Success: Names three “reasonable” sources of variation in trout length

Growing: Names an unreasonable source of variation in trout lengths

Feedback: We are interested in variables that, if changed, we would expect the length of the Cutthroat trout to change.

Question 7 – Ridge plot

Success: Code should look like the following

ggplot(data = trout, 
       mapping = aes(x = length_1_mm, y = unittype)) +
  geom_density_ridges() +
  labs(x = "Length (mm)", 
       y = "Channel Section")

Growing:

  • Doesn’t include units (mm) in x-axis label

Feedback: It is important for the axis label to contain the units of the variable. What units were the lengths measured in?

Question 8 – Adding another categorical variable

Success: Uses either color or facets to incorporate species

## Option 1 -- Facets
ggplot(data = trout, 
       mapping = aes(x = length_1_mm, y = unittype)) +
  geom_density_ridges() +
  facet_wrap(~species)

## Option 2 -- Colors
ggplot(data = trout, 
       mapping = aes(x = length_1_mm, y = unittype, fill = species)) +
  geom_density_ridges() 

Growing:

  • y-axis should say something about the type of channel

Feedback: It is important for the axis label to describe the variable that is being measured. What would be a good y-axis title for what the unittype variable measured?

Not marking down on axis label

If they got a growing on #8 for an axis label, they don’t get a growing here.

  • If doesn’t use facets or colors ::: callout-note # If they used color instead of fill

Feedback: The color aesthetic only colors the outside of the ridge plot. However, if you were to use the fill aesthetic the entire ridge would be filled with color! :::

Question 9 – Based on the plot, how different are the lengths between the channel types and forest sections?

Success: Must have all of the following

  • Comparisons between forest sections (centers, spreads, shapes)
  • Comparisons between channel types
  • State what channel types did not have both types of forest

Growing:

  • If they say the distributions are all fairly similar

Feedback: While I would agree that most of these distributions are fairly similar (they overlap a lot), there are a few channel types where the lengths of trout are quite different between clear cut and old growth forests. Which channel types are these? In these channel types, how do the distributions of fish lengths differ (e.g., are trout in the CC section larger?)?

  • Comparisons are incomplete

Feedback: Your comparison of these distributions should include similarities / differences in their centers and shapes.

  • Does not state what channel types did not have both types of forest

Feedback: Careful! Were there clear cut and old growth forests for every type of channel? If not, what channel types only had one section of forest?

Question 10 – Average length of Cutthroat trout between channel types

Success: Code should look similar to:

trout %>%
  group_by(unittype) %>%
  summarize(mean_length = mean(length_1_mm, na.rm = TRUE))
If they don’t name their summary statistics

Feedback: For Question 11, the output of your summary statistics looks a lot nicer if you give them names! (like the example that was given)

Growing: If process is not correct

Question 11 – Find the average length of Cutthroat trout between channel types and forest section

Success: Code should look similar to:

trout %>%
  group_by(unittype, section) %>%
  summarize(mean_length = mean(length_1_mm))

Growing: If process is not correct

Question 12 – Differences in averages compared to plot

Success:

  • States that the averages are all fairly similar (comparing the centers)
  • Connects averages with the skew seen in the visualizations

Growing:

  • If their statement doesn’t connect the means to the shape of the distributions seen in Question 8

Feedback: Note that in Question 11 you are specifically using the mean to summarize the center of the distribution. In Question 8 you saw the shape of each channel / forest section’s distribution. Based on what you saw, how do the shapes of the distributions influence the means you are seeing in Question 11?