<- filter(and_vertebrates,
trout == "Cutthroat trout",
species > 101) length_1_mm
Lab 3: Grading Guide
Question 1 – Size of and_vertebrates
dataset
Success:
- has
glimpse()
somewhere in their code - states there are 32,209 rows and 16 columns
Growing:
- If no code is present
- If they flip the rows and columns
- If they do not provide size of data
Feedback for no code: You needed to write code to explore the size of the data and the types of variables.
Feedback for no / incorrect dimensions: Look again at the output of the glimpse()
function, how many rows and columns are included in the and_vertebrates
dataset?
Question 2 – Categorical variables in dataset
Success:
- Lists all variables with a
chr
data type:sitecode
section
reach
unittype
species
clip
notes
- Permitted to miss 1 variable
Growing:
- If misses 2 or more variables
- If includes date variable (
sampledate
)
Feedback: You need to be careful to include ALL of the variables that R believes are categorical. What data types did we say in class that categorical variables can have?
Question 3 – Levels of the species variable
Success
- Provides code that uses
distinct()
function - Inputs the
species
variable
Growing: Does not use distinct()
function
Feedback: There is a function which finds the distinct levels of a categorical variable. What function might that be?
Question 4 – Filtering to including only trout
Success: Code should look similar to:
The can use %>% or pass in and_vertebrates
as first argument
Growing: If filter()
is not correct
- using
<
instead of>
Feedback: Careful! What lengths of trout were you told to include in the dataset? We think really small fish are not catchable, so we decided to remove them from the dataset.
- using the wrong variable / value (e.g., wrong spelling of Cutthroat trout, putting 101 in quotations)
Feedback for spelling Cutthroat trout incorrectly: Careful! R is case sensitive, so we need to be sure that we use the same spelling and capitalization the researchers used. How did these researchers spell Cutthroat trout? The output of the distinct()
function should help you!
Feedback for using "101"
instead of 101
: Careful! We use quotations to indicate values of a categorical variable. Is length_1_mm
a categorical variable?
Question 5 – Distribution of trout lengths
Success: Acceptable geom
s:
geom_histogram()
geom_density()
Growing:
- Uses
geom_dotplot()
Feedback for using a dotplot: Careful! A dotplot doesn’t automatically resize the dots based on the number of observations you have. This makes it so your dots are running off the page! You can resize the dots using the dotsize argument, with a number smaller than 1 (e.g., 0.5).
- Uses
geom_boxplot()
Feedback for using a boxplot: For Question 6, we need to use a geom which allows for us to see the shape of the distribution. Boxplots hide distributions with multiple modes, so what type of plot would be better?
- Not including axis label with units (mm)
Feedback for not including axis labels with units: Every plot we make should have descriptive axis labels which include the units the variable was measured in. What unit were the length of each trout measured in?
- The y-axis label doesn’t reflect that we are plotting counts
Feedback: Great job changing your y-axis label! The label you chose makes it a bit unclear what is being plotted on the y-axis. Remember, the original label was “count,” so our label should say something about the number of observations (trout) that are being plotted.
Question 6 – Sources of variation
Success: Names three “reasonable” sources of variation in trout length
Growing: Names an unreasonable source of variation in trout lengths
Feedback: We are interested in variables that, if changed, we would expect the length of the Cutthroat trout to change.
Question 7 – Ridge plot
Success: Code should look like the following
ggplot(data = trout,
mapping = aes(x = length_1_mm, y = unittype)) +
geom_density_ridges() +
labs(x = "Length (mm)",
y = "Channel Section")
Growing:
- Doesn’t include units (mm) in x-axis label
Feedback: It is important for the axis label to contain the units of the variable. What units were the lengths measured in?
Question 8 – Adding another categorical variable
Success: Uses either color or facets to incorporate species
## Option 1 -- Facets
ggplot(data = trout,
mapping = aes(x = length_1_mm, y = unittype)) +
geom_density_ridges() +
facet_wrap(~species)
## Option 2 -- Colors
ggplot(data = trout,
mapping = aes(x = length_1_mm, y = unittype, fill = species)) +
geom_density_ridges()
Growing:
- y-axis should say something about the type of channel
Feedback: It is important for the axis label to describe the variable that is being measured. What would be a good y-axis title for what the unittype
variable measured?
If they got a growing on #8 for an axis label, they don’t get a growing here.
- If doesn’t use facets or colors ::: callout-note # If they used
color
instead offill
Feedback: The color
aesthetic only colors the outside of the ridge plot. However, if you were to use the fill
aesthetic the entire ridge would be filled with color! :::
Question 9 – Based on the plot, how different are the lengths between the channel types and forest sections?
Success: Must have all of the following
- Comparisons between forest sections (centers, spreads, shapes)
- Comparisons between channel types
- State what channel types did not have both types of forest
Growing:
- If they say the distributions are all fairly similar
Feedback: While I would agree that most of these distributions are fairly similar (they overlap a lot), there are a few channel types where the lengths of trout are quite different between clear cut and old growth forests. Which channel types are these? In these channel types, how do the distributions of fish lengths differ (e.g., are trout in the CC section larger?)?
- Comparisons are incomplete
Feedback: Your comparison of these distributions should include similarities / differences in their centers and shapes.
- Does not state what channel types did not have both types of forest
Feedback: Careful! Were there clear cut and old growth forests for every type of channel? If not, what channel types only had one section of forest?
Question 10 – Average length of Cutthroat trout between channel types
Success: Code should look similar to:
%>%
trout group_by(unittype) %>%
summarize(mean_length = mean(length_1_mm, na.rm = TRUE))
Feedback: For Question 11, the output of your summary statistics looks a lot nicer if you give them names! (like the example that was given)
Growing: If process is not correct
Question 11 – Find the average length of Cutthroat trout between channel types and forest section
Success: Code should look similar to:
%>%
trout group_by(unittype, section) %>%
summarize(mean_length = mean(length_1_mm))
Growing: If process is not correct
Question 12 – Differences in averages compared to plot
Success:
- States that the averages are all fairly similar (comparing the centers)
- Connects averages with the skew seen in the visualizations
Growing:
- If their statement doesn’t connect the means to the shape of the distributions seen in Question 8
Feedback: Note that in Question 11 you are specifically using the mean to summarize the center of the distribution. In Question 8 you saw the shape of each channel / forest section’s distribution. Based on what you saw, how do the shapes of the distributions influence the means you are seeing in Question 11?