ggplot(data = ntl_icecover,
mapping = aes(y = ice_duration, x = year)) +
geom_point() +
labs(x = "Year",
y = "Ice Duration (days)")
Lab 4: Grading Guide
Question 1 – Size of ntl_icecover
dataset
Success:
- has
glimpse()
somewhere in their code - states there are 334 rows and 5 columns
Growing:
- If no code is present
- If they provide incorrect size of data
If they say there are 431 rows, they got their answer from the help file!
Feedback: Careful! The best way to obtain information about the dataset is using the glimpse() function!
Question 2 – Scatterplot
Success: Code should look like the following
Growing:
- Doesn’t change y-axis labels
Feedback: You were asked to give your visualization nice axis labels. Also, it is important for the axis label to contain the units of the variable. What unit was ice duration measured in?
- Swaps the x- and y-variable
Feedback: Careful! Which variable is supposed to be the response?
Question 3 – Describe the regression line
Success: Addresses form, direction, strength and unusual points.
- Form: Linear
- Direction: Negative
- Strength: Moderate
- Outliers: Maybe the 2001 observation with ~20 days of ice?
Growing:
- If they don’t discuss one of the for aspects.
Feedback: You were asked to describe the form, direction, strength, and unusual points. Make sure you discuss all four aspects!
- If they say there are unusual points / outliers but don’t say where. The description should point to the exact x and y location (not just one or the other).
Feedback: If you believe there are unusual observations you need to tell the reader where these points are, which includes both their x- and y-coordinates!
Question 4 – Add a regression line
Success:
- Includes regression line with
geom_smooth()
andmethod = "lm"
option
They don’t need to turn the standard errors off (se = FALSE
), but that’s fine if they did!
Growing:
- If they do not add regression line
Feedback: For this question you needed to add a regression line to the previous scatterplot!
- If they do not use
method = "lm"
to get straight line
Feedback: You need to have a straight line, not a wiggly line! Take a look at the R resources to see how you specify a straight line!
Question 5 – Find correlation
Success: Code should look like the following
get_correlation(data = ntl_icecover,
~ year,
ice_duration na.rm = TRUE)
Growing:
- Doesn’t remove NAs
Feedback: It is important to remove the NAs before calculation the correlation! Is there another argument (input) for the get_correlation()
function that allows you to remove the missing values? Hint: it looks like the option we used to remove missing values in the mean()
function in Week 3!
Question 6 – Fit linear regression
Success: Code should look like the following
<- lm(ice_duration ~ year, data = ntl_icecover) my_model
Growing:
- Swaps response and explanatory
Feedback: Careful! With the lm()
function the response variable goes first (before the ~
) and the explanatory comes second (after the ~
). Which variable were you instructed to use as the response?
When you make this change, your coefficient table (and resulting interpretations) will change! These changes to Questions 8, 9, 10, and 11 should be included in your revisions.
Question 7 – Get regression table
Success: Code should look like the following
get_regression_table(my_model)
Growing:
- Doesn’t use the
get_regression_table()
function
Feedback: We want out model output to look tidy, which is why we use the get_regression_table()
function!
Question 8 – Write out regression equation
Success:
\[ \widehat{\text{ice duration}} = 495 - 0.203 \times (\text{year})\]
- Indicates the response (ice duration) is estimated / predicted / the mean with either a hat or with language
- Inputs values in correct location
- References
year
andice duration
(not x and y)
Growing:
- Does not indicate the response (ice duration) is estimated / predicted / the mean
Feedback: Remember, the estimated regression equation has a hat over y. What does that hat mean? How can you incorporate that language into your interpretation?
- Uses x and y instead of variable names
Feedback: When writing out the regression equation, you need to reference the context of the line that was found. What variable was the response? What variable was the explanatory?
If they use x and y in the equation but then define what variables they are associated with, that is okay!
Question 9 – Interpret slope
Success:
- Correct interpretation of slope for 1 year increase
- 1 year increase
- mean / predicted / estimated ice duration
- decreases by 0.203 days
Growing:
- Doesn’t interpret the value of -0.203
Feedback: Careful! You did not interpret what the value of -0.203 means in the context of the regression line!
- States the duration of ice increases
Feedback: Look at the sign associated with the slope coefficient! What does the sign tell you about whether ice duration should increase or decrease?
- Doesn’t indicate the response is mean / estimate / predicted ice duration
Feedback: Careful! Your interpretation sounds like it is guaranteed that every 1 year increase in time will result in exactly a 0.203 day decrease in ice duration. Will there always be exactly this decrease? How do we indicate that our slope estimate is not exact?
- Uses the word “cause”
Feedback: The word “cause” in statistics means a very specific thing. We can only use cause and effect statements in an experiment. Was this study an experiment? If not, what word would be more appropriate to use?
Question 10 – 100 Year Increase
- Correct interpretation of slope for 100 year increase
- 100 year increase
- mean / predicted / estimated / approximate ice duration
- decreases by 20.3 days
Growing:
- Uses y-intercept when increasing 100 years
Feedback: Careful! Look back at your interpretation for Question 9. Did you include the y-intercept when you interpreted the value of the slope? How can you translate what you did in Question 9 to an increase of 100 years instead of 1 year?
- Don’t provide an estimate of the change
Feedback: You need to use your estimated regression equation to state what the expected change in the ice duration would be if the number of years was increased by 100.
Question 11 – Multivariate plot
Success: Code should look like the following
ggplot(data = ntl_icecover,
mapping = aes(y = ice_duration, x = year, color = "lakeid")) +
geom_point() +
geom_smooth(method = "lm") +
labs(x = "Year",
y = "Ice Duration (days)")
If they got a Growing for Question 4 from using a wiggly line, they don’t need to get a Growing here!
If they got a Growing for Question 2 for forgetting units on the y-axis, they don’t need to get a Growing here!
Growing:
- If they use
fill
instead ofcolor
Feedback: I know it is confusing, but to change the color of the points and the lines in a scatterplot we need to use the color
aesthetic, not the fill
.