Lab 4: Grading Guide

Question 1 – Size of ntl_icecover dataset

Success:

  • has glimpse() somewhere in their code
  • states there are 334 rows and 5 columns

Growing:

  • If no code is present
  • If they provide incorrect size of data
Note

If they say there are 431 rows, they got their answer from the help file!

Feedback: Careful! The best way to obtain information about the dataset is using the glimpse() function!

Question 2 – Scatterplot

Success: Code should look like the following

ggplot(data = ntl_icecover, 
       mapping = aes(y = ice_duration, x = year)) +
  geom_point() +
  labs(x = "Year", 
       y = "Ice Duration (days)")

Growing:

  • Doesn’t change y-axis labels

Feedback: You were asked to give your visualization nice axis labels. Also, it is important for the axis label to contain the units of the variable. What unit was ice duration measured in?

  • Swaps the x- and y-variable

Feedback: Careful! Which variable is supposed to be the response?

Question 3 – Describe the regression line

Success: Addresses form, direction, strength and unusual points.

  • Form: Linear
  • Direction: Negative
  • Strength: Moderate
  • Outliers: Maybe the 2001 observation with ~20 days of ice?

Growing:

  • If they don’t discuss one of the for aspects.

Feedback: You were asked to describe the form, direction, strength, and unusual points. Make sure you discuss all four aspects!

  • If they say there are unusual points / outliers but don’t say where. The description should point to the exact x and y location (not just one or the other).

Feedback: If you believe there are unusual observations you need to tell the reader where these points are, which includes both their x- and y-coordinates!

Question 4 – Add a regression line

Success:

  • Includes regression line with geom_smooth() and method = "lm" option
Note

They don’t need to turn the standard errors off (se = FALSE), but that’s fine if they did!

Growing:

  • If they do not add regression line

Feedback: For this question you needed to add a regression line to the previous scatterplot!

  • If they do not use method = "lm" to get straight line

Feedback: You need to have a straight line, not a wiggly line! Take a look at the R resources to see how you specify a straight line!

Question 5 – Find correlation

Success: Code should look like the following

get_correlation(data = ntl_icecover, 
                ice_duration ~ year, 
                na.rm = TRUE)

Growing:

  • Doesn’t remove NAs

Feedback: It is important to remove the NAs before calculation the correlation! Is there another argument (input) for the get_correlation() function that allows you to remove the missing values? Hint: it looks like the option we used to remove missing values in the mean() function in Week 3!

Question 6 – Fit linear regression

Success: Code should look like the following

my_model <- lm(ice_duration ~ year, data = ntl_icecover)

Growing:

  • Swaps response and explanatory

Feedback: Careful! With the lm() function the response variable goes first (before the ~) and the explanatory comes second (after the ~). Which variable were you instructed to use as the response?

When you make this change, your coefficient table (and resulting interpretations) will change! These changes to Questions 8, 9, 10, and 11 should be included in your revisions.

Question 7 – Get regression table

Success: Code should look like the following

get_regression_table(my_model)

Growing:

  • Doesn’t use the get_regression_table() function

Feedback: We want out model output to look tidy, which is why we use the get_regression_table() function!

Question 8 – Write out regression equation

Success:

\[ \widehat{\text{ice duration}} = 495 - 0.203 \times (\text{year})\]

  • Indicates the response (ice duration) is estimated / predicted / the mean with either a hat or with language
  • Inputs values in correct location
  • References year and ice duration (not x and y)

Growing:

  • Does not indicate the response (ice duration) is estimated / predicted / the mean

Feedback: Remember, the estimated regression equation has a hat over y. What does that hat mean? How can you incorporate that language into your interpretation?

  • Uses x and y instead of variable names

Feedback: When writing out the regression equation, you need to reference the context of the line that was found. What variable was the response? What variable was the explanatory?

Note

If they use x and y in the equation but then define what variables they are associated with, that is okay!

Question 9 – Interpret slope

Success:

  • Correct interpretation of slope for 1 year increase
    • 1 year increase
    • mean / predicted / estimated ice duration
    • decreases by 0.203 days

Growing:

  • Doesn’t interpret the value of -0.203

Feedback: Careful! You did not interpret what the value of -0.203 means in the context of the regression line!

  • States the duration of ice increases

Feedback: Look at the sign associated with the slope coefficient! What does the sign tell you about whether ice duration should increase or decrease?

  • Doesn’t indicate the response is mean / estimate / predicted ice duration

Feedback: Careful! Your interpretation sounds like it is guaranteed that every 1 year increase in time will result in exactly a 0.203 day decrease in ice duration. Will there always be exactly this decrease? How do we indicate that our slope estimate is not exact?

  • Uses the word “cause”

Feedback: The word “cause” in statistics means a very specific thing. We can only use cause and effect statements in an experiment. Was this study an experiment? If not, what word would be more appropriate to use?

Question 10 – 100 Year Increase

  • Correct interpretation of slope for 100 year increase
    • 100 year increase
    • mean / predicted / estimated / approximate ice duration
    • decreases by 20.3 days

Growing:

  • Uses y-intercept when increasing 100 years

Feedback: Careful! Look back at your interpretation for Question 9. Did you include the y-intercept when you interpreted the value of the slope? How can you translate what you did in Question 9 to an increase of 100 years instead of 1 year?

  • Don’t provide an estimate of the change

Feedback: You need to use your estimated regression equation to state what the expected change in the ice duration would be if the number of years was increased by 100.

Question 11 – Multivariate plot

Success: Code should look like the following

ggplot(data = ntl_icecover, 
       mapping = aes(y = ice_duration, x = year, color = "lakeid")) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "Year", 
       y = "Ice Duration (days)")
Tip

If they got a Growing for Question 4 from using a wiggly line, they don’t need to get a Growing here!

If they got a Growing for Question 2 for forgetting units on the y-axis, they don’t need to get a Growing here!

Growing:

  • If they use fill instead of color

Feedback: I know it is confusing, but to change the color of the points and the lines in a scatterplot we need to use the color aesthetic, not the fill.