Lab 8 Grading Guide

Question 1

To earn a Success: Creates a scatterplot with the following qualities:

  • life expectancy on y-axis
  • GDP per capita on the x-axis
  • axis labels for both axes, including units

Growing:

If they swap \(x\) and \(y\):

Careful! What variable should be the response variable (located on the y-axis)?

If they don’t include axis labels:

Careful! Your axis labels should also include the unit each variable was measured in! If you are unsure what unit a variable was measured in, you can consult the data documentation using ?gapminder. :)

Question 2

To earn a Success:

  • takes log of GDP (using scale_x_log10())
  • renames x-axis label to indicate log GDP (or log $) is what is being plotted

Growing:

If they log transform life expectancy (y):

Similar to the Principle of Parsimony, we want to transform as few variables as is necessary. Look back at the plot with just x log transformed. Is log transforming y and x notably better?

If their x-axis doesn’t indicate the log was taken:

Careful! You axis label needs to indicate how the original variable was transformed!

Question 3

To earn a Success: Code should look like the following:

gapminder_lm <- lm(lifeExp ~ log(gdpPercap), data = gapminder)

Growing:

If they swap lifeExp and gdpPercap:

Careful! What variable comes first in the lm() function? The response or explanatory variable?

If they don’t use log(gdpPercap) when fitting their regression:

What variable did you decide to transform in #2? How should that variable appear in the regression model you are fitting here?

Question 4

To earn a Success:

  • Says it is not reasonable to assume the observations are independent

  • Describes how observations are not independent citing at least one of the following:

    • each country has repeated observations
    • observations for each country are related in time (temporal correlation)
    • countries of close geographical proximity may share information (spatial correlation)

Growing:

If their justification doesn’t talk about the context of the data (e.g., temporal relationships between observations):

Your justification needs to make direct reference to the context of the data.

If their reasoning doesn’t include any of the above justifications:

How many observations are their for each country? Are these observations related in some way? If so, how?

Question 5

To earn a Success:

  • Says the condition is violated
  • Justifies with the left skew of the distribution
  • Says the condition is not violated
  • Justifies with characteristics of the distribution

Growing:

If their justification is insufficient:

When evaluating conditions the choices are subjective, so it is necessary for you to justify WHY you made the decision you did. Your justification should make direct reference to characteristics of the distribution of residuals.

Question 6

To earn a Success:

  • Says the condition is violated
  • References how the residuals change / decrease for larger values of log(GDP)

Growing:

If they say the condition is not violated:

Equal variance requires that the spread of residuals / the vertical width (e.g., going from -20 to +20) stays the same for ALL values of the explanatory variable (across the x-axis). Is that the case? Why or why not?

If they say the condition is / is not violated but reference the number of observations above and below the line:

Equal variance is not about having equal spread of points above and below the line – it is okay for there to be more residuals below the line compared to above the line. They key is that the spread of residuals / the vertical width (e.g., going from -20 to +20) stays the same for ALL values of the explanatory variable (across the x-axis). Is that the case? Why or why not?

If they say the condition is violated but have insufficient justification as to why:

When evaluating conditions the choices are subjective, so it is necessary for you to justify WHY you made the decision you did. Your justification should make direct reference to characteristics of the plot of the residuals versus fitted values. Specifically, we are evaluating if the spread of residuals / the vertical width (e.g., going from -20 to +20) stays the same for ALL values of the explanatory variable (across the x-axis). Is that the case? Why or why not?

Question 7

To earn a Success:

  • \(H_0\): there is no linear relationship between log GDP per capita and life expectancy
  • \(H_A\): there is a linear relationship between log GDP per capita and life expectancy

Growing:

If they say GDP instead of log GDP:

Careful! How did you transform your variable(s) in #2? What variables are you looking at the linear relationship between?

If they say there is a positive relationship in their alternative:

The standard hypothesis test for the slope uses a two-sided alternative hypothesis, unless we knew 100% going in that the relationship between \(x\) and \(y\) was positive. Did you know that the relationship was positive BEFORE you made your visualizations?

Question 8

To earn a Success: Code should look like the following:

obs_slope <-  gapminder %>%
  specify(response = lifeExp, explanatory = log(gdpPercap)) %>%
  calculate(stat = "slope")

Growing:

If they swap lifeExp and gdpPercap:

Careful! What variable is your response variable?

If they use GDP instead of log(GDP) for their explanatory:

Careful! You need to be consistent with the transformation you decided in #2.

Question 9

To earn a Success: Code should look like the following:

null_dist <- gapminder %>%
  specify(response = lifeExp, explanatory = log(gdpPercap)) %>%
  hypothesise(null = "independence") %>% 
  generate(reps = 1000, type = "permute") %>% 
  calculate(stat = "slope")

Growing:

If they also don’t use the log in this step:

Update your code to use the same variable transformation that you use in #8.

Question 10

To earn a Success: Code should look like the following:

get_p_value(null_dist, 
            obs_stat = obs_slope, 
            direction = "two-sided")
Consistent with alternative hypothesis

If they said the relationship was positive in the alternative, then their direction should be "greater".

Growing:

*If they don’t use a direction that is consistent with what they said in their alternative ("two-sided" or "greater"):

How many tails are there in the hypotheses stated in #7?

Question 11

To earn a Success: States they reject \(H_0\) because the p-value is less than 0.1

It is okay if they do not make a decision but conclude that there is a relationship, but give the following feedback:

You were asked to make a decision regarding the hypotheses, which has two possible options. Which option do you choose and why?


Growing:

If they don’t reference their significance level:

Careful! Hypothesis test decisions can differ based on the significance threshold that was used. What threshold did you use?

Question 12

To earn a Success: Code should look like the following:

get_regression_table(gapminder_lm, conf.level = 0.9)
It’s okay if their conf.level isn’t 0.9

This doesn’t need to be consistent, since we aren’t making a confidence interval!

Question 13

To earn a Success: States that the p-value is essentially the same

Question 14

To earn a Success: Answer must agree with what they said in #4-6

If they said equal variance and / or independence was violated, they must say that neither p-value is reliable.

If they said normality was violated but equal variance and independence were not violated, they must say that the simulation-based p-value is more reliable.

If they said none of the conditions were violated, they must say that both p-values are reliable.

Growing:

If they choose the wrong method:

Look back at the conditions required for each of these methods. Which conditions did you say were violated? What does that imply for the method(s) which give you the most reliable p-value?