<- obs_slope <- marsh_info %>%
obs_slope specify(response = water_temp,
explanatory = latitude) %>%
calculate(stat = "slope")
Lab 7 Grading Guide
Question 1
To earn a Success:
- includes either
site
orname
inside ofgroup_by()
If uses another variable (e.g., latitude
, water_temp
):
Your approach works, since there was only one measurement per site. But what if there wasn’t? This variable doesn’t uniquely define the location of each marsh! What variable(s) do?
Question 2
To earn a Success:
Creates scatterplot with:
latitude
on x-axiswater temperature
on y-axis- axis labels include variable units (Celsius for water temp; degrees for latitude)
They are not required to include a regression line, but if they do, great!
If they don’t have axis labels with the units:
It is important to include an axis label stating what the units of each variable are! If you are wondering what the units of water temperature are, you can look up the help file (using ?pie_crab)
If they only put temperature on their y-axis:
What specific variable is being plotted on the y-axis? Air temperature? Ground temperature? Water temperature? Be specific!
Question 3
To earn a Success:
Description includes all of the following:
- form of relationship (linear)
- direction of relationship (negative)
- strength of relationship (strong)
- location of any outliers (e.g., at a latitude of 45 degrees and 17.5 C temperature)
If they forget one of these:
Careful! You were asked to describe the form, direction, strength, and unusual points for the plot. Remember, it is important to explicitly state where you believe the outliers are, so the reader knows where to look!
If they don’t state where the outlier is:
It is important to explicitly state where you believe the outliers are, so the reader knows where to look!
Question 4
To earn a Success:
Code should look like the following:
Question 5
To earn a Success:
Code should look like the following:
<- marsh_info %>%
bootstrap_dist specify(response = water_temp,
explanatory = latitude) %>%
generate(reps = 500,
type = "bootstrap") %>%
calculate(stat = "slope")
Question 6
To earn a Success:
Code should look like the following:
visualise(bootstrap_dist) +
labs(x = "<SOME AXIS LABEL>")
Their axis label must describe what is being plotted (e.g., slope statistics, sample slope, etc.)
If they did not include an x-axis label:
It is important to include an axis label stating what is being plotted! What statistics are being plotted on this bootstrap distribution? Your axis label should tell the reader what statistics you are plotting!
If their x-axis label is not about the slope statistic:
What is being plotted in this distribution? What is it a distribution of? What statistics did you calculate in the previous step that are being plotted here?
If their y-axis label is something other than number of resamples:
Careful! This is a histogram of the bootstrap statistics that you found in #5. Think back to when we learned about histograms, what do the numbers (e.g., 25, 50, 75) represent? Hint: this is not a variable in the dataset!
Question 7
To earn a Success:
Code should look like the following:
get_confidence_interval(bootstrap_dist,
level = 0.90,
type = "percentile")
"percentile"
is the default
So, it is okay if they don’t specify it!
If they use a level
other than 0.90:
Look back at the lab instructions, what percentage confidence interval were you asked to construct?
Question 8
To earn a Success:
The interpretation must state:
- confidence: they are 90% confident
- statistic: the slope between water temperature and latitude
- population: for all marshes along the eastern US
- interval: is between [lower bound] and [upper bound]
Note that every group will get a different interval, due to the randomness of bootstrapping!
If they don’t state their confidence:
How much confidence do you have in your interval?
If they don’t state the statistic in context:
We need to be specific about the what parameter we believe is in our interval. The slope statistic is measuring the relationship between which variables?
If they don’t include a population:
What population does this interval apply to? Where were these marshes located? What population would you characterize these 13 marshes belonging to? That is the population your interval applies to!
If they have state an incorrect population:
Careful! Are you analyzing data on crabs or on marshes? Where were these marshes located? What population would you characterize these 13 marshes belonging to? That is the population your interval applies to!
Question 9
To earn a Success:
Code should look like the following:
get_confidence_interval(bootstrap_dist,
level = 0.90,
point_estimate = obs_slope,
type = "se")
If they don’t have type = "se"
:
What method do you want the function to use when calculating your confidence interval? Percentile intervals are the default!
If they type in the actual number for the estimated slope rather than using the obs_slope
object, you can give them this reminder:
It is better practice not to “hard-code” numbers in our code, but to reference values stored in objects. This makes our code more resistant to changes in the dataset.
Question 10
To earn a Success:
- states the intervals are similar
- states similarity is due to the normality of the bootstrap distribution
If they say the intervals are different:
Give these intervals another look, yes their endpoints differ by a bit, but are they vastly different intervals?
If they don’t talk about the bootstrap distribution being approximately normal:
What condition do we need to check when using the SE method to construct confidence intervals? Hint: it is a condition about the shape of the bootstrap distribution
Question 11
To earn a Success:
- states whether they do / do not believe the sample of 13 marshes is representative of all marshes along the Atlantic coast
- justifies why using something something that is true
I’m willing to be pretty flexible here, but the key aspect is that they need to justify their reasoning.
If their justification is about crabs:
Careful! Are you analyzing data on crabs or on marshes? The population of marshes you define in #8 is what you should use to assess if the assumption of bootstrapping is reasonable.
If their justification is not reasonable:
Bootstrapping assumes that the original sample is representative of the population from which the observations were sampled. Where were these 13 marshes located geographically? Do you believe that these 13 marshes are representative of the broader population they were sampled from?
If their justification is about the shape of the bootstrap distribution:
You are on the right track, we need for the sample to be representative of the population from which it was sampled. However, we cannot assess the representativeness of a sample using a bootstrap distribution. This can only be assessed by inspecting how the data were collected.