Don’t forget reflections!
If your reflections are not present by the deadline for revisions, your revisions are not eligible to be regraded. Please don’t forget your reflections!
Approximate the variability you’d expect to see in other samples!
Bootstrapping!
A Bootstrap Resample
We can use the statistics from these bootstrap resamples to approximate the true sampling distribution!
Estimating a population parameter
Capture a range of plausible values for the population parameter.
Are more likely to capture the population parameter than a point estimate.
How do I get this plausible range of values?
Bootstrapping!
This is your bootstrap resample.
These are your bootstrap statistics
Bootstrap Distribution
a distribution of the bootstrap statistics from every bootstrap resample
Displays the variability in the statistic that could have happened with repeated sampling.
Approximates the true sampling distribution!
The relationship between penguin’s bill length and body mass for all penguins in the Palmer Archipelago
\[\widehat{\text{bill length}} = 26.899 + 0.004 \times \text{body mass}\]
What slope could have happened in a different sample of penguins?
Generating bootstrap resamples and calculating bootstrap statistics
Step 1: specify()
your response and explanatory variables
Step 2: generate()
bootstrap resamples
Step 3: calculate()
the statistic of interest
Step 1: Specify your variables!
Step 2: Generate your resamples!
reps
– the number of resamples you want to generate
"bootstrap"
– the method that should be used to generate the new samples
Your turn!
Why do we resample with replacement when creating a bootstrap distribution?
When we resample with replacement from our original sample what are we assuming about the original sample?
Step 3: Calculate your statistics!
"slope"
– the stat
istic of interest
The final product
A plausible range of values for \(\beta_1\)
The 95% confidence interval is…
Lower Bound | Upper Bound |
---|---|
0.00351 | 0.00451 |
What do we hope is captured by this interval?
How do we interpret this interval?
“We are 95% confident the slope of the relationship between bill length and body mass for all Adelie, Chinstrap, and Gentoo penguins in the Palmer Archipelago is between 0.00355 and 0.00453.
“For every 1000 gram (~2.2lbs) increase in a penguin’s body mass, we are 95% confident the length of an Adelie, Chinstrap, or Gentoo penguins’s bill will increase between 3.55 and 4.53mm.
Classic interpretation mistakes
“There is a 95% probability that \(\beta_1\) is between 0.00355 and 0.00453.”
“We are 95% confident the sample statistic is in our interval.”
The original sample of 344 penguins were broken down into the following groups (plus 11 NA
values):
Sex | Sample Size |
---|---|
female | 165 |
male | 168 |
Before we resampled with replacement 344 times.
If we resample with replacement 333 times, are we guaranteed to get 165 female penguins and 168 male penguins in each sample?
Step 1: Fitting our Model
Syntax changes
When we have multiple explanatory variables, we need to use the “tilde” (~
) syntax to specify our model. We also use the fit()
function instead of the calculate()
function.
Step 2: Finding our Statistic
term | estimate |
---|---|
intercept | 25.5713814 |
body_mass_g | 0.0042787 |
sexmale | 5.5160611 |
body_mass_g:sexmale | -0.0010301 |
Which statistic measures how different the slopes are between male and female penguins?
Lower Bound | Upper Bound |
---|---|
-0.0019812 | -0.0001181 |
We are 90% confident that the for every 1,000 gram increase in a penguin’s body mass (~2.2lbs), the length of a male penguin’s bill is between 0.118 and 1.98mm shorter than a female penguin’s bill.