Week 8
Week 9
Week 10
Revision Deadlines
If you did not submit revisions by the deadline (or forgot to include your reflections), your assignment is not eligible for additional revisions.
you…
What if I want to know if the population parameter differs from a specific value?
Hypothesis test!
Hypothesis test goal:
Assess how different what we saw in our data is from what could have happened if the null hypothesis was true*
*For hypothesis tests, we live in an alternative universe where \(H_0\) is true
Permutation!
Assumes the original sample is “representative” of observations in the population
Uses the original sample to generate new samples that might have occurred if the null hypothesis was true.
We can use statistics from these resamples to approximate the true sampling distribution under the null!
Like before, we are interested in knowing how a statistic varies from sample to sample.
Knowing a statistic’s behavior helps us make better / more informed decisions!
This helps us know what statistics are more or less likely to occur if the null hypothesis is true.
Quantify how “surprising” what we saw in our data is, if the null hypothesis was true
Permuting!
From your original sample, separate the \(x\) values from the \(y\) values.
Create new ordered pairs by randomly pairing \(x\) values with \(y\) values (permuting the labels).
This is your permuted resample.
These are your permuted statistics.
definition: a distribution of the permuted statistics from every permuted resample
Displays the variability in the statistic that could have happened with repeated sampling, if the null hypothesis was true.
Approximates the true sampling distribution under the null!
How do I get my p-value?
Compare the observed statistic with the statistics produced assuming the null hypothesis was true.
A p-value summarizes the probability of obtaining a sample statistic as or more extreme than what we observed, if the null hypothesis was true.
Your turn!
What is one similarity and one difference between
a permutation distribution
a bootstrap distribution
hbr_maples
dataset!stem_length
: a number denoting the height of the seedling in millimeters
stem_dry_mass
: a number denoting the dry mass of the stem in grams
What condition do we need to be worried about?
\[\widehat{\text{stem dry mass}} = -0.043 + 0.001 \times \text{stem length}\]
What slope could have happened if there was no relationship between stem length and stem dry mass?
Step 1: specify()
your response and explanatory variables
Step 2: hypothesize()
what would happen under the null
Step 3: generate()
permuted resamples
Step 4: calculate()
the statistic of interest
"independence"
– the assumed relationship between the explanatory and response variables under the null hypothesis
Independence of variables
Note! This is different from assuming your observations are independent!
reps
– the number of resamples you want to generate
"permute"
– the method that should be used to generate the new samples
infer
will let you know if you missed something!
Error: Permuting should be done only when doing independence hypothesis test. See `hypothesize()`.
In addition: Warning message:
You have given `type = "permute"`, but `type` is expected to be `"bootstrap"`.
This workflow is untested and the results may not mean what you think they mean.
Why is the hypothesize()
function used to make a null distribution but not for a bootstrap distribution?
What does the null = "independence"
input in hypothesize()
mean? What is it assuming about the variables declared in the specify()
step?
Warning: Please be cautious in reporting a p-value of 0. This result is an approximation
based on the number of `reps` chosen in the `generate()` step.
â„ą See `get_p_value()` (`?infer::get_p_value()`) for more information.
# A tibble: 1 Ă— 1
p_value
<dbl>
1 0
Why did we get a warning?
Need:
The probability of observing a slope statistic (for the relationship between stem length and stem dry mass) as or more extreme than what was observed is less than 1 in 1000, if there was no relationship between a sugar maple’s stem length and stem dry mass.
“The probability that the null hypothesis is true is about 0%.”
You assume the null is true when you calculate a p-value!
You must justify why you believe you can infer onto a population larger than your sample!
Causal statements do not come from random / representative samples!
Excellent Project | Satisfactory Project | Progressing Project | No Credit |
---|---|---|---|
At most one section is marked “Satisfactory” the remainder are marked “Excellent” | At most one section is marked “Progressing” the remainder are marked “Satisfactory” or “Excellent” | At most two sections are marked “Progressing” the remainder are marked “Satisfactory” | At most one section is marked “Satisfactory” or “Excellent” the remainder are marked “Progressing” |