<- gapminder %>%
gapminder2007 filter(year == 2007) %>%
select(country, lifeExp, continent, gdpPercap)
Question 8 (Hypothesis Test Conclusion)
If you rejected the null hypothesis, there is evidence of a linear relationship between the size of a crab and the latitude in which it lives.
If you failed to reject the null hypothesis, there is insufficient evidence of a linear relationship between the size of a crab and the latitude in which it lives.
Notice how both interpretations are in terms of the alternative hypothesis?
If you always write your conclusion in terms of \(H_A\), then you will never accidentally “accept” the null hypothesis.
Question 11 (Interpret Confidence Interval):
We need to be specific about the what parameter we believe is in our interval. The slope statistic is measuring the relationship between which variables?
We’re analyzing the linear relationship between the size of a crab and the latitude in which it lives!
What population does this interval apply to? Where were these crabs sampled from? That is the population your interval applies to!
The fiddler crabs were sampled from 13 salt marshes, but where were these salt marshes located? What population might they belong to?
Question 12 (Apply Confidence Interval)
Bergmann’s Rule suggests that organisms are larger in larger latitudes (further from the equator).
Does your interval suggest this is true for fiddler crabs?
Step 1 - Due by tonight!
Step 2 - Due by Thursday
Step 3 - Due by Sunday
Compares the means of three of more groups to detect if the means of the groups are different.
In my mind, there are two different options for incorporating an ANOVA into your analysis.
Option 1:
There is a categorical variable with 3+ groups that you would like to use for your regression.
Option 2:
There is a numerical variable that you would like to use to create 3+ groups to incorporate into your regression.
Let’s check out what I mean!
Independence Violations
In Lab 8 you should have found that each country has multiple observations, which are not independent. One way to get around this is to filter()
the data to only use one year.
Independence Violations
Last week we noticed that there are multiple observations for each faculty member which are not independent. One way to get around this is to collapse these multiple observations into a single number.
We want visualizations that allow for us to easily compare:
What can you say about the differences between the age groups?
What can you say about the variability within the age groups?
This ignores the groups and finds one mean for every observation!
Step 5: Calculate the sum of squares within groups
Step 6: Calculate the F-statistic
Step 7: Find the p-value
F-distribution
An \(F\)-distribution is a variant of the \(t\)-distribution, and is also defined by degrees of freedom.
This distribution is defined by two different degrees of freedom:
Let’s play around and see how these two degrees of freedom change the shape of the F-distribution:
Do you always use an F-distribution to get the p-value?
NO!
Observations are independent within groups and between groups
The distribution of residuals for each group is approximately normal
The spread of the distributions are similar across groups
Which condition(s) are required to use “theory-based” methods?
All three!
Which condition(s) are required to use “simulation-based” methods?
All but normality!
What do you think? Which method should we use?
Response: min_eval (numeric)
Explanatory: age_cat (factor)
# A tibble: 1 × 1
stat
<dbl>
1 1.41
How could we use cards to simulate what minimum evaluation score a professor would have gotten, if their score was independent from their age?
Why doesn’t the distribution have negative numbers?
For a p-value of 0.24, what decision would you reach regarding your hypothesis test?
What would you conclude regarding the mean minimum evaluation score for different age groups of faculty?
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
age_cat | 3 | 1.24198 | 0.4139932 | 1.412556 | 0.2443174 |
Residuals | 90 | 26.37728 | 0.2930808 | NA | NA |
Is this the same statistic
as before?
What distribution was used to calculate the p.value
?
For a p-value of 0.244, what decision would you reach regarding your hypothesis test?
What would you conclude regarding the mean minimum evaluation score for different age groups of faculty?
Did the two methods yield different results?
What does that imply about the normality condition?
evaluate the within group independence condition
evaluate the between group independence condition
Outline the two questions your research seeks to address.