Week 9: Inference for Many Means (ANOVA)
Welcome!
In this week’s coursework we are transitioning into our last topic of the quarter—comparing multiple means. These comparisons have a specific name, ANalysis of VAriance (ANOVA).
We are going to use all of the simulation-based methods we learned previously for this new context. An ANOVA relies on a new statistic, the \(F\)-statistic, to summarize how different 3 or more means are from each other.
The primary focus of an ANOVA is to detect if the means of 3 or more groups are different. Because of this we focus on hypothesis tests for a difference in the group means and do not use confidence intervals. We will generate permutation distributions of \(F\)-statistics that we could have observed if the null hypothesis was true (there is no difference in the group means).
0.1 Learning Outcomes
By the end of this coursework you should be able to:
describe what an ANOVA tests for
use a visualization to outline how the following are calculated:
- total sum of squares
- group sum of squares
- residual sum of squares
describe why the mean squares of groups is called the “between group variability”
describe why the mean square error is called the “within group variability”
outline how an F-statistic is calculated
explain what a “large” or a “small” F-statistic indicates
describe the conditions for performing an ANOVA procedure
outline when it is appropriate to use an \(F\)-distribution in an ANOVA
explain the similarities and differences between parametric (\(F\)-based) methods and non-parametric (simulation-based) methods
use R to:
- generate a permutation distribution for F-statistics
- visualize the permutation distribution
- calculate the observed F-statistic statistic
- calculate a p-value for a hypothesis test
1 Prepare
1.1 Textbook Reading
Required Reading: Inference for Comparing Many Means
Reading Guide – Due Monday by the start of class
1.2 Concept Quiz – Due Monday by the start of class
- Which of the following are true about the mean squares between groups? (Select all that apply)
- it is a standardized measure of the variability in responses between groups
- it compares the mean of each group to the overall mean across all groups
- it compares the observations within each group to the mean of that group
- it is used as the numerator in an F-statistic
- it is used as the denominator in an F-statistic
- it is found by dividing the sum of squares between groups by the number of groups minus 1 (\(k\) - 1)
- it is found by dividing the sum of squares between groups by the sample size minus the number of groups (\(n - k\))
- Which of the following are true about the mean square errors? (Select all that apply)
- it is a standardized measure of the variability in responses within each group
- it compares the mean of each group to the overall mean across all groups
- it compares the observations within each group to the mean of that group
- it is used as the numerator in an F-statistic
- it is used as the denominator in an F-statistic
- it is found by dividing the sum of square errors by the number of groups minus 1 (\(k\) - 1)
- it is found by dividing the sum of square errors by the sample size minus the number of groups (\(n - k\))
- An F-statistic uses which formula?
\(\frac{MSG}{MSE}\)
\(\frac{SSG}{SSE}\)
\(\frac{MSE}{MSG}\)
\(\frac{SSE}{SSG}\)
- Ideally, in an ANOVA we’d like to see… (select all that apply)
- large differences in the means of the groups
- small differences in the means of the groups
- large variability in the observations within each group
- small variability in the observations within each group
- If the null hypothesis that the means of four groups are all the same is rejected using ANOVA at a 5% significance level, then… (select all that apply)
we can then conclude that all the means are different from one another.
the variability between groups is higher than the variability within groups.
we can then conclude that at least one mean is different from the others.
the pairwise analysis will identify at least one pair of means that are significantly different.
an appropriate \(\alpha\) to be used in pairwise comparisons is \(\frac{0.05}{4} = 0.0125\) since there are four groups.
True or false, as the total sample size increases, the degrees of freedom for the residuals increases.
True or false, the constant variance condition can be somewhat relaxed when the sample sizes are large.
True or false, the independence assumption can be relaxed when the total sample size is large.
True or false, the normality condition is very important when the sample sizes of each group are small.