Step 1 - Due by tonight!
Step 2 - Due by Thursday
Step 3 - Due by Sunday
Before…
Now…
Goal of an ANOVA
Analysis of variance (ANOVA) compares the means of three of more groups to detect if the means of the groups are different.
How???
We want visualizations that allow for us to easily compare:
What can you say about the differences between the age groups?
What can you say about the variability within the age groups?
Step 1: Compare your groups
Step 2: Find the overall mean
This ignores the groups and finds one mean for every observation!
Step 3: Find the group means
Step 4: Calculate the sum of squares between groups
Step 5: Calculate the sum of squares within groups
Step 6: Calculate the F-statistic
Step 7: Find the p-value
F-distribution
An \(F\)-distribution is a variant of the \(t\)-distribution, and is also defined by degrees of freedom.
This distribution is defined by two different degrees of freedom:
Two degrees of freedom!
Changing the numerator degrees of freedom
Changing the denominator degrees of freedom
Do you always use an F-distribution to get the p-value?
NO!
Observations are independent within groups and between groups
The distribution of residuals for each group is approximately normal
The spread of the distributions are similar across groups
Which condition(s) are required to use “theory-based” methods?
All three!
Which condition(s) are required to use “simulation-based” methods?
All but normality!
What do you think? Which method should we use?
Response: min_eval (numeric)
Explanatory: age_cat (factor)
# A tibble: 1 × 1
stat
<dbl>
1 1.41
How could we use cards to simulate what minimum evaluation score a professor would have gotten, if their score was independent from their age?
Why doesn’t the distribution have negative numbers?
For a p-value of 0.254, what decision would you reach regarding your hypothesis test?
What would you conclude regarding the mean minimum evaluation score for different age groups of faculty?
# A tibble: 2 × 6
term df sumsq meansq statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 age_cat 3 1.24 0.414 1.41 0.244
2 Residuals 90 26.4 0.293 NA NA
How was the statistic
calculated?
What distribution was used to calculate the p.value
?
For a p-value of 0.244, what decision would you reach regarding your hypothesis test?
What would you conclude regarding the mean minimum evaluation score for different age groups of faculty?
Did the two methods yield different results?
In 4-6 sentences, introduce / describe your data.
Outline the two questions your research seeks to address.