aov(stem_dry_mass ~ watershed * year_cat,
data = hbr_maples_small)
The final round of revisions on all assignments are due by Sunday, June 9.
One round of revisions
You will only have time for one round of revisions on Lab 8 and Statistical Critique 2, so make sure you feel confident about your revisions.
Two-way ANOVA
Goal:
Assess if multiple categorical variables have a relationship with the response.
Additive Model
Each explanatory variable has a meaningful relationship with the response, conditional on the variable(s) included in the model.
Interaction Model
The relationship between one categorical explanatory variable and the response differs based on the values of another categorical variable.
Does the relationship between mean stem dry mass and calcium treatment for sugar maples differ based on the year the treatment was applied?
Or, because the study was an experiment…
Does the effect of calcium treatment on the stem dry mass of sugar maples differ based on the year of the treatment?
Observations are independent within groups and between groups
The spread of the distributions are similar across groups
The distribution of responses for each group is approximately normal
term | Df | Sum Sq | Mean Sq | F value | Pr(>F) |
---|---|---|---|---|---|
watershed | 1 | 0.012460942 | 0.0124609422 | 56.61926 | 1.322406e-12 |
year_cat | 1 | 0.116975377 | 0.1169753767 | 531.50546 | 1.011525e-60 |
watershed:year_cat | 1 | 0.003819154 | 0.0038191541 | 17.35324 | 4.452323e-05 |
Residuals | 221 | 0.048638368 | 0.0002200831 | NA | NA |
The watershed:year_cat
line is testing if the relationship between the calcium treatment (watershed
) and mean stem dry mass differs between 2003 and 2004.
The p-values in the previous table use Type I sums of squares.
Type I sums of squares are “sequential,” meaning variables are tested in the order they are listed.
So, the p-value for watershed:year_cat
is conditional on including watershed
and year_cat
as explanatory variables.
Is that what we want????
If there is evidence of an interaction, we do not test if the main effects are “significant.”
Why?
The interactions depend on these variables, so they should be included in the model!
When interaction effects are present, an interpretation of main effects is incomplete or misleading
term | Df | Sum Sq | Mean Sq | F value | Pr(>F) |
---|---|---|---|---|---|
elevation | 1 | 0.0006503957 | 6.503957e-04 | 9.515513 | 2.446313e-03 |
watershed | 1 | 0.0031676880 | 3.167688e-03 | 46.344364 | 2.531165e-10 |
Residuals | 143 | 0.0097742065 | 6.835109e-05 | NA | NA |
Do you think it matters which variable comes first?
term | Df | Sum Sq | Mean Sq | F value | Pr(>F) |
---|---|---|---|---|---|
watershed | 1 | 0.0065062507 | 6.506251e-03 | 86.658504 | 9.073535e-18 |
elevation | 1 | 0.0005821935 | 5.821935e-04 | 7.754392 | 5.791052e-03 |
Residuals | 237 | 0.0177937692 | 7.507919e-05 | NA | NA |
Did we get the same p-values as before?
Similar to before, the p-values in the ANOVA table use Type I (sequential) sums of squares.
elevation
is conditional on watershed
being included in the modelwatershed
is conditional on…nothing.If we want the p-value for each explanatory variable to be conditional on every variable included in the model, then we need to use a different type of sums of squares!
Type III sums of squares are “partial,” meaning every term in the model is tested in light of the other terms in the model.
elevation
is conditional on watershed
being included in the modelwatershed
is conditional on elevation
being included in the modelOnly different for variables that were not first
We could have used Type III sums of squares for the interaction model and would have gotten the same p-value!
Load in the car
package!
term | Sum Sq | Df | F value | Pr(>F) |
---|---|---|---|---|
(Intercept) | 0.0375881724 | 1 | 549.92788 | 7.417552e-51 |
watershed | 0.0031676880 | 1 | 46.34436 | 2.531165e-10 |
elevation | 0.0007501911 | 1 | 10.97555 | 1.169414e-03 |
Residuals | 0.0097742065 | 143 | NA | NA |
What do you think the is the elevation
line testing?
What would you decide?
Should you always remove variables with “large” p-values from an ANOVA?
No!
Even “non-significant” variables explain some amount of the variation in the response. Which makes your estimates of a treatment effect more precise!
Step 1: Fit a one-way ANOVA model for each categorical variable
Step 2: Decide if each explanatory variable has a meaningful relationship with the response variable
If there is evidence that both variables have a relationship with the response variable, then you fit an additive two-way ANOVA.
Don’t forget to load in the car
package!
For the sake of time, we are not fitting interaction models for the Final Project.