Week 10: Two-way ANOVA

Week 10

Wrapping Up Revisions

The final round of revisions on all assignments are due by Sunday, June 9.

One round of revisions

You will only have time for one round of revisions on Lab 8 and Statistical Critique 2, so make sure you feel confident about your revisions.

Final Project

  • Feedback (from me) will be provided no later than Thursday evening
  • Peer feedback on Wednesday
    • Print your report!

Two-Way ANOVA Models

Two-way ANOVA


Goal:

Assess if multiple categorical variables have a relationship with the response.

Modeling Options

Additive Model

Each explanatory variable has a meaningful relationship with the response, conditional on the variable(s) included in the model.

Interaction Model

The relationship between one categorical explanatory variable and the response differs based on the values of another categorical variable.

What are we looking for?

Another way to think about it…

Interaction Two-way ANOVA

Research Question

Does the relationship between mean stem dry mass and calcium treatment for sugar maples differ based on the year the treatment was applied?


Or, because the study was an experiment…

Does the effect of calcium treatment on the stem dry mass of sugar maples differ based on the year of the treatment?

Conditions

  • Independence of observations

Observations are independent within groups and between groups

  • Equal variability of the groups

The spread of the distributions are similar across groups

  • Normality of the residuals / responses

The distribution of responses for each group is approximately normal

Theory-based Two-Way ANOVA

aov(stem_dry_mass ~ watershed * year_cat, 
    data = hbr_maples_small)


term Df Sum Sq Mean Sq F value Pr(>F)
watershed 1 0.012460942 0.0124609422 56.61926 1.322406e-12
year_cat 1 0.116975377 0.1169753767 531.50546 1.011525e-60
watershed:year_cat 1 0.003819154 0.0038191541 17.35324 4.452323e-05
Residuals 221 0.048638368 0.0002200831 NA NA

The watershed:year_cat line is testing if the relationship between the calcium treatment (watershed) and mean stem dry mass differs between 2003 and 2004.


Does it?

How are those p-values calculated?

The p-values in the previous table use Type I sums of squares.

Type I sums of squares are “sequential,” meaning variables are tested in the order they are listed.


So, the p-value for watershed:year_cat is conditional on including watershed and year_cat as explanatory variables.


Is that what we want????

Testing “main effects”

If there is evidence of an interaction, we do not test if the main effects are “significant.”


Why?


The interactions depend on these variables, so they should be included in the model!

Interpreting “main effects”

When interaction effects are present, an interpretation of main effects is incomplete or misleading

Additive Two-way ANOVA

What if our analysis found no evidence of an interaction?

Testing for a relationship for each variable

aov(stem_dry_mass ~ elevation + watershed, 
    data = hbr_maples_small) %>% 
  tidy()


term Df Sum Sq Mean Sq F value Pr(>F)
elevation 1 0.0006503957 6.503957e-04 9.515513 2.446313e-03
watershed 1 0.0031676880 3.167688e-03 46.344364 2.531165e-10
Residuals 143 0.0097742065 6.835109e-05 NA NA


Do you think it matters which variable comes first?

Let’s see…

aov(stem_dry_mass ~ watershed + elevation, 
    data = hbr_maples) %>% 
  tidy()


term Df Sum Sq Mean Sq F value Pr(>F)
watershed 1 0.0065062507 6.506251e-03 86.658504 9.073535e-18
elevation 1 0.0005821935 5.821935e-04 7.754392 5.791052e-03
Residuals 237 0.0177937692 7.507919e-05 NA NA


Did we get the same p-values as before?

Sequential Versus Partial Sums of Squares

Similar to before, the p-values in the ANOVA table use Type I (sequential) sums of squares.

  • The p-value for each variable is conditional on the variable(s) that came before it.
  • The p-value for elevation is conditional on watershed being included in the model
  • The p-value for watershed is conditional on…nothing.

If we want the p-value for each explanatory variable to be conditional on every variable included in the model, then we need to use a different type of sums of squares!

Partial Sums of Squares

Type III sums of squares are “partial,” meaning every term in the model is tested in light of the other terms in the model.

  • The p-value for elevation is conditional on watershed being included in the model
  • The p-value for watershed is conditional on elevation being included in the model

Only different for variables that were not first

We could have used Type III sums of squares for the interaction model and would have gotten the same p-value!

Getting the Conditional Tests for Every Variable

Load in the car package!

library(car)

water_elev_lm <- lm(stem_dry_mass ~ watershed + elevation, 
    data = hbr_maples_small) 

Anova(water_elev_lm, type = "III")

Additive Model Hypothesis Tests

term Sum Sq Df F value Pr(>F)
(Intercept) 0.0375881724 1 549.92788 7.417552e-51
watershed 0.0031676880 1 46.34436 2.531165e-10
elevation 0.0007501911 1 10.97555 1.169414e-03
Residuals 0.0097742065 143 NA NA


What do you think the is the elevation line testing?

What would you decide?

Keeping “Non-significant” Variables


Should you always remove variables with “large” p-values from an ANOVA?


No!

Even “non-significant” variables explain some amount of the variation in the response. Which makes your estimates of a treatment effect more precise!

Steps for Final Project

Hypothesis Test Steps

Step 1: Fit a one-way ANOVA model for each categorical variable

Step 2: Decide if each explanatory variable has a meaningful relationship with the response variable

  • If yes, then go to Step 3!
  • If no, then report which variable (if any) has the strongest relationship with the response.

Step 3 – Fit an Additive Two-way ANOVA

If there is evidence that both variables have a relationship with the response variable, then you fit an additive two-way ANOVA.

library(car) 

my_model <- lm(<NAME OF RESPONSE VARIABLE> ~ <NAME OF EXPLANATORY VARIABLE 1> + <NAME OF EXPLANATORY VARIABLE 2>,
               data = <NAME OF DATASET>) 

Anova(my_model, type = “III”) %>% 
  tidy()

Don’t forget to load in the car package!

What about interaction models?



For the sake of time, we are not fitting interaction models for the Final Project.

Do you always expect your main effects to be “significant” in a two-way ANOVA?

Work Session

Your Options

  1. Complete your revisions on Lab 8
  2. Complete your revisions on Statistical Critique 2
  3. Fit your two-way ANOVA model for your Final Project and interpret the results
  4. Finish any remaining revisions on labs