One-Way ANOVA to Two-Way ANOVA Model Selection Process
We will be using a “forward” selection process for deciding on the “best” ANOVA model. Meaning, we will start with a simple model and keep adding complexity until it seems like the complexity isn’t “worth it.”
Step 1 – Visualize each model
Are the groups different?
We are looking to see if the mean of the response variable (IMDB rating) differs between the explanatory variable groups (genre or era). Based on these visualizations, it doesn’t seem like the mean IMDB rating differs based on the era or the genre of the movie.
Step 2 – Decide what method should be used
We have two methods to choose from when performing a one-way ANOVA — theory-based or simulation-based. The decision for which method to use is decided by the conditions of our analysis. Specifically, the normality condition is what dictates which method we use.
Based on the density ridge plots, it appears that both genres have about the same spread. For the eras, it looks like movies from the 2000s have the largest spread, but it is not much greater than the 1970s. So, I would say the equal variance condition is not violated.
Most of these density plots are unimodal and fairly symmetric. The most problematic distribution is the romance movies, with three different modes.
Since the romance group is 1/2 of the genre groups, I don’t feel great using theory-based methods for this one-way ANOVA.
I do, however, think using theory-based methods for testing the eras would be okay!
Step 3 – Fit the model(s)
Testing era
aov(rating ~ era, data = movies) %>%tidy()
# A tibble: 2 × 6
term df sumsq meansq statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 era 3 4.59 1.53 0.669 0.575
2 Residuals 52 119. 2.28 NA NA
With a p-value of 0.575 (from an F-statistic of 0.669 with 3 and 52 degrees of freedom) at a significance level of 0.1, I fail to reject the null hypothesis. Thus, the data have unconvincing evidence that the mean movie rating differs for at least one era.
Hypotheses
The era line of the ANOVA table is testing the following hypotheses:
\(H_0\): The mean movie rating is the same for every era
\(H_A\): The mean movie rating is different for at least one era
Testing genre
Two groups
Note that genre has only two levels—action and romance. So, when using simulation-based methods, we need to use a "diff in means" statistic instead of an "F" statistic.
obs_diff <- movies %>%specify(response = rating, explanatory = genre) %>%calculate(stat ="diff in means", order =c("Action", "Romance") )permutation_dist <- movies %>%specify(response = rating, explanatory = genre) %>%hypothesise(null ="independence") %>%generate(reps =1000, type ="permute") %>%calculate(stat ="diff in means", order =c("Action", "Romance") )visualise(permutation_dist) +labs(x ="Simulated Difference in Mean IMDB Rating (Action - Romance)")
get_p_value(permutation_dist, obs_stat = obs_diff, direction ="two-sided")
# A tibble: 1 × 1
p_value
<dbl>
1 0.184
With a p-value of 0.184 from an observed difference in means of -0.578 at a significance level of 0.1, I fail to reject the null hypothesis. Thus, the data have unconvincing evidence that the mean movie rating of action movie is different from romance movies.
Hypotheses
For our hypothesis test, we are testing if the mean movie rating is different for at least genre, but technically there are two genres (action, romance). So, we are actually testing following hypotheses:
\(H_0\): The mean movie rating is the same for romance and action movies
\(H_A\): The mean movie rating is different for romance and action movies
Step 4: Decide How to Proceed
You will decide what model to fit next depending on the results of your one-way ANOVA models.
If you failed to reject the null hypothesis for either one-way ANOVA models…
You do not fit any additional models.
If you rejected the null hypothesis for both one-way ANOVA models…
your next step is to fit an additive two-way ANOVA model
Additive two-way ANOVA
In Week 10 we will learn about additive and interaction two-way ANOVA models, so you will have the tools to fit these additional models then!