Coding a One-way ANOVA – Final Project Work Day

Upcoming Deadlines

  • Lab 7 revisions are due today (by midnight)
  • Lab 8 revisions are due next Wednesday
  • Statistical Critique 2 revisions are due next Wednesday
  • Final revisions on all assignments will be accepted until Sunday, June 9

Don’t forget reflections!

Lab 8 Recap

Common Mistakes


Questions 1 & 2 – Your axis labels should contain units and indicate if your variables were transformed!


Question 6 – Equal variance is not about equal points above and below the line!


Question 11 – Your hypothesis test decision should state what \(\alpha\) was used!


Question 15 – Both theory-based and simulation-based methods have conditions and the reliability of a p-value depends on those conditions.

Coding a One-way ANOVA

Visualizations – Density Ridges

ggplot(data = <NAME OF DATASET>, 
       mapping = aes(x = <NAME OF NUMERICAL VARIABLE>, 
                     y = <NAME OF CATEGORICAL VARIABLE>)
       ) + 
  geom_density_ridges()+
  labs(x = "<TITLE FOR THE X-AXIS>", 
       y = "<TITLE FOR THE Y-AXIS>")

Revisit the code you wrote for Question 8 of Lab 3

Choosing a Statistical Model


Everyone will fit two one-way ANOVA models — one model for each explanatory variable!

  • You will need to choose whether to use a theory-based method or a simulation-based method.
  • Your choice is based on the model conditions — specifically the normality condition.
    • The density ridge plots can be used to check the normality and equal variance conditions.
    • The study design should be used to check the independence condition.

Theory-based Method


To fit a theory-based ANOVA, you use the following code:

my_model <- aov(<NAME OF RESPONSE VARIABLE> ~ <NAME OF EXPLANATORY VARIABLE>, 
                data = <NAME OF DATASET>)


Only use if normality is not violated!

If you look at your density ridge plots and you believe the normality condition is violated, you should not use a theory-based method!

Simulation-based Method

To fit a simulation-based ANOVA, you need to carry out the following steps:

  1. find the observed statistic
  2. simulate statistics that could have happened if the null was true
  3. visualize the distribution of simulated statistics (your permutation distribution)
  4. calculate the p-value for your observed statistic

2 levels versus 3 levels

If you categorical variable has two levels, you need to use a "diff in means" statistic, not an F-statistic!

Model Validity

Are your p-values trustworthy?

Inspect the conditions of your model!


Independence of Observations

Look at how the data were collected!

Normality of Residuals (Responses)

Use the density ridge plots!

Equal Variances of Each Group

Use the density ridge plots!

Study Limitations

Same questions as last project!

Where are we headed next week?

Two-way ANOVA


Code for Plot

ggplot(data = evals_small, 
       mapping = aes(y = age_cat, x = min_eval, fill = gender)) + 
  geom_density_ridges(alpha = 0.5) + 
  scale_fill_brewer(palette = "PRGn") +
  labs(y = "",
       title = "Gaps between male and female faculty evaluation scores become more pronounced with age", 
       x = "", 
       fill = "Professor's Sex") +
  theme(legend.position = "top")

Legend Positions

There are four options for legend.position

  1. "none" (removes the legend)
  2. "left" (difficult to read with the y-axis label)
  3. "right" (the default position)
  4. "bottom" (difficult to read with the x-axis label)
  5. "top" (generally my preference)