Final Project Guidelines

0.1 Your Task

For this project, you are expected to use a two-way ANOVA to investigate the relationship between one numerical response variable and two categorical explanatory variables. You are permitted to use the same dataset as your Midterm Project, so long as there are at least two categorical variables to choose from.

Discretizing numerical variables

If you would like to analyze a discrete numerical variable (e.g., number of pets) as a categorical variable, you will need to convert that variable into a categorical variable in R as R assumes all variables with numbers should be numerical.

You will need Dr. Theobold’s help to perform this task. Dr. Theobold will work with you to convert your variable as long as you request help before Friday, May 31 at 4pm.

1 Introduction

1.1 Data Description

In 4-6 sentences describe:

  • how the data were collected
  • the context of the data (e.g., are the data from from a published study?)
  • the background of the research problem (e.g., why were the data collected?)

1.2 Questions of Interest

State the question(s) of interest you will address with your statistical analysis. The more specific you define the question of interest here, the easier the rest of the analysis and report will be. The research questions should be phrased in terms of your alternative hypothesis, meaning it should start with “Does the mean ___ differ based on ___ …”. Your Findings section should directly address the question(s) you pose here.

Multiple research questions

This week you are starting with research questions are appropriate for the one-way ANOVA models you are fitting. However, you may find that you need to add research questions based on the models you fit in Week 10!

2 Methods

This section should lay out the steps, decisions, and logic leading to the statistical model you will use to answer the research question of interest.

  • Describe the response and explanatory variables, how they were measured, and their associated units. For categorical variables, describe the levels of the categorical variable.
Tip

If the levels of your variable are abbreviations, you are expected to state exactly what each abbreviation means (e.g., CC represents the “clear cut” section of the forest).

  • Produce data visualizations exploring the relationship(s) you are interested in investigating. For your project everyone will have three visualizations:

    • a visualization of the relationship between your response variable and explanatory variable 1
    • a visualization of the relationship between your response variable and explanatory variable 2
    • a visualization of the relationship between your response variable and both explanatory variables
Ridge plots

In Lab 3 you learned how to make density ridge plots, the most recommended visualization for numerical and categorical variables. For your project you are required to use density ridge plots, boxplots will not be accepted. If you are unsure how to accomplish this task, look back over Lab 3 and the Week 3 R resources.

Keep in mind, every visualization should have nicely formatted axis labels (with units)!

  • Describe what you see in the visualizations, making direct references to the plots!
Not visually selecting a model

Unlike the Midterm Project, you will not be choosing what statistical model to fit based on the visualizations.

3 Findings

In this section you will write up your findings for your question of interest. I’ve created an example of how this section could look in this document.

\(\alpha\) threshold

At the beginning of your analysis, you must state what \(\alpha\) threshold was used to make decisions regarding your hypothesis tests.

3.1 One-Way ANOVA Models

Everyone starts here! In this section, you need to do the following:

  • Fit two one-way ANOVA models – one model for each categorical explanatory variable
  • Obtain the ANOVA table for the model
  • Based on the ANOVA table, state what decision was reached for the hypothesis in the one-way ANOVA model
  • Based on the decision you made, state what you can conclude regarding the relationship between your variables

3.2 Two-Way ANOVA Additive Model

When to proceed

If you rejected the null hypothesis for both one-way ANOVA models, you can proceed to fitting a two-way ANOVA model

  • Fit a two-way ANOVA additive model

  • Obtain the tidy ANOVA table for the model

  • Based on the ANOVA table, state what decision was reached for each hypothesis in the two-way ANOVA additive model.

  • Based on the decision you made, state what you can conclude regarding the relationship between your variables.

3.3 Model Validity

  • Evaluate the conditions of both one-way ANOVA models
Condition violations

If you find through the study design and / or your visualizations that certain model conditions are violated, you are expected to do your best to remedy these violations. If you need help figuring out how to do this, email Dr. Theobold before Friday, May 31 at 4pm!

  • Based on your analysis of the conditions, describe whether you believe the tests you performed are “reliable.”

4 Study Limitations

Write a 6-8 sentence statement on what can be inferred from the design of the study and the results of your statistical analysis. Specifically, answer these two questions and comment on their implications:

  • Based on the sampling method used, what larger population can you infer the results or your analysis onto?
Tip

Your statement needs to include a description of (1) how the data were collected, and (2) the population to whom the results can be applied. You must justify your reasoning for #2 using information from the design of the study.

Generic statements (e.g., “these observations were representative so we can infer onto the entire population”) are not sufficient. Your description should include a description of why the sample is / is not representative of the population and how that influences the population you can infer the results onto.

  • Based on the design of the study, what type of statements can be made about the relationship between the explanatory and response variables?
Tip

Your statement needs to include a description of (1) how the study was designed, and (2) what statements can be made about the relationship between the variables. You must justify your reasoning for #2 by making direct reference to the variables included in the study.

Generic statements (e.g., “this was an observational study so we can’t make causal statements”) are not sufficient. Your description should include a description of why the study was observational and why you cannot make causal statements for observational studies.

5 Conclusions

Based on the results of your analysis what is your conclusion for the questions of interest? Connect your conclusion(s) to the relationships you saw in the visualizations you made and the results of your hypothesis tests.