Final Project Guidelines
0.1 Your Task
For this project, you are expected to use a two-way ANOVA to investigate the relationship between one numerical response variable and two categorical explanatory variables. You are permitted to use the same dataset as your Midterm Project, so long as there are at least two categorical variables to choose from.
If you would like to analyze a discrete numerical variable (e.g., number of pets) as a categorical variable, you will need to convert that variable into a categorical variable in R
as R
assumes all variables with numbers should be numerical.
You will need Dr. Theobold’s help to perform this task. Dr. Theobold will work with you to convert your variable as long as you request help before Friday, May 31 at 4pm.
1 Introduction
1.1 Data Description
In 4-6 sentences describe:
- how the data were collected
- the context of the data (e.g., are the data from from a published study?)
- the background of the research problem (e.g., why were the data collected?)
1.2 Questions of Interest
State the question(s) of interest you will address with your statistical analysis. The more specific you define the question of interest here, the easier the rest of the analysis and report will be. The research questions should be phrased in terms of your alternative hypothesis, meaning it should start with “Does the mean ___ differ based on ___ …”. Your Findings section should directly address the question(s) you pose here.
This week you are starting with research questions are appropriate for the one-way ANOVA models you are fitting. However, you may find that you need to add research questions based on the models you fit in Week 10!
2 Methods
This section should lay out the steps, decisions, and logic leading to the statistical model you will use to answer the research question of interest.
- Describe the response and explanatory variables, how they were measured, and their associated units. For categorical variables, describe the levels of the categorical variable.
If the levels of your variable are abbreviations, you are expected to state exactly what each abbreviation means (e.g., CC
represents the “clear cut” section of the forest).
Produce data visualizations exploring the relationship(s) you are interested in investigating. For your project everyone will have three visualizations:
- a visualization of the relationship between your response variable and explanatory variable 1
- a visualization of the relationship between your response variable and explanatory variable 2
- a visualization of the relationship between your response variable and both explanatory variables
In Lab 3 you learned how to make density ridge plots, the most recommended visualization for numerical and categorical variables. For your project you are required to use density ridge plots, boxplots will not be accepted. If you are unsure how to accomplish this task, look back over Lab 3 and the Week 3 R resources.
Keep in mind, every visualization should have nicely formatted axis labels (with units)!
- Describe what you see in the visualizations, making direct references to the plots!
Unlike the Midterm Project, you will not be choosing what statistical model to fit based on the visualizations.
3 Findings
In this section you will write up your findings for your question of interest. I’ve created an example of how this section could look in this document.
At the beginning of your analysis, you must state what \(\alpha\) threshold was used to make decisions regarding your hypothesis tests.
3.1 One-Way ANOVA Models
Everyone starts here! In this section, you need to do the following:
- Fit two one-way ANOVA models – one model for each categorical explanatory variable
- Obtain the ANOVA table for the model
- Based on the ANOVA table, state what decision was reached for the hypothesis in the one-way ANOVA model
- Based on the decision you made, state what you can conclude regarding the relationship between your variables
3.2 Two-Way ANOVA Additive Model
If you rejected the null hypothesis for both one-way ANOVA models, you can proceed to fitting a two-way ANOVA model
Fit a two-way ANOVA additive model
Obtain the tidy ANOVA table for the model
Based on the ANOVA table, state what decision was reached for each hypothesis in the two-way ANOVA additive model.
Based on the decision you made, state what you can conclude regarding the relationship between your variables.
3.3 Model Validity
- Evaluate the conditions of both one-way ANOVA models
If you find through the study design and / or your visualizations that certain model conditions are violated, you are expected to do your best to remedy these violations. If you need help figuring out how to do this, email Dr. Theobold before Friday, May 31 at 4pm!
- Based on your analysis of the conditions, describe whether you believe the tests you performed are “reliable.”
4 Study Limitations
Write a 6-8 sentence statement on what can be inferred from the design of the study and the results of your statistical analysis. Specifically, answer these two questions and comment on their implications:
- Based on the sampling method used, what larger population can you infer the results or your analysis onto?
Your statement needs to include a description of (1) how the data were collected, and (2) the population to whom the results can be applied. You must justify your reasoning for #2 using information from the design of the study.
Generic statements (e.g., “these observations were representative so we can infer onto the entire population”) are not sufficient. Your description should include a description of why the sample is / is not representative of the population and how that influences the population you can infer the results onto.
- Based on the design of the study, what type of statements can be made about the relationship between the explanatory and response variables?
Your statement needs to include a description of (1) how the study was designed, and (2) what statements can be made about the relationship between the variables. You must justify your reasoning for #2 by making direct reference to the variables included in the study.
Generic statements (e.g., “this was an observational study so we can’t make causal statements”) are not sufficient. Your description should include a description of why the study was observational and why you cannot make causal statements for observational studies.
5 Conclusions
Based on the results of your analysis what is your conclusion for the questions of interest? Connect your conclusion(s) to the relationships you saw in the visualizations you made and the results of your hypothesis tests.