Final Project Guidelines
0.1 Your Task
For this project, you are expected to use a two-way ANOVA to investigate the relationship between one numerical response variable and two categorical explanatory variables. You are permitted to use the same dataset as your Midterm Project, so long as there are at least two categorical variables to choose from.
If you would like to analyze a discrete numerical variable (e.g., year) as a categorical variable, you will need to convert that variable into a categorical variable in R
as R
assumes all variables with numbers should be numerical.
You will need Dr. Theobold’s help to perform this task. Dr. Theobold will work with you to convert your variable as long as you request help before Friday, May 30 at 4pm.
1 Introduction
- In no fewer than six (6) sentences describe how the data were collected, the context of the data. Your description should address the following questions:
- Are these data from a study or a publication?
- What question do these data address?
- Why were the data collected?
- How were the data collected?
You may need to look up the publication(s) the data are associated with to obtain information on how they were collected.
- State the question(s) of interest you will address with your statistical analysis. The more specific you define the question of interest here, the easier the rest of the analysis and report will be. The research questions should start with, “Are there differences in…” and should be as specific as possible. Your Conclusion section should directly address the question(s) you pose here.
You will be fitting two one-way ANOVA models, so you have a research question for each model!
2 Methods
This section should lay out the steps, decisions, and logic leading to the statistical model you will use to answer the research question of interest.
- Describe the response and explanatory variables, how they were measured and their associated units. For categorical variables, describe the levels of the categorical variable.
You may need to look up the publication(s) the data are associated with to obtain information on how each variable was measured by the researchers.
If the levels of your variable are abbreviations, you are expected to state exactly what each abbreviation means (e.g., CC
represents the “clear cut” section of the forest).
Produce data visualizations exploring the relationship(s) you are interested in investigating. For your project everyone will have two visualizations:
- a visualization of the relationship between your response variable and explanatory variable 1
- a visualization of the relationship between your response variable and explanatory variable 2
In Lab 3 you learned how to make density ridge plots, the most recommended visualization for numerical and categorical variables. For your project you are required to use density ridge plots, boxplots will not be accepted. If you are unsure how to accomplish this task, look back over Lab 3 and the Week 3 R resources.
Keep in mind, every visualization should have nicely formatted axis labels (with units)!
Describe what you see in the visualizations, making direct references to the plots!
Outline the appropriate statistical method (simulation-based / theory-based) you will use for your analysis. Your justification should make direct reference to the visualizations.
If you’re not sure which model to choose, I would recommend you review the chapter on ANOVA from the Introduction to Modern Statistics textbook.
You are allowed to use different methods to address each research question. Meaning, if you choose to use a simulation-based method for one question, that does not mean you must use a simulation-based method for the second question. You could choose to use a theory-based method for the second question.
3 Results
In this section you will fit your statistical models and display the results.
- Fit two one-way ANOVA models – one model for each categorical explanatory variable.
- These models should use the methods stated in the Statistical Methods section.
- Obtain the p-value for each hypothesis test.
Simulation-based Method
This section needs to include:
- a plot of the null distribution (with a nice x-axis label!)
- the computed p-value
Theory-based Method
This section needs to include:
- the ANOVA table
4 Discussion
In this section you will write up your findings for your question of interest.
At the beginning of your analysis, you must state what \(\alpha\) threshold was used to make decisions regarding your hypothesis tests.
- Based on the p-value from your ANOVA, state what decision was reached for the hypothesis in the one-way ANOVA model.
- Based on the decision you made, state what you can conclude regarding the relationship between your variables.
- Propose a possible explanation for why these results were observed.
You might need to do a bit of independent research on why you obtained the results you did.
- Describe whether you believe the tests you performed are “reliable.”
- The trustworthiness of these models depends on the model conditions! You need to check each of the model conditions.
If you find through the study design and / or your visualizations that certain model conditions are violated, you are expected to do your best to remedy these violations. If you need help figuring out how to do this, email Dr. Theobold before Friday, May 30 at 4pm!
5 Conclusions
In this section, you will provide a short summary of the key findings of your analysis, and their connection to the broader scientific community.
Based on your visualizations and the ANOVA model, what is your conclusion for each of your research questions?
What are the implications of this research for science in general?
Based on the sampling methods used, what larger population can you infer the results or your analysis onto?
Based on the design of the study, what type of statements can be made about the relationship between the explanatory and response variables. Specifically, can cause-and-effect statements be made?
Your statement needs to include a description of (1) how the data were collected, and (2) the population to whom the results can be applied. You must justify your reasoning for #2 using information from the design of the study.
Generic statements (e.g., “these observations were representative so we can infer onto the entire population”) are not sufficient. Your description should include a description of why the sample is / is not representative of the population and how that influences the population you can infer the results onto.
Your statement needs to include a description of (1) how the study was designed, and (2) what statements can be made about the relationship between the variables. You must justify your reasoning for #2 by making direct reference to the variables included in the study.
Generic statements (e.g., “this was an observational study so we can’t make causal statements”) are not sufficient. Your description should include a description of why the study was observational and why you cannot make causal statements for observational studies.
6 Submission
You are expected to submit your rendered HTML file by the deadline. The entirety of the content for your project is expected to be included in your HTML file.
If you do not submit a first draft by the stated deadline, you will not receive feedback from Dr. Theobold. It is your responsibility to find a time to meet with Dr. Theobold to discuss your project before you submit your final draft.
If you do not submit a final version of your project by the stated deadline, your grade will be based on your first draft. If you did not submit a first draft, your grade on the project will be an F.
If you do not submit an HTML for the final version of your project because you were unable to render your project, you will receive a one letter grade deduction. Meaning, if you submit a Google Doc or a Word file with your project, you will receive a letter grade deduction in your project grade.