Midterm Project Guidelines
Your Task
For this project, you are expected to use a multiple linear regression to investigate the relationship between the variables you outlined in your Midterm Project Proposal. Your project is required to have at least two explanatory variables, one of which must be numerical. You are expected to justify why you chose each explanatory variable in the Introduction section of your report.
Introduction
- In 4-6 sentences describe how the data were collected, the context of the data (e.g., Are they from a study or a publication?), and the background of the research problem (e.g., What question do these data address? Why were the data collected?)
You may need to look up the publication(s) the data are associated with to obtain information on how they were collected.
- State the question(s) of interest you will address with your statistical analysis. The more specific you define the question of interest here, the easier the rest of the analysis and report will be. The research questions should start with, “What is the relationship between…” and should be as specific as possible. Your Findings section should directly address the question(s) you pose here.
Methods
This section should lay out the steps, decisions, and logic leading to the statistical model you will use to answer the research question of interest.
Describe the response and explanatory variables, how they were measured and their associated units. For categorical variables, describe the levels of the categorical variable.
Produce data visualizations exploring the relationship(s) you are interested in investigating, contrasting the need for a second explanatory variable.
- For a multiple linear regression with one numerical variable and one categorical variable, you should produce two visualizations: one with different slopes and one with parallel slopes.
- For a multiple linear regression with two numerical variables, you should produce three visualizations: one with both explanatory variables included, and two with each explanatory variable on its own.
Describe what you see in the visualizations, making direct references to the plots!
Outline the appropriate statistical model you will use to answer the question(s) of interest that you stated previously. Be specific about why the method being used are appropriate for the investigation at hand making direct reference to the visualizations.
The statistical model you fit in the next section depends on what you see in your visualization. Click here for a flowchart of how you should select the statistical model that is best for your situation.
Findings
In this section you will write up your findings for each research question of interest.
- Fit the statistical model stated at the end of the Methods section
- Obtain the coefficients for the model
Use the get_regression_table()
function from the moderndive package to provide nicely formatted output from your regression model.
Write out the estimated regression equation for your statistical model.
- If your regression contains categorical variables, you can either write your equation out with indicator variables or write out a different equation for each group.
Interpret in the context of the data the coefficients from the regression equation.
Study Limitations
Write a 3-4 sentence statement on what can be inferred from the design of the study and the results of your statistical analysis. Specifically, answer these two questions and comment on their implications:
Based on the sampling method used, what larger population can you infer the results or your analysis onto?
Based on the design of the study, what type of statements can be made about the relationship between the explanatory and response variables. Specifically, can cause-and-effect statements be made?
Conclusions
Based on your visualizations and the regression model, what is your conclusion for the questions of interest?
There should be no mention of p-values in your conclusion!