Midterm Project Work Day

Reminders About Deadlines

Revision Deadlines

  • Lab 3 Revisions are due next Wednesday (May 7)

  • Statistical Critique revisions are due next Wednesday (May 7)

  • The first draft of your Midterm Project is due on Sunday at midnight.

Deadline Extension

A deadline extension is permitted for the first draft. Deadline extensions are not permitted for the final version (due next week).

Comments from Project Proposals

Introduction Versus Methods

  • The description of your data goes in your Introduction.

  • The description of your variables goes at the beginning of your Methods, in the Variables subsection!

Data Description – A Word of Caution

Be cautious in how you are using the resources I provided—do not copy these descriptions.

Inserting a verbatim copy of the descriptions seen in the data resources is plagiarism.

In text citation

If you wish to borrow elements of these descriptions, you need to quote them and provide a reference to the resource. e.g., “This data set has been of interest to medical researchers who are studying the relation between habits and practices of expectant mothers and the birth of their children” (United States Department of Health and Human Services, 2014).

Variable Selection

Be sure to review the feedback provided in your Midterm Project Proposal before starting your first draft.

Specifically, make sure I did not have any objections to the variables you chose for your analysis!

Specific Dataset Advice

  • For the and_vertebrates data, you should include species as an explanatory variable. If you don’t you are assuming the same relationship applies to trout AND salamanders.

  • For the hbr_maples data, you cannot use year as a numerical variable. There are only two years of data!

    • If you want to use year as a categorical variable, let Dr. T know and they will help you write code to change the data type of this variable!

Coding a Multiple Linear Regression

Step 0 – Read in Your Data

  • Locate what package your data live in (found in the directions for the midterm project proposal)

  • Load in the package you need!

  • Get started!

library(openintro)

OR

library(lterdatasampler)

moderndive Package

We will be using the moderndive package to get our regression tables, so do not remove this package from your project!

Step 1 – Visualizations

You will make two total visualizations:

  1. A “different slopes” multiple linear regression using geom_smooth(method = "lm")
ggplot(data = MA_schools, 
       mapping = aes(y = average_sat_math, 
                       x = perc_disadvan, 
                       color = size)) + 
  geom_point() +
  geom_smooth(method = "lm") + 
  labs(x = "Percent Economically Disadvantaged", 
       y = "Average SAT Math", 
       color = "Size of School") 

Step 1 – Visualizations

  1. A “parallel slopes” multiple linear regression using geom_parallel_slopes()
ggplot(data = MA_schools, 
       mapping = aes(y = average_sat_math, 
                       x = perc_disadvan, 
                       color = size)) + 
  geom_point() +
  geom_parallel_slopes(method = "lm") + 
  labs(x = "Percent Economically Disadvantaged", 
       y = "Average SAT Math", 
       color = "Size of School") 

Step 2 – Decide the “Best” Model

Next, you will decide which of these two models seems like the better model.

  • Look at the plot where the lines are allowed to be different! Does it look like they are?

  • If the lines look different – you should use the different slopes (interaction) model!

  • If the lines look similar – you should use the parallel slopes (additive) model!

No p-values

Your model decision needs to rely exclusively on the visualizations, you cannot use p-values to make your decision.

Step 3 – Fit the regression model you chose with lm()


Are the slopes different? You need to fit a different slopes model! Use a * to separate the variables!

my_model <- lm(tail_l ~ age * pop, data = possum)


Are the slopes similar? You need to fit a parallel slopes model! Use a + to separate the variables!

my_model <- lm(weight ~ weeks + habit, data = births14)

Step 4: Get the coefficients with get_regression_table()


Regardless of the model you fit, you need to get your estimated coefficients using the get_regression_table() function!


get_regression_table(my_model)

Now interpret!