Midterm Project Work Day

Reminders About Deadlines

Revision Deadlines

  • Lab 3 Revisions are due today (May 1)

  • Second round of revisions for Lab 2 are due today (May 1)

  • Statistical Critique revisions are due next Wednesday (May 8)

  • Lab 4 Revisions are due next Wednesday (May 8)

Making a copy of your group’s Lab 4

If you were the recorder (typer) for your group, you need to make your project public. If you were not the recorder, you need to make a copy of your group’s project.

Midterm Project

The first draft of your Midterm Project is due on Sunday at midnight.

Deadline Extension

A deadline extension is permitted for the first draft. Deadline extensions are not permitted for the final version (due next week).

Comments from Project Proposals

Introduction Versus Methods

  • The description of your data goes in your Introduction.

  • The description of your variables goes at the beginning of your Methods, in the Variables subsection!

Data Description – A Word of Caution

Be cautious in how you are using the resources I provided—do not copy these descriptions.

Inserting a verbatim copy of the descriptions seen in the data resources is plagiarism.

In text citation

If you wish to borrow elements of these descriptions, you need to quote them and provide a reference to the resource. e.g., “This data set has been of interest to medical researchers who are studying the relation between habits and practices of expectant mothers and the birth of their children” (United States Department of Health and Human Services, 2014).

Specific Dataset Advice

  • For the and_vertebrates data, you should include species as an explanatory variable. If you don’t you are assuming the same relationship applies to trout AND salamanders.

  • For the hbr_maples data, you cannot use year as a numerical variable. There are only two years of data!

  • For the pie_crabs data:

    • site and latitude measure the same thing

Coding a Multiple Linear Regression

Step 0 – Read in Your Data

  • Locate what package your data live in (found in the directions for the midterm project proposal)

  • Load in the package you need!

  • Get started!

library(openintro)

OR

library(lterdatasampler)

moderndive Package

We will be using the moderndive package to get our regression tables, so do not remove this package from your project!

Step 1 – Visualizations

Two Numerical Variables

Three total visualizations

  1. Visualize the model with both variables, using a color gradient
  2. Visualize a simple linear regression with the one explanatory variable
  3. Visualize a simple linear regression with the other explanatory variable

One Categorical & One Numerical Variable

Two total visualizations

  1. Visualize the “different slopes” multiple linear regression using geom_smooth(method = "lm")
  2. Visualize the “parallel slopes” multiple linear regression usinggeom_parallel_slopes()

Step 2 – Decide the “Best” Model

Two Numerical Variables

  • If there appears to be a relationship with the colors – include both variables!

  • If the colors are equally dispersed throughout the plot – choose the one variable that has the stronger relationship (larger slope)!

One Categorical & One Numerical Variable

  • Look at the plot where the lines are allowed to be different! Does it look like they are?

  • If the lines look different – you should use the different slopes (interaction) model!

  • If the lines look similar – you should use the parallel slopes (additive) model!

Step 3 – Fit the regression model you chose with lm()

Two Numerical Variables

  • Are both variables included? Use a + to separate them!
my_model <- lm(size ~ latitude + water_temp, 
               data = pie_crab)

One Categorical & One Numerical Variable

  • Are the slopes different? You need to fit a different slopes model! Use a * to separate the variables!
my_model <- lm(tail_l ~ age * pop, 
               data = possum)
  • Are the slopes similar? You need to fit a parallel slopes model! Use a + to separate the variables!
my_model <- lm(weight ~ weeks + habit, 
               data = births14)

Step 4: Get the coefficients with get_regression_table()

Now interpret!