<- lm(weight ~ weeks * habit, data = ncbirths)
smoke_lm
get_regression_table(smoke_lm)
Review different types of multiple linear regression models
Complete an activity on sample selection
Start Midterm Project write-up
No lab – focus on getting all the coding accomplished for the Midterm Project
To get everyone feedback on their drafts in a timely manner, the first drafts are due by Sunday.
Deadline Extension
A deadline extension is permitted for the first draft. Deadline extensions are not permitted for the final version (due next week).
Before…
Now…
How?
Offsets!
# A tibble: 4 × 3
term estimate std_error
<chr> <dbl> <dbl>
1 intercept -5.94 0.484
2 weeks 0.341 0.013
3 habit: smoker -1.86 1.63
4 weeks:habitsmoker 0.039 0.042
Interaction Model
The * means the variables are interacting!
# A tibble: 4 × 3
term estimate std_error
<chr> <dbl> <dbl>
1 intercept -5.94 0.484
2 weeks 0.341 0.013
3 habit: smoker -1.86 1.63
4 weeks:habitsmoker 0.039 0.042
What is the regression equation for non-smoker mothers?
What is the regression equation for smoker mothers?
What if we have a second numerical explanatory variable?
Multiple slopes
# A tibble: 3 × 3
term estimate std_error
<chr> <dbl> <dbl>
1 intercept -6.68 0.492
2 weeks 0.346 0.012
3 mage 0.02 0.006
How do you interpret the value of 0.346?
How do you interpret the value of 0.02?
But how do we decide if the interaction model is “best” without a p-value??????
When investigating if a relationship differs…
Always start with the “interaction” / different slopes model.
If the slopes look different, you’re done!
If the slopes look similar, then fit the “additive” / parallel slopes model.
Different Enough?
What if they’re not very different?
Parallel Slopes
# A tibble: 4 × 3
term estimate std_error
<chr> <dbl> <dbl>
1 intercept 588. 7.61
2 perc_disadvan -2.78 0.106
3 size: medium -11.9 7.54
4 size: large -6.36 6.92
Group equations – Baseline
# A tibble: 4 × 3
term estimate std_error
<chr> <dbl> <dbl>
1 intercept 588. 7.61
2 perc_disadvan -2.78 0.106
3 size: medium -11.9 7.54
4 size: large -6.36 6.92
\[\widehat{SAT}_{small} = 588 - 2.78 \times \text{percent disadvantaged}\]
Group equations – Offsets
# A tibble: 4 × 3
term estimate std_error
<chr> <dbl> <dbl>
1 intercept 588. 7.61
2 perc_disadvan -2.78 0.106
3 size: medium -11.9 7.54
4 size: large -6.36 6.92
\[\widehat{SAT}_{medium} = (588 - 11.9) - 2.78 \times \text{percent disadvan}\]
\[\widehat{SAT}_{medium} = 576.1 - 2.78 \times \text{percent disadvan}\]
\[\widehat{SAT}_{large} = (588 - 6.36) - 2.78 \times \text{percent disadvan}\]
\[\widehat{SAT}_{large} = 581.64 - 2.78 \times \text{percent disadvan}\]
Once you have found other students working on the same dataset, complete the sample selection activity.
What are the observations / rows in this dataset?
From what population was the sample drawn?
For an observation to be included in the dataset, what inclusion criteria needed to be met?
How were the observations who satisfied the inclusion criteria sampled from the population?
Based on the inclusion criteria and sampling methods, to what population can the findings of the study be generalized?
Insert the description of your dataset and variables (from the Midterm Proposal) into the “Introduction” of your project
Pose a research question about your selected variables, which can be addressed with multiple linear regression
Insert the code to create the required two (or three) visualizations
Write a description of what you see in the visualizations
Make a decision which model you believe is “best”