STAT 313: Midterm Project Proposal
Due February 5, 2023 by 5pm
This week you will get started on your midterm project by selecting what dataset you wish to analyze and writing an introduction about the dataset you chose.
1 Pick a Dataset
I’ve compiled a list of datasets from a variety of contexts, all which are relatively tidy and ready for analysis. Each of these datasets have sufficient numerical and categorical variables for you to use the dataset for both your Midterm Project and your Final Project!
Each of these datasets are included in an R package, so there is no need for you to download the dataset! All you will need to do is load in the necessary package (e.g., library(lterdatasampler)
) at the beginning of your analysis.
1.1 From the lterdatasampler package:
and_vertebrates
: Size data for Cutthroat trout and salamanders in different sections of forest (from Lab 3).- Additional information about the data: https://lter.github.io/lterdatasampler/articles/and_vertebrates_vignette.html
- Publication resulting from data collected
hbr_maples
: Data on the growth of Sugar Maple (Acer saccharum) seedlings in response to calcium addition.- Additional information about the data: https://lter.github.io/lterdatasampler/articles/hbr_maples_vignette.html
- Publication resulting from data collected
nwt_pikas
: Data on the stress of p- Additional information about data: https://lter.github.io/lterdatasampler/articles/nwt_pikas_vignette.html
- Publication resulting from data collected
1.2 From the openintro package:
births14
: Data from US births from 2014 (similar toncbirths
dataset from Week 4).
possum
: Data representing possums in Australia and New Guinea
1.3 From the moderndive package:
evals
: Data from end of semester student evaluations from University of Texas at Austin (discussed in ModernDive textbook)
1.4 Deliverable
For the Midterm Project Proposal assignment on Canvas, you are required to state at the beginning of the document, the name of the dataset you have chosen to use.
Every dataset is included in an R package, so there is no need to download the raw data file!
2 Write an Introduction
Step 1: Describe the context of your dataset in your own words! How were the data collected? Was there a study these data came from? Were these data included in any publications?
There are at least two resource files / websites linked for each dataset. Please read through these resources when writing your Introduction.
Step 2: Choose your variables
We will using a linear regression to analyze the data you chose. Thus, there are some stipulations for the variables you can choose. You must choose
- one numeric variable for the response variable
- one numeric variable for the explanatory variable
- one categorical variable for the explanatory variable
Describe each variable you chose for your analysis—how was the variable measured? What unit was the variable measured in? What types of values does the variable take on? e.g., what are the different levels / values of the categorical variable?
3 Submitting on Canvas
For the Midterm Project Proposal assignment on Canvas, your proposal is required to consist of both components (chosen dataset and introduction).
You are allowed to use any text editing software to make your proposal (e.g., Word, Pages, Google Docs), but your submission must be a PDF. If you are unsure how to save your file as a PDF, I recommend using Google!