STAT 313: Midterm Project Proposal
Due February 5, 2023 by 5pm
This week you will get started on your midterm project by selecting what dataset you wish to analyze and writing an introduction about the dataset you chose.
1 Step 1: Picking a Dataset
1.1 Option 1 – Use your own data
If you have a dataset from another course or your own research which you believe can be appropriately modeled with a linear regression, you can propose to use these data.
Deliverable
For the Midterm Project Proposal assignment on Canvas, you are required to state at the beginning of the document, that you have chosen to use your own dataset. You are also required to submit your dataset as an Excel file (stored as either a .csv or .xlsx).
1.2 Option 2 – Use a public dataset
I’ve compiled a list of datasets from a variety of contexts, all which are relatively tidy and ready for analysis. Each of these datasets are included in an R package, so there is no need for you to download the dataset! All you will need to do is load in the necessary package (e.g., library(lterdatasampler)
) at the beginning of your analysis.
From the lterdatasampler package:
and_vertebrates
: Size data for Cutthroat trout and salamanders in different sections of forest (from Lab 3).ntl_icecover
: Data on duration of ice cover of lakes in the Madison, WI area (from Lab 4).hbr_maples
: Data on the growth of Sugar Maple (Acer saccharum) seedlings in response to calcium addition.pie_crab
: Data on Fiddler crab body size in salt marshes from Florida to Massachusetts.
From the openintro package:
births14
: Data from US births from 2014 (similar toncbirths
dataset from Week 4).possum
: Data representing possums in Australia and New Guinea – link to scientific article referenced inpossum
information
From the moderndive package:
evals
: Data from end of semester student evaluations from University of Texas at Austin (discussed in ModernDive textbook) – link to article analyzing theevals
dataset
From the gapminder package:
gapminder
: Data on life expectancy, GDP per capita, and population by country (discussed in ModernDive textbook).
Deliverable
For the Midterm Project Proposal assignment on Canvas, you are required to state at the beginning of the document, the name of the dataset you have chosen to use.
2 Step 2: Writing an Introduction
2.1 If you chose to use your own data
Step 1: Describe the context of your dataset in your own words! How were the data collected? Was there a study these data came from?
I don’t know anything about your data, so I expect for this description to be extensive. If the data you are using are from a class, you are responsible for obtaining the context of the data–you cannot simply say “These were data used in BIO 253.”
Step 2: Choose your variables
We will using a simple linear regression to analyze the data you chose. Thus, there are some stipulations for the variables you can choose. You must choose
- one numeric variable for the response variable
- one numeric variable for the explanatory variable
Once you have each of these variables decided, you then need to choose one additional explanatory variable. This additional explanatory variable can be either numerical or categorical.
Describe each variable you chose for your analysis—how was the variable measured? What unit was the variable measured in? What types of values does the variable take on (this is especially important if you chose a categorical variable)?
2.2 If you chose a dataset from the list above
Step 1: Describe the context of your dataset in your own words! How were the data collected? Was there a study these data came from? Were these data included in any publications?
To obtain information on the dataset, click on the link provided in its name!
Step 2: Choose your variables
We will using a simple linear regression to analyze the data you chose. Thus, there are some stipulations for the variables you can choose. You must choose
- one numeric variable for the response variable
- one numeric variable for the explanatory variable
Once you have each of these variables decided, you then need to choose one additional explanatory variable. This additional explanatory variable can be either numerical or categorical.
Describe each variable you chose for your analysis—how was the variable measured? What unit was the variable measured in? What types of values does the variable take on (this is especially important if you chose a categorical variable)?
3 Step 3: Submitting on Canvas
For the Midterm Project Proposal assignment on Canvas, your proposal is required to consist of both components (dataset and description).
You are allowed to use any text editing software to make your proposal (e.g., Word, Pages, Google Docs), but your submission must be a PDF. If you are unsure how to save your file as a PDF, I recommend using Google!
Remember, if you chose to use your own dataset, you are required to submit your data as an Excel file (.csv or .xlsx).