Welcome to Stat 313!

Warm-up

(90 seconds)

Why do you believe your discipline requires / expects you to take this course?

Who am I?

About Me…

Three images of things related to Dr. Theobold's life. The first image is of a field of purple flowers from Dr. Theobold's spring break trip to Amsterdam.

The third image is of Dr. Theobold's two cats snuggled in a cat bed together.

The second image is of Dr. Theobold running across a bridge in a trail race.

What can I expect from this class?

What is Statistics?

Scientists seek to answer questions using rigorous methods and careful observations. These observations – collected from the likes of field notes, surveys, and experiments – form the backbone of a statistical investigation and are called data.

Statistics is the study of how best to collect, analyze, and draw conclusions from data.

Introduction to Modern Statistics

What Statistics Is To Me



A figure of the 'data science cycle', with six different stages. The process starts with 'Import' (importing a dataset), then moves to 'Tidy' (tidying the dataset you imported). From there, the diagram has a blue box labeled the 'Explore' cycle, which includes the 'Transform', 'Visualize,' and 'Model' stages. These stages are connected in a circle with arrows between each stage, implying that these stages are a cycle which one can repeat. After the 'Explore' cycle, there in one additional stage labeled 'Communicate,' representing how you communicate the results of your analysis.

The data science cycle – Wickham & Grolemund

What you can expect in STAT 313

This course will teach you the fundamentals of linear models—simple linear regression, multiple linear regression, and analysis of variance—and experimental design. You will extend the concepts covered in your Stat I course, to:

  • work with data in a reproducible way (using R)
  • visualize and summarize a variety of datasets (in R)
  • critically evaluate the use of Statistics
  • perform statistical analyses to answer research questions (using R)

Coding 🙀

Coding is a huge part of how doing statistics in the wild looks.

  • Everyone is coming from a different background
  • Different aspects of the course will be difficult to different people
  • You will be given coding resources each week
  • Use your peers to support your learning

Course Components

Before Class

  • Reading Guides
  • Concept Quizzes
  • R Tutorials

During Class

  • Group Discussion
  • Hands-on Activities
  • Lab Assignments

Outside of Class

  • Statistical Critiques

  • Midterm Project

  • Final Project

Specifications Based Grading

An image of a building with four pillars, labeled clearly defined standards, helpful feedback, marks that indicate progress, and reattempts without penalty. In the roof (at the top of the pillars), the saying 'in feedback loops we trust' is written. These pillars represent the four pillars / principals of specifications-based grading, the grading system used in this course.

Let’s talk about data…

Tidy Data

Expected layout of “tidy” datasets

Gender Stereotypes

subject sex age trait target stereotype high_achieve_caution
140 male 5 nice adults 1.00 0.25
68 female 5 nice children 0.50 1.00
65 male 7 smart adults 0.00 0.75
65 male 7 nice children 0.00 0.75
9 male 5 nice adults 0.00 0.50
8 female 6 smart children 0.25 0.75


What are the observations in these data? i.e., What separates one row from the other rows?

Body Girth and Skeletal Diameter

age wgt hgt sex sho_gi wai_gi nav_gi hip_gi
45 58.6 166.2 0 102.5 69.5 75.5 95.3
27 85.5 198.1 1 117.9 82.5 90.8 94.9
19 47.0 160.0 0 89.2 61.0 73.0 82.5
26 82.7 177.8 1 121.8 85.0 90.8 97.9
29 53.9 167.4 1 102.9 75.4 78.0 88.6
36 55.2 165.1 0 96.4 68.3 88.0 93.8


What are the observations in these data?

NBA Player of the Week

Age Date Draft Year Height Player Position
33 Mar 16, 1997 1985 6-9 Karl Malone PF
32 Nov 21, 2004 1994 6-8 Grant Hill SF
25 Mar 23, 2003 1996 6-10 Peja Stojakovic F
27 Dec 19, 2016 2009 201cm DeMar DeRozan GF
29 Apr 7, 2014 2004 6-10 Al Jefferson FC
28 Dec 26, 2016 2011 175cm Isaiah Thomas PG


What are the observations in these data?

Your Turn

Every year, the US releases to the public a large data set containing information on births recorded in the country.

A total of 12 variables were collected on every birth, including information about:

  • the birth (baby weight, sex of baby, premie status)
  • the pregnancy (hospital visits, length of gestation, weight gained)
  • the birth parent’s attributes (age, high-risk status, smoking status, marital status, race)
  • the partner’s age

If you were to open this dataset in Excel, how would you expect it to look?

Military Spending

Country 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Africa NA NA NA NA NA NA NA NA NA NA
USA 4.922642 4.840174 4.477401 4.046679 3.695891 3.477846 3.418941 3.313385 3.316244 3.413107
Australia 1.856791 1.757078 1.670963 1.639861 1.772211 1.950601 2.081512 1.997974 1.894180 1.879802
Norway 1.515698 1.451436 1.402134 1.413997 1.472039 1.507280 1.626681 1.622270 1.628300 1.684120
Sweden 1.188288 1.104288 1.133304 1.116715 1.129776 1.069579 1.052833 1.024075 1.031100 1.120599


Do these data satisfy the “tidy” principles?

Vehicle Efficiency

mpg cyl disp hp drat wt qsec vs am gear carb
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4


Do these data satisfy the “tidy” principles?

An image of lots of different datasets. The datasets on the top all look alike, with tidy rows and columns (they are happy). The datasets on the bottom all look different, all having different characteristics that make data untidy. The quote displayed reads 'The standard structure of tidy data means that tidy datasets are all alike, but every messy dataset is messy in its own way.'

Artwork by @allison_horst

Types of Variables


A tree diagram of the variables we will analyze in this class. The top node says 'all variables'. This top node has two branches, one that leads to 'numerical' and one that leads to 'categorical', representing the two classes of variables we will analyze. From the numerical variable node, there are two additional branches, one that says 'continuous' and one that says 'dicrete', representing the two types of numerical varaibles we will consider. Lastly, from the 'categorical' variable node, there are two branches, one saying 'regular categorical' and one saying 'ordinal', each representing the two classes of categorical variables we will consider.

Diagram of types of variables we will analyze!

An image of a baby chick and a baby octopus. The baby chick is under the 'continuous' banner, and says the chick is 3.1 inches tall and weighs 34.16 grams, representing two variables that are measured continuously. The octopus is under the 'discrete' banner, and says the octopus has 8 arms and 4 spots, representing two variables that are measured discretely.

An image of the three different types of categorical variables, nominal, ordinal, and binary. The 'nominal' category has a turtle, snail, and butterfly and represents variables with unordered descriptions. The 'ordinal' category has three bumblebees, one saying 'I am happy,' one saying 'I am okay,' and one saying 'I am awesome!!!'. The bumblebees' emotions represent an ordinal variable. Lastly, the 'binary' category references variables that have two mutually exclusive outcomes. The image displayed under 'binary' is a t-rex saying 'I am extinct' and a shark saying 'Ha' (because they are not extinct).

Your Turn (90-seconds)


Write down one example of:

  • a continuous, numerical variable

  • a discrete, numerical variable

  • an ordinal, categorical variable

  • a regular, categorical variable

Share with your neighbor!

Lab Warm-up

Data Types in R

glimpse(births_small)
Rows: 1,000
Columns: 10
$ fage           <int> 34, 36, 37, NA, 32, 32, 37, 29, 30, 29, 30, 34, 28, 28,…
$ mage           <dbl> 34, 31, 36, 16, 31, 26, 36, 24, 32, 26, 34, 27, 22, 31,…
$ weeks          <dbl> 37, 41, 37, 38, 36, 39, 36, 40, 39, 39, 42, 40, 40, 39,…
$ premie         <chr> "full term", "full term", "full term", "full term", "pr…
$ gained         <dbl> 28, 41, 28, 29, 48, 45, 20, 65, 25, 22, 40, 30, 31, NA,…
$ weight         <dbl> 6.96, 8.86, 7.51, 6.19, 6.75, 6.69, 6.13, 6.74, 8.94, 9…
$ lowbirthweight <chr> "not low", "not low", "not low", "not low", "not low", …
$ sex            <fct> male, female, female, male, female, female, female, mal…
$ habit          <chr> "nonsmoker", "nonsmoker", "nonsmoker", "nonsmoker", "no…
$ whitemom       <chr> "white", "white", "not white", "white", "white", "white…

What do you think dbl means?

How is that different from int?

What does chr mean?

How might it differ from fct?

Lab 1

Joining the STAT 313 Workspace on Posit Cloud

  1. Access Posit Cloud from link posted on Canvas
  2. Create a Cloud Student account (using your Cal Poly email)

$5 / month subscription

The Cloud Student account costs $5 / month. You will only need to pay for three (3) months of access, for a total of $15 for the quarter.

  1. Join the STAT 313 workspace

Accessing Lab 1

  1. Access Lab 1 either through:
  • the link posted on Canvas
  • the Content tab in the STAT 313 workspace
  1. Click on Lab 1 to open the Project

Opening the Lab 1 Document

  1. Click on the lab-1.qmd file in the lower right hand corner to open the lab assignment
A screenshot of the lower right hand panel of Posit Cloud. The image displays the documents listed under the 'Files' tab, with lab-1.qmd (a Quarto file) higlighted in a purple box.
  1. Start working!