Welcome to Stat 313!

Warm-up

(90 seconds)

Why do you believe your discipline requires / expects you to take this course?

Who am I?

About Me…

Three images of things related to Dr. Theobold's life. The first image is of Dr. Theobold running across a bridge in a trail race. The second image is of Dr. Theobold's mountain bike at the top of a local hiking trail. The third image is of Dr. Theobold's two cats snuggled in a cat bed together.

Three images of things related to Dr. Theobold's life. The first image is of Dr. Theobold running across a bridge in a trail race. The second image is of Dr. Theobold's mountain bike at the top of a local hiking trail. The third image is of Dr. Theobold's two cats snuggled in a cat bed together.

Three images of things related to Dr. Theobold's life. The first image is of Dr. Theobold running across a bridge in a trail race. The second image is of Dr. Theobold's mountain bike at the top of a local hiking trail. The third image is of Dr. Theobold's two cats snuggled in a cat bed together.

What can I expect from this class?

What is Statistics?

Scientists seek to answer questions using rigorous methods and careful observations. These observations – collected from the likes of field notes, surveys, and experiments – form the backbone of a statistical investigation and are called data.

Statistics is the study of how best to collect, analyze, and draw conclusions from data.

Introduction to Modern Statistics

What Statistics Is To Me



A figure of the 'data science cycle', with six different stages. The process starts with 'Import' (importing a dataset), then moves to 'Tidy' (tidying the dataset you imported). From there, the diagram has a blue box labeled the 'Explore' cycle, which includes the 'Transform', 'Visualize,' and 'Model' stages. These stages are connected in a circle with arrows between each stage, implying that these stages are a cycle which one can repeat. After the 'Explore' cycle, there in one additional stage labeled 'Communicate,' representing how you communicate the results of your analysis.

The data science cycle – Wickham & Grolemund

What you can expect in STAT 313

This course will teach you the fundamentals of linear models—simple linear regression, multiple linear regression, and analysis of variance—and experimental design. You will extend the concepts covered in your Stat I course, to:

  • work with data in a reproducible way (using R)
  • visualize and summarize a variety of datasets (in R)
  • critically evaluate the use of Statistics
  • perform statistical analyses to answer research questions (using R)

Coding 🙀

Coding is a huge part of how doing statistics in the wild looks.

  • Everyone is coming from a different background
  • Different aspects of the course will be difficult to different people
  • You will be given coding resources each week
  • Use your peers to support your learning

Course Components

Before Class

  • Reading Guides
  • Concept Quizzes
  • R Tutorials

During Class

  • Group Discussion
  • Hands-on Activities
  • Lab Assignments

Outside of Class

  • Statistical Critiques

  • Midterm Project

  • Final Project

Specifications Based Grading

An image of a building with four pillars, labeled clearly defined standards, helpful feedback, marks that indicate progress, and reattempts without penalty. In the roof (at the top of the pillars), the saying 'in feedback loops we trust' is written. These pillars represent the four pillars / principals of specifications-based grading, the grading system used in this course.

Let’s talk about data…

Tidy Data

Expected layout of “tidy” datasets

Gender stereotypes in 5-7 year old children


subject sex age trait target stereotype high_achieve_caution
49 male 7 smart children 1.00 0.25
96 female 6 nice children 1.00 1.00
37 female 7 nice children 1.00 1.00
37 female 7 smart children 0.75 1.00
139 male 7 smart children 0.75 0.75
135 male 7 smart adults 0.75 0.50

Body girth and skeletal diameter measurements

age wgt hgt sex sho_gi wai_gi nav_gi hip_gi
19 70.6 178.0 0 103.0 68.0 84.5 99.5
21 81.6 184.0 1 119.5 77.5 81.5 99.8
36 54.5 167.6 0 96.4 73.6 86.9 94.7
40 56.8 168.9 0 98.2 62.9 76.8 95.5
32 58.4 173.2 0 97.7 68.6 83.0 94.2
49 89.1 182.9 1 115.6 103.6 100.7 100.6

NBA player of the week


Age Date Draft Year Height Player Position
23 Dec 15, 1985 1984 6-3 Alvin Robertson SG
20 Jan 9, 2005 2003 6-11 Chris Bosh PF
27 Nov 21, 2016 2011 201cm Jimmy Butler GF
29 Dec 2, 1984 1977 6-11 Jack Sikma C
24 Jan 4, 2004 1999 6-9 Elton Brand PF
25 Mar 1, 1992 1988 6-10 Danny Manning F

Your Turn

Every year, the US releases to the public a large data set containing information on births recorded in the country.

A total of 13 variables were collected on every birth, including information about:

  • the birth (baby weight, sex of baby, premie status)
  • the pregnancy (hospital visits, length of gestation, )
  • the birth parent’s attributes (age, smoking status, marital status, race)
  • the partner’s age

How would you expect this dataframe to look?

Military Spending

Country 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Africa NA NA NA NA NA NA NA NA NA NA
USA 4.922642 4.840174 4.477401 4.046679 3.695891 3.477846 3.418941 3.313385 3.316244 3.413107
Australia 1.856791 1.757078 1.670963 1.639861 1.772211 1.950601 2.081512 1.997974 1.894180 1.879802
Norway 1.515698 1.451436 1.402134 1.413997 1.472039 1.507280 1.626681 1.622270 1.628300 1.684120
Sweden 1.188288 1.104288 1.133304 1.116715 1.129776 1.069579 1.052833 1.024075 1.031100 1.120599


Do these data satisfy the “tidy” principles?

Vehicle Efficiency

mpg cyl disp hp drat wt qsec vs am gear carb
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.9 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4


Do these data satisfy the “tidy” principles?

An image of lots of different datasets. The datasets on the top all look alike, with tidy rows and columns (they are happy). The datasets on the bottom all look different, all having different characteristics that make data untidy. The quote displayed reads 'The standard structure of tidy data means that tidy datasets are all alike, but every messy dataset is messy in its own way.'

Artwork by @allison_horst

Types of Variables


A tree diagram of the variables we will analyze in this class. The top node says 'all variables'. This top node has two branches, one that leads to 'numerical' and one that leads to 'categorical', representing the two classes of variables we will analyze. From the numerical variable node, there are two additional branches, one that says 'continuous' and one that says 'dicrete', representing the two types of numerical varaibles we will consider. Lastly, from the 'categorical' variable node, there are two branches, one saying 'regular categorical' and one saying 'ordinal', each representing the two classes of categorical variables we will consider.

Diagram of types of variables we will analyze!

An image of a baby chick and a baby octopus. The baby chick is under the 'continuous' banner, and says the chick is 3.1 inches tall and weighs 34.16 grams, representing two variables that are measured continuously. The octopus is under the 'discrete' banner, and says the octopus has 8 arms and 4 spots, representing two variables that are measured discretely.

{fig-alt=“An image of the three different types of categorical variables, nominal, ordinal, and binary. The ‘nominal’ category has a turtle, snail, and butterfly and represents variables with unordered descriptions. The ‘ordinal’ category has three bumblebees, one saying ‘I am happy,’ one saying ‘I am oka,” and one saying ’I am awesome!!!’. The bumblebees’ emotions represent an ordinal variable. Lastly, the ‘binary’ category references variables that have two mutually exclusive outcomes. The image displayed under ‘binary’ is a t-rex saying ‘I am extinct’ and a shark saying ‘Ha’ (because they are not extinct).”}

Your Turn (90-seconds)


Write down one example of:

  • a continuous, numerical variable

  • a discrete, numerical variable

  • an ordinal, categorical variable

  • a regular, categorical variable

Share out!

Lab Warm-up

Data Types in R

glimpse(births_small)
Rows: 1,000
Columns: 10
$ fage           <int> 34, 36, 37, NA, 32, 32, 37, 29, 30, 29, 30, 34, 28, 28,…
$ mage           <dbl> 34, 31, 36, 16, 31, 26, 36, 24, 32, 26, 34, 27, 22, 31,…
$ weeks          <dbl> 37, 41, 37, 38, 36, 39, 36, 40, 39, 39, 42, 40, 40, 39,…
$ premie         <chr> "full term", "full term", "full term", "full term", "pr…
$ gained         <dbl> 28, 41, 28, 29, 48, 45, 20, 65, 25, 22, 40, 30, 31, NA,…
$ weight         <dbl> 6.96, 8.86, 7.51, 6.19, 6.75, 6.69, 6.13, 6.74, 8.94, 9…
$ lowbirthweight <chr> "not low", "not low", "not low", "not low", "not low", …
$ sex            <fct> male, female, female, male, female, female, female, mal…
$ habit          <chr> "nonsmoker", "nonsmoker", "nonsmoker", "nonsmoker", "no…
$ whitemom       <chr> "white", "white", "not white", "white", "white", "white…

What do you think dbl means?

How is that different from int?

What does chr mean?

How might it differ from fct?

Other Foundational Concepts

Types of Studies

Experiment

  • randomization
  • replication
  • controlling
  • blocking

Observational Study

  • collect data in a way that does not directly interfere with how the data arise

Relationships Between Variables


explanatory variable \(\rightarrow\) might affect \(\rightarrow\) response variable

  • If two variables are not associated, then they are said to be independent.

  • If two variables are associated, then they are said to be dependent.

Causal Inference


association \(\neq\) causation


What do you need to say that the explanatory variable causes a change in the response variable?

Lab 1

Joining the STAT 313 Workspace on Posit Cloud

  1. Access Posit Cloud from link posted on Canvas
  2. Create a Cloud Student account (using your Cal Poly email)

$5 / month subscription

The Cloud Student account costs $5 / month. You will only need to pay for three (3) months of access, for a total of $15 for the quarter.

  1. Join the STAT 313 workspace

Accessing Lab 1

  1. Access Lab 1 either through:
  • the link posted on Canvas
  • the Content tab in the STAT 313 workspace
  1. Click on Lab 1 to open the Project

Opening the Lab 1 Document

  1. Click on the lab-1.qmd file in the lower right hand corner to open the lab assignment

{fig-alt=“A screenshot of the lower right hand panel of Posit Cloud. The image displays the documents listed under the”Files” tab, with lab-1.qmd (a Quarto file) higlighted in a purple box.}

  1. Start working!