Sampling Variability – The Heart of Statistical Inference

Salaries of football coaches

Individual Samples

Each group has a different sample of 25 UC & CSU coach salaries.

First

  • each person samples 10 salaries
  • calculate the median
  • write your median on a post-it note
  • place your median on the class plot

Then

everyone works together to calculate the median of all 25 salaries

Would you feel comfortable inferring that the median salary of your sample is close to the median salary of all UC & CSU coaches?


Why or why not?

Precision & Accuracy

  • Random sampling ensures our point estimates are accurate.


  • Larger sample sizes ensure our point estimates are precise.

Statistical Inference

There were 252 “Head Coaches” at University of California and California State Universities in 2021 (that satisfied my search criteria)


Median salary for all 252 coaches

$137,619

Was the median of your sample of 25 coaches a good estimate of the salary for all 252 coaches?

Sampling Framework

population – collection of observations / individuals we are interested in

population parameter – numerical summary about the population that is unknown but you wish you knew


sample – a collection of observations from the population

sample statistic – a summary statistic computed from a sample that estimates the unknown population parameter.

Statistical Inference Reasoning

  • If the sampling is done at random
  • then the sample should be representative of the population
  • any result based on the sample can generalize to the population
  • the point estimate is a “good guess” of the unknown population parameter

Every group had a random sample of 25 coach salaries, so why were some of the medians really far off?

Repeated Samples

Why sample more than once?


What do we get when we take multiple samples?

A distribution of statistics!


Why do we want a distribution of statistics?

Understanding the variability of a statistic is the heart of statistical inference!

Virtual Sampling

rep_sample_n(coaches, 
             size = 25, 
             reps = 1, 
             replace = FALSE)


Employee Name Job Title Total Pay & Benefits
Ryan Jorden Intercol Ath Head Coach Ex 181065.0
Mike Blasquez Asc Head Coach Crd 4 197727.0
Jorge Salcedo Intercol Ath Head Coach Ex 70761.0
Jamie J Christian HEAD COACH - 12 MONTH 191865.5
John Margaritis Head Coach 5 271706.0
Ronald Lowell Dubois Head Coach 5 52930.0

\(\vdots\)

Distribution of 1000 medians from samples of 25 coaches

Sampling Distributions

  • Visualize the effect of sampling variation on the distribution of any point estimate
    • In this case, the sample median!
  • We can use sampling distributions to make statements about what values we can typically expect.

Sampling distribution vs. Sample distribution

Be careful! A sampling distribution is different from a sample’s distribution!

Distributions of 1000 medians from different sample sizes

What differences do you see?

Variability for Different Sample Sizes

Sample Size Standard Error of Median
25 19408.079
50 12459.358
100 8279.311
  • Standard errors quantify the variability of point estimates

  • As a general rule, as sample size increases, the standard error decreases.

Standard error vs. Standard deviation

Careful! There are important differences between standard errors and standard deviations.

Does sample size change how accurate the estimate is?

Does sample size change how precise the estimate is?

Sampling Activity!