The Flaws of Averages

Lab 1

Revisions

“Complete” = Satisfactory

  • Your images were included in the document
  • You provided responses to every question

“Incomplete” = Growing

  • Your images were not included in the document
  • You did not provide responses to every question

Key - Code Chunk Options

A code chunk option is declared after a #|. Here are some options we may want to use:

  • #| label: packages – creates a label for the code chunk (describing its contents)
  • #| echo: false – tells Quarto not to output the code in the rendered HTML (only the output)
  • #| include: false – tells Quarto not to include the code or the output in the rendered HTML

Key - Previewing Your Data

The glimpse() function is a great tool to preview the dataset you are working with! It gives you:

  • the dimensions of the data (rows and columns)
  • the names of the columns
  • the data type of each column (e.g., chr, dbl)
  • a preview of the first 10 rows of each column

Key - Plotting Your Data

Now that we’ve practiced making some plots, we know…

mapping = aes(y = manufacturer, x = hwy) declares what variables are plotted on the x- and y-axis.

Tip

The variable names you put insides aes() must be identical to the names of the variables in the dataset!


labs(x = "Highway Miles Per Gallon", y = "Car Manufacturer") declares new x- and y-axis labels for the plot.

Tip

Including nice axis labels (with their units) is a critical part of every visualization we make!

Completing Revisions

Lab 1 revisions are due by Wednesday, April 17 (at midnight).

  1. Read comments on Canvas
  2. Go back into your Lab 1 on Posit Cloud and complete your revisions
  3. Render your revised Lab 1
  4. Download your revised HTML

Reflections

Revisions are required to be accompanied with reflections on what you learned while completing your revisions. These can be written in your Lab 1 Quarto file (next to the problems you revised), in a Word document, or in the comment box on Canvas.

15-minutes

  • Review Lab 1 comments
  • Ask questions
  • Start revisions

Suppose…

“Overall this instructor was educationally effective.”

year quarter average
2021 Fall 4.53
2021 Fall 4.36
2022 Winter 4.18
2022 Winter 4.24
2022 Spring 4.83
2022 Spring 4.41
2022 Spring 4.00

How were those averages calculated?

What do these averages mean?

The Problem


It’s incredibly rare for scientists, including statisticians, to explicitly think about that conditions underlying their models.

“I’ve had many conversations in very different contexts with scientists about what the average calculated from the data (or mean in a model) could reasonably represent and whether that was really what the scientist was after.” Dr. Megan Higgs

Why so much resistance?


Departments hold specific expectations of statistics courses


These expectations are conditional on the assumption that means represent the magic quantity of interest.


I’m then expected to educate you to “play the game” in the scientific culture of averages

Averagarianism

“The primary research method of averagarianism is aggregate, then analyze: First, combine many people together and look for patterns in the group. Then, use these group patterns (such as averages and other statistics) to analyze and model individuals. The science of the individual instead instructs scientists to analyze, then aggregate: First, look for pattern within each individual. Then, look for ways to combine these individual patterns into collective insight.”

The End of Average by Todd Rose

“We’ve always done it this way”

Methods based on averages are available, easy, convenient, and take little creativity — and they are expected in our scientific culture.


Justification for using averages is simply not demanded — though justification for use of anything but averages is incredibly difficult to sell.

Some Rules to Play By


  • Look at and understand your raw data before aggregating
  • Boxplots don’t count as visualizing the raw data

Lab 2

Departure Delays

  • Inspect the nycflights dataset
  • Visualize departure delays
  • Play with histogram binwidth
  • filter() data to include only certain flights
  • calculate() summary statistics
  • Make decisions based on summary statistics
  • Compare summary statistics to a visualization

Working in Groups