Statistical Critique 2: Exploring p-values

A black-and-white cartoon depicts several scientists, each saying a variation of the word 'significant' in different fonts and styles. Some speech bubbles say 'significant?', 'Significant.', 'significance.', 'significance', 'Significant!', and 'Significant.' in bold or stylized text. The image humorously illustrates the overuse and varied emphasis of the word 'significant' in scientific communication.

Assignment Details

In your second statistical critique, you will focus on critiquing another key aspect of any statistical argument—statistical significance. No doubt you have seen \(p\)-values in a previous statistical course and / or disciplinary course, and this week you’re adding to that knowledge. For this critique you will compare the model you selected in your Midterm Project with what model you would have chosen based on a statistical test.

This critique involves coding! You can find a template for critique posted in the STAT 313 main workspace on Posit Cloud.

Part Zero: p-values in Multiple Linear Regression

For the first step of this critique, you are required to read about how p-values can be used in the context of multiple linear regression: Extending to Multiple Linear Regression

1 Part One: Revisiting the Midterm Project

For the first part of this critique, you are going to revisit the model you selected for your Midterm Project. You need to copy-and-paste the code you wrote in your Midterm Project to create your two visualizations. After these visualizations, you should include the 2-3 sentences from your Statistical Methods subsection where you described why you chose the model you did in your Midterm Project. Feel free to copy and paste from your Midterm Project!

2 Part Two: Using p-values Instead

For this second part, you are tasked with testing what regression model you would have chosen if you had used p-values to make your decision. Regardless of the model you chose for your Midterm Project, you will fit the most complex regression model. In the context of a multiple linear regression with one numerical and one categorical explanatory variables, the most complex model is the different slopes (interaction) model.

Step 1: Fit a different slopes multiple linear regression

my_model <- lm(bill_length_mm ~ flipper_length_mm * species, 
               data = penguins)

Step 2: Run an ANOVA to test if the groups have different slopes

anova(my_model) |> 
  tidy()

Based on the p-value(s) you obtained from the ANOVA table, what model is the “best” model?

3 Part Three: Learning More about Misuses of \(p\)-values

“The p-value was never intended to be a substitute for scientific reasoning.” Ron Wasserstein, Executive Director of the American Statistical Association

Issues with the use of \(p\)-values had gotten so problematic that the American Statistical Association (ASA)1 put out a statement in 2016 titled, “The ASA Statement on Statistical Significance and \(p\)-Values”. This statement includes six principles which address misconceptions and misuse of the \(p\)-value.

In March of 2019, Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories published an article in Nature calling for an end to “statistical significance”. The article details how, on top of the many common misunderstandings about hypothesis testing and \(p\)-values, there is an incentive for researchers to “cherry pick” only the results that are “statistically significant” while dismissing those that aren’t. There are two problems with this system:

  1. It incentivizes researchers to do whatever it takes to obtain “significant” p-values, even through dishonest means.

  2. It dismisses the importance of results where no “significant” effects are found.

Read the American Statistical Association’s statement on \(p\)-values and statistical significance and the Nature article on “statistical significance.

4 Part Four: Reflection

Now that you have compared the model you would have selected using p-values instead of a visualization and the misuses of p-values, I would like you to reflect on the benefits and drawbacks of using p-values.

  1. What are the benefits of using p-values to assess the strength of evidence in a study?

  2. What are some limitations or drawbacks of relying solely on p-values to determine whether a result is meaningful or important?

  3. What is something you’ve learned in this activity that you will take with you in your future courses / research?

Footnotes

  1. This is my professional organization.↩︎