Final Project

Your final task for the quarter is to develop a “data context protocol” which individuals can use to assessing the context of a dataset.

Data Context

Typically, when someone refers to the “context” of the data they are thinking of addressing the six questions:

Who?
- Who collected the data? (i.e., what person, people, agency)
What?
- What data were collected? (i.e., observations and variables)
Where?
- Where were the data collected? (i.e., what setting)
When?
- When were the data collected? (i.e., timing)
Why?
- Why were the data collected? (i.e., for what purpose)
How?
- How were the data collected? (i.e., sampling procedures, variable measurement)

These questions specifically target what Catherine D’Ignazio and Lauren Klein call the “material conditions” of the data. But, these questions do not situate the data in the broader context in which they were produced. One of the central tenets of feminist thinking is the belief that all knowledge is situated. So, our task is to create a protocol (i.e., a series of questions) that others can use to situate the context of any dataset.

Data Context Protocol

This activity is part of a broader push for transparency with respect to data published on the internet (e.g., data dictionary). Similar to these documents, you are tasked with developing a protocol that helps individuals situate their data in the social, cultural, historical, and institutional contexts. These contexts are critical for understanding the power and privilege that contributed to making the data and identifying how this may obscuring the truth.

Developing a Protocol

For each of these four contexts (social, cultural, historical, institutional), you are tasked with:

Creating a working definition (i.e., what is the “social context”?)
Curating a series of questions that help researchers identify pertinent details related to this context

Protocol Application

Once you have finished your protocol, you will then provide a demonstration of applying your protocol to a specific dataset. I have selected three datasets that are widely used in statistics and data science education:

Palmer Penguins

Data Documentation
- Palmer Station, Antarctica LTER
- Long Term Ecological Research Network
R package with data
Articles written about the data:

Course Evaluations

Data Documentation
R package with data
Articles written about the data:
- Looking Good on Course Evaluations
- Modern Dive

Global Economies

Data Documentation
R package with data
Articles written about the data:
- Dollar Street