Lab 5: Murder in SQL City

join + filter + stringr + ludridate

For this lab, you will be joining and filtering related datasets to solve a murder mystery!

1 Part 1: GitHub Workflow

1.1 Step 1: Making a Copy from GitHub Classroom

You can access the Lab 5 materials through the Lab 5 GitHub Repository.

This week, to get a copy of the repository, we’re going to follow a slightly different process. Use these steps to fork your own copy of the Lab 5 repository, then clone that repository into RStudio: List of Steps to Fork GitHub Repository

1.2 Step 2: Making a Small Change

Now, find the lab-5-student.qmd file in the “Files” tab in the lower right hand corner. Click on this file to open it.

At the top of the document (in the YAML) there is an author line that says "Your name here!". Change this to be your name and save your file either by clicking on the blue floppy disk or with a shortcut (command / control + s).

1.3 Step 3: Pushing Your Lab to GitHub

Now for our last step, we need to commit the files to our repo.

1.4 Step 4: Let’s get started!

2 Part 2: Some Words of Advice

  • Set chunk options carefully.

  • Make sure you don’t print out more output than you need.

  • Make sure you don’t assign more objects than necessary—avoid “object junk” in your environment.

  • Make your code readable and nicely formatted.

  • Think through your desired result before writing any code.

3 Part 3: Finding the Killer

Northwestern University’s Knight Lab wanted to help sharpen users’ database skills, so they created a murder mystery. Can you solve this crime in SQL City??

The relational data you will be working with contains tables with different pieces of information pertinent to the crime - people, social media check-ins, driver’s licenses, crime scene reports, police interviews, and more!

This is an entity-relationship diagram that showcases multiple interconnected tables. The central table is labeled 'person' with attributes including id, name, license_id, address_number, address_street_name, and ssn. It connects to various other tables. One connection leads to 'drivers_license,' which contains id, age, height, eye_color, hair_color, gender, plate_number, car_make, and car_model. Another connection leads to the 'income' table, which includes ssn and annual_income. Additionally, the 'person' table is linked to 'get_fit_now_member,' which has id, person_id, name, membership_start_date, and membership_status attributes. The 'get_fit_now_member' table connects to 'get_fit_now_check_in,' which lists membership_id, check_in_date, check_in_time, and check_out_time. Another linked table is 'facebook_event_checkin,' containing person_id, event_id, event_name, and date. The diagram also includes an 'interview' table with person_id and transcript, a 'crime_scene_report' table with date, type, description, and city, and a small table labeled 'solution' with user and value attributes. The diagram visualizes relationships and interactions across these entities.

Database schema

Solve the murder mystery, showing all of your work in this document. Your document and code must be well organized, easy to follow, and reproducible.

  • Use headers and written descriptions to indicate what you are doing.
  • Use dplyr verbs and join functions rather than just looking through the tables manually.
  • Use good code formatting practices.
  • Comment your code.
  • Cite any external sources you use to solve the mystery.
Tip

You may find the pull() function handy for displaying each clue as you go along.

3.1 Access the Data

This code chunk will read in all of the tables of data for you. Don’t modify or remove this!

library(tidyverse)

# If purrr is not detected, install the package
if (!"purrr" %in% installed.packages()) install.packages("purrr")

source("https://raw.githubusercontent.com/atheobold/groupworthy-data-science/main/labs/instructions/lab-5-setup.R")

3.2 Solve the Crime

3.2.1 Crime Scene Report

Detective Wickham reaches out to you…

A crime has taken place and I need your help! There was a murder in SQL City sometime on January 15, 2018. Could you retrieve the crime scene report from the police department’s database and follow the clues to find the person responsible for the murder?!

# Code for looking at the relevant crime scene report.

Follow the evidence to the person responsible for the murder, building a report as you go.

Caution

Make sure you check for interviews with any suspects!

And the final suspect is…

put the name of the person responsible for the murder here.