| id | name | sex | age | height | weight | team | noc | games | year | season | city | sport | event | medal |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3229 | Gyula Alvics | M | 28 | 186 | 91 | Hungary | HUN | 1988 Summer | 1988 | Summer | Seoul | Boxing | Boxing Men’s Heavyweight | NA |
| 31516 | Alfred Frederik “Fred” Eefting | M | 21 | 183 | 94 | Netherlands | NED | 1980 Summer | 1980 | Summer | Moskva | Swimming | Swimming Men’s 200 metres Backstroke | NA |
| 35269 | Solenne Nadge Figus (-Saint Marie) | F | 25 | 178 | 59 | France | FRA | 2004 Summer | 2004 | Summer | Athina | Swimming | Swimming Women’s 200 metres Freestyle | Bronze |
| 36759 | Robert Frank | M | 34 | NA | NA | Switzerland | SUI | 1936 Summer | 1936 | Summer | Berlin | Art Competitions | Art Competitions Mixed Sculpturing, Reliefs | NA |
| 80591 | Aiko Miyamura | F | 24 | 173 | 60 | Japan-1 | JPN | 1996 Summer | 1996 | Summer | Atlanta | Badminton | Badminton Women’s Doubles | NA |
| 20202 | Chen Hsiu-Hsiung | M | 32 | 169 | 70 | Chinese Taipei | TPE | 1968 Summer | 1968 | Summer | Mexico City | Sailing | Sailing Mixed Three Person Keelboat | NA |
Project Checkpoint 1 - Example
This is a minimal example representing “passing” or ‘C’ level work, to give you a baseline for how to proceed. ‘B’ or ‘A’ level work will require a more thoughtful and thorough description of the data and cleaning, deeper or more complex research questions, and/or more polished visualization sketches. Some notes about areas for improvement are included in the callouts below.
Introduction
My dataset is the Olympics data from TidyTuesday. This dataset shows all the Olympic athletes ever, and gives information about their sport, the country they are from, whether they won a medal, etc.
The data comes from a Kaggle dataset created by user RGriffin, who scraped the data from www.sports-reference.com in May 2018.
Here are a few rows of the dataset:
Good:
This description explains who created the data, from where, and when. It provides links to the sources so the reader can find them.
It somewhat explains what information is in the dataset; i.e., information about athletes, as opposed to scheduling information or other details.
Showing a random small snippet of the dataset is typically useful, to get a feel for what it looks like.
Bad:
This is lacking in the “why” and “how” of the data creation. It is important to know what motivated the creation and sharing of a dataset, as well as the specific process that the creator used to assemble it.
This description is unclear about the observational units, i.e., what each row represents. (It is not true that each row is a unique athlete!)
This description could give much more detail on the observed variables present, such as how they were measured and what a typical value looks like.
Data Cleaning
This data was cleaned by using janitor to reformat column names.
The user RGriffin who scraped the data also checked for misentered data in the columns Name, Gender, Height, and Weight.
Good:
- We did not just look at the Tidy Tuesday cleaning and call it a day! We followed the path of the data creation to figure out what other cleaning and wrangling took place.
Bad:
- We were not specific enough about the cleaning done by RGriffin. What did they alter in those columns and why? What other anomalies did they look for?
Explorations
RQ 1: Olympic success by country
Which countries win the most Gold, Silver, and Bronze medals?
RQ 2: Sex over time
Are there a higher percentage of female sports over time?
Good:
- These two research questions are well-defined and answerable with the dataset.
Bad:
- These are not particularly deep. “Which countries win the most?” is not going to provide a new insight beyond what most people already know, and I think we can expect from the outset that female sports have increased over time.
RQ 3: Olympic success corrected by population
Which countries win disproportionately more medals compared to their population size?
Additional Data: Populations of each country
RQ 4: Correlation between female wins and education
Do countries with better support for education for women also tend to see more success in female sports categories?
Additional Data: Education trends in each country
Good:
- Question 4 digs deeper into the relationships between a country’s culture and government; and outcomes at the Olympics. This is a good question.
Bad:
- Question 3 is a bit less interesting - you might uncover 1 or 2 interesting countries that win disproportionately, but you aren’t telling a rich story.
Visualizations


Good:
- Plots are appropriate to data type and address RQs
Bad:
- Not a lot of thought is put into these as far as good plot design, annotation and storytelling, etc.