Introduction to Version Control

Dr. Theobold

Thursday, September 26

Today we will…

  • Review Key Feedback from Lab 4 & Challenge 4
    • Discuss New Structure for Revisions
  • New Material
    • Version Control
  • Setting-up Portfolio
    • Forking the Final Portfolio Repository
    • Cloning into RStudio
    • Making a Small Change, Committing & Pushing

Lab 4 Common Mishaps

  • Q1: Who collected these data? When? Why? How?

  • Q4: Column titles of 2008 and 2018 are not descriptive!

    • Creating column names that describe the values stored in those columns!
    • The names_prefix = argument to pivot_wider() can help you make better column names!
    • DVS-6: I can create tables which make my summaries clear to the reader
  • Q4: Unless you specify .groups = "drop" within summarize() your table still is grouped!

    • group_by() + summarize() only drops the first group.
    • If you have two variables inside group_by(), then the data will still be grouped by the second variable!
  • Q7: The data description contains important information!

    • mc_toddler – Aggregated weekly, full-time median price charged for Center-based Care for toddlers.
    • mhi_2018Median household income expressed in 2018 dollars.

Superseded Functions

PE-4: I can use modern tools when carrying out my analysis.

  • recode() from the dplyr package has been superseded for the case_match() function.
    • case_match() is the SQL cousin of case_when()
  • Neither of these functions live in the forcats package which includes the tools we learned!
    • Neither is factor()!
    • Maybe try something from forcats!

Other “Big Picture” Code Feedback

I strongly recommend against nested functions, as they are difficult for people to understand what your code is doing. Having two lines is not less efficient and is more readable.


mutate(age_group = fct_relevel(fct_recode(age_group,
                                          "Infant" = "mc_infant",
                                          "Toddler" = "mc_toddler",
                                          "Preschool" = "mc_preschool"),
                                "Infant",
                                "Toddler",
                                "Preschool"))

Non-Nested Functions

I strongly recommend against nested functions, as they are difficult for people to understand what your code is doing. Having two lines is not less efficient and is more readable.


mutate(age_group = fct_recode(age_group,
                              "Infant" = "mc_infant",
                              "Toddler" = "mc_toddler",
                              "Preschool" = "mc_preschool"),
       age_group = fct_relevel(age_group, 
                                "Infant",
                                "Toddler",
                                "Preschool")
       )

Saving Objects That Aren’t Worth Saving

We should only save objects that we need to use later!

lowest_child_care_price_2018 <- ca_childcare |>
  filter(study_year == 2018) |>
  group_by(region) |>   
  summarise(median_infant_price = median(mc_infant)) |> 
  slice_min(order_by = median_infant_price)

lowest_child_care_price_2018

Challenge 3

DVS-2: I use plot modifications to make my visualizations clearer to the reader

  • Facets ordered based on developmental stage not alphabetically.
  • Ordering colors in the legend so they appear in the same order as the lines in the plot.
  • Adding $ signs to axis labels.
  • Not making people tilt their head to read your plot.
  • Using meaningful labels (e.g., “Infant” not “mfcc_infant”).

Challenge 3

DVS-3: I show creativity in my visualizations

Challenge 3

DVS-3: I show creativity in my visualizations

  • Exploring different plot themes
    • Personally, I like theme_bw(), but you might like others!
    • Remember to keep major gridlines! They are important for readablity!

New Structure for Lab Revisions

Due to the time it is taking me to give feedback for the lab and challenge assignments, I need to find a new plan for revisions.

New Plan:

  • Spend 45-minutes of class on Thursdays working on revisions with people around you, and discussing your revisions with me.
  • Revisions will be graded as “Success” or “Growing” with no additional feedback provided.
  • Additional feedback can be given during student hours!

Version Control

Version Control

A process of tracking changes to a file or set of files over time so that you can recall specific versions later.

Git vs GitHub

git's logo, a red diamond, with two 'branches', one large branch and one smaller branch stemming from the main branch.

  • A system for version control that manages a collection of files in a structured way.
  • Uses the command line or a GUI.
  • Git is local.

GitHub's logo, a black circle, with the outline of a cat in white. The cat seems to have a snake-like tail.

  • A cloud-based service that lets you use git across many computers.
  • Basic services are free, advanced services are paid (like RStudio!).
  • GitHub is remote.

Why learn version control?

  1. GitHub provides a structured way for tracking changes to files over the course of a project.
  • Think Google Docs or Dropbox history, but more structured and powerful!
  1. GitHub makes it easy to have multiple people working on the same files at the same time.

  2. You can host a URL of fun things (like the class text, these slides, the course website, etc.) with GitHub pages.

Preparatory Work

You were asked to complete the following steps before coming to class today:

  1. Create a GitHub account
  2. Introduce yourself to git (in RStudio)
  3. Generate a Personal Access Token (PAT)
  4. Store your PAT in RStudio

Using git and GitHub in RStudio

I’m going to guide you through how to interact with git and GitHub through RStudio. This is not the only way to do this! If you are comfortable with version control, feel free to use the tool that you like best.

Git Repositories

Git is based on repositories.

  • Think of a repository (repo) as a directory (folder) for a single project.
    • This directory will likely contain code, documentation, data, to do lists, etc. associated with the project.

A red file folder, with the git logo on it (one large branch with one smaller branch stemming off of it).

Actions in Git

Forking a Repo

Make your own copy of a project.

  • The original repository is like the main version of a project.
  • When you fork it, you make your own copy of that project under your own account.
  • You can freely make changes, test ideas, or add new features in your fork.
    • The original project stays untouched.

A picture of the graphic used by GitHub to represent a 'fork' of a repository.

Your Turn

Navigate the the Final Portfolio repository linked on Canvas.

Create your own fork of that repository.

Cloning a Repo

Download the project onto your computer.

  • A repository (repo) lives online.
  • When you clone it, you make a local copy on your computer.
    • All the project files, history, and branches come with it.

A diagram of the process of cloning a repository. At the top of the picture, there is a cloud (representing the internet), with a pink box labeled 'remote' symbolizing the remote GitHub repository. There is a down arrow connecting the cloud to a laptop, mimicking the process of cloning a remote repository onto a local computer. The laptop has a greeen box labeled 'local' symbolizing the local copy of the remote GitHub repository.

Your Turn

Navigate to your fork of the Final Project Portfolio repository – the one you own!

Follow the instructions posted on Canvas to clone that repository into RStudio.

Committing Changes

Record any changes you’ve made.

  • Whenever you make changes you can “commit” those changes to record them permanently in your project’s history.
  • You also provide a commit message – describing what changes you made.
    • The log of these changes is called your commit history.
    • You can always go back to old copies!

A diagram of the process of committing changes that were made to a document. On the left is a document with four lines of text. The third line is colored red, to symbolize where a change was made, while the other lines are colored black. There is a right arrow connecting the document to a laptop, with the phrase 'git commit' printed above the arrow. The arrow terminates at a green box labeled 'local' on the laptop, symbolizing committing changes made to the document to the local repository.

Commit Tips

  • Commit small blocks of changes.
    • Commit every time you accomplish a small task (e.g., one learning target).
    • With frequent commits, its easier to find the issue if / when you mess up!
  • Use short, but informative commit messages.
  • You’ll end up with a set of bite-sized changes (with description) to serve as a record of what you’ve done.

Your Turn

Dr. Theobold will live code this process!

Pushing Changes

Send your commit to the remote repository so others can see and access them.

  • Send the changes you’ve saved in your commit to GitHub.
    • No one else can see the changes you’ve made locally unless you push them!

A diagram of the process of pushing local changes to the remote repository. There is a laptop with a green box labeled 'local' symbolizing the local copy of the GitHub repository. Above the laptop is cloud with a pink box labeled 'remote' symbolizing the remote GitHub repository (that lives on the internet). There is an arrow pointing from the laptop to the cloud with the phrase 'git push' next to the arrow, symbolizing the action of pushing the local changes (that have been committed) up to the remote repository.

Your Turn

Dr. Theobold will live code this process!

Pulling Changes

Download the latest changes from the remote repository into your local copy.

  • Your updates are saved online when you commit and push.
  • To see those new edits in your offline copy, you have to pull them down.

Only relevant if you are working on two computers!

If this is you, keeping both local versions up to date will be important!

A diagram of the process of pulling from the remote repository to update the local repository. There is a laptop with a green box labeled 'local' symbolizing the local copy of the GitHub repository. Above the laptop is cloud with a pink box labeled 'remote' symbolizing the remote GitHub repository (that lives on the internet). There is an arrow pointing from the cloud to the laptop with the phrase 'git pull' next to the arrow, symbolizing the action of pull the changes that exist on the remote repository (possibly from a different computer) to update the local repository.

Workflow

  1. Pull the repo to make sure you have the most up to date version (especially if you are working on different computers).
  2. Make some changes locally.
  3. Commit the changes.
  4. Push your changes to GitHub.

Final Portfolio

Resources Available to You

To do…

  • Lab 4 revisions due Friday at 11:59pm
  • Final Portfolio due Sunday at 11:59pm