Week 1, Part 3: Version Control with Git and GitHub

📖 Reading: 60-75 minutes

📽️ Videos: 0 minute(s)

✅ Check-ins: 4

0.1 Objectives

Most of this section is either heavily inspired by Happy Git and Github for the UseR (Bryan, Hester, and The Stat 545 TAs 2021) or directly links to that book.

  • Recognize the benefits of using version control to improve your coding practices and workflow.
  • Identify git / GitHub as a version control platform (and helper).
  • Install git onto your computer and register for a GitHub account
  • Start applying version control practices to your workflow.

1 What is Version Control?

Version control is a system that (1) allows you to store your files in the cloud, (2) track change in those files over time, and (3) share your files with others.

📖 Required Reading: Big Picture

2 Git

Git is a version control system - a structured way for tracking changes to files over the course of a project that may also make it easy to have multiple people working on the same files at the same time. Git manages a collection of files in a structured way—like “track changes” in Microsoft Word or version history in Google Docs, but much more powerful.

If you are working alone, you will benefit from adopting version control because it removes the need to add _final.qmd or _final_finalforreal.qmd to the end of your file names. However, most of us work in collaboration with other people (or will have to work with others eventually), so one of the goals of this program is to teach you how to use git because it is a useful tool that will make you a better collaborator.

In data science programming, we use git for a similar, but slightly different purpose. We use it to keep track of changes not only to code files, but to data files, figures, reports, and other essential bits of information.

Git Basics

Git tracks changes to each file that it is told to monitor, and as the files change, you provide short labels describing what the changes were and why they exist (called “commits”). The log of these changes (along with the file history) is called your commit history.

When writing papers, this means you can cut material out freely, so long as the paper is being tracked by git—you can always go back and get that paragraph you cut out if you need to. You also don’t have to rename files—you can confidently save over your old files, so long as you remember to commit frequently.

📖 Required Reading: Install Git

Person 1: 'This is GIT. It tracks collaborative work on projects through a beautiful distributed graph theory tree model'. Person 2: 'Cool, How do we use it?' Person 1: 'No Idea. Just memorize these shell commands and type them to sync up. If you get errors, save your work elsewhere, delete the project, and download a fresh copy.'

✅ Check-in 1.6: Install Git

We will be working with Git/GitHub every week for the next 10 weeks, starting this week! To be prepared for class, follow the instructions in the above reading to install Git onto your computer.

Once you have installed Git, tell me “yes” in the Canvas Quiz.

3 GitHub

Git by itself is nice enough, but where git really becomes amazing is when you combine it with GitHub—an online service that makes it easy to use git across many computers, share information with collaborators, publish to the web, and more. Git is great, but GitHub is … essential.

Similar to the differences between R and RStudio, git is a program that runs on your machine which includes a language for monitoring changes to specific files (similar to a programming language like R). GitHub is a website that hosts people’s git repositories (similar to a IDE like RStudio). You can use git without GitHub (like using R without RStudio), but you can’t use GitHub without git.

If you want, you can hook git up to GitHub, and make a copy of your local git repository that lives in the cloud. Then, if you configure things correctly, your local repository will talk to GitHub without too much trouble. Using Github with git allows you to easily make a cloud backup of your important code, so that even if your computer suddenly catches on fire, all your important code files exist somewhere else. Any data you don’t have in three different places is data you don’t care about.1

📖 Required Reading: Register for a GitHub Account

✅ Check-in 1.7 Register for a GitHub Account

Follow the instructions in Registering a GitHub Account to create a free GitHub account.

Copy and paste the link to your GitHub profile into the Canvas quiz.

(Optional) Register for the Student Developer Pack

I would highly recommend checking out GitHub Education and signing up for the GitHub Student Developer Pack. Signing up gets you unlimited private repositories among other perks.

Save your login information!

Make sure you remember your username and password so you don’t have to try to hack into your own account during class this week.

Write your information down somewhere safe.

4 Introducing Yourself to Git

Now that you have git downloaded and have a GitHub account, it is time to introduce yourself to git!

📖 Required Reading: Introduce Yourself to Git

Rather than using the terminal on your computer (like they do in the chapter above), let’s get familiar with the usethis package in R.

  1. Open RStudio.
  2. Run the following code in your console (bottom left), substituting your name for "Jane Doe" and the email associated with your GitHub account with "jane@example.org".
install.packages("usethis")

library(usethis)

# Change this to use your name and your GitHub email address! 
use_git_config(user.name = "Jane Doe", 
               user.email = "jane@example.org")

✅ Check-in 1.8: Introduce Yourself to git

Follow the instructions for introducing yourself to git by running the code in your console. Once you’ve run the code, take a screenshot of the output in your console.

5 Connecting Git, GitHub, and RStudio

In order to interact with a remote Git server (e.g., GitHub), we need to include our credentials. The credentials proves to GitHub who we are and that we are allowed to do what we are trying to do. There are a few ways to setup your credential, but we will specifically be using Personal Access Tokens or PATs.

No support for username & password credentials

Let it be known that the password that you use to login to GitHub’s website is NOT an acceptable credential when talking to GitHub as a Git server.

📖 Required Reading: Personal Access Tokens for HTTPS

Skip Section 9.2

We’re not using SSH, so feel free to skip that section!

✅ Check-in 1.9: PAT

Follow the instructions for setting up your own personal access token. When selecting an expiration either choose one of the options that will allow your PAT to last the entire quarter (90 days, No expiration, or use Custom and input a date after the end of the quarter).

usethis::create_github_token()
gitcreds::gitcreds_set()

Once you’ve completed this process, run the following code in your console and take a screenshot of the output you get. I’ve included the output I get when I run this, so you have an idea of how your output should look.

usethis::git_sitrep()
── Git global (user) 
• Name: 'Allison Theobold'
• Email: 'atheobol@calpoly.edu'
• Global (user-level) gitignore file: '~/.gitignore'
• Vaccinated: FALSE
ℹ See `?git_vaccinate` to learn more
• Default Git protocol: 'https'
• Default initial branch name: 'main'

── GitHub user 
• Default GitHub host: 'https://github.com'
• Personal access token for 'https://github.com': '<discovered>'
• GitHub user: 'atheobold'
• Token scopes: 'admin:org, admin:public_key, delete:packages, delete_repo, gist, notifications, repo, user, workflow, write:packages'
• Email(s): 'atheobol@calpoly.edu (primary)', 'theobold.allison970@gmail.com', '12439090+atheobold@users.noreply.github.com'
ℹ No active usethis project

6 Getting Started with GitHub

Now you are setup and ready to get started working with GitHub and RStudio for this week’s lab!

7 Learn More

Extra Resources

References

Bryan, Jenny, Jim Hester, and The Stat 545 TAs. 2021. Happy Git and GitHub for the useR. https://happygitwithr.com/.
Erica Heidi. 2020. “Stage. Commit. Push. A Git Story (Comic).” DEV Community. https://dev.to/erikaheidi/stage-commit-push-a-git-story-comic-a37.
The Coding Train. 2016. “Introduction: Git and GitHub for Poets.” https://www.youtube.com/watch?v=BCQHnlnPusY.
Traversy Media. 2017. Git & GitHub Crash Course For Beginners. https://www.youtube.com/watch?v=SWYqp7iY_Tc.
Wei, Jerry. 2019. “A Quick Guide to Using Command Line (Terminal).” Towards Data Science. https://towardsdatascience.com/a-quick-guide-to-using-command-line-terminal-96815b97b955.

Footnotes

  1. Yes, I’m aware that this sounds paranoid. It’s been a very rare occasion that I’ve needed to restore something from another backup. You don’t want to take chances.↩︎