Scripts, Notebooks & Version Control

Thursday, September 26

Today we will…

  • Changes to Student Hours
  • Answer Clarifying Questions:
    • Syllabus?
    • Ungrading?
    • Participating in Dr. Theobold’s research on group collaborations?
  • New Material
    • Scripts + Notebooks
    • Version Control
  • Lab 1: Introduction to Quarto
  • Challenge 1: Modifying your Quarto Document

Changes to Student Hours

Changes to Student Hours


New Student Hours are:

Wednesdays from 4pm - 5pm
Thursdays from 2pm - 3pm

Questions about the Syllabus

What does “ungrading” mean?

“Ungrading is a pedagogical practice which entirely removes grades as a focus of the course. Instead, ungrading exclusively focuses on providing students with feedback aimed at helping them build proficiency to accomplish the course’s learning goals.”

How does this system work?

  • Each problem on an assignment is marked for proficiency (e.g., success, growing).
    • These marks are not attached to a grade outcome (e.g., A, B, C, D).
  • Grades are defined from a set of criteria, co-developed between you and me.
    • At the end of the course, you propose the grade you believe you’ve earned based on a portfolio of your work and the criteria we established.

A reminder about my research…

Groupworthy Data Science

The purpose of the study is to understand how an instructor’s pedagogy impacts the equity of group collaborations, specifically as they relate to pair programming.

My Collaborator

Judith Canner, Professor of Statistics

  • Joined faculty at CSUMB in 2010
  • Is CSUMB’s Statistics and Data Science Program Coordinator.
  • Favorite classes to teach are Data Visualization and Statistical Computing.
  • Learned how to program in R in 2005 when my PhD adviser handed me some code and said “modify this.”
  • Goal is to see statistics and data science education that is accessible to everyone!

Scripts + Notebooks

Scripts

  • Scripts (File > New File > R Script) are files of code that are meant to be run on their own.
  • Scripts can be run in RStudio by clicking the Run button at the top of the editor window when the script is open.

  • You can also run code interactively in a script by:

    • highlighting lines of code and hitting run.

    • placing your cursor on a line of code and hitting run.

    • placing your cursor on a line of code and hitting ctrl + enter or command + enter.

Notebooks

Notebooks are an implementation of literate programming.

  • They allow you to integrate code, output, text, images, etc. into a single document.

  • E.g.,

    • Quarto notebook
    • R Markdown notebook
    • Jupyter notebook

We love notebooks because they help us produce a reproducible analysis!

What is Markdown?

Markdown is a markup language.

It uses special symbols and formatting to make pretty documents.

  • *italics* – makes italics
  • **bold** – makes bold text
  • # – makes headers
  • ! – includes images or HTML links
  • < > – embeds URLs

Markdown files have the .md extension.

What is Quarto?

Quarto unifies and extends the R Markdown ecosystem.

The image displays a collection of hexagonal logos representing different R packages that are part of the R Markdown ecosystem. The logos from top to bottom, left to right, are: Xaringan, Distill, Blogdown, RMarkdown, Bookdown, Flexdashboard, Knitr, Rticles, RSConnect. These logos are arranged in a hexagonal shape, visually emphasizing the interconnected nature of these tools in the R Markdown ecosystem.

Quarto files have the .qmd extension.

Highlights of Quarto

  • Consistent implementation of attractive and handy features across outputs:

    • E.g., tabsets, code-folding, syntax highlighting, etc.
  • More accessible defaults and better support for accessibility.

  • Guardrails that are helpful when learning:

    • E.g., YAML completion, informative syntax errors, etc.
  • Support for other languages like Python, Julia, Observable, and more.

Quarto Formats

Quarto makes moving between outputs straightforward.

  • All that needs to change between these formats is a few lines in the front matter (YAML)!

Document

title: "Lesson 1"
format: html

Presentation

title: "Lesson 1"
format: revealjs

Website

project:
  type: website

website: 
  navbar: 
    left:
      - lesson-1.qmd

Quarto Components

The image shows a split-screen view of a Quarto document editor and the rendered output in a browser-like preview. On the left side, the code editor is displayed, and on the right side, the rendered HTML output of the document. On the left, the top section contains the front matter, which includes metadata about the document such as the title ('Hello, Quarto'), the format (HTML), and the editor (set to visual mode). Below this, there is an R code chunk where R packages are loaded (specifically, 'tidyverse' and 'palmerpenguins'). This section is marked with `{r`} at the beginning of the code chunk, and the chunk is labeled 'load-packages.' Some options, like 'include: false', are present to prevent this code from appearing in the final rendered output. Following the code is Markdown content, which includes text descriptions and headings such as 'Meet Quarto' and 'Meet the Penguins'. There are also inline code elements and hyperlinks, like a link to the Palmer Penguins dataset. On the right side of the image, the rendered output is displayed, showing formatted text and visuals. The title 'Hello, Quarto' is shown as a heading, followed by a description of Quarto and a section discussing penguins. A link to the Palmer Penguins dataset is included, along with a colorful illustration of three penguins representing different species (Chinstrap, Gentoo, and Adélie). Below this text, a plot is displayed, showing the relationship between flipper length and bill length for these penguin species. This image highlights how Quarto documents combine code, Markdown, and front matter to create a dynamic and executable report.

How does Quarto know that a section of text should be interpreted as R code?

R Code Options in Quarto

R code chunk options are included at the top of each code chunk, prefaced with a #| (hashpipe).

  • These options control how the following code is run and reported in the final Quarto document.
  • Some R code options can also be included in the front matter (YAML) which would be applied globally to the entire document.

R Code Options in Quarto

The image displays a table with two columns: 'Option' and 'Description.' It lists various options that can be used in code chunks within Quarto documents, along with their descriptions.The first option is 'eval,' which evaluates the code chunk. If 'false,' it only echoes the code into the output without executing it. The second option is 'echo,' which includes the source code in the output. Next is 'output,' which includes the results of executing the code in the output. The possible values for this option are 'true,' 'false,' or 'asis.' If set to 'asis,' the output will be in raw markdown and won't have any of Quarto's standard enclosing markdown. The 'warning' option includes warnings in the output. Lastly, the 'error' option includes errors in the output. It notes that enabling this option means errors while executing the code won't halt the document's processing.

Chunk Option Completion in Quarto

The image demonstrates three different aspects of YAML code completion and diagnostics in Quarto. At the top, under the heading 'YAML completion for fields,' there is a code editor where the user is typing the beginning of an R chunk ('{r'). As they begin typing '#| e', a code completion menu appears suggesting YAML fields such as 'eval', 'echo', 'external', and 'error', among others. The highlighted field is `eval`, and a description in a yellow box appears, explaining that this option evaluates the code chunk, or, if set to 'false', it only echoes the code into the output. In the middle, under the heading 'YAML completion for options,' the code editor shows the 'eval' field typed out as '#| eval:'. As the user begins entering a value, a dropdown suggests the options 'true' and 'false' for the field, representing valid boolean options. At the bottom, under the heading 'YAML diagnostics for errors,' there is an example of an incorrect entry. The user has written '#| eval: FALSE' (with uppercase 'FALSE'), which leads to a red error indicator in the editor. A tooltip appears, pointing out the mistake and suggesting that the value should be lowercase 'false' instead. The image showcases how Quarto assists users with code completion, option suggestions, and real-time error diagnostics in the YAML section of code chunks.

Rendering your Quarto Document

To take your .qmd file and make it look pretty, you have to render it.

The image shows the toolbar from an RStudio or Quarto editor interface. At the top, it displays the filename 'hello.qmd' (indicating a Quarto Markdown file). There are several options visible in the toolbar, and a specific button labeled 'Render' is highlighted with a pink outline. This 'Render' button is used to generate the final output (such as HTML, PDF, or other formats) based on the Quarto document. Additionally, there is an option for 'Render on Save,' which allows automatic rendering whenever the document is saved. The toolbar also includes options for adjusting the view between 'Source' and 'Visual' modes, formatting tools, inserting content, and running code chunks. The image emphasizes the 'Render' function, which is essential for compiling and viewing the Quarto document's output.

The image shows a toolbar from an RStudio or Quarto editor interface, similar to the previous one. At the top, the filename 'hello.qmd' is displayed, indicating the file being worked on is a Quarto Markdown document. In this image, the 'Render on Save' option is highlighted with a pink outline, indicating that it is selected or being emphasized. This feature automatically triggers rendering of the document every time it is saved, which is useful for seeing updates to the output in real-time without manually pressing the 'Render' button. In addition to the 'Render on Save' option, the toolbar also includes the 'Render' button, formatting tools, and options for switching between 'Source' and 'Visual' editing modes. Other options for running code chunks, adjusting formatting, and inserting elements are also visible, but the main focus here is on the 'Render on Save' functionality.

Rendering your Quarto Document

Quarto CLI (command line interface) orchestrates each step of rendering:

  1. Process the executable code chunks with either knitr or jupyter.
  2. Convert the resulting Markdown file to the desired output.

A diagram of the proecess of rendering a Quarto document. First the qmd is passed to knitr or jupyter, then it is passed to markdown, finally it's passed to pandoc, resulting in a PDF, Word, or HTML output file.

Rendering your Quarto Document

When you click Render:

  1. Your file is saved.
  2. The R code written in your .qmd file gets run in order.
  • It starts from scratch, even if you previously ran some of the code in RStudio.
  1. A new file is created.
  • If your Quarto file is called “Lab1.qmd”, then a file called “Lab1.html” will be created.
  • This will be saved in the same folder as “Lab1.qmd”.

Version Control

Version Control

A process of tracking changes to a file or set of files over time so that you can recall specific versions later.

Git vs GitHub

The image is the official logo for Git, a version control system. It consists of a diamond-shaped orange symbol with white lines inside that resemble a branching structure, symbolizing version control and branching in Git. Next to the symbol is the word 'Git' written in bold, black lowercase letters. The logo visually represents Git's core function of managing changes in code through branching and merging.

  • A system for version control that manages a collection of files in a structured way.
  • Uses the command line or a GUI.
  • Git is local.

Git vs GitHub

git's logo, a red diamond, with two 'branches', one large branch and one smaller branch stemming from the main branch.

  • A system for version control that manages a collection of files in a structured way.
  • Uses the command line or a GUI.
  • Git is local.

GitHub's logo, a black circle, with the outline of a cat in white. The cat seems to have a snake-like tail.

  • A cloud-based service that lets you use git across many computers.
  • Basic services are free, advanced services are paid (like RStudio!).
  • GitHub is remote.

Why Learn GitHub?

  1. GitHub provides a structured way for tracking changes to files over the course of a project.
  • Think Google Docs or Dropbox history, but more structured and powerful!
  1. GitHub makes it easy to have multiple people working on the same files at the same time.

  2. You can host a URL of fun things (like the class text, these slides, the course website, etc.) with GitHub pages.

Git Repositories

Git is based on repositories.

  • Think of a repository (repo) as a directory (folder) for a single project.
    • This directory will likely contain code, documentation, data, to do lists, etc. associated with the project.
    • You can link a local repo with a remote copy.

A red file folder, with the git logo on it (one large branch with one smaller branch stemming off of it).

Actions in Git

Cloning a Repo


Create an exact copy of a remote repo on your local machine.

A diagram of the process of cloning a repository. At the top of the picture, there is a cloud (representing the internet), with a pink box labeled 'remote' symbolizing the remote GitHub repository. There is a down arrow connecting the cloud to a laptop, mimicking the process of cloning a remote repository onto a local computer. The laptop has a greeen box labeled 'local' symbolizing the local copy of the remote GitHub repository.

Committing Changes

Tell git you have made changes you want to add to the repo.

  • Also provide a commit message – a short label describing what the changes are and why they exist.

The red line is a change we commit (add) to the repo.

A diagram of the process of committing changes that were made to a document. On the left is a document with four lines of text. The third line is colored red, to symbolize where a change was made, while the other lines are colored black. There is a right arrow connecting the document to a laptop, with the phrase 'git commit' printed above the arrow. The arrow terminates at a green box labeled 'local' on the laptop, symbolizing committing changes made to the document to the local repository.

The log of these changes is called your commit history.

  • You can always go back to old copies!

Commit Tips

  • Use short, but informative commit messages.
  • Commit small blocks of changes – commit every time you accomplish a small task (e.g., one problem in the lab).
    • You’ll have a set of bite-sized changes (with description) to serve as a record of what you’ve done.
    • With frequent commits, its easier to find the issue if / when you mess up!

Pushing Changes


Update the copy of your repo on GitHub so it has the most recent changes you’ve made on your machine.

A diagram of the process of pushing local changes to the remote repository. There is a laptop with a green box labeled 'local' symbolizing the local copy of the GitHub repository. Above the laptop is cloud with a pink box labeled 'remote' symbolizing the remote GitHub repository (that lives on the internet). There is an arrow pointing from the laptop to the cloud with the phrase 'git push' next to the arrow, symbolizing the action of pushing the local changes (that have been committed) up to the remote repository.

Pulling Changes


Update the local copy of your repo (the copy on your computer) with the version on GitHub.

A diagram of the process of pulling from the remote repository to update the local repository. There is a laptop with a green box labeled 'local' symbolizing the local copy of the GitHub repository. Above the laptop is cloud with a pink box labeled 'remote' symbolizing the remote GitHub repository (that lives on the internet). There is an arrow pointing from the cloud to the laptop with the phrase 'git pull' next to the arrow, symbolizing the action of pull the changes that exist on the remote repository (possibly from a different computer) to update the local repository.

Workflow

When you have an existing local repo:

  1. Pull the repo to make sure you have the most up to date version (especially if you are working on different computers).
  2. Make some changes locally.
  3. Commit the changes to git.
  4. Push your changes to GitHub.

Connect GitHub to RStudio

Previous Steps

You were asked to complete the following steps before coming to class today:

  1. Create a GitHub account
  2. Introduce yourself to git (in RStudio)
  3. Generate a Personal Access Token (PAT)
  4. Store your PAT in RStudio

Verifying Your Connection

Open RStudio and run the following code in your console (lower left pane):


usethis::git_sitrep()

You should see something like:

── GitHub user 
• Default GitHub host: 'https://github.com'
• Personal access token for 'https://github.com': '<discovered>'
• GitHub user: 'atheobold'
• Token scopes: 'admin:org, admin:public_key, delete:packages, delete_repo, gist, notifications, repo, user, workflow, write:packages'
• Email(s): 'atheobol@calpoly.edu (primary)', 'theobold.allison970@gmail.com', '12439090+atheobold@users.noreply.github.com'
ℹ No active usethis project

If that is not the case, Dr. Theobold will help you troubleshoot in 5-minutes!

Accessing Lab 1

Accessing Lab 1

Here are step by step directions: Copying the Lab Assignment with GitHub Classroom in 11 Steps


Step 1: Open the Lab 1 assignment on GitHub Classroom

Step 2: Open your Lab 1 repository

Step 3: Clone the repository to your computer

Once You’ve Cloned the Repo

Step 4: Open the lab-1.qmd file

Step 5: Change your name

Step 6: Commit your change (with a nice message!)

Step 7: Push your change

Lab 1 & Challenge 1 Instructions

*I would highly recommend having these pulled up alongside RStudio while you work!

To do…

  • Lab 1: Introduction to Quarto
    • Due Sunday (9/29) at 11:59pm
  • Challenge 1: Modifying Your Quarto Document
    • Due Sunday (9/29) at 11:59pm
  • Complete the Week 2 Coursework
    • Check-ins 2.1, 2.2, 2.3 due Tuesday (10/1) by the start of class