Lab 7: Functions + Fish

Author

Instructions

Accessing the Lab

Download the template Lab 7 Quarto file here: lab-7-student.qmd

Download the data for the lab here: BlackfootFish.csv

Important

Be sure to save these in the Lab 7 folder, inside your Week 7 folder, inside your STAT 331 folder!

library(tidyverse)
fish <- read.csv("BlackfootFish.csv")

The Data

This lab’s data concerns mark-recapture data on four species of trout from the Blackfoot River outside of Helena, Montana. These four species are rainbow trout (RBT), westslope cutthroat trout (WCT), bull trout, and brown trout.

Mark-recapture is a common method used by ecologists to estimate a population’s size when it is impossible to conduct a census (count every animal). This method works by tagging animals with a tracking device so that scientists can track their movement and presence.

Data Exploration

The measurements of each captured fish were taken by a biologist on a raft in the river. The lack of a laboratory setting opens the door to the possibility of measurement errors.

1. Let’s look for missing values in the dataset. Output ONE table that answers BOTH of the following questions:

  • How many observations have missing values?
  • What variable(s) have missing values present?
Tip

Hint: use across().

2. Use a pivot to transform your summary table from Question 1 into a table that is easier to read.

Important

Make sure your table has intuitive column names that describe their contents!

3. Create ONE thoughtful visualization that explores the frequency of missing values across the different years, sections, and trips.

Tip

Hint: The data you want to be plotting are only the data with missing values!

Rescaling the Data

If I wanted to rescale every quantitative variable in my dataset so that they only have values between 0 and 1, I could use this formula:


\[y_{scaled} = \frac{y_i - min\{y_1, y_2,..., y_n\}}{max\{y_1, y_2,..., y_n\} - min\{y_1, y_2,..., y_n\}}\]


I might write the following R code to carry out the rescaling procedure for the length and weight columns of the BlackfootFish data:

fish <- fish |> 
  mutate(length = (length - min(length, na.rm = TRUE)) / 
           (max(length, na.rm = TRUE) - min(length, na.rm = TRUE)), 
         weight = (weight - min(weight, na.rm = TRUE)) / 
           (max(weight, na.rm = TRUE) - min(length, na.rm = TRUE)))

This process of duplicating an action multiple times can make it difficult to understand the intent of the process. Additionally, it can make it very difficult to spot mistakes. When you find yourself copy-pasting lines of code, it’s time to write a function, instead!

4. Transform the repeated process above into a rescale_01() function. Your function should…

  • … take a single vector as input.
  • … return the rescaled vector.
Tip

Think about the efficiency of your function. Are you calling the same function multiple times?

5. Let’s incorporate some input validation into your function. Modify your previous code so that the function stops if …

  • … the input vector is not numeric.
  • … the length of the input vector is not greater than 1.
Important

Do not create a new code chunk here – simply add these stops to your function above!

Test Your Function

6. Run the code below to test your function. Verify that the maximum of your rescaled vector is 1 and the minimum is 0!

x <- c(1:25, NA)

rescaled <- rescale_01(x)
min(rescaled, na.rm = TRUE)
max(rescaled, na.rm = TRUE)

Next, let’s test the function on the length column of the BlackfootFish data.

7. The code below makes a histogram of the original values of length. Add a plot of the rescaled values of length. Output your plots side-by-side, so the reader can confirm the only aspect that has changed is the scale.

Warning

This will require you to call your rescale_01() function within a mutate() statement to create a length_scaled variable.

Tip
  1. Set the y-axis limits for both plots to go from 0 to 4000 to allow for direct comparison across plots.

  2. Pay attention to binwidth! Adjust it so that the plots are comparable (they may not look exactly the same).

  3. Look for a Quarto code-chunk option to put the plots side-by-side.

fish |>  
  ggplot(mapping = aes(x = length)) + 
  geom_histogram(binwidth = 45) +
  labs(x = "Original Values of Fish Length (mm)") +
  scale_y_continuous(limits = c(0, 4000))

# Code for Q7 plot.