Writing Functions in R

Go on a walk and listen to a talk!

I’ve included quite a few talks in this coursework (1) because they do a great job discussing topics related to function design, and (2) they were given by my coding heroes (cough, Jenny Bryan 🤩).

It’s not critical for you to sit in front of a computer when listening to these talks, so I might recommend going on a walk and listening to one! The weather is too nice to sit in front of a computer all day!

Basics of Functions

If you do not recall the basics of writing functions, or if you want a quick refresher, watch the video below.

Good Function Design

Most likely, you have so far only written functions for your own convenience. (Or for assignments, of course!) We are now going to be designing functions for other people to use and possibly even edit them. This means we need to put some thought into the design of the function.

Debugging Functions

Suppose you’ve done it: You’ve written the most glorious, beautiful, well-designed function of all time. It’s many lines long, and it relies on several sub-functions.

You run it and - it doesn’t work.

How can you track down exactly where in your complicated functions, something went wrong?

Advanced Details

As this is an Advanced course, let’s take a moment to talk about two quirky details of how R handles functions.

Objects of Type Closure

In R, functions are objects. That is, creating a function is not fundamentally different from creating a vector or a data frame.

Here we store the vector 1, 2, 3 in the object named a:

a <- 1:3

a
[1] 1 2 3

Here we store the procedure “add one plus one” in the object named a:

a <- function(){
  1 + 1
}

a
function () 
{
    1 + 1
}

For some strange reason, there is a specific term in R for “an object that is a function”—closure. Have you ever gotten this error?

a[1]
Error in a[1]: object of type 'closure' is not subsettable

I bet you have! What happened here is that we tried to take a subset of the vector a. But a is a function, not a vector, so this doesn’t work! If you’ve encounter this error in the wild, it’s probably because you tried to reference a non-existent object. However, you used an object name that happens to also be an existing function.

Check-in: Object of Type “closure”

Question 7: What is the most likely cause of the error message Error in x[1] : object of type 'closure' is not subsettable?

  1. Trying to access an element of a list using parentheses () instead of square brackets []
  2. Attempting to subset a numeric vector using the wrong index type
  3. Trying to extract an element from a function using square bracket notation
  4. Passing a missing argument to a function

Lazy Evaluation

Like most people, R’s goal is to avoid doing any unnecessary work. When you “give” a value to an argument of a function, R does a quick check to make sure you haven’t done anything too crazy, like forgotten a parenthesis. Then it says, “Yep, looks like R code to me!” and moves on with its life. Only when that argument is actually used does R try to run the code.

Consider the following obvious problem:

mean('orange')
Warning in mean.default("orange"): argument is not numeric or logical:
returning NA
[1] NA

Now consider the following function:

silly_function <- function(x) {
  
  cat("I am silly!")
  
}

What do you think will happen when we run:

silly_function(
  x = mean("orange")
  )

Seems like it should be an error, right? But wait! Try it out for yourself.

The function silly_function() doesn’t use the x argument. Thus, R was “lazy”, and never even bothered to try to run mean("orange") - so we never get an error. 🙀

Check-in: Lazy Evaluation

Question 8: In R, when exactly does the evaluation of a function argument occur?

  1. Immediately when the function is called
  2. Only if and when the argument’s value is actually used within the function body
  3. When the function is compiled
  4. Only after all other arguments have been evaluated

Non-Standard Evaluation and Tunnelling

Suppose you want to write a function that takes a dataset, a categorical variable, and a quantitative variable; and returns the means by group.

You might think to yourself, “Easy!” and write something like this:

means_by_group <- function(dataset, cat_var, quant_var) {
  
  dataset %>%
    group_by(cat_var) %>%
    summarize(means = mean(quant_var, 
                           na.rm = TRUE)
              )
}

Okay, let’s run it!

means_by_group(penguins, 
               cat_var = species, 
               quant_var = bill_length_mm)
Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `cat_var` is not found.

Dagnabbit! The function tried to group the data by a variable named cat_var - but the dataset penguins doesn’t have any variables named cat_var!

What happened here is that the function group_by() uses non-standard evaluation. This means it has a very special type of input called unquoted. Notice that we say group_by(species) not group_by("species") - there are no quotation marks, because species is a variable name, not a string. In the means_by_group() function, R sees the unquoted variable cat_var, and tries to use it as an input in group_by(), not realizing that we actually meant to pass along the variable name species into the function.

Footnotes

  1. There are ways to cheat your way around this, but we will avoid them!↩︎