Special Data Types

This week is all about special data types in R. Similar to the tools you learned last week for working with factors, this week you are going to learn about tools for working with strings and dates. By the end of this week you should be able to:


▶️ Watch Videos: 20 minutes

📖 Readings: 60-75 minutes

✅ Preview Activities: 2


1 Part 1: Strings

Nearly always, when multiple variables are stored in a single column, they are stored as character variables. There are many different “levels” of working with strings in programming, from simple find-and-replaced of fixed (constant) strings to regular expressions, which are extremely powerful (and extremely complicated).

📖 Required Reading: R4DS – Strings

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems. - Jamie Zawinski

Alternately, the xkcd version of the above quote
stringr

Download the stringr cheatsheet.

Table of string functions in the R stringr package. x is the string or vector of strings, pattern is a pattern to be found within the string, a and b are indexes, and encoding is a string encoding, such as UTF8 or ASCII.
Task stringr
Replace pattern with replacement str_replace(x, pattern, replacement) and str_replace_all(x, pattern, replacement)
Convert case str_to_lower(x), str_to_upper(x) , str_to_title(x)
Strip whitespace from start/end str_trim(x) , str_squish(x)
Pad strings to a specific length str_pad(x, …)
Test if the string contains a pattern str_detect(x, pattern)
Count how many times a pattern appears in the string str_count(x, pattern)
Find the first appearance of the pattern within the string str_locate(x, pattern)
Find all appearances of the pattern within the string str_locate_all(x, pattern)
Detect a match at the start/end of the string str_starts(x, pattern) ,str_ends(x, pattern)
Subset a string from index a to b str_sub(x, a, b)
Convert string encoding str_conv(x, encoding)

1.1 Regular Expressions

Matching exact strings is easy - it’s just like using find and replace.

library(stringr)

human_talk <- "blah, blah, blah. Do you want to go for a walk?"
dog_hears <- str_extract(human_talk, "walk")
dog_hears
[1] "walk"

But, if you can master even a small amount of regular expression notation, you’ll have exponentially more power to do good (or evil) when working with strings. You can get by without regular expressions if you’re creative, but often they’re much simpler.

Check-in 5.1: Functions from stringr

1 Which of the follow are differences between length() and str_length()?

  • length() gives the number of elements in a vector
  • str_length() gives the number of characters in a string
  • str_length() gives the number of strings in a vector
  • length() gives the dimensions of a dataframe

2 What of the following is true about str_replace()?

  • str_replace() replaces the first instance of the pattern
  • str_replace() replaces the last instance of the pattern
  • str_replace() replaces every instance of the pattern

3 str_trim() allows you to remove whitespace on what sides

  • left
  • right
  • both

4 Which of the following does str_sub() use to create a substring?

  • starting position
  • ending position
  • pattern to search for

5 Which of the following does str_subset() use to create a substring?

  • starting position
  • ending position
  • pattern to search for

6 What does the collapse argument do in str_c()?

  • specifies a string to be used when combining inputs into a single string
  • specifies whether the string should be collapsed

2 Part 2: Dates

In order to fill in an important part of our toolbox, we need to learn how to work with date variables. These variables feel like they should be simple and intuitive given we all work with schedules and calendars everyday. However, there are little nuances that we will learn to make working with dates and times easier.

📖 Required Reading: R4DS – Dates and Times

Check-in 5.2: Functions from lubridate

Q1 Which of the following is true about the year() function?

  • year() creates a duration object to be added to a datetime
  • year() extracts the year of a datetime object

Q3 What tz would you use for San Luis Obispo? Use the exact input you would use in R!

Q3 Which of the following is true about the %within% operator?

  • it checks if a date is included in an interval
  • it returns a logical value
  • it creates an interval with a start and end time

Q4 Which of the following is true about the %--% operator?

  • it creates an interval with a start and end time
  • it returns a logical value
  • it checks if a date is included in an interval

Q5 What day does the make_date() function use as default if no day argument is provided?