Using stringr to Work with Strings

Monday, April 29

Today we will…

  • New layout this week
  • What you can expect in Week 6
  • New material
    • String variables
    • Functions for working with strings
    • Regular expressions
  • PA 5.1: Scrambled Message

Week 5 Layout

Week 5 Layout

  • Today: Strings with stringr
    • Practice Activity: Decoding a Message
  • Thursday: Dates with lubridate
    • Practice Activity: Jewel Heist
  • Lab Assignment Solving a Murder Mystery
    • Using dplyr + stringr + ludridate

Week 6 Layout

Week 6 Layout

  • Tuesday: Writing Basic Functions
    • Practice Activity
  • Thursday: Midterm Portfolio Work Session
    • Midterm Portfolios Due Sunday, November 3

String Variables

What is a string?

A string is a bunch of characters.

There is a difference between…

…a string (many characters, one object)…

and

…a character vector (vector of strings).

my_string <- "Hi, my name is Bond!"
my_string
[1] "Hi, my name is Bond!"
my_vector <- c("Hi", "my", "name", "is", "Bond")
my_vector
[1] "Hi"   "my"   "name" "is"   "Bond"

stringr

Common tasks

  • Identify strings containing a particular pattern.
  • Remove or replace a pattern.
  • Edit a string (e.g., make it lowercase).

Note

  • The stringr package loads with tidyverse.
  • All functions are of the form str_xxx().

pattern =

The pattern argument appears in many stringr functions.

  • The pattern must be supplied inside quotes.
my_vector <- c("Hello,", 
               "my name is", 
               "Bond", 
               "James Bond")

str_detect(my_vector, pattern = "Bond")
str_locate(my_vector, pattern = "James Bond")
str_match(my_vector, pattern = "[bB]ond")
str_extract(my_vector, pattern = "[jJ]ames [bB]ond")


Let’s explore these functions!

str_detect()

Returns a logical vector indicating whether the pattern was found in each element of the supplied vector.

my_vector <- c("Hello,", 
               "my name is", 
               "Bond", 
               "James Bond")
str_detect(my_vector, pattern = "Bond")
[1] FALSE FALSE  TRUE  TRUE
  • Pairs well with filter().
  • Works with summarise() + sum (to get total matches) or mean (to get proportion of matches).

Related Function

str_which() returns the indexes of the strings that contain a match.

str_match()

Returns a character matrix containing either NA or the pattern, depending on if the pattern was found.

my_vector <- c("Hello,", 
               "my name is", 
               "Bond", 
               "James Bond")

str_match(my_vector, pattern = "Bond")
     [,1]  
[1,] NA    
[2,] NA    
[3,] "Bond"
[4,] "Bond"

str_extract()

Returns a character vector with either NA or the pattern, depending on if the pattern was found.

my_vector <- c("Hello,", 
               "my name is", 
               "Bond", 
               "James Bond")

str_extract(my_vector, pattern = "Bond")
[1] NA     NA     "Bond" "Bond"

Warning

str_extract() only returns the first pattern match.

Use str_extract_all() to return every pattern match.

What do you mean by the first match?

Suppose we had a slightly different vector…

alt_vector <- c("Hello,", 
               "my name is", 
               "Bond, James Bond")

If we were to extract every instance of "Bond" from the vector…

str_extract(alt_vector, 
            pattern = "Bond")
[1] NA     NA     "Bond"
str_extract_all(alt_vector, 
                pattern = "Bond")
[[1]]
character(0)

[[2]]
character(0)

[[3]]
[1] "Bond" "Bond"

str_locate()

Returns a dateframe with two numeric variables – the starting and ending location of the pattern. The values are NA if the pattern is not found.

my_vector <- c("Hello,", 
               "my name is", 
               "Bond", 
               "James Bond")

str_locate(my_vector, pattern = "Bond")
     start end
[1,]    NA  NA
[2,]    NA  NA
[3,]     1   4
[4,]     7  10

Related Function

str_sub() extracts values based on a starting and ending location.

str_subset()

Returns a character vector containing a subset of the original character vector consisting of the elements where the pattern was found.

my_vector <- c("Hello,", 
               "my name is", 
               "Bond", 
               "James Bond")

str_subset(my_vector, pattern = "Bond")
[1] "Bond"       "James Bond"

Try it out!

my_vector <- c("I scream,", 
               "you scream", 
               "we all",
               "scream",
               "for",
               "ice cream")

str_detect(my_vector, pattern = "cream")
str_locate(my_vector, pattern = "cream")
str_match(my_vector, pattern = "cream")
str_extract(my_vector, pattern = "cream")
str_subset(my_vector, pattern = "cream")

Note

For each of these functions, write down:

  • the object structure of the output.
  • the data type of the output.
  • a brief explanation of what they do.

Replace / Remove Patterns

Replace the first matched pattern in each string.

  • Pairs well with mutate().
str_replace(my_vector, 
            pattern = "Bond", 
            replace = "Franco")
[1] "Hello,"       "my name is"   "Franco"       "James Franco"


Related Function

str_replace_all() replaces all matched patterns in each string.

Remove the first matched pattern in each string.

str_remove(my_vector, 
           pattern = "Bond")
[1] "Hello,"     "my name is" ""           "James "    


Related Functions

This is a special case of str_replace(x, pattern, replacement = "").

str_remove_all() removes all matched patterns in each string.

Edit Strings

Convert letters in a string to a specific capitalization format.

str_to_lower() converts all letters in a string to lowercase.


str_to_lower(my_vector)
[1] "hello,"     "my name is" "bond"       "james bond"

str_to_upper() converts all letters in a string to uppercase.


str_to_upper(my_vector)
[1] "HELLO,"     "MY NAME IS" "BOND"       "JAMES BOND"

str_to_title() converts the first letter of each word to uppercase.


str_to_title(my_vector)
[1] "Hello,"     "My Name Is" "Bond"       "James Bond"

This is handy for axis labels!

Combine Strings

Join multiple strings into a single character vector.

prompt <- "Hello, my name is"
first  <- "James"
last   <- "Bond"
str_c(prompt, last, ",", first, last, sep = " ")
[1] "Hello, my name is Bond , James Bond"

Note

Similar to paste() and paste0().

Combine a vector of strings into a single string.

my_vector <- c("Hello,", 
               "my name is", 
               "Bond", 
               "James Bond")

str_flatten(my_vector, collapse = " ")
[1] "Hello, my name is Bond James Bond"

Use variables in the environment to create a string based on {expressions}.

first <- "James"
last <- "Bond"
str_glue("My name is {last}, {first} {last}")
My name is Bond, James Bond

Tip

For more details, I would recommend looking up the glue R package!

Tips for String Success

  • Refer to the stringr cheatsheet

  • Remember that str_xxx functions need the first argument to be a vector of strings, not a dataset!

    • You will use these functions inside dplyr verbs like filter() or mutate().
cereal |> 
  mutate(is_bran = str_detect(name, "Bran"), 
         .after = name)
name is_bran manuf type calories protein fat sodium fiber carbo sugars potass vitamins shelf weight cups rating
100% Bran TRUE N cold 70 4 1 130 10.0 5.0 6 280 25 3 1.00 0.33 68.40297
100% Natural Bran TRUE Q cold 120 3 5 15 2.0 8.0 8 135 0 3 1.00 1.00 33.98368
All-Bran TRUE K cold 70 4 1 260 9.0 7.0 5 320 25 3 1.00 0.33 59.42551
All-Bran with Extra Fiber TRUE K cold 50 4 0 140 14.0 8.0 0 330 25 3 1.00 0.50 93.70491
Almond Delight FALSE R cold 110 2 2 200 1.0 14.0 8 -1 25 3 1.00 0.75 34.38484
Apple Cinnamon Cheerios FALSE G cold 110 2 2 180 1.5 10.5 10 70 25 1 1.00 0.75 29.50954
Apple Jacks FALSE K cold 110 2 0 125 1.0 11.0 14 30 25 2 1.00 1.00 33.17409
Basic 4 FALSE G cold 130 3 2 210 2.0 18.0 8 100 25 3 1.33 0.75 37.03856
Bran Chex TRUE R cold 90 2 1 200 4.0 15.0 6 125 25 1 1.00 0.67 49.12025
Bran Flakes TRUE P cold 90 3 0 210 5.0 13.0 5 190 25 3 1.00 0.67 53.31381
Cap'n'Crunch FALSE Q cold 120 1 2 220 0.0 12.0 12 35 25 2 1.00 0.75 18.04285
Cheerios FALSE G cold 110 6 2 290 2.0 17.0 1 105 25 1 1.00 1.25 50.76500
Cinnamon Toast Crunch FALSE G cold 120 1 3 210 0.0 13.0 9 45 25 2 1.00 0.75 19.82357
Clusters FALSE G cold 110 3 2 140 2.0 13.0 7 105 25 3 1.00 0.50 40.40021
Cocoa Puffs FALSE G cold 110 1 1 180 0.0 12.0 13 55 25 2 1.00 1.00 22.73645
Corn Chex FALSE R cold 110 2 0 280 0.0 22.0 3 25 25 1 1.00 1.00 41.44502
Corn Flakes FALSE K cold 100 2 0 290 1.0 21.0 2 35 25 1 1.00 1.00 45.86332
Corn Pops FALSE K cold 110 1 0 90 1.0 13.0 12 20 25 2 1.00 1.00 35.78279
Count Chocula FALSE G cold 110 1 1 180 0.0 12.0 13 65 25 2 1.00 1.00 22.39651
Cracklin' Oat Bran TRUE K cold 110 3 3 140 4.0 10.0 7 160 25 3 1.00 0.50 40.44877
Cream of Wheat (Quick) FALSE N hot 100 3 0 80 1.0 21.0 0 -1 0 2 1.00 1.00 64.53382
Crispix FALSE K cold 110 2 0 220 1.0 21.0 3 30 25 3 1.00 1.00 46.89564
Crispy Wheat & Raisins FALSE G cold 100 2 1 140 2.0 11.0 10 120 25 3 1.00 0.75 36.17620
Double Chex FALSE R cold 100 2 0 190 1.0 18.0 5 80 25 3 1.00 0.75 44.33086
Froot Loops FALSE K cold 110 2 1 125 1.0 11.0 13 30 25 2 1.00 1.00 32.20758
Frosted Flakes FALSE K cold 110 1 0 200 1.0 14.0 11 25 25 1 1.00 0.75 31.43597
Frosted Mini-Wheats FALSE K cold 100 3 0 0 3.0 14.0 7 100 25 2 1.00 0.80 58.34514
Fruit & Fibre Dates; Walnuts; and Oats FALSE P cold 120 3 2 160 5.0 12.0 10 200 25 3 1.25 0.67 40.91705
Fruitful Bran TRUE K cold 120 3 0 240 5.0 14.0 12 190 25 3 1.33 0.67 41.01549
Fruity Pebbles FALSE P cold 110 1 1 135 0.0 13.0 12 25 25 2 1.00 0.75 28.02576
Golden Crisp FALSE P cold 100 2 0 45 0.0 11.0 15 40 25 1 1.00 0.88 35.25244
Golden Grahams FALSE G cold 110 1 1 280 0.0 15.0 9 45 25 2 1.00 0.75 23.80404
Grape Nuts Flakes FALSE P cold 100 3 1 140 3.0 15.0 5 85 25 3 1.00 0.88 52.07690
Grape-Nuts FALSE P cold 110 3 0 170 3.0 17.0 3 90 25 3 1.00 0.25 53.37101
Great Grains Pecan FALSE P cold 120 3 3 75 3.0 13.0 4 100 25 3 1.00 0.33 45.81172
Honey Graham Ohs FALSE Q cold 120 1 2 220 1.0 12.0 11 45 25 2 1.00 1.00 21.87129
Honey Nut Cheerios FALSE G cold 110 3 1 250 1.5 11.5 10 90 25 1 1.00 0.75 31.07222
Honey-comb FALSE P cold 110 1 0 180 0.0 14.0 11 35 25 1 1.00 1.33 28.74241
Just Right Crunchy Nuggets FALSE K cold 110 2 1 170 1.0 17.0 6 60 100 3 1.00 1.00 36.52368
Just Right Fruit & Nut FALSE K cold 140 3 1 170 2.0 20.0 9 95 100 3 1.30 0.75 36.47151
Kix FALSE G cold 110 2 1 260 0.0 21.0 3 40 25 2 1.00 1.50 39.24111
Life FALSE Q cold 100 4 2 150 2.0 12.0 6 95 25 2 1.00 0.67 45.32807
Lucky Charms FALSE G cold 110 2 1 180 0.0 12.0 12 55 25 2 1.00 1.00 26.73451
Maypo FALSE A hot 100 4 1 0 0.0 16.0 3 95 25 2 1.00 1.00 54.85092
Muesli Raisins; Dates; & Almonds FALSE R cold 150 4 3 95 3.0 16.0 11 170 25 3 1.00 1.00 37.13686
Muesli Raisins; Peaches; & Pecans FALSE R cold 150 4 3 150 3.0 16.0 11 170 25 3 1.00 1.00 34.13976
Mueslix Crispy Blend FALSE K cold 160 3 2 150 3.0 17.0 13 160 25 3 1.50 0.67 30.31335
Multi-Grain Cheerios FALSE G cold 100 2 1 220 2.0 15.0 6 90 25 1 1.00 1.00 40.10596
Nut&Honey Crunch FALSE K cold 120 2 1 190 0.0 15.0 9 40 25 2 1.00 0.67 29.92429
Nutri-Grain Almond-Raisin FALSE K cold 140 3 2 220 3.0 21.0 7 130 25 3 1.33 0.67 40.69232
Nutri-grain Wheat FALSE K cold 90 3 0 170 3.0 18.0 2 90 25 3 1.00 1.00 59.64284
Oatmeal Raisin Crisp FALSE G cold 130 3 2 170 1.5 13.5 10 120 25 3 1.25 0.50 30.45084
Post Nat. Raisin Bran TRUE P cold 120 3 1 200 6.0 11.0 14 260 25 3 1.33 0.67 37.84059
Product 19 FALSE K cold 100 3 0 320 1.0 20.0 3 45 100 3 1.00 1.00 41.50354
Puffed Rice FALSE Q cold 50 1 0 0 0.0 13.0 0 15 0 3 0.50 1.00 60.75611
Puffed Wheat FALSE Q cold 50 2 0 0 1.0 10.0 0 50 0 3 0.50 1.00 63.00565
Quaker Oat Squares FALSE Q cold 100 4 1 135 2.0 14.0 6 110 25 3 1.00 0.50 49.51187
Quaker Oatmeal FALSE Q hot 100 5 2 0 2.7 -1.0 -1 110 0 1 1.00 0.67 50.82839
Raisin Bran TRUE K cold 120 3 1 210 5.0 14.0 12 240 25 2 1.33 0.75 39.25920
Raisin Nut Bran TRUE G cold 100 3 2 140 2.5 10.5 8 140 25 3 1.00 0.50 39.70340
Raisin Squares FALSE K cold 90 2 0 0 2.0 15.0 6 110 25 3 1.00 0.50 55.33314
Rice Chex FALSE R cold 110 1 0 240 0.0 23.0 2 30 25 1 1.00 1.13 41.99893
Rice Krispies FALSE K cold 110 2 0 290 0.0 22.0 3 35 25 1 1.00 1.00 40.56016
Shredded Wheat FALSE N cold 80 2 0 0 3.0 16.0 0 95 0 1 0.83 1.00 68.23588
Shredded Wheat 'n'Bran TRUE N cold 90 3 0 0 4.0 19.0 0 140 0 1 1.00 0.67 74.47295
Shredded Wheat spoon size FALSE N cold 90 3 0 0 3.0 20.0 0 120 0 1 1.00 0.67 72.80179
Smacks FALSE K cold 110 2 1 70 1.0 9.0 15 40 25 2 1.00 0.75 31.23005
Special K FALSE K cold 110 6 0 230 1.0 16.0 3 55 25 1 1.00 1.00 53.13132
Strawberry Fruit Wheats FALSE N cold 90 2 0 15 3.0 15.0 5 90 25 2 1.00 1.00 59.36399
Total Corn Flakes FALSE G cold 110 2 1 200 0.0 21.0 3 35 100 3 1.00 1.00 38.83975
Total Raisin Bran TRUE G cold 140 3 1 190 4.0 15.0 14 230 100 3 1.50 1.00 28.59278
Total Whole Grain FALSE G cold 100 3 1 200 3.0 16.0 3 110 100 3 1.00 1.00 46.65884
Triples FALSE G cold 110 2 1 250 0.0 21.0 3 60 25 3 1.00 0.75 39.10617
Trix FALSE G cold 110 1 1 140 0.0 13.0 12 25 25 2 1.00 1.00 27.75330
Wheat Chex FALSE R cold 100 3 1 230 3.0 17.0 3 115 25 1 1.00 0.67 49.78744
Wheaties FALSE G cold 100 3 1 200 3.0 17.0 3 110 25 1 1.00 1.00 51.59219
Wheaties Honey Gold FALSE G cold 110 2 1 200 1.0 16.0 8 60 25 1 1.00 0.75 36.18756

Tips for String Success

The real power of these str_xxx functions comes when you specify the pattern using regular expressions!

The image is a comic strip from xkcd titled 'Regular Expressions.' It humorously portrays a programmer's overconfidence in using regular expressions to solve complex text processing tasks. In the first panel, a stick figure declares, 'EVERYBODY STAND BACK,' and in the second panel, they assert, 'I KNOW REGULAR EXPRESSIONS,' suggesting that their expertise is both a warning and a badge of honor. This reflects the sentiment that while regular expressions are powerful tools in programming, they can also lead to intricate and hard-to-maintain code if not used judiciously.

regex

Regular Expressions

“Regexps are a very terse language that allow you to describe patterns in strings.”

R for Data Science

Use str_xxx functions + regular expressions!

str_detect(string  = my_string_vector,
           pattern = "p[ei]ck[a-z]")

Tip

You might encounter gsub(), grep(), etc. from Base R, but I would highly recommending using functions from the stringr package instead.

Regular Expressions

…are tricky!

  • There are lots of new symbols to keep straight.
  • There are a lot of cases to think through.


This web app for testing R regular expressions might be handy!

Special Characters

There is a set of characters that have a specific meaning when using regex.

  • The stringr package does not read these as normal characters.
  • These characters are:

. ^ $ \ | * + ? { } [ ] ( )

Wild Card Character: .

This character can match any character.

x <- c("She", 
       "sells", 
       "seashells", 
       "by", 
       "the", 
       "seashore!")

str_subset(x, pattern = ".ells")
[1] "sells"     "seashells"


This matches strings that contain any character followed by “ells”.

Anchor Characters: ^ $

^ – looks at the beginning of a string.

x <- c("She", 
       "sells", 
       "seashells", 
       "by", 
       "the", 
       "seashore!")

str_subset(x, pattern = "^s")
[1] "sells"     "seashells" "seashore!"

This matches strings that start with “s”.

$ – looks at the end of a string.

x <- c("She", 
       "sells", 
       "seashells", 
       "by", 
       "the", 
       "seashore!")

str_subset(x, pattern = "s$")
[1] "sells"     "seashells"

This matches strings that end with “s”.

Quantifier Characters: ? + *

? – matches when the preceding character occurs 0 or 1 times in a row.

x <- c("shes", 
       "shels", 
       "shells", 
       "shellls", 
       "shelllls")

str_subset(x, pattern = "shel?s")
[1] "shes"  "shels"

+ – occurs 1 or more times in a row.

str_subset(x, pattern = "shel+s")
[1] "shels"    "shells"   "shellls"  "shelllls"

* – occurs 0 or more times in a row.

str_subset(x, pattern = "shel*s")
[1] "shes"     "shels"    "shells"   "shellls"  "shelllls"

Quantifier Characters: {}

{n} – matches when the preceding character occurs exactly n times in a row.

x <- c("shes", 
       "shels", 
       "shells", 
       "shellls", 
       "shelllls")

str_subset(x, pattern = "shel{2}s")
[1] "shells"

{n,} – occurs at least n times in a row.

str_subset(x, pattern = "shel{2,}s")
[1] "shells"   "shellls"  "shelllls"

{n,m} – occurs between n and m times in a row.

str_subset(x, pattern = "shel{1,3}s")
[1] "shels"   "shells"  "shellls"

Character Groups: ()

Groups are created with ( ).

  • We can specify “either” / “or” within a group using |.
x <- c("Peter", 
       "Piper", 
       "picked", 
       "a", 
       "peck",
       "of", 
       "pickled",
       "peppers!")

str_subset(x, pattern = "p(e|i)ck")
[1] "picked"  "peck"    "pickled"


This matches strings that contain either “peck” or “pick”.

Character Classes: []

Character classes let you specify multiple possible characters to match on.

x <- c("Peter", 
       "Piper", 
       "picked", 
       "a",
       "peck",
       "of",
       "pickled",
       "peppers!")

str_subset(x, pattern = "p[ei]ck")
[1] "picked"  "peck"    "pickled"

Why use [] instead of ()?

() is better for making groups of characters, plus you can only use a | with ().

[] is better for referencing multiple characters, plus you can only use a ^ with []

Matches you don’t want

[^ ] – specifies characters not to match on (think except)

str_subset(x, pattern = "p[^i]ck")
[1] "peck"


str_subset(x, pattern = "^p")
[1] "picked"   "peck"     "pickled"  "peppers!"


str_subset(x, pattern = "^[^p]")
[1] "Peter" "Piper" "a"     "of"   

Character Classes: []

[ - ] – specifies a range of characters.

x <- c("Peter", 
       "Piper", 
       "picked", 
       "a",
       "peck",
       "of",
       "pickled",
       "peppers!")

str_subset(x, pattern = "p[ei]ck[a-z]")
[1] "picked"  "pickled"
  • [A-Z] matches any capital letter.
  • [a-z] matches any lowercase letter.
  • [A-z] or [:alpha:] matches any letter
  • [0-9] or [:digit:] matches any number

Shortcuts

  • \\w – matches any “word” (\\W matches not “word”)

    • A “word” contains any letters and numbers.
  • \\d – matches any digit (\\D matches not digit)

  • \\s – matches any whitespace (\\S matches not whitespace)

    • Whitespace includes spaces, tabs, newlines, etc.


x <- "phone number: 1234567899"

str_extract(x, pattern = "\\d+")
[1] "1234567899"
str_extract_all(x, pattern = "\\S+")
[[1]]
[1] "phone"      "number:"    "1234567899"

Try it out!

What regular expressions would match words that…

  • end with a vowel?
  • start with x, y, or z?
  • do not contain x, y, or z?
  • contain British spelling?
x <- c("zebra", 
       "xray", 
       "apple", 
       "yellow",
       "color", 
       "colour",
       "summarize",
       "summarise")

Some Possible Solutions…

  • end with a vowel?
str_subset(x, "[aeiouy]$")
  • start with x, y, or z?
str_subset(x, "^[xyz]")
  • do not contain x, y, or z?
str_subset(x, "[^xyz]")
str_subset(x, "(our)|(i[sz]e)")

Escape: \\

To match a special character, you need to escape it.

x <- c("How",
       "much", 
       "wood",
       "could",
       "a",
       "woodchuck",
       "chuck",
       "if",
       "a",
       "woodchuck",
       "could",
       "chuck",
       "wood?")

str_subset(x, pattern = "?")
Error in stri_subset_regex(string, pattern, omit_na = TRUE, negate = negate, : Syntax error in regex pattern. (U_REGEX_RULE_SYNTAX, context=`?`)

Escape: \\

Use \\ to escape the ? – it is now read as a normal character.

str_subset(x, pattern = "\\?")
[1] "wood?"


Note

Alternatively, you could use []:

str_subset(x, pattern = "[?]")
[1] "wood?"

When in Doubt


Use the web app to test R regular expressions.

Tips for working with regex

  • Read the regular expressions out loud like a request.
  • Test out your expressions on small examples first.

str_view()

str_view(c("shes", "shels", "shells", "shellls", "shelllls"), "l+")
[2] │ she<l>s
[3] │ she<ll>s
[4] │ she<lll>s
[5] │ she<llll>s
  • Be kind to yourself!

PA 5.1: Scrambled Message

In this activity, you will use functions from the stringr package and regex to decode a message.

A pile of tiles from the game of Scrabble.

This activity will require knowledge of:

  • indexing vectors
  • stringr functions for previewing string contents
  • regular expressions for locating patterns
  • stringr functions for removing whitespace
  • stringr functions for truncating strings
  • stringr functions for replacing patterns
  • stringr functions for combining multiple strings

None of us have all these abilities. Each of us has some of these abilities.

A Refresher on Indexing Vectors with []

x <- c("She",
       "sells",
       "seashells",
       "by", 
       "the",
       "seashore!")
  • Grab elements out of a vector with indices.
x[c(1, 4, 5)]
[1] "She" "by"  "the"
  • Grab elements out of a vector with logicals.
x[c(TRUE, FALSE, FALSE, TRUE, TRUE, FALSE)]
[1] "She" "by"  "the"

Translating into stringr

x <- c("She",
       "sells",
       "seashells",
       "by", 
       "the",
       "seashore!")

Detect what strings have a certain pattern:

x[
  str_detect(x, 
             pattern = "ll")
  ]
[1] "sells"     "seashells"

Replace that pattern with a different pattern:

x <- str_replace_all(x, 
                     pattern = "ll", 
                     replacement = "zz")
x
[1] "She"       "sezzs"     "seashezzs" "by"        "the"       "seashore!"

stringr Resources

Every group should have a stringr cheatsheet!

On the Front:

  • Detecting matches (e.g., Does a string have a specific pattern?)
  • Subsetting strings (e.g., Extract strings with specific patterns!)
  • Managing lengths (e.g., How long are the strings? Removing whitespace!)
  • Mutating strings (e.g., Replace specific patterns!)
  • Join & Flatten (e.g., Collapsing multiple strings into a single string!)

Task Card

Every group should have a task card!

On the Front

  • the expectations of each role
  • the norms of collaborating

On the Back

  • stringr functions for different tasks you may encounter
  • Regular expressions for different tasks you may encounter
    • Matching patterns (e.g., [:punct:], \\w)
    • Special characters (e.g., ^, $)
    • Creating groups of characters (e.g., [Kk])
    • Repeated patters (e.g., ?, +, {2})

Pair Programming Expectations

Developer

  • Reads prompt and ensures Coder understands what is being asked.
  • Types the code specified by the Coder into the Quarto document.
  • Runs the code provided by the Coder.
  • Works with Coder to debug the code.
  • Evaluates the output.
  • Works with Coder to write code comments.

Coder

  • Reads out instructions or prompts
  • Directs the Developer what to type.
  • Talks with Developer about their ideas.
  • Manages resources (e.g., cheatsheets, textbook, slides).
  • Works with Developer to debug the code.
  • Works with Developer to write code comments.

Getting Started

First, both of you will do the following:

  • Join your Practice Activity workspace in Posit Cloud
  • Log-in to Posit Cloud
  • Open the PA 5: Decode Secret Message project
  • Open the PA-5-stringr.qmd file

Then, the partner who has the most pets starts as the Developer (typing and listening to instructions from the Coder)!

  • The Coder does not type.
    • The collaborative editing feature should allow you to track what is being typed.
  • The Developer only types what they are told to type.

External Resources

During the Practice Activity, you are not permitted to use Google, ChatGPT, or websites for regular expressions for help. . . .


You are permitted to use:

  • the stringr cheatsheet,
  • the task card,
  • the course textbook, and
  • the course slides.

Submission

Submit the name of the movie the quote is from.

  • Each person will input the full name of the movie the scrambled message is from into the PA5 quiz.
  • The person who last occupied the role of Developer will download and submit the PA-5.html file for the group.
    • Only one submission per group!

Exit Ticket

To do…

  • PA 5.1: Scrambled Message
    • Due Thursday, October 24 at 12:10pm
  • Check-in 5.2: Functions from lubridate
    • Due Thursday, October 24 at 12:10pm