Using `stringr` to Work with Strings

Monday, April 29

Today we will…

New layout this week
What you can expect in Week 6
New material
- String variables
- Functions for working with strings
- Regular expressions
PA 5.1: Scrambled Message

Week 5 Layout

Today: Strings with stringr
- Practice Activity: Decoding a Message

Thursday: Dates with lubridate
- Practice Activity: Jewel Heist

Lab Assignment Solving a Murder Mystery
- Using dplyr + stringr + ludridate

Week 6 Layout

Tuesday: Writing Basic Functions
- Practice Activity

Thursday: Midterm Portfolio Work Session
- Midterm Portfolios Due Sunday, November 3

String Variables

What is a string?

A string is a bunch of characters.

There is a difference between…

…a string (many characters, one object)…

and

…a character vector (vector of strings).

my_string <- "Hi, my name is Bond!"
my_string

[1] "Hi, my name is Bond!"

my_vector <- c("Hi", "my", "name", "is", "Bond")
my_vector

[1] "Hi"   "my"   "name" "is"   "Bond"

`stringr`

Common tasks

Identify strings containing a particular pattern.
Remove or replace a pattern.
Edit a string (e.g., make it lowercase).

Note

The stringr package loads with tidyverse.
All functions are of the form str_xxx().

`pattern =`

The pattern argument appears in many stringr functions.

The pattern must be supplied inside quotes.

my_vector <- c("Hello,", 
               "my name is", 
               "Bond", 
               "James Bond")

str_detect(my_vector, pattern = "Bond")
str_locate(my_vector, pattern = "James Bond")
str_match(my_vector, pattern = "[bB]ond")
str_extract(my_vector, pattern = "[jJ]ames [bB]ond")

Let’s explore these functions!

`str_detect()`

Returns a logical vector indicating whether the pattern was found in each element of the supplied vector.

my_vector <- c("Hello,", 
               "my name is", 
               "Bond", 
               "James Bond")
str_detect(my_vector, pattern = "Bond")

[1] FALSE FALSE  TRUE  TRUE

Pairs well with filter().
Works with summarise() + sum (to get total matches) or mean (to get proportion of matches).

Related Function

str_which() returns the indexes of the strings that contain a match.

`str_match()`

Returns a character matrix containing either NA or the pattern, depending on if the pattern was found.

my_vector <- c("Hello,", 
               "my name is", 
               "Bond", 
               "James Bond")

str_match(my_vector, pattern = "Bond")

     [,1]  
[1,] NA    
[2,] NA    
[3,] "Bond"
[4,] "Bond"

`str_extract()`

Returns a character vector with either NA or the pattern, depending on if the pattern was found.

my_vector <- c("Hello,", 
               "my name is", 
               "Bond", 
               "James Bond")

str_extract(my_vector, pattern = "Bond")

[1] NA     NA     "Bond" "Bond"

Warning

str_extract() only returns the first pattern match.

Use str_extract_all() to return every pattern match.

What do you mean by the first match?

Suppose we had a slightly different vector…

alt_vector <- c("Hello,", 
               "my name is", 
               "Bond, James Bond")

If we were to extract every instance of "Bond" from the vector…

str_extract(alt_vector, 
            pattern = "Bond")

[1] NA     NA     "Bond"

str_extract_all(alt_vector, 
                pattern = "Bond")

[[1]]
character(0)

[[2]]
character(0)

[[3]]
[1] "Bond" "Bond"

`str_locate()`

Returns a dateframe with two numeric variables – the starting and ending location of the pattern. The values are NA if the pattern is not found.

my_vector <- c("Hello,", 
               "my name is", 
               "Bond", 
               "James Bond")

str_locate(my_vector, pattern = "Bond")

     start end
[1,]    NA  NA
[2,]    NA  NA
[3,]     1   4
[4,]     7  10

Related Function

str_sub() extracts values based on a starting and ending location.

`str_subset()`

Returns a character vector containing a subset of the original character vector consisting of the elements where the pattern was found.

my_vector <- c("Hello,", 
               "my name is", 
               "Bond", 
               "James Bond")

str_subset(my_vector, pattern = "Bond")

[1] "Bond"       "James Bond"

Try it out!

my_vector <- c("I scream,", 
               "you scream", 
               "we all",
               "scream",
               "for",
               "ice cream")

str_detect(my_vector, pattern = "cream")
str_locate(my_vector, pattern = "cream")
str_match(my_vector, pattern = "cream")
str_extract(my_vector, pattern = "cream")
str_subset(my_vector, pattern = "cream")

Note

For each of these functions, write down:

the object structure of the output.
the data type of the output.
a brief explanation of what they do.

Replace the first matched pattern in each string.

Pairs well with mutate().

str_replace(my_vector, 
            pattern = "Bond", 
            replace = "Franco")

[1] "Hello,"       "my name is"   "Franco"       "James Franco"

Related Function

str_replace_all() replaces all matched patterns in each string.

Remove the first matched pattern in each string.

str_remove(my_vector, 
           pattern = "Bond")

[1] "Hello,"     "my name is" ""           "James "

Related Functions

This is a special case of str_replace(x, pattern, replacement = "").

str_remove_all() removes all matched patterns in each string.

Edit Strings

Convert letters in a string to a specific capitalization format.

lower
UPPER
Title

str_to_lower() converts all letters in a string to lowercase.

str_to_lower(my_vector)

[1] "hello,"     "my name is" "bond"       "james bond"

str_to_upper() converts all letters in a string to uppercase.

str_to_upper(my_vector)

[1] "HELLO,"     "MY NAME IS" "BOND"       "JAMES BOND"

str_to_title() converts the first letter of each word to uppercase.

str_to_title(my_vector)

[1] "Hello,"     "My Name Is" "Bond"       "James Bond"

This is handy for axis labels!

Combine Strings

str_c()
str_flatten()
str_glue()

Join multiple strings into a single character vector.

prompt <- "Hello, my name is"
first  <- "James"
last   <- "Bond"
str_c(prompt, last, ",", first, last, sep = " ")

[1] "Hello, my name is Bond , James Bond"

Note

Similar to paste() and paste0().

Combine a vector of strings into a single string.

my_vector <- c("Hello,", 
               "my name is", 
               "Bond", 
               "James Bond")

str_flatten(my_vector, collapse = " ")

[1] "Hello, my name is Bond James Bond"

Use variables in the environment to create a string based on {expressions}.

first <- "James"
last <- "Bond"
str_glue("My name is {last}, {first} {last}")

My name is Bond, James Bond

Tip

For more details, I would recommend looking up the glue R package!

Tips for String Success

Refer to the stringr cheatsheet
Remember that str_xxx functions need the first argument to be a vector of strings, not a dataset!
- You will use these functions inside dplyr verbs like filter() or mutate().

cereal |> 
  mutate(is_bran = str_detect(name, "Bran"), 
         .after = name)

name	is_bran	manuf	type	calories	protein	fat	sodium	fiber	carbo	sugars	potass	vitamins	shelf	weight	cups	rating
100% Bran	TRUE	N	cold	70	4	1	130	10.0	5.0	6	280	25	3	1.00	0.33	68.40297
100% Natural Bran	TRUE	Q	cold	120	3	5	15	2.0	8.0	8	135	0	3	1.00	1.00	33.98368
All-Bran	TRUE	K	cold	70	4	1	260	9.0	7.0	5	320	25	3	1.00	0.33	59.42551
All-Bran with Extra Fiber	TRUE	K	cold	50	4	0	140	14.0	8.0	0	330	25	3	1.00	0.50	93.70491
Almond Delight	FALSE	R	cold	110	2	2	200	1.0	14.0	8	-1	25	3	1.00	0.75	34.38484
Apple Cinnamon Cheerios	FALSE	G	cold	110	2	2	180	1.5	10.5	10	70	25	1	1.00	0.75	29.50954
Apple Jacks	FALSE	K	cold	110	2	0	125	1.0	11.0	14	30	25	2	1.00	1.00	33.17409
Basic 4	FALSE	G	cold	130	3	2	210	2.0	18.0	8	100	25	3	1.33	0.75	37.03856
Bran Chex	TRUE	R	cold	90	2	1	200	4.0	15.0	6	125	25	1	1.00	0.67	49.12025
Bran Flakes	TRUE	P	cold	90	3	0	210	5.0	13.0	5	190	25	3	1.00	0.67	53.31381
Cap'n'Crunch	FALSE	Q	cold	120	1	2	220	0.0	12.0	12	35	25	2	1.00	0.75	18.04285
Cheerios	FALSE	G	cold	110	6	2	290	2.0	17.0	1	105	25	1	1.00	1.25	50.76500
Cinnamon Toast Crunch	FALSE	G	cold	120	1	3	210	0.0	13.0	9	45	25	2	1.00	0.75	19.82357
Clusters	FALSE	G	cold	110	3	2	140	2.0	13.0	7	105	25	3	1.00	0.50	40.40021
Cocoa Puffs	FALSE	G	cold	110	1	1	180	0.0	12.0	13	55	25	2	1.00	1.00	22.73645
Corn Chex	FALSE	R	cold	110	2	0	280	0.0	22.0	3	25	25	1	1.00	1.00	41.44502
Corn Flakes	FALSE	K	cold	100	2	0	290	1.0	21.0	2	35	25	1	1.00	1.00	45.86332
Corn Pops	FALSE	K	cold	110	1	0	90	1.0	13.0	12	20	25	2	1.00	1.00	35.78279
Count Chocula	FALSE	G	cold	110	1	1	180	0.0	12.0	13	65	25	2	1.00	1.00	22.39651
Cracklin' Oat Bran	TRUE	K	cold	110	3	3	140	4.0	10.0	7	160	25	3	1.00	0.50	40.44877
Cream of Wheat (Quick)	FALSE	N	hot	100	3	0	80	1.0	21.0	0	-1	0	2	1.00	1.00	64.53382
Crispix	FALSE	K	cold	110	2	0	220	1.0	21.0	3	30	25	3	1.00	1.00	46.89564
Crispy Wheat & Raisins	FALSE	G	cold	100	2	1	140	2.0	11.0	10	120	25	3	1.00	0.75	36.17620
Double Chex	FALSE	R	cold	100	2	0	190	1.0	18.0	5	80	25	3	1.00	0.75	44.33086
Froot Loops	FALSE	K	cold	110	2	1	125	1.0	11.0	13	30	25	2	1.00	1.00	32.20758
Frosted Flakes	FALSE	K	cold	110	1	0	200	1.0	14.0	11	25	25	1	1.00	0.75	31.43597
Frosted Mini-Wheats	FALSE	K	cold	100	3	0	0	3.0	14.0	7	100	25	2	1.00	0.80	58.34514
Fruit & Fibre Dates; Walnuts; and Oats	FALSE	P	cold	120	3	2	160	5.0	12.0	10	200	25	3	1.25	0.67	40.91705
Fruitful Bran	TRUE	K	cold	120	3	0	240	5.0	14.0	12	190	25	3	1.33	0.67	41.01549
Fruity Pebbles	FALSE	P	cold	110	1	1	135	0.0	13.0	12	25	25	2	1.00	0.75	28.02576
Golden Crisp	FALSE	P	cold	100	2	0	45	0.0	11.0	15	40	25	1	1.00	0.88	35.25244
Golden Grahams	FALSE	G	cold	110	1	1	280	0.0	15.0	9	45	25	2	1.00	0.75	23.80404
Grape Nuts Flakes	FALSE	P	cold	100	3	1	140	3.0	15.0	5	85	25	3	1.00	0.88	52.07690
Grape-Nuts	FALSE	P	cold	110	3	0	170	3.0	17.0	3	90	25	3	1.00	0.25	53.37101
Great Grains Pecan	FALSE	P	cold	120	3	3	75	3.0	13.0	4	100	25	3	1.00	0.33	45.81172
Honey Graham Ohs	FALSE	Q	cold	120	1	2	220	1.0	12.0	11	45	25	2	1.00	1.00	21.87129
Honey Nut Cheerios	FALSE	G	cold	110	3	1	250	1.5	11.5	10	90	25	1	1.00	0.75	31.07222
Honey-comb	FALSE	P	cold	110	1	0	180	0.0	14.0	11	35	25	1	1.00	1.33	28.74241
Just Right Crunchy Nuggets	FALSE	K	cold	110	2	1	170	1.0	17.0	6	60	100	3	1.00	1.00	36.52368
Just Right Fruit & Nut	FALSE	K	cold	140	3	1	170	2.0	20.0	9	95	100	3	1.30	0.75	36.47151
Kix	FALSE	G	cold	110	2	1	260	0.0	21.0	3	40	25	2	1.00	1.50	39.24111
Life	FALSE	Q	cold	100	4	2	150	2.0	12.0	6	95	25	2	1.00	0.67	45.32807
Lucky Charms	FALSE	G	cold	110	2	1	180	0.0	12.0	12	55	25	2	1.00	1.00	26.73451
Maypo	FALSE	A	hot	100	4	1	0	0.0	16.0	3	95	25	2	1.00	1.00	54.85092
Muesli Raisins; Dates; & Almonds	FALSE	R	cold	150	4	3	95	3.0	16.0	11	170	25	3	1.00	1.00	37.13686
Muesli Raisins; Peaches; & Pecans	FALSE	R	cold	150	4	3	150	3.0	16.0	11	170	25	3	1.00	1.00	34.13976
Mueslix Crispy Blend	FALSE	K	cold	160	3	2	150	3.0	17.0	13	160	25	3	1.50	0.67	30.31335
Multi-Grain Cheerios	FALSE	G	cold	100	2	1	220	2.0	15.0	6	90	25	1	1.00	1.00	40.10596
Nut&Honey Crunch	FALSE	K	cold	120	2	1	190	0.0	15.0	9	40	25	2	1.00	0.67	29.92429
Nutri-Grain Almond-Raisin	FALSE	K	cold	140	3	2	220	3.0	21.0	7	130	25	3	1.33	0.67	40.69232
Nutri-grain Wheat	FALSE	K	cold	90	3	0	170	3.0	18.0	2	90	25	3	1.00	1.00	59.64284
Oatmeal Raisin Crisp	FALSE	G	cold	130	3	2	170	1.5	13.5	10	120	25	3	1.25	0.50	30.45084
Post Nat. Raisin Bran	TRUE	P	cold	120	3	1	200	6.0	11.0	14	260	25	3	1.33	0.67	37.84059
Product 19	FALSE	K	cold	100	3	0	320	1.0	20.0	3	45	100	3	1.00	1.00	41.50354
Puffed Rice	FALSE	Q	cold	50	1	0	0	0.0	13.0	0	15	0	3	0.50	1.00	60.75611
Puffed Wheat	FALSE	Q	cold	50	2	0	0	1.0	10.0	0	50	0	3	0.50	1.00	63.00565
Quaker Oat Squares	FALSE	Q	cold	100	4	1	135	2.0	14.0	6	110	25	3	1.00	0.50	49.51187
Quaker Oatmeal	FALSE	Q	hot	100	5	2	0	2.7	-1.0	-1	110	0	1	1.00	0.67	50.82839
Raisin Bran	TRUE	K	cold	120	3	1	210	5.0	14.0	12	240	25	2	1.33	0.75	39.25920
Raisin Nut Bran	TRUE	G	cold	100	3	2	140	2.5	10.5	8	140	25	3	1.00	0.50	39.70340
Raisin Squares	FALSE	K	cold	90	2	0	0	2.0	15.0	6	110	25	3	1.00	0.50	55.33314
Rice Chex	FALSE	R	cold	110	1	0	240	0.0	23.0	2	30	25	1	1.00	1.13	41.99893
Rice Krispies	FALSE	K	cold	110	2	0	290	0.0	22.0	3	35	25	1	1.00	1.00	40.56016
Shredded Wheat	FALSE	N	cold	80	2	0	0	3.0	16.0	0	95	0	1	0.83	1.00	68.23588
Shredded Wheat 'n'Bran	TRUE	N	cold	90	3	0	0	4.0	19.0	0	140	0	1	1.00	0.67	74.47295
Shredded Wheat spoon size	FALSE	N	cold	90	3	0	0	3.0	20.0	0	120	0	1	1.00	0.67	72.80179
Smacks	FALSE	K	cold	110	2	1	70	1.0	9.0	15	40	25	2	1.00	0.75	31.23005
Special K	FALSE	K	cold	110	6	0	230	1.0	16.0	3	55	25	1	1.00	1.00	53.13132
Strawberry Fruit Wheats	FALSE	N	cold	90	2	0	15	3.0	15.0	5	90	25	2	1.00	1.00	59.36399
Total Corn Flakes	FALSE	G	cold	110	2	1	200	0.0	21.0	3	35	100	3	1.00	1.00	38.83975
Total Raisin Bran	TRUE	G	cold	140	3	1	190	4.0	15.0	14	230	100	3	1.50	1.00	28.59278
Total Whole Grain	FALSE	G	cold	100	3	1	200	3.0	16.0	3	110	100	3	1.00	1.00	46.65884
Triples	FALSE	G	cold	110	2	1	250	0.0	21.0	3	60	25	3	1.00	0.75	39.10617
Trix	FALSE	G	cold	110	1	1	140	0.0	13.0	12	25	25	2	1.00	1.00	27.75330
Wheat Chex	FALSE	R	cold	100	3	1	230	3.0	17.0	3	115	25	1	1.00	0.67	49.78744
Wheaties	FALSE	G	cold	100	3	1	200	3.0	17.0	3	110	25	1	1.00	1.00	51.59219
Wheaties Honey Gold	FALSE	G	cold	110	2	1	200	1.0	16.0	8	60	25	1	1.00	0.75	36.18756

Tips for String Success

The real power of these str_xxx functions comes when you specify the pattern using regular expressions!

regex

Regular Expressions

“Regexps are a very terse language that allow you to describe patterns in strings.”

R for Data Science

Use str_xxx functions + regular expressions!

str_detect(string  = my_string_vector,
           pattern = "p[ei]ck[a-z]")

Tip

You might encounter gsub(), grep(), etc. from Base R, but I would highly recommending using functions from the stringr package instead.

Regular Expressions

…are tricky!

There are lots of new symbols to keep straight.
There are a lot of cases to think through.

This web app for testing R regular expressions might be handy!

Special Characters

There is a set of characters that have a specific meaning when using regex.

The stringr package does not read these as normal characters.
These characters are:

. ^ $ \ | * + ? { } [ ] ( )

Wild Card Character: `.`

This character can match any character.

x <- c("She", 
       "sells", 
       "seashells", 
       "by", 
       "the", 
       "seashore!")

str_subset(x, pattern = ".ells")

[1] "sells"     "seashells"

This matches strings that contain any character followed by “ells”.

Anchor Characters: `^ $`

^ – looks at the beginning of a string.

x <- c("She", 
       "sells", 
       "seashells", 
       "by", 
       "the", 
       "seashore!")

str_subset(x, pattern = "^s")

[1] "sells"     "seashells" "seashore!"

This matches strings that start with “s”.

$ – looks at the end of a string.

x <- c("She", 
       "sells", 
       "seashells", 
       "by", 
       "the", 
       "seashore!")

str_subset(x, pattern = "s$")

[1] "sells"     "seashells"

This matches strings that end with “s”.

Quantifier Characters: `? + *`

? – matches when the preceding character occurs 0 or 1 times in a row.

x <- c("shes", 
       "shels", 
       "shells", 
       "shellls", 
       "shelllls")

str_subset(x, pattern = "shel?s")

[1] "shes"  "shels"

+ – occurs 1 or more times in a row.

str_subset(x, pattern = "shel+s")

[1] "shels"    "shells"   "shellls"  "shelllls"

* – occurs 0 or more times in a row.

str_subset(x, pattern = "shel*s")

[1] "shes"     "shels"    "shells"   "shellls"  "shelllls"

Quantifier Characters: `{}`

{n} – matches when the preceding character occurs exactly n times in a row.

x <- c("shes", 
       "shels", 
       "shells", 
       "shellls", 
       "shelllls")

str_subset(x, pattern = "shel{2}s")

[1] "shells"

{n,} – occurs at least n times in a row.

str_subset(x, pattern = "shel{2,}s")

[1] "shells"   "shellls"  "shelllls"

{n,m} – occurs between n and m times in a row.

str_subset(x, pattern = "shel{1,3}s")

[1] "shels"   "shells"  "shellls"

Character Groups: `()`

Groups are created with ( ).

We can specify “either” / “or” within a group using |.

x <- c("Peter", 
       "Piper", 
       "picked", 
       "a", 
       "peck",
       "of", 
       "pickled",
       "peppers!")

str_subset(x, pattern = "p(e|i)ck")

[1] "picked"  "peck"    "pickled"

This matches strings that contain either “peck” or “pick”.

Character Classes: `[]`

Character classes let you specify multiple possible characters to match on.

x <- c("Peter", 
       "Piper", 
       "picked", 
       "a",
       "peck",
       "of",
       "pickled",
       "peppers!")

str_subset(x, pattern = "p[ei]ck")

[1] "picked"  "peck"    "pickled"

Why use [] instead of ()?

() is better for making groups of characters, plus you can only use a | with ().

[] is better for referencing multiple characters, plus you can only use a ^ with []…

Matches you don’t want

[^ ] – specifies characters not to match on (think except)

str_subset(x, pattern = "p[^i]ck")

[1] "peck"

str_subset(x, pattern = "^p")

[1] "picked"   "peck"     "pickled"  "peppers!"

str_subset(x, pattern = "^[^p]")

[1] "Peter" "Piper" "a"     "of"

Character Classes: `[]`

[ - ] – specifies a range of characters.

x <- c("Peter", 
       "Piper", 
       "picked", 
       "a",
       "peck",
       "of",
       "pickled",
       "peppers!")

str_subset(x, pattern = "p[ei]ck[a-z]")

[1] "picked"  "pickled"

[A-Z] matches any capital letter.
[a-z] matches any lowercase letter.
[A-z] or [:alpha:] matches any letter
[0-9] or [:digit:] matches any number

Shortcuts

\\w – matches any “word” (\\W matches not “word”)
- A “word” contains any letters and numbers.
\\d – matches any digit (\\D matches not digit)
\\s – matches any whitespace (\\S matches not whitespace)
- Whitespace includes spaces, tabs, newlines, etc.

x <- "phone number: 1234567899"

str_extract(x, pattern = "\\d+")

[1] "1234567899"

str_extract_all(x, pattern = "\\S+")

[[1]]
[1] "phone"      "number:"    "1234567899"

Try it out!

What regular expressions would match words that…

end with a vowel?
start with x, y, or z?
do not contain x, y, or z?
contain British spelling?

x <- c("zebra", 
       "xray", 
       "apple", 
       "yellow",
       "color", 
       "colour",
       "summarize",
       "summarise")

Some Possible Solutions…

end with a vowel?

str_subset(x, "[aeiouy]$")

start with x, y, or z?

str_subset(x, "^[xyz]")

do not contain x, y, or z?

str_subset(x, "[^xyz]")

contain British spelling?

str_subset(x, "(our)|(i[sz]e)")

Escape: `\\`

To match a special character, you need to escape it.

x <- c("How",
       "much", 
       "wood",
       "could",
       "a",
       "woodchuck",
       "chuck",
       "if",
       "a",
       "woodchuck",
       "could",
       "chuck",
       "wood?")

str_subset(x, pattern = "?")

Error in stri_subset_regex(string, pattern, omit_na = TRUE, negate = negate, : Syntax error in regex pattern. (U_REGEX_RULE_SYNTAX, context=`?`)

Escape: `\\`

Use \\ to escape the ? – it is now read as a normal character.

str_subset(x, pattern = "\\?")

[1] "wood?"

Note

Alternatively, you could use []:

str_subset(x, pattern = "[?]")

[1] "wood?"

When in Doubt

Use the web app to test R regular expressions.

Tips for working with regex

Read the regular expressions out loud like a request.

Test out your expressions on small examples first.

str_view()

str_view(c("shes", "shels", "shells", "shellls", "shelllls"), "l+")

[2] │ she<l>s
[3] │ she<ll>s
[4] │ she<lll>s
[5] │ she<llll>s

Use the stringr cheatsheet.

Be kind to yourself!

PA 5.1: Scrambled Message

In this activity, you will use functions from the stringr package and regex to decode a message.

A pile of tiles from the game of Scrabble.

This activity will require knowledge of:

indexing vectors
stringr functions for previewing string contents
regular expressions for locating patterns
stringr functions for removing whitespace
stringr functions for truncating strings
stringr functions for replacing patterns
stringr functions for combining multiple strings

None of us have all these abilities. Each of us has some of these abilities.

A Refresher on Indexing Vectors with `[]`

x <- c("She",
       "sells",
       "seashells",
       "by", 
       "the",
       "seashore!")

Grab elements out of a vector with indices.

x[c(1, 4, 5)]

[1] "She" "by"  "the"

Grab elements out of a vector with logicals.

x[c(TRUE, FALSE, FALSE, TRUE, TRUE, FALSE)]

[1] "She" "by"  "the"

Translating into `stringr`

x <- c("She",
       "sells",
       "seashells",
       "by", 
       "the",
       "seashore!")

Detect what strings have a certain pattern:

x[
  str_detect(x, 
             pattern = "ll")
  ]

[1] "sells"     "seashells"

Replace that pattern with a different pattern:

x <- str_replace_all(x, 
                     pattern = "ll", 
                     replacement = "zz")
x

[1] "She"       "sezzs"     "seashezzs" "by"        "the"       "seashore!"

stringr Resources

Every group should have a stringr cheatsheet!

On the Front:

Detecting matches (e.g., Does a string have a specific pattern?)
Subsetting strings (e.g., Extract strings with specific patterns!)
Managing lengths (e.g., How long are the strings? Removing whitespace!)
Mutating strings (e.g., Replace specific patterns!)
Join & Flatten (e.g., Collapsing multiple strings into a single string!)

Task Card

Every group should have a task card!

On the Front

the expectations of each role
the norms of collaborating

On the Back

stringr functions for different tasks you may encounter
Regular expressions for different tasks you may encounter
- Matching patterns (e.g., [:punct:], \\w)
- Special characters (e.g., ^, $)
- Creating groups of characters (e.g., [Kk])
- Repeated patters (e.g., ?, +, {2})

Pair Programming Expectations

Developer

Reads prompt and ensures Coder understands what is being asked.
Types the code specified by the Coder into the Quarto document.
Runs the code provided by the Coder.
Works with Coder to debug the code.
Evaluates the output.
Works with Coder to write code comments.

Coder

Reads out instructions or prompts
Directs the Developer what to type.
Talks with Developer about their ideas.
Manages resources (e.g., cheatsheets, textbook, slides).
Works with Developer to debug the code.
Works with Developer to write code comments.

Getting Started

First, both of you will do the following:

Join your Practice Activity workspace in Posit Cloud
Log-in to Posit Cloud
Open the PA 5: Decode Secret Message project
Open the PA-5-stringr.qmd file

Then, the partner who has the most pets starts as the Developer (typing and listening to instructions from the Coder)!

The Coder does not type.
- The collaborative editing feature should allow you to track what is being typed.
The Developer only types what they are told to type.

External Resources

During the Practice Activity, you are not permitted to use Google, ChatGPT, or websites for regular expressions for help. . . .

You are permitted to use:

the stringr cheatsheet,
the task card,
the course textbook, and
the course slides.

Submission

Submit the name of the movie the quote is from.

Each person will input the full name of the movie the scrambled message is from into the PA5 quiz.
The person who last occupied the role of Developer will download and submit the PA-5.html file for the group.
- Only one submission per group!

Exit Ticket

To do…

PA 5.1: Scrambled Message
- Due Thursday, October 24 at 12:10pm
Check-in 5.2: Functions from lubridate
- Due Thursday, October 24 at 12:10pm

Using stringr to Work with Strings

Monday, April 29

Week 5 Layout

Week 5 Layout

Week 6 Layout

Week 6 Layout

String Variables

What is a string?

stringr

pattern =

str_detect()

str_match()

str_extract()

What do you mean by the first match?

str_locate()

str_subset()

Try it out!

Replace / Remove Patterns

Edit Strings

Combine Strings

Tips for String Success

Tips for String Success

regex

Regular Expressions

Regular Expressions

Special Characters

Wild Card Character: .

Anchor Characters: ^ $

Quantifier Characters: ? + *

Quantifier Characters: {}

Character Groups: ()

Character Classes: []

Matches you don’t want

Character Classes: []

Shortcuts

Try it out!

Some Possible Solutions…

Escape: \\

Escape: \\

When in Doubt

Tips for working with regex

PA 5.1: Scrambled Message

This activity will require knowledge of:

A Refresher on Indexing Vectors with []

Translating into stringr

stringr Resources

Task Card

Pair Programming Expectations

Getting Started

External Resources

Submission

Exit Ticket

To do…

Using `stringr` to Work with Strings

`stringr`

`pattern =`

`str_detect()`

`str_match()`

`str_extract()`

`str_locate()`

`str_subset()`

Wild Card Character: `.`

Anchor Characters: `^ $`

Quantifier Characters: `? + *`

Quantifier Characters: `{}`

Character Groups: `()`

Character Classes: `[]`

Character Classes: `[]`

Escape: `\\`

Escape: `\\`

A Refresher on Indexing Vectors with `[]`

Translating into `stringr`