<- "Hi, my name is Bond!"
my_string my_string
[1] "Hi, my name is Bond!"
stringr
to Work with StringsToday we will…
stringr
lubridate
dplyr
+ stringr
+ ludridate
A string is a bunch of characters.
There is a difference between…
…a string (many characters, one object)…
and
…a character vector (vector of strings).
stringr
Common tasks
Note
stringr
package loads with tidyverse
.str_xxx()
.pattern =
The pattern
argument appears in many stringr
functions.
Let’s explore these functions!
str_detect()
Returns a logical vector indicating whether the pattern was found in each element of the supplied vector.
filter()
.summarise()
+ sum
(to get total matches) or mean
(to get proportion of matches).Related Function
str_which()
returns the indexes of the strings that contain a match.
str_match()
Returns a character matrix containing either NA
or the pattern, depending on if the pattern was found.
str_extract()
Returns a character vector with either NA
or the pattern, depending on if the pattern was found.
Warning
str_extract()
only returns the first pattern match.
Use str_extract_all()
to return every pattern match.
Suppose we had a slightly different vector…
str_locate()
Returns a dateframe with two numeric variables – the starting and ending location of the pattern. The values are NA
if the pattern is not found.
Related Function
str_sub()
extracts values based on a starting and ending location.
str_subset()
Returns a character vector containing a subset of the original character vector consisting of the elements where the pattern was found.
Note
For each of these functions, write down:
Replace the first matched pattern in each string.
mutate()
.Related Function
str_replace_all()
replaces all matched patterns in each string.
Convert letters in a string to a specific capitalization format.
str_to_lower()
converts all letters in a string to lowercase.
str_to_upper()
converts all letters in a string to uppercase.
Join multiple strings into a single character vector.
prompt <- "Hello, my name is"
first <- "James"
last <- "Bond"
str_c(prompt, last, ",", first, last, sep = " ")
[1] "Hello, my name is Bond , James Bond"
Note
Similar to paste()
and paste0()
.
Combine a vector of strings into a single string.
Use variables in the environment to create a string based on {expressions}.
My name is Bond, James Bond
Tip
For more details, I would recommend looking up the glue
R package!
Refer to the stringr
cheatsheet
Remember that str_xxx
functions need the first argument to be a vector of strings, not a dataset!
dplyr
verbs like filter()
or mutate()
.name | is_bran | manuf | type | calories | protein | fat | sodium | fiber | carbo | sugars | potass | vitamins | shelf | weight | cups | rating |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
100% Bran | TRUE | N | cold | 70 | 4 | 1 | 130 | 10.0 | 5.0 | 6 | 280 | 25 | 3 | 1.00 | 0.33 | 68.40297 |
100% Natural Bran | TRUE | Q | cold | 120 | 3 | 5 | 15 | 2.0 | 8.0 | 8 | 135 | 0 | 3 | 1.00 | 1.00 | 33.98368 |
All-Bran | TRUE | K | cold | 70 | 4 | 1 | 260 | 9.0 | 7.0 | 5 | 320 | 25 | 3 | 1.00 | 0.33 | 59.42551 |
All-Bran with Extra Fiber | TRUE | K | cold | 50 | 4 | 0 | 140 | 14.0 | 8.0 | 0 | 330 | 25 | 3 | 1.00 | 0.50 | 93.70491 |
Almond Delight | FALSE | R | cold | 110 | 2 | 2 | 200 | 1.0 | 14.0 | 8 | -1 | 25 | 3 | 1.00 | 0.75 | 34.38484 |
Apple Cinnamon Cheerios | FALSE | G | cold | 110 | 2 | 2 | 180 | 1.5 | 10.5 | 10 | 70 | 25 | 1 | 1.00 | 0.75 | 29.50954 |
Apple Jacks | FALSE | K | cold | 110 | 2 | 0 | 125 | 1.0 | 11.0 | 14 | 30 | 25 | 2 | 1.00 | 1.00 | 33.17409 |
Basic 4 | FALSE | G | cold | 130 | 3 | 2 | 210 | 2.0 | 18.0 | 8 | 100 | 25 | 3 | 1.33 | 0.75 | 37.03856 |
Bran Chex | TRUE | R | cold | 90 | 2 | 1 | 200 | 4.0 | 15.0 | 6 | 125 | 25 | 1 | 1.00 | 0.67 | 49.12025 |
Bran Flakes | TRUE | P | cold | 90 | 3 | 0 | 210 | 5.0 | 13.0 | 5 | 190 | 25 | 3 | 1.00 | 0.67 | 53.31381 |
Cap'n'Crunch | FALSE | Q | cold | 120 | 1 | 2 | 220 | 0.0 | 12.0 | 12 | 35 | 25 | 2 | 1.00 | 0.75 | 18.04285 |
Cheerios | FALSE | G | cold | 110 | 6 | 2 | 290 | 2.0 | 17.0 | 1 | 105 | 25 | 1 | 1.00 | 1.25 | 50.76500 |
Cinnamon Toast Crunch | FALSE | G | cold | 120 | 1 | 3 | 210 | 0.0 | 13.0 | 9 | 45 | 25 | 2 | 1.00 | 0.75 | 19.82357 |
Clusters | FALSE | G | cold | 110 | 3 | 2 | 140 | 2.0 | 13.0 | 7 | 105 | 25 | 3 | 1.00 | 0.50 | 40.40021 |
Cocoa Puffs | FALSE | G | cold | 110 | 1 | 1 | 180 | 0.0 | 12.0 | 13 | 55 | 25 | 2 | 1.00 | 1.00 | 22.73645 |
Corn Chex | FALSE | R | cold | 110 | 2 | 0 | 280 | 0.0 | 22.0 | 3 | 25 | 25 | 1 | 1.00 | 1.00 | 41.44502 |
Corn Flakes | FALSE | K | cold | 100 | 2 | 0 | 290 | 1.0 | 21.0 | 2 | 35 | 25 | 1 | 1.00 | 1.00 | 45.86332 |
Corn Pops | FALSE | K | cold | 110 | 1 | 0 | 90 | 1.0 | 13.0 | 12 | 20 | 25 | 2 | 1.00 | 1.00 | 35.78279 |
Count Chocula | FALSE | G | cold | 110 | 1 | 1 | 180 | 0.0 | 12.0 | 13 | 65 | 25 | 2 | 1.00 | 1.00 | 22.39651 |
Cracklin' Oat Bran | TRUE | K | cold | 110 | 3 | 3 | 140 | 4.0 | 10.0 | 7 | 160 | 25 | 3 | 1.00 | 0.50 | 40.44877 |
Cream of Wheat (Quick) | FALSE | N | hot | 100 | 3 | 0 | 80 | 1.0 | 21.0 | 0 | -1 | 0 | 2 | 1.00 | 1.00 | 64.53382 |
Crispix | FALSE | K | cold | 110 | 2 | 0 | 220 | 1.0 | 21.0 | 3 | 30 | 25 | 3 | 1.00 | 1.00 | 46.89564 |
Crispy Wheat & Raisins | FALSE | G | cold | 100 | 2 | 1 | 140 | 2.0 | 11.0 | 10 | 120 | 25 | 3 | 1.00 | 0.75 | 36.17620 |
Double Chex | FALSE | R | cold | 100 | 2 | 0 | 190 | 1.0 | 18.0 | 5 | 80 | 25 | 3 | 1.00 | 0.75 | 44.33086 |
Froot Loops | FALSE | K | cold | 110 | 2 | 1 | 125 | 1.0 | 11.0 | 13 | 30 | 25 | 2 | 1.00 | 1.00 | 32.20758 |
Frosted Flakes | FALSE | K | cold | 110 | 1 | 0 | 200 | 1.0 | 14.0 | 11 | 25 | 25 | 1 | 1.00 | 0.75 | 31.43597 |
Frosted Mini-Wheats | FALSE | K | cold | 100 | 3 | 0 | 0 | 3.0 | 14.0 | 7 | 100 | 25 | 2 | 1.00 | 0.80 | 58.34514 |
Fruit & Fibre Dates; Walnuts; and Oats | FALSE | P | cold | 120 | 3 | 2 | 160 | 5.0 | 12.0 | 10 | 200 | 25 | 3 | 1.25 | 0.67 | 40.91705 |
Fruitful Bran | TRUE | K | cold | 120 | 3 | 0 | 240 | 5.0 | 14.0 | 12 | 190 | 25 | 3 | 1.33 | 0.67 | 41.01549 |
Fruity Pebbles | FALSE | P | cold | 110 | 1 | 1 | 135 | 0.0 | 13.0 | 12 | 25 | 25 | 2 | 1.00 | 0.75 | 28.02576 |
Golden Crisp | FALSE | P | cold | 100 | 2 | 0 | 45 | 0.0 | 11.0 | 15 | 40 | 25 | 1 | 1.00 | 0.88 | 35.25244 |
Golden Grahams | FALSE | G | cold | 110 | 1 | 1 | 280 | 0.0 | 15.0 | 9 | 45 | 25 | 2 | 1.00 | 0.75 | 23.80404 |
Grape Nuts Flakes | FALSE | P | cold | 100 | 3 | 1 | 140 | 3.0 | 15.0 | 5 | 85 | 25 | 3 | 1.00 | 0.88 | 52.07690 |
Grape-Nuts | FALSE | P | cold | 110 | 3 | 0 | 170 | 3.0 | 17.0 | 3 | 90 | 25 | 3 | 1.00 | 0.25 | 53.37101 |
Great Grains Pecan | FALSE | P | cold | 120 | 3 | 3 | 75 | 3.0 | 13.0 | 4 | 100 | 25 | 3 | 1.00 | 0.33 | 45.81172 |
Honey Graham Ohs | FALSE | Q | cold | 120 | 1 | 2 | 220 | 1.0 | 12.0 | 11 | 45 | 25 | 2 | 1.00 | 1.00 | 21.87129 |
Honey Nut Cheerios | FALSE | G | cold | 110 | 3 | 1 | 250 | 1.5 | 11.5 | 10 | 90 | 25 | 1 | 1.00 | 0.75 | 31.07222 |
Honey-comb | FALSE | P | cold | 110 | 1 | 0 | 180 | 0.0 | 14.0 | 11 | 35 | 25 | 1 | 1.00 | 1.33 | 28.74241 |
Just Right Crunchy Nuggets | FALSE | K | cold | 110 | 2 | 1 | 170 | 1.0 | 17.0 | 6 | 60 | 100 | 3 | 1.00 | 1.00 | 36.52368 |
Just Right Fruit & Nut | FALSE | K | cold | 140 | 3 | 1 | 170 | 2.0 | 20.0 | 9 | 95 | 100 | 3 | 1.30 | 0.75 | 36.47151 |
Kix | FALSE | G | cold | 110 | 2 | 1 | 260 | 0.0 | 21.0 | 3 | 40 | 25 | 2 | 1.00 | 1.50 | 39.24111 |
Life | FALSE | Q | cold | 100 | 4 | 2 | 150 | 2.0 | 12.0 | 6 | 95 | 25 | 2 | 1.00 | 0.67 | 45.32807 |
Lucky Charms | FALSE | G | cold | 110 | 2 | 1 | 180 | 0.0 | 12.0 | 12 | 55 | 25 | 2 | 1.00 | 1.00 | 26.73451 |
Maypo | FALSE | A | hot | 100 | 4 | 1 | 0 | 0.0 | 16.0 | 3 | 95 | 25 | 2 | 1.00 | 1.00 | 54.85092 |
Muesli Raisins; Dates; & Almonds | FALSE | R | cold | 150 | 4 | 3 | 95 | 3.0 | 16.0 | 11 | 170 | 25 | 3 | 1.00 | 1.00 | 37.13686 |
Muesli Raisins; Peaches; & Pecans | FALSE | R | cold | 150 | 4 | 3 | 150 | 3.0 | 16.0 | 11 | 170 | 25 | 3 | 1.00 | 1.00 | 34.13976 |
Mueslix Crispy Blend | FALSE | K | cold | 160 | 3 | 2 | 150 | 3.0 | 17.0 | 13 | 160 | 25 | 3 | 1.50 | 0.67 | 30.31335 |
Multi-Grain Cheerios | FALSE | G | cold | 100 | 2 | 1 | 220 | 2.0 | 15.0 | 6 | 90 | 25 | 1 | 1.00 | 1.00 | 40.10596 |
Nut&Honey Crunch | FALSE | K | cold | 120 | 2 | 1 | 190 | 0.0 | 15.0 | 9 | 40 | 25 | 2 | 1.00 | 0.67 | 29.92429 |
Nutri-Grain Almond-Raisin | FALSE | K | cold | 140 | 3 | 2 | 220 | 3.0 | 21.0 | 7 | 130 | 25 | 3 | 1.33 | 0.67 | 40.69232 |
Nutri-grain Wheat | FALSE | K | cold | 90 | 3 | 0 | 170 | 3.0 | 18.0 | 2 | 90 | 25 | 3 | 1.00 | 1.00 | 59.64284 |
Oatmeal Raisin Crisp | FALSE | G | cold | 130 | 3 | 2 | 170 | 1.5 | 13.5 | 10 | 120 | 25 | 3 | 1.25 | 0.50 | 30.45084 |
Post Nat. Raisin Bran | TRUE | P | cold | 120 | 3 | 1 | 200 | 6.0 | 11.0 | 14 | 260 | 25 | 3 | 1.33 | 0.67 | 37.84059 |
Product 19 | FALSE | K | cold | 100 | 3 | 0 | 320 | 1.0 | 20.0 | 3 | 45 | 100 | 3 | 1.00 | 1.00 | 41.50354 |
Puffed Rice | FALSE | Q | cold | 50 | 1 | 0 | 0 | 0.0 | 13.0 | 0 | 15 | 0 | 3 | 0.50 | 1.00 | 60.75611 |
Puffed Wheat | FALSE | Q | cold | 50 | 2 | 0 | 0 | 1.0 | 10.0 | 0 | 50 | 0 | 3 | 0.50 | 1.00 | 63.00565 |
Quaker Oat Squares | FALSE | Q | cold | 100 | 4 | 1 | 135 | 2.0 | 14.0 | 6 | 110 | 25 | 3 | 1.00 | 0.50 | 49.51187 |
Quaker Oatmeal | FALSE | Q | hot | 100 | 5 | 2 | 0 | 2.7 | -1.0 | -1 | 110 | 0 | 1 | 1.00 | 0.67 | 50.82839 |
Raisin Bran | TRUE | K | cold | 120 | 3 | 1 | 210 | 5.0 | 14.0 | 12 | 240 | 25 | 2 | 1.33 | 0.75 | 39.25920 |
Raisin Nut Bran | TRUE | G | cold | 100 | 3 | 2 | 140 | 2.5 | 10.5 | 8 | 140 | 25 | 3 | 1.00 | 0.50 | 39.70340 |
Raisin Squares | FALSE | K | cold | 90 | 2 | 0 | 0 | 2.0 | 15.0 | 6 | 110 | 25 | 3 | 1.00 | 0.50 | 55.33314 |
Rice Chex | FALSE | R | cold | 110 | 1 | 0 | 240 | 0.0 | 23.0 | 2 | 30 | 25 | 1 | 1.00 | 1.13 | 41.99893 |
Rice Krispies | FALSE | K | cold | 110 | 2 | 0 | 290 | 0.0 | 22.0 | 3 | 35 | 25 | 1 | 1.00 | 1.00 | 40.56016 |
Shredded Wheat | FALSE | N | cold | 80 | 2 | 0 | 0 | 3.0 | 16.0 | 0 | 95 | 0 | 1 | 0.83 | 1.00 | 68.23588 |
Shredded Wheat 'n'Bran | TRUE | N | cold | 90 | 3 | 0 | 0 | 4.0 | 19.0 | 0 | 140 | 0 | 1 | 1.00 | 0.67 | 74.47295 |
Shredded Wheat spoon size | FALSE | N | cold | 90 | 3 | 0 | 0 | 3.0 | 20.0 | 0 | 120 | 0 | 1 | 1.00 | 0.67 | 72.80179 |
Smacks | FALSE | K | cold | 110 | 2 | 1 | 70 | 1.0 | 9.0 | 15 | 40 | 25 | 2 | 1.00 | 0.75 | 31.23005 |
Special K | FALSE | K | cold | 110 | 6 | 0 | 230 | 1.0 | 16.0 | 3 | 55 | 25 | 1 | 1.00 | 1.00 | 53.13132 |
Strawberry Fruit Wheats | FALSE | N | cold | 90 | 2 | 0 | 15 | 3.0 | 15.0 | 5 | 90 | 25 | 2 | 1.00 | 1.00 | 59.36399 |
Total Corn Flakes | FALSE | G | cold | 110 | 2 | 1 | 200 | 0.0 | 21.0 | 3 | 35 | 100 | 3 | 1.00 | 1.00 | 38.83975 |
Total Raisin Bran | TRUE | G | cold | 140 | 3 | 1 | 190 | 4.0 | 15.0 | 14 | 230 | 100 | 3 | 1.50 | 1.00 | 28.59278 |
Total Whole Grain | FALSE | G | cold | 100 | 3 | 1 | 200 | 3.0 | 16.0 | 3 | 110 | 100 | 3 | 1.00 | 1.00 | 46.65884 |
Triples | FALSE | G | cold | 110 | 2 | 1 | 250 | 0.0 | 21.0 | 3 | 60 | 25 | 3 | 1.00 | 0.75 | 39.10617 |
Trix | FALSE | G | cold | 110 | 1 | 1 | 140 | 0.0 | 13.0 | 12 | 25 | 25 | 2 | 1.00 | 1.00 | 27.75330 |
Wheat Chex | FALSE | R | cold | 100 | 3 | 1 | 230 | 3.0 | 17.0 | 3 | 115 | 25 | 1 | 1.00 | 0.67 | 49.78744 |
Wheaties | FALSE | G | cold | 100 | 3 | 1 | 200 | 3.0 | 17.0 | 3 | 110 | 25 | 1 | 1.00 | 1.00 | 51.59219 |
Wheaties Honey Gold | FALSE | G | cold | 110 | 2 | 1 | 200 | 1.0 | 16.0 | 8 | 60 | 25 | 1 | 1.00 | 0.75 | 36.18756 |
The real power of these str_xxx
functions comes when you specify the pattern
using regular expressions!
“Regexps are a very terse language that allow you to describe patterns in strings.”
R for Data Science
Use str_xxx
functions + regular expressions!
Tip
You might encounter gsub()
, grep()
, etc. from Base R, but I would highly recommending using functions from the stringr
package instead.
…are tricky!
This web app for testing R regular expressions might be handy!
There is a set of characters that have a specific meaning when using regex.
stringr
package does not read these as normal characters..
^
$
\
|
*
+
?
{
}
[
]
(
)
.
This character can match any character.
[1] "sells" "seashells"
This matches strings that contain any character followed by “ells”.
^ $
? + *
?
– matches when the preceding character occurs 0 or 1 times in a row.
{}
{n}
– matches when the preceding character occurs exactly n times in a row.
()
Groups are created with ( )
.
|
.This matches strings that contain either “peck” or “pick”.
[]
Character classes let you specify multiple possible characters to match on.
Why use []
instead of ()
?
()
is better for making groups of characters, plus you can only use a |
with ()
.
[]
is better for referencing multiple characters, plus you can only use a ^
with []
…
[^ ]
– specifies characters not to match on (think except)
[]
[ - ]
– specifies a range of characters.
\\w
– matches any “word” (\\W
matches not “word”)
\\d
– matches any digit (\\D
matches not digit)
\\s
– matches any whitespace (\\S
matches not whitespace)
What regular expressions would match words that…
\\
To match a special character, you need to escape it.
\\
Use \\
to escape the ?
– it is now read as a normal character.
Use the web app to test R regular expressions.
stringr
cheatsheet.In this activity, you will use functions from the stringr
package and regex to decode a message.
stringr
functions for previewing string contentsstringr
functions for removing whitespacestringr
functions for truncating stringsstringr
functions for replacing patternsstringr
functions for combining multiple stringsNone of us have all these abilities. Each of us has some of these abilities.
[]
stringr
Every group should have a stringr cheatsheet!
On the Front:
Every group should have a task card!
On the Front
On the Back
stringr
functions for different tasks you may encounter[:punct:]
, \\w
)^
, $
)[Kk]
)?
, +
, {2}
)Developer
Coder
First, both of you will do the following:
PA-5-stringr.qmd
fileThen, the partner who has the most pets starts as the Developer (typing and listening to instructions from the Coder)!
During the Practice Activity, you are not permitted to use Google, ChatGPT, or websites for regular expressions for help. . . .
You are permitted to use:
stringr
cheatsheet,Submit the name of the movie the quote is from.
PA-5.html
file for the group.
lubridate