library(stringr)
shopping_list <- c("bread & Apples §$%&/()=?4", "flouR", "sugar", "milk x2")
str_extract(shopping_list, "[A-Z].*[1-9]")
# this extracts partial strings starting with an upper-case letter
# and ending with a digit, for all elements of the input vector..
# "." period, any single case letter, "*" the preceding item will
# be matched zero or more times, ".*" regex for a string
# comprised of any item being repeated arbitrarily often.
# output:
[1] "Apples §$%&/()=?4" NA NA NA
str_extract(shopping_list, "[a-z]{1,4}")
# this extracts partial strings with lowercase repetitions of 4,
# for all elements of the input vector..
# output:
[1] "brea" "flou" "suga" "milk"
str_extract(shopping_list, "\\b[a-z]{1,4}\\b")
# this extracts whole words with lowercase repetitions of 4,
# for all elements of the input vector..
#output:
[1] NA NA NA "milk"
str <- c("&George W. Bush", "Lyndon B. Johnson?")
gsub("[^[:alnum:][:space:].]", "", str)
# keep alphanumeric signs AND full-stop, remove anything else,
# that is, all other punctuation. what should not be matched is
# designated by the caret.
# output:
[1] "George W. Bush" "Lyndon B. Johnson"
3 Nov 2011
Some Simple but Propably Useful Regex Examples with R-Package stringr...
I found that examples for the use of regex in R are rather rare. Thus, I will provide some examples from my own learning materials - mostly stolen from the help pages, with small but maybe illustrative adaptions.
ps: I will extent this list of examples HERE occasionally..
Subscribe to:
Post Comments
(
Atom
)

You might want to show how the function str_extract_all compares on the above examples
ReplyDelete..."str_extract" extracts the first piece of a string (held within a vector element) matching a pattern, returning a vector containing the matching strings.
ReplyDelete"str_extract_all" extracts all pieces of a string (held within a vector element) that match a pattern and returns a vector-list, each vector containing all matches with the elements of the input vector.
Hope I got this right,
Kay
What is the advantage of using the stringr package versus the grep() function in the base library?
ReplyDeletetry yourself and you'll see...
ReplyDeletestr_extract(shopping_list, "[A-Z](.*)[1-9]")
grep("[A-Z](.*)[1-9]", shopping_list)