library(stringr) shopping_list <- c("bread & Apples §$%&/()=?4", "flouR", "sugar", "milk x2") str_extract(shopping_list, "[A-Z].*[1-9]") # this extracts partial strings starting with an upper-case letter # and ending with a digit, for all elements of the input vector.. # "." period, any single case letter, "*" the preceding item will # be matched zero or more times, ".*" regex for a string # comprised of any item being repeated arbitrarily often. # output: [1] "Apples §$%&/()=?4" NA NA NA str_extract(shopping_list, "[a-z]{1,4}") # this extracts partial strings with lowercase repetitions of 4, # for all elements of the input vector.. # output: [1] "brea" "flou" "suga" "milk" str_extract(shopping_list, "\\b[a-z]{1,4}\\b") # this extracts whole words with lowercase repetitions of 4, # for all elements of the input vector.. #output: [1] NA NA NA "milk" str <- c("&George W. Bush", "Lyndon B. Johnson?") gsub("[^[:alnum:][:space:].]", "", str) # keep alphanumeric signs AND full-stop, remove anything else, # that is, all other punctuation. what should not be matched is # designated by the caret. # output: [1] "George W. Bush" "Lyndon B. Johnson"
3 Nov 2011
Some Simple but Propably Useful Regex Examples with R-Package stringr...
I found that examples for the use of regex in R are rather rare. Thus, I will provide some examples from my own learning materials - mostly stolen from the help pages, with small but maybe illustrative adaptions.
ps: I will extent this list of examples HERE occasionally..
Subscribe to:
Post Comments
(
Atom
)
You might want to show how the function str_extract_all compares on the above examples
ReplyDelete..."str_extract" extracts the first piece of a string (held within a vector element) matching a pattern, returning a vector containing the matching strings.
ReplyDelete"str_extract_all" extracts all pieces of a string (held within a vector element) that match a pattern and returns a vector-list, each vector containing all matches with the elements of the input vector.
Hope I got this right,
Kay
What is the advantage of using the stringr package versus the grep() function in the base library?
ReplyDeletetry yourself and you'll see...
ReplyDeletestr_extract(shopping_list, "[A-Z](.*)[1-9]")
grep("[A-Z](.*)[1-9]", shopping_list)