UPDATE: Thanks to
Max Ghenis for updating my R-script which I wrote a while back - the below R-script can now be used again for pulling the number of hits from Google-Search.
GoogleHits <- function(input)
{
require(XML)
require(RCurl)
url <- paste("https://www.google.com/search?q=\"",
input, "\"", sep = "")
CAINFO = paste(system.file(package="RCurl"), "/CurlSSL/ca-bundle.crt", sep = "")
script <- getURL(url, followlocation = TRUE, cainfo = CAINFO)
doc <- htmlParse(script)
res <- xpathSApply(doc, '//*/div[@id="resultStats"]', xmlValue)
cat(paste("\nYour Search URL:\n", url, "\n", sep = ""))
cat("\nNo. of Hits:\n")
return(as.integer(gsub("[^0-9]", "", res)))
}
# Example:
GoogleHits("R%Statistical%Software")
p.s.: If you try to do this in a robot fashion, like:
lapply(list_of_search_terms, GoogleHits)
google will block you after about the 300th recursion!
great script! any way to get around the block of google after the 300th recursion?
ReplyDeleteNot to my knowledge..
DeleteI guess there are some changes with google output.
ReplyDeleteTo make it working I had to remove [[2]] from the following command.
res <- xpathSApply(doc, "//div[@id='subform_ctrl']/*", xmlValue)[[2]]
Thanks bro!
DeleteGoogle output changed again, need to change the path to '//*/div[@id="resultStats"]'
ReplyDeleteAlso I don't believe stringr is needed
ReplyDeleteof course (just forgot to remove the line..)
DeleteHi, i'm new on R, is there some way to do the same but filtering the search on google by specific date intervals?
ReplyDeleteYes - just go to the advanced Google Search and put a time interval for your query, then take the URL Google creates and use it with the R-function!
Delete