GoogleHits <- function(input)
{
require(XML)
require(stringr)
require(RCurl)
url <- paste("https://www.google.com/search?q=\"",
input, "\"", sep = "")
CAINFO = paste(system.file(package="RCurl"), "/CurlSSL/ca-bundle.crt", sep = "")
script <- getURL(url, followlocation = TRUE, cainfo = CAINFO)
doc <- htmlParse(script)
res <- xpathSApply(doc, "//div[@id='subform_ctrl']/*", xmlValue)[[2]]
cat(paste("\nYour Search URL:\n", url, "\n", sep = ""))
cat("\nNo. of Hits:\n")
return(as.integer(gsub("[^0-9]", "", res)))
}
# Example:
GoogleHits("R%Statistical%Software")
p.s.: If you try to do this in a robot fashion, like:
lapply(list_of_search_terms, GoogleHits)
google will block you after about the 300th recursion!
great script! any way to get around the block of google after the 300th recursion?
ReplyDeleteNot to my knowledge..
DeleteI guess there are some changes with google output.
ReplyDeleteTo make it working I had to remove [[2]] from the following command.
res <- xpathSApply(doc, "//div[@id='subform_ctrl']/*", xmlValue)[[2]]
Thanks bro!
Delete