I have updated the Google Scholar Web-Scraper Function
GScholarScaper_2 to
GScholarScraper_3 (and
GScholarScaper_3.1) as it was outdated due to changes in the Google Scholar html-code. The new script is more slender and faster. It returns a dataframe or optionally a CSV-file with the titles, authors, publications & links. Feel free to report bugs, etc.
Got a error message:
ReplyDeleteErro em htmlParse(url) :
error in creating parser for http://scholar.google.com/scholar?q=allintitle:pantanal&num=1&as_sdt=1&as_vis=1
I could not solve the problem.
Anyway, its an interesting function :)
Ah, i use Tinn-R, windows 7 and R 2.15.1 if you could figure out the problem ^^.
Sry, I can't reproduce the error.. As you only search for one word in the titles you could use "intitle:pantanal" - however, it works for me also with "allintitle:pantanal"..
DeleteWell, i was trying to do something like this. to produce a figure to show how some theory for example got more citations.
Deleteinput<-paste("metapopulation&as_ylo=",1980:2012,"as_yhi=",1980:2012,sep="")
anos<-1980:2012
resultados<-rep(NA,length(anos))
for(i in 1:length(anos)) {
resultados[i]<-length(GScholar_Scraper(input[i],write=F)$PUBLICATION )
}
Make many searchs for year, it work sometimes then stop working and start giveing the error i said before
Please see the follow-up posting http://thebiobucket.blogspot.co.at/2012/08/toy-example-with-gscholarscraper31.html - maybe this will help!
DeleteHowever, there is an issue with Google blocking automated searches which arises for search strings giving more than 1000 results. And, occasionally you're IP seems to be blocked generally.. I'm afraid there is no quick solution for this (changing your IP / resetting modem, etc. fixes the problem, however, not very elegantly..).
So cool!
ReplyDeleteThanks very much!
Thank you for this. I previously spent many hours working out how to scrape data from Google Scholar. Sadly, once I got a working program, I found Google Scholar locked me out after I had retrieved around 100 records. Correspondence with them got me nowhere: they basically accuse you of unethical behavior if you try to automate searches. I can't understand their logic and they don't explain it.
ReplyDeleteIt's very disappointing for those of us who want to do serious research using bibliometrics.
I haven't tried your program but assume it would hit the same snag?
..with my function which utilizes htmlParse(url) from the XML-package it works for search strings that give less than 1000 hits. Then it seems to be blocked.
DeleteReally cool application. Could you please provide a brief example of how to produce a wordcloud with the dataframe returned by GScholar_Scraper_3.1
ReplyDeleteI attempted following the example shown in GScholar_Scraper_2 but keep getting wordclouds of publication years and removing numeric's leaves an empty dataframe. I'm missing something simple in corpus <- Corpus(DataframeSource(df[, 1:2])) but cant see what
Thanks again
Check the follow-up (http://thebiobucket.blogspot.com/2012/08/follow-up-making-word-cloud-for-search.html)..
DeleteHi, is this still live? I read elsewhere on your site that Google had changed their code. Thanks so much
ReplyDeletePleaser try version 3.1. and report if there are any issues!
Delete