theBioBucket*: Web-Scraper for Google Scholar Updated!

22 Aug 2012

Web-Scraper for Google Scholar Updated!

I have updated the Google Scholar Web-Scraper Function GScholarScaper_2 to GScholarScraper_3 (and GScholarScaper_3.1) as it was outdated due to changes in the Google Scholar html-code. The new script is more slender and faster. It returns a dataframe or optionally a CSV-file with the titles, authors, publications & links. Feel free to report bugs, etc.

Update 11-07-2013: bug fixes due to google scholar code changes - https://github.com/gimoya/theBioBucket-Archives/blob/master/R/Functions/GScholarScraper_3.2.R. Note that since lately your IP will be blocked by Google at about the 1000th search result (cumulated) - so there's not much fun when you want to do some extensive bibliometrics..

12 comments :

Anonymous23 August 2012 at 21:13
Got a error message:
Erro em htmlParse(url) :
error in creating parser for http://scholar.google.com/scholar?q=allintitle:pantanal&num=1&as_sdt=1&as_vis=1

I could not solve the problem.

Anyway, its an interesting function :)

Ah, i use Tinn-R, windows 7 and R 2.15.1 if you could figure out the problem ^^.
ReplyDelete
Replies
Erin24 August 2012 at 00:13
So cool!

Thanks very much!
ReplyDelete
Replies
deevybee24 August 2012 at 08:11
Thank you for this. I previously spent many hours working out how to scrape data from Google Scholar. Sadly, once I got a working program, I found Google Scholar locked me out after I had retrieved around 100 records. Correspondence with them got me nowhere: they basically accuse you of unethical behavior if you try to automate searches. I can't understand their logic and they don't explain it.
It's very disappointing for those of us who want to do serious research using bibliometrics.
I haven't tried your program but assume it would hit the same snag?
ReplyDelete
Replies
Anonymous30 August 2012 at 18:37
Really cool application. Could you please provide a brief example of how to produce a wordcloud with the dataframe returned by GScholar_Scraper_3.1

I attempted following the example shown in GScholar_Scraper_2 but keep getting wordclouds of publication years and removing numeric's leaves an empty dataframe. I'm missing something simple in corpus <- Corpus(DataframeSource(df[, 1:2])) but cant see what

Thanks again
ReplyDelete
Replies
Anonymous22 April 2013 at 09:26
Hi, is this still live? I read elsewhere on your site that Google had changed their code. Thanks so much
ReplyDelete
Replies
saeed26 September 2015 at 04:19
I really appreciate your efforts here. However, by 100 hits you mean 1000 returned results? If that is the case, this code has a very very limited usage. I searched for "authenticity" as my keyword and 280,000+ results returned. Obviously, the code didn't work. At least, you can add an argument enabling the user to limit the fetched results to the first 1000 results.
ReplyDelete
Replies

Add comment

TABS

22 Aug 2012

Web-Scraper for Google Scholar Updated!

12 comments :