17 Mar 2014

Download all Documents from Google Drive with R

A commentator on my blog recently asked if it is possible to retrieve all direct links to your Google Documents. And indeed it can be very easily done with R, just like so:









# you'll need RGoogleDocs (with RCurl dependency..)
install.packages("RGoogleDocs", repos = "http://www.omegahat.org/R", type="source")
library(RGoogleDocs)



gpasswd = "mysecretpassword"
auth = getGoogleAuth("kay.cichini@gmail.com", gpasswd)
con = getGoogleDocsConnection(auth)

CAINFO = paste(system.file(package="RCurl"), "/CurlSSL/ca-bundle.crt", sep = "")
docs <- getDocs(con, cainfo = CAINFO)

# get file references
hrefs <- lapply(docs, function(x) return(x@access["href"]))
keys <- sub(".*/full/.*%3A(.*)", "\\1", hrefs)
types <- sub(".*/full/(.*)%3A.*", "\\1", hrefs)

# make urls (for url-scheme see: http://techathlon.com/download-shared-files-google-drive/)
# put format parameter for other output formats!
pdf_urls <- paste0("https://docs.google.com/uc?export=download&id=", keys)
doc_urls <- paste0("https://docs.google.com/document/d/", keys, "/export?format=", "txt")

# download documents with your browser
gdoc_ids <- grep("document", types)
lapply(gdoc_ids, function(x) shell.exec(doc_urls[x]))

pdf_ids <- grep("pdf", types, ignore.case = T)
lapply(pdf_ids, function(x) shell.exec(pdf_urls[x]))

6 comments :

  1. Hi,
    Nice post! On mac I can not find the appropriate function replacement for shell.exec. Could you please give an advise how can I do it.
    Many thanks.

    ReplyDelete
    Replies
    1. Sorry, I got no experience with mac os..

      Delete
  2. Side note: I learned the hard way that you cannot store git repo folders on Google drive. If you are using github from within Rstudio, and saving the directory on google drive, the repo will get corrupted (icon files).

    ReplyDelete
  3. This is the beauty of R !!!

    ReplyDelete
  4. Nice. Can I use this to download data from a public folder on the internet?

    ReplyDelete
    Replies
    1. Sorry for the late answer,
      that's exactly what the post was intended for, so please try it and report if there are any issues!

      Delete