28 Sept 2012

Reading and Text Mining a PDF-File in R

I just added this R-script that reads a PDF-file to R and does some text mining with it to my Github repo..


9 comments :

  1. Thanks for a helpful post and code. Does anyone have experience using pdftotext on a the Mac side? This is my first time and I'm having trouble adapting this code to the Mac. I downloaded the .dmg from http://www.bluem.net/en/mac/packages/ and installed it, but I don't know how to call it within R.

    ReplyDelete
    Replies
    1. I have the same problem, I downloaded pdftotext but can't locate it anywhere on my computer.
      Help anyone?

      Delete
    2. Kay, thanks for that nice script! I adopted it for Mac usage: http://www.nicebread.de/parse-pdf-files-with-r-on-a-mac/trackback/

      Aaron, Sebastian, does that help you?

      Delete
  2. Thanks for this script. One question - The command-prompt to execute PDFtoText works, except that it is not opening with the named file as the PDF to be converted. PDFtoText opens with blank (no files added for conversion) ... Is there something I'm missing? I'm using PDFtoText 1.20 on a Windows 7 machine. Thanks!

    ReplyDelete
    Replies
    1. I had similar problems until I changed the working directory to C
      > setwd("C:/")
      and removed tempfile from line 7
      7 dest <- "romney.pdf"
      Michael

      Delete
  3. take a look: http://cran.r-project.org/web/packages/grImport/vignettes/import.pdf

    ReplyDelete