Text extraction from MS Office and PDF

Discussion in 'Ruby' started by Vitali, Jul 24, 2010.

  1. Vitali

    Vitali Guest

    Hello,

    I'm looking for libraries to do text extraction from MS Office and PDF
    file formats. Also looking for libraries to do HTML rendering of
    documents in the same formats. I know of couple of commercial
    libraries from Oracle and Autonomy, but they only have C and/or Java
    APIs. I also found this project http://poi.apache.org/poi-ruby.html.
    Is there other open source alternatives, and/or alternatives with Ruby
    bindings?

    Thanks,
    Vitali
     
    Vitali, Jul 24, 2010
    #1
    1. Advertisements

  2. Vitali

    BruceL Guest

    I am using the standard 'spreadsheet' library to load from excel
    2003 .xls files with ruby.
    it's not pretty but it works
     
    BruceL, Jul 28, 2010
    #2
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stan Accrington
    Replies:
    1
    Views:
    1,195
    Michael Borgwardt
    May 13, 2004
  2. Replies:
    6
    Views:
    656
    Carlos Eduardo Lima Borges
    Jul 7, 2006
  3. vasudevram
    Replies:
    0
    Views:
    840
    vasudevram
    Jul 22, 2006
  4. vasudevram
    Replies:
    0
    Views:
    556
    vasudevram
    Oct 27, 2006
  5. Azodious

    Colored Text extraction from PDF

    Azodious, Jun 3, 2009, in forum: Java
    Replies:
    2
    Views:
    431
    Roedy Green
    Jun 3, 2009
  6. Ricardo Pog
    Replies:
    1
    Views:
    801
    Austin Ziegler
    Mar 26, 2008
  7. Sean Nakasone
    Replies:
    1
    Views:
    676
    Farrel Lifson
    Apr 14, 2008
  8. Replies:
    2
    Views:
    254
    Matt Garrish
    Aug 16, 2005
Loading...