Text extraction from MS Office and PDF

Discussion in 'Ruby' started by Vitali, Jul 24, 2010.

  1. Vitali

    Vitali Guest

    Hello,

    I'm looking for libraries to do text extraction from MS Office and PDF
    file formats. Also looking for libraries to do HTML rendering of
    documents in the same formats. I know of couple of commercial
    libraries from Oracle and Autonomy, but they only have C and/or Java
    APIs. I also found this project http://poi.apache.org/poi-ruby.html.
    Is there other open source alternatives, and/or alternatives with Ruby
    bindings?

    Thanks,
    Vitali
    Vitali, Jul 24, 2010
    #1
    1. Advertising

  2. Vitali

    BruceL Guest

    I am using the standard 'spreadsheet' library to load from excel
    2003 .xls files with ruby.
    it's not pretty but it works
    BruceL, Jul 28, 2010
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stan Accrington
    Replies:
    1
    Views:
    931
    Michael Borgwardt
    May 13, 2004
  2. Azodious

    Colored Text extraction from PDF

    Azodious, Jun 3, 2009, in forum: Java
    Replies:
    2
    Views:
    335
    Roedy Green
    Jun 3, 2009
  3. Ricardo Pog
    Replies:
    1
    Views:
    421
    Austin Ziegler
    Mar 26, 2008
  4. Sean Nakasone
    Replies:
    1
    Views:
    363
    Farrel Lifson
    Apr 14, 2008
Loading...

Share This Page