Text extraction from MS Office and PDF

V

Vitali

Hello,

I'm looking for libraries to do text extraction from MS Office and PDF
file formats. Also looking for libraries to do HTML rendering of
documents in the same formats. I know of couple of commercial
libraries from Oracle and Autonomy, but they only have C and/or Java
APIs. I also found this project http://poi.apache.org/poi-ruby.html.
Is there other open source alternatives, and/or alternatives with Ruby
bindings?

Thanks,
Vitali
 
B

BruceL

I am using the standard 'spreadsheet' library to load from excel
2003 .xls files with ruby.
it's not pretty but it works
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top