Extract PDF content?

E

EdUarDo

Hi all,

Is there any gem or library which allows to extract text from a .PDF file?, any for Word or OpenOffice files?
 
A

akbarhome

reading pdf with pure ruby? no. Only creating pdf now.....

Reading word? I don't know.....
 
D

Dave Burt

EdUarDo said:
Is there any gem or library which allows to extract text from a .PDF
file?, any for Word or OpenOffice files?

You can use Windows Automation, the WIN32OLE library, and Microsoft Word
to open a Word document and use "Save As" to produce a plain text file
or expose the contents programmatically.

Cheers,
Dave
 
M

Martin DeMello

Jon Wood said:
I don't know about PDFs, but there are several programs available that
can convert a Word file into HTML - you'll probably lose formatting,
but you should then be able to process the file like any other XML to
extract the text content from it.

There are some command line switches available for openoffice too -
http://www.xml.com/pub/a/2006/01/11/from-microsoft-to-openoffice.html

You should be able to script it to open the file and save as text.

martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top