Ruby PDF text extractor

K

Kevin Olbrich

I notice that Ruby has lots of tools for creating PDF files, are there any
that let you extract text from a PDF file?

_Kevin
 
A

Austin Ziegler

I notice that Ruby has lots of tools for creating PDF files, are there an= y
that let you extract text from a PDF file?

Not yet. PDF::Writer will be refactored a little bit for version 2.0
(coming out later this year) so that it will be three separate
components: PDF::Core (the core objects representing a PDF object in
memory, as well as rendering), PDF::Writer (the writer/layout code),
and PDF::Reader (read a PDF object into an in-memory representation).
Much of the code to do PDF::Core is already in place (it's currently
called PDF::Writer::Object or PDF::Writer::Objects), but there's
nothing explicitly present to represent this.

PDF::Reader will probably be released in early 2006, depending on how
long it takes to refactor the code that already exists, properly
extend it, and get the necessary PDF::Writer code finished.

-austin
--=20
Austin Ziegler * (e-mail address removed)
* Alternate: (e-mail address removed)
 
K

Kevin Olbrich

Thanks, I'll keep my eyes open for it.

_Kevin

-----Original Message-----
From: Austin Ziegler [mailto:[email protected]]=20
Sent: Saturday, August 13, 2005 01:45 PM
To: ruby-talk ML
Subject: Re: Ruby PDF text extractor


I notice that Ruby has lots of tools for creating PDF files, are there =
any that let you extract text from a PDF file?

Not yet. PDF::Writer will be refactored a little bit for version 2.0 =
(coming
out later this year) so that it will be three separate
components: PDF::Core (the core objects representing a PDF object in =
memory,
as well as rendering), PDF::Writer (the writer/layout code), and =
PDF::Reader
(read a PDF object into an in-memory representation). Much of the code =
to do
PDF::Core is already in place (it's currently called PDF::Writer::Object =
or
PDF::Writer::Objects), but there's nothing explicitly present to =
represent
this.

PDF::Reader will probably be released in early 2006, depending on how =
long
it takes to refactor the code that already exists, properly extend it, =
and
get the necessary PDF::Writer code finished.

-austin
--=20
Austin Ziegler * (e-mail address removed)
* Alternate: (e-mail address removed)
 
A

Andreas Schrafl

I once wrote a Ruby PDF Text extractor while workin at ywesee.

I tought they released it on rubyforge but I can't find it anymore.
perhaps if you contact them they can help you.
www.ywesee.com

Greetings
Andy
 
M

Martin DeMello

Austin Ziegler said:
PDF::Reader will probably be released in early 2006, depending on how
long it takes to refactor the code that already exists, properly
extend it, and get the necessary PDF::Writer code finished.

I'd be interested in helping with this.

martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top