Accessing PDF Metadata and Page Thumbnails

B

Ben Gribaudo

Hello,

I am putting together a PDF archive of our corporate newsletters. I'd
like to iterate though a directory of PDFs, read their metadata (title,
description, etc.) and use this info to dynamically generate a RHTML
index page. There are several Ruby PDF libraries out there but they seem
inclined towards creating PDFs instead of reading them. Any
recommendations on a library to read PDF metadata?

It would be neat to not only read metadata but also to pull the PDF's
first page's thumbnail out as an image. This would allow dynamic
creation of an index page that looks like this:
http://www.reviveourhearts.com/difference/newsletter/newsletter_archive.php

Any thoughts?

Thanks,
Ben
 
E

Eugen Minciu

Excerpts from Ben Gribaudo's message of Thu Jul 26 19:33:32 +0300 2007:
Hello,

I am putting together a PDF archive of our corporate newsletters. I'd
like to iterate though a directory of PDFs, read their metadata (title,
description, etc.) and use this info to dynamically generate a RHTML
index page. There are several Ruby PDF libraries out there but they seem
inclined towards creating PDFs instead of reading them. Any
recommendations on a library to read PDF metadata?

It would be neat to not only read metadata but also to pull the PDF's
first page's thumbnail out as an image. This would allow dynamic
creation of an index page that looks like this:
http://www.reviveourhearts.com/difference/newsletter/newsletter_archive.php

Any thoughts?
Have a look at http://extractor.rubyforge.org . You need libextractor
and its headers to compile it though. Would that work for you?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top