Johan Holst Nielsen said:
David Boddie wrote:
Yep... I tried it... but there are no reason to do exactly the same - if
other people already have done that. And time is an issue too
Time is always an issue. How much of it do you have? ;-)
Well, let me know
Maybe I could get an demo or something? That would
be nice
You may be disappointed, but here it is:
http://www.boddie.org.uk/david/Projects/Python/pdftools/
The core of the library was written in a hurry over two years ago; later refinements
make it only slightly more robust. It was never really intended for anything other
than exploring the structure of PDF files.
Basic use:
import pdftools
file = "MyFile.pdf"
doc = pdftools.PDFdocument(file)
print "Document uses PDF format version", doc.document_version()
pages = doc.count_pages()
print "Document contains %i pages." % pages
if pages > 123:
page123 = doc.read_page(123)
contents123 = page123.read_contents()
print "The objects found in this page:"
print
print contents123.contents
I've not really dealt with the coordinate system very well. Ideally, it would be
trivial to extract all the device-independent positioning information but,
whenever I start to look at this, I get distracted.
Have fun, and don't expect too much,
David