read pdf header in c

R

Rudra Banerjee

Can anyone kindly show me the steps required to read pdf headers in human readable format?
 
P

Paul

Rudra said:
Can anyone kindly show me the steps required to read pdf headers in human readable format?

Adobe offers documentation.

http://www.adobe.com/devnet/pdf/pdf_reference_archive.html

There is this 1310 page book.

http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/pdf_reference_1-7.pdf

For comparison, you can also get yourself a copy of the
PostScript Language Reference Manual (PLRM.pdf).

http://www.adobe.com/products/postscript/pdfs/PLRM.pdf

In the beginning, there was PostScript. It's documented in
PLRM.pdf and it's a language of its own. PDF builds on those
concepts.

A lot of tools, when they produce PDF, they emit output in a
binary format which is hard for humans to read. But there
is also an option, to output in a text format (less compressed
perhaps). A tool such as GhostScript, can help with such
a transformation. And the source code for GhostScript, will
teach you a lot about PDF and PostScript in general.

http://stackoverflow.com/questions/...o-ascii-ansi-so-i-can-look-at-it-in-a-text-ed

gswin32c.exe -- c:/path/to/pdfinflt.ps your-input.pdf deflated-output.pdf

It's unclear to me, what you mean by "headers" in this context.
PDF defines setup and subroutines, ahead of the definition
of the actual pages. But that's not particularly useful.
You could also be referring to tagging information. And
that may be OS specific for all I know.

In any case, have fun.

Paul
 
M

Malcolm McLean

בת×ריך ×™×•× ×©×œ×™×©×™, 18 בספטמבר 2012 16:54:23 UTC+1, מ×ת Rudra Banerjee:
Can anyone kindly show me the steps required to read pdf headers in human
readable format?
PDF is a binary format. To read any binary format, you need to have a copy
of the format specification. That tells you how the bits are to be interpreted.

With PDF, the gross file structure is quite straightforwards. Whilst I forget
the details, basically you have a tag which tells you what type of data the
section is (text, image, font, copyright notice, etc), then you have the
length of the data, then you have the data itself.
However the data itself is usually compressed, using zlib. Whilst it is
possible to write your owen decompressor, this is a major undertaking.
usually the only realistic option is to use a library.
What this means is that whilst you can get an idea of waht a PDF file
contains, you can't easily read the actual data, certainly not with your own
little scratch program.
 
J

James Kuyper

On 09/18/2012 03:42 PM, Malcolm McLean wrote:
....
However the data itself is usually compressed, using zlib. Whilst it is
possible to write your owen decompressor, this is a major undertaking.
usually the only realistic option is to use a library.
What this means is that whilst you can get an idea of waht a PDF file
contains, you can't easily read the actual data, certainly not with your own
little scratch program.

I'd expect zlib to include decompression algorithms, and a quick look at
the zlib documentation seems to confirm this expectation, so it should
be relatively easy to write a scratch program linked to zlib for reading
the actual data. I've never actually tried it - am I missing something?
 
R

Rudra Banerjee

Thanks to all of you.
But, first of all, it seems, I need to have a profound knowledge of pdf file structure, as Paul suggested.
:(
 
J

Jorgen Grahn

????? 2012 16:54:23 UTC+1, ?????? Rudra Banerjee:
PDF is a binary format. To read any binary format, you need to have a copy
of the format specification.

And that's true for text-based formats as well. It's just more
tempting to rely on guesswork in that case: "all C programs start with
a few #include lines, because all of those I looked at did".

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top