read pdf header in c

Discussion in 'C Programming' started by Rudra Banerjee, Sep 18, 2012.

  1. Can anyone kindly show me the steps required to read pdf headers in human readable format?
    Rudra Banerjee, Sep 18, 2012
    #1
    1. Advertising

  2. Rudra Banerjee

    Paul Guest

    Rudra Banerjee wrote:
    > Can anyone kindly show me the steps required to read pdf headers in human readable format?


    Adobe offers documentation.

    http://www.adobe.com/devnet/pdf/pdf_reference_archive.html

    There is this 1310 page book.

    http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/pdf_reference_1-7.pdf

    For comparison, you can also get yourself a copy of the
    PostScript Language Reference Manual (PLRM.pdf).

    http://www.adobe.com/products/postscript/pdfs/PLRM.pdf

    In the beginning, there was PostScript. It's documented in
    PLRM.pdf and it's a language of its own. PDF builds on those
    concepts.

    A lot of tools, when they produce PDF, they emit output in a
    binary format which is hard for humans to read. But there
    is also an option, to output in a text format (less compressed
    perhaps). A tool such as GhostScript, can help with such
    a transformation. And the source code for GhostScript, will
    teach you a lot about PDF and PostScript in general.

    http://stackoverflow.com/questions/...o-ascii-ansi-so-i-can-look-at-it-in-a-text-ed

    gswin32c.exe -- c:/path/to/pdfinflt.ps your-input.pdf deflated-output.pdf

    It's unclear to me, what you mean by "headers" in this context.
    PDF defines setup and subroutines, ahead of the definition
    of the actual pages. But that's not particularly useful.
    You could also be referring to tagging information. And
    that may be OS specific for all I know.

    In any case, have fun.

    Paul
    Paul, Sep 18, 2012
    #2
    1. Advertising

  3. בת×ריך ×™×•× ×©×œ×™×©×™, 18 בספטמבר 2012 16:54:23 UTC+1, מ×ת Rudra Banerjee:
    > Can anyone kindly show me the steps required to read pdf headers in human
    > readable format?
    >

    PDF is a binary format. To read any binary format, you need to have a copy
    of the format specification. That tells you how the bits are to be interpreted.

    With PDF, the gross file structure is quite straightforwards. Whilst I forget
    the details, basically you have a tag which tells you what type of data the
    section is (text, image, font, copyright notice, etc), then you have the
    length of the data, then you have the data itself.
    However the data itself is usually compressed, using zlib. Whilst it is
    possible to write your owen decompressor, this is a major undertaking.
    usually the only realistic option is to use a library.
    What this means is that whilst you can get an idea of waht a PDF file
    contains, you can't easily read the actual data, certainly not with your own
    little scratch program.

    --
    http://www.malcolmmclean.site11.com/www
    Malcolm McLean, Sep 18, 2012
    #3
  4. Rudra Banerjee

    James Kuyper Guest

    On 09/18/2012 03:42 PM, Malcolm McLean wrote:
    ....
    > However the data itself is usually compressed, using zlib. Whilst it is
    > possible to write your owen decompressor, this is a major undertaking.
    > usually the only realistic option is to use a library.
    > What this means is that whilst you can get an idea of waht a PDF file
    > contains, you can't easily read the actual data, certainly not with your own
    > little scratch program.


    I'd expect zlib to include decompression algorithms, and a quick look at
    the zlib documentation seems to confirm this expectation, so it should
    be relatively easy to write a scratch program linked to zlib for reading
    the actual data. I've never actually tried it - am I missing something?
    James Kuyper, Sep 18, 2012
    #4
  5. Thanks to all of you.
    But, first of all, it seems, I need to have a profound knowledge of pdf file structure, as Paul suggested.
    :(
    Rudra Banerjee, Sep 18, 2012
    #5
  6. Rudra Banerjee

    Jorgen Grahn Guest

    On Tue, 2012-09-18, Malcolm McLean wrote:
    > ????? 2012 16:54:23 UTC+1, ?????? Rudra Banerjee:
    >> Can anyone kindly show me the steps required to read pdf headers in human
    >> readable format?
    >>

    > PDF is a binary format. To read any binary format, you need to have a copy
    > of the format specification.


    And that's true for text-based formats as well. It's just more
    tempting to rely on guesswork in that case: "all C programs start with
    a few #include lines, because all of those I looked at did".

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Sep 20, 2012
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Newsgroup - Ann
    Replies:
    4
    Views:
    701
    lilburne
    Nov 2, 2003
  2. John Smith

    Header files included in header files

    John Smith, Jul 21, 2004, in forum: C Programming
    Replies:
    18
    Views:
    603
    Jack Klein
    Jul 24, 2004
  3. mlt
    Replies:
    2
    Views:
    831
    Jean-Marc Bourguet
    Jan 31, 2009
  4. Ricardo Pog
    Replies:
    1
    Views:
    415
    Austin Ziegler
    Mar 26, 2008
  5. Sean Nakasone
    Replies:
    1
    Views:
    355
    Farrel Lifson
    Apr 14, 2008
Loading...

Share This Page