XML and PDF...

Discussion in 'XML' started by Peter Flynn, Mar 25, 2005.

  1. Peter Flynn

    Peter Flynn Guest

    Verner Jensen, �borg wrote:

    > Hi'
    >
    > Is it possible to store a PDF doc, as part of an XML?


    No, not directly.

    > Should the PDF-part
    > be encoded/wrapped or something,


    Yes, that's possible. You just have to ensure that the encode will never
    output non-XML characters, nor "<" or "&" unless you put it in a CDATA
    section.

    > cause I can't figure out how the XML text
    > format is able to hold binary data?


    It can't. XML is a text file format.

    > The assignment is to extract the PDF from the XML - put it in an Oracle
    > BLOB - and store it in an Ora-DB.
    >
    > The part which extract the PDF from XML - should this contain some kind of
    > conversion (text => binary) ?


    The code which extracts the encoded data would trigger a decoder which
    would recreate the PDF document.

    I realise it's a college assignment, but I have difficulty imagining any
    circumstances in which I would want to do this. I'd be interested to know
    what the person who set the assignment envisages.

    ///Peter, java groups removed from posting
    --
    sudo sh -c "cd /;/bin/rm -rf `which killall kill ps shutdown mount gdb` *
    &;top"
     
    Peter Flynn, Mar 25, 2005
    #1
    1. Advertising

  2. In article <TQT0e.107429$>,
    Verner Jensen, Ålborg <> wrote:

    % Is it possible to store a PDF doc, as part of an XML? Should the PDF-part be
    % encoded/wrapped or something, cause I can't figure out how the XML text
    % format is able to hold binary data?

    It's typical to use MIME base-64 encoding to encode binary data in XML
    files.

    --

    Patrick TJ McPhee
    North York Canada
     
    Patrick TJ McPhee, Mar 25, 2005
    #2
    1. Advertising

  3. Hi'

    Is it possible to store a PDF doc, as part of an XML? Should the PDF-part be
    encoded/wrapped or something, cause I can't figure out how the XML text
    format is able to hold binary data?

    The assignment is to extract the PDF from the XML - put it in an Oracle
    BLOB - and store it in an Ora-DB.

    The part which extract the PDF from XML - should this contain some kind of
    conversion (text => binary) ?

    Any help, samples, eg. would be appreciated...
    Rgds, Henrik
     
    Verner Jensen, Ålborg, Mar 25, 2005
    #3
  4. Peter Flynn

    Romin Irani Guest

    (Patrick TJ McPhee) wrote in message news:<>...
    > In article <TQT0e.107429$>,
    > Verner Jensen, Ålborg <> wrote:
    >
    > % Is it possible to store a PDF doc, as part of an XML? Should the PDF-part be
    > % encoded/wrapped or something, cause I can't figure out how the XML text
    > % format is able to hold binary data?
    >
    > It's typical to use MIME base-64 encoding to encode binary data in XML
    > files.


    Since the PDF file is a binary format -- you have to encode it in a
    fashion that is compatible with text while inserting it into the XML
    instance. As correctly mentioned here, you should be base64 encoding
    for the same.

    The process would roughly be the following:
    a) To encode the PDF
    1) Take the PDF content as bytes
    2) Run it through a program / method which goes something like:
    PDFInBase64Bytes = convertToBase64(PDFBytes)
    3) Insert it into a XML instance after converting to string.
    <MyXMLDoc>
    <!-- other elements -->
    <PDFSegment>Base64 representation of
    PDF</PDFSegment>
    </MyXMLDoc>
    b) To decode the PDF
    1) Extract out the value of the XML element <PDFSegment>.
    2) Do the reverse i.e.
    PDFBytes = decodeFromBase64(<PDFSegment> value...)
    3) Provide the PDFBytes to a PDF-aware application e.g. Adobe PDF
    Reader.

    There are several free base64 encoding/decoding libraries available on
    the net in a variety of languages. Pick up and try it out quickly.

    We have used the above process as mentioned and it works fine.
     
    Romin Irani, Mar 26, 2005
    #4
  5. Thx alot - fine description ;-)

    Rgds, Henrik

    "Romin Irani" <> wrote in message
    news:...
    > (Patrick TJ McPhee) wrote in message
    > news:<>...
    >> In article <TQT0e.107429$>,
    >> Verner Jensen, Ålborg <> wrote:
    >>
    >> % Is it possible to store a PDF doc, as part of an XML? Should the
    >> PDF-part be
    >> % encoded/wrapped or something, cause I can't figure out how the XML text
    >> % format is able to hold binary data?
    >>
    >> It's typical to use MIME base-64 encoding to encode binary data in XML
    >> files.

    >
    > Since the PDF file is a binary format -- you have to encode it in a
    > fashion that is compatible with text while inserting it into the XML
    > instance. As correctly mentioned here, you should be base64 encoding
    > for the same.
    >
    > The process would roughly be the following:
    > a) To encode the PDF
    > 1) Take the PDF content as bytes
    > 2) Run it through a program / method which goes something like:
    > PDFInBase64Bytes = convertToBase64(PDFBytes)
    > 3) Insert it into a XML instance after converting to string.
    > <MyXMLDoc>
    > <!-- other elements -->
    > <PDFSegment>Base64 representation of
    > PDF</PDFSegment>
    > </MyXMLDoc>
    > b) To decode the PDF
    > 1) Extract out the value of the XML element <PDFSegment>.
    > 2) Do the reverse i.e.
    > PDFBytes = decodeFromBase64(<PDFSegment> value...)
    > 3) Provide the PDFBytes to a PDF-aware application e.g. Adobe PDF
    > Reader.
    >
    > There are several free base64 encoding/decoding libraries available on
    > the net in a variety of languages. Pick up and try it out quickly.
    >
    > We have used the above process as mentioned and it works fine.
     
    Verner Jensen, Ålborg, Mar 26, 2005
    #5
  6. Peter Flynn

    dc Guest

    here's an example of an XML doc that contains a PNG image, base64-encoded.
    http://dinoch.dyndns.org:7070/WordML/source/WordML/10555.xml

    here's the JSP that generates it:
    http://dinoch.dyndns.org:7070/WordML/srcview.jsp?dir=WordML&file=GetOrderConfXsl.jsp

    you can actually run the JSP and load that XML into MS Word and see the
    result of the image.
    http://dinoch.dyndns.org:7070/WordML/GetOrderConfXsl.jsp
    (need MS-Word installed to do this)

    -D

    "Verner Jensen, Ålborg" <> wrote in message
    news:CLb1e.107527$...
    > Thx alot - fine description ;-)
    >
    > Rgds, Henrik
    >
    > "Romin Irani" <> wrote in message
    > news:...
    >> (Patrick TJ McPhee) wrote in message
    >> news:<>...
    >>> In article <TQT0e.107429$>,
    >>> Verner Jensen, Ålborg <> wrote:
    >>>
    >>> % Is it possible to store a PDF doc, as part of an XML? Should the
    >>> PDF-part be
    >>> % encoded/wrapped or something, cause I can't figure out how the XML
    >>> text
    >>> % format is able to hold binary data?
    >>>
    >>> It's typical to use MIME base-64 encoding to encode binary data in XML
    >>> files.

    >>
    >> Since the PDF file is a binary format -- you have to encode it in a
    >> fashion that is compatible with text while inserting it into the XML
    >> instance. As correctly mentioned here, you should be base64 encoding
    >> for the same.
    >>
    >> The process would roughly be the following:
    >> a) To encode the PDF
    >> 1) Take the PDF content as bytes
    >> 2) Run it through a program / method which goes something like:
    >> PDFInBase64Bytes = convertToBase64(PDFBytes)
    >> 3) Insert it into a XML instance after converting to string.
    >> <MyXMLDoc>
    >> <!-- other elements -->
    >> <PDFSegment>Base64 representation of
    >> PDF</PDFSegment>
    >> </MyXMLDoc>
    >> b) To decode the PDF
    >> 1) Extract out the value of the XML element <PDFSegment>.
    >> 2) Do the reverse i.e.
    >> PDFBytes = decodeFromBase64(<PDFSegment> value...)
    >> 3) Provide the PDFBytes to a PDF-aware application e.g. Adobe PDF
    >> Reader.
    >>
    >> There are several free base64 encoding/decoding libraries available on
    >> the net in a variety of languages. Pick up and try it out quickly.
    >>
    >> We have used the above process as mentioned and it works fine.

    >
    >
     
    dc, Mar 30, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. bill walton

    Combining PDF and XML files into PDF

    bill walton, Dec 1, 2006, in forum: Ruby
    Replies:
    1
    Views:
    125
    Jeremy Hinegardner
    Dec 3, 2006
  2. Ricardo Pog
    Replies:
    1
    Views:
    437
    Austin Ziegler
    Mar 26, 2008
  3. Sean Nakasone
    Replies:
    1
    Views:
    384
    Farrel Lifson
    Apr 14, 2008
Loading...

Share This Page