SAX - is there an equivalent to the DOM .nodeTypedValue for reading the whole node data at once?

Discussion in 'XML' started by jimmyfishbean@yahoo.co.uk, Sep 9, 2005.

  1. Guest

    Hi,

    I am using VB6, SAX (implementing IVBSAXContentHandler).

    I need to extract binary encoded data (images) from large XML files and
    decode this data and generate the appropriate images onto disk. My XML
    files have the following structure:

    <?xml version="1.0" encoding="utf-8" ?>
    <imagepla xmlns:dt="urn:schemas-microsoft-com:datatypes">
    <attachment>
    <primary_id>28899</primary_id>
    <filename>userguide3.pdf</filename>
    <file
    dt:dt="bin.base64">JVBERi0xLjMNJeLjz9MNCjU5NTAgMCBvYmoNPDwgDS9MaW5lYXJpemVkIDEgDS9PIDU5NTMgDS9I
    IFsgMTM4OSAzODY0IF0gDS9MIDUwNTEyOTggDS9FIDEwMTQ3NCANL04gMTUzIA0vVCA0OTMyMTc4
    .........
    ...................
    </file>
    </attachment>
    <attachment>
    ......
    ......
    </attachment>
    </imagepla>

    The encoded data (in the <file> element) neds to be extracted and then
    decoded. I am trying to use SAX but I cannot read the whole of the
    <file> element data at once (i.e. using DOM I would use
    DOMDoc.nodeTypedValue). I understand that the DOM loads the whole
    document into memory therefore the nodeTypedValue can be used.

    I am using the following extract of code:

    Dim strTmp as String
    Dim byArr() as Byte

    Private Sub IVBSAXContentHandler_characters(text As String)
    ...
    strTmp = strTmp & text
    ...
    btArr = strTmp
    Open strAttFile For Binary As #1
    Put #1, 1, btArr
    Close #1
    ...
    End Sub

    The problem is that only 1 line at a time of the <file> node data is
    passed to this sub. Therefore I need to reconstruct the whole of the
    binary data for the image in a temp variable (strTmp), before I
    determine the end of the file and then write it to disk.

    This takes a vast amount of time (i.e. 20 minutes to try and decode a
    4MB image). The XML file will contain 100s of images, so really the
    current way of processing is no good at all.


    Is there a way to read the whole of the data from the <file> node in
    one go?
    Also, I will be extracting the binary data and then use DOM to rewrite
    the XML file without the binary data (so the user has a copy of the
    original XML file - but a much smaller one since no binary in it).
    Should I use DOM or SAXReader/SAXWriter?

    Greatly appreciated. Thanks.

    Jimmy
    , Sep 9, 2005
    #1
    1. Advertising

  2. wrote:
    : Hi,

    : I am using VB6, SAX (implementing IVBSAXContentHandler).

    : I need to extract binary encoded data (images) from large XML files and
    : decode this data and generate the appropriate images onto disk. My XML
    : files have the following structure:

    : <?xml version="1.0" encoding="utf-8" ?>
    : <imagepla xmlns:dt="urn:schemas-microsoft-com:datatypes">
    : <attachment>
    : <primary_id>28899</primary_id>
    : <filename>userguide3.pdf</filename>
    : <file
    : dt:dt="bin.base64">JVBERi0xLjMNJeLjz9MNCjU5NTAgMCBvYmoNPDwgDS9MaW5lYXJpemVkIDEgDS9PIDU5NTMgDS9I
    : IFsgMTM4OSAzODY0IF0gDS9MIDUwNTEyOTggDS9FIDEwMTQ3NCANL04gMTUzIA0vVCA0OTMyMTc4
    : ........
    : ..................
    : </file>
    : </attachment>
    : <attachment>
    : ......
    : ......
    : </attachment>
    : </imagepla>

    : The encoded data (in the <file> element) neds to be extracted and then
    : decoded. I am trying to use SAX but I cannot read the whole of the
    : <file> element data at once (i.e. using DOM I would use
    : DOMDoc.nodeTypedValue). I understand that the DOM loads the whole
    : document into memory therefore the nodeTypedValue can be used.

    : I am using the following extract of code:

    : Dim strTmp as String
    : Dim byArr() as Byte

    : Private Sub IVBSAXContentHandler_characters(text As String)
    : ...
    : strTmp = strTmp & text
    : ...
    : btArr = strTmp
    : Open strAttFile For Binary As #1
    : Put #1, 1, btArr
    : Close #1
    : ...
    : End Sub

    : The problem is that only 1 line at a time of the <file> node data is
    : passed to this sub. Therefore I need to reconstruct the whole of the
    : binary data for the image in a temp variable (strTmp), before I
    : determine the end of the file and then write it to disk.

    : This takes a vast amount of time (i.e. 20 minutes to try and decode a
    : 4MB image). The XML file will contain 100s of images, so really the
    : current way of processing is no good at all.


    : Is there a way to read the whole of the data from the <file> node in
    : one go?

    In SAX in general you cannot ever be sure to read the whole of the
    character data at once, though there is a slim chance that the sax module
    you have available in VB has an option to do that (I have no idea, I
    wouldn't count on it).

    But why do you need to read the whole thing into memory? Base64 can be
    decoded on the fly. Each sequence of four characters gives you three
    bytes of data. Read a chunk, decode multiples of four characters at one
    go and write them out. You may have to worry about the last few bytes
    that have to hold over from one read to the next to get a multiple of
    four.

    And where is the slow down? I suspect that the string concatenation is to
    blame. VB may be allocating a longer string each time and then copying
    all the existing data plus the appended data into it. If you keep doing
    that for an eventually large string it could get very slow. Can you
    preallocate a much larger string and use substr to push the data into that
    single large string. (VB substr, is that right?
    substr(the_line,offset,len) = data_to_insert, something like that.)


    : Also, I will be extracting the binary data and then use DOM to rewrite
    : the XML file without the binary data (so the user has a copy of the
    : original XML file - but a much smaller one since no binary in it).
    : Should I use DOM or SAXReader/SAXWriter?

    If you are not changing anything else in the xml except removing the
    file data (and possibly replacing that one tag) then I would think it
    easiest use a sax approach. As you read the data you also spool it back
    out, except that one tag. I suppose a SAXWriter would help do that.


    $0.10

    --

    This programmer available for rent.
    Malcolm Dew-Jones, Sep 9, 2005
    #2
    1. Advertising

  3. kryptomoon Guest

    wrote:
    > Hi,
    >
    > I am using VB6, SAX (implementing IVBSAXContentHandler).
    >
    > I need to extract binary encoded data (images) from large XML files and
    > decode this data and generate the appropriate images onto disk. My XML
    > files have the following structure:
    >
    > <?xml version="1.0" encoding="utf-8" ?>
    > <imagepla xmlns:dt="urn:schemas-microsoft-com:datatypes">
    > <attachment>
    > <primary_id>28899</primary_id>
    > <filename>userguide3.pdf</filename>
    > <file
    > dt:dt="bin.base64">JVBERi0xLjMNJeLjz9MNCjU5NTAgMCBvYmoNPDwgDS9MaW5lYXJpemVkIDEgDS9PIDU5NTMgDS9I
    > IFsgMTM4OSAzODY0IF0gDS9MIDUwNTEyOTggDS9FIDEwMTQ3NCANL04gMTUzIA0vVCA0OTMyMTc4
    > ........
    > ..................
    > </file>
    > </attachment>
    > <attachment>
    > ......
    > ......
    > </attachment>
    > </imagepla>
    >
    > The encoded data (in the <file> element) neds to be extracted and then
    > decoded. I am trying to use SAX but I cannot read the whole of the
    > <file> element data at once (i.e. using DOM I would use
    > DOMDoc.nodeTypedValue). I understand that the DOM loads the whole
    > document into memory therefore the nodeTypedValue can be used.
    >
    > I am using the following extract of code:
    >
    > Dim strTmp as String
    > Dim byArr() as Byte
    >
    > Private Sub IVBSAXContentHandler_characters(text As String)
    > ...
    > strTmp = strTmp & text
    > ...
    > btArr = strTmp
    > Open strAttFile For Binary As #1
    > Put #1, 1, btArr
    > Close #1
    > ...
    > End Sub
    >
    > The problem is that only 1 line at a time of the <file> node data is
    > passed to this sub. Therefore I need to reconstruct the whole of the
    > binary data for the image in a temp variable (strTmp), before I
    > determine the end of the file and then write it to disk.
    >
    > This takes a vast amount of time (i.e. 20 minutes to try and decode a
    > 4MB image). The XML file will contain 100s of images, so really the
    > current way of processing is no good at all.
    >
    >
    > Is there a way to read the whole of the data from the <file> node in
    > one go?
    > Also, I will be extracting the binary data and then use DOM to rewrite
    > the XML file without the binary data (so the user has a copy of the
    > original XML file - but a much smaller one since no binary in it).
    > Should I use DOM or SAXReader/SAXWriter?
    >
    > Greatly appreciated. Thanks.
    >
    > Jimmy


    Try NOT to open/close the file on each "characters" event.
    kryptomoon, Sep 13, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    1,415
  2. Replies:
    0
    Views:
    365
  3. Tjerk Wolterink
    Replies:
    2
    Views:
    1,407
    Dimitre Novatchev
    Aug 24, 2006
  4. Adam David Moss

    ServerXMLHTTP and nodeValue versus nodeTypedValue

    Adam David Moss, Jun 24, 2008, in forum: ASP General
    Replies:
    7
    Views:
    568
    Adam David Moss
    Jun 26, 2008
  5. Steve
    Replies:
    6
    Views:
    235
    Thomas 'PointedEars' Lahn
    Mar 5, 2008
Loading...

Share This Page