converting org.w3c.dom.Element to String *without* losing whitespace

Discussion in 'XML' started by Adam Funk, Jan 18, 2010.

  1. Adam Funk

    Adam Funk Guest

    I have a web-service client that gets an org.w3c.dom.Element out of
    the service's response. (The service contains an application that
    produces an XML document and returns its root element as the "answer"
    in the response.) I need to turn that Element into a String to
    display in the client GUI and to save as a file on the client machine.

    This Element has the xml:space="preserve" attribute (added by the
    application in the service).

    So far I have tried two things:

    javax.xml.transform.Source domSource = new DOMSource(xmlElement);
    StringWriter sw = new StringWriter();
    javax.xml.transform.stream.StreamResult streamResult = new StreamResult(sw);
    javax.xml.transform.Transformer identityTransform = transformerFactory.newTransformer();
    identityTransform.transform(domSource, streamResult);
    return sw.toString();

    org.jdom.output.XMLOutputter outputter = new XMLOutputter(Format.getRawFormat());
    org.jdom.input.DOMBuilder builder = new DOMBuilder();
    return outputter.outputString(builder.build(xmlElement));

    Both approaches delete a lot of whitespace inside the root element,
    but I don't want this to happen. (The root element still has the
    xml:space="preserve" attribute.)

    I've had a look around on the web and found only pages telling you how
    to get rid of whitespace, but not how to force it to stay. This one
    [1] says that preserving whitespace is the default anyway.

    I'd appreciate any debugging suggestions or alternative approaches.

    Thanks,
    Adam


    [1]
    http://www.xml.com/pub/a/2001/11/07/whitespace.html
     
    Adam Funk, Jan 18, 2010
    #1
    1. Advertisements

  2. I haven't used the jdom stuff -- I've always considered the arguments in
    its favor to be pretty much without merit. But the identity transform
    *should* be preserving whitespace everywhere except, possibly, in the
    area _preceding_ the root element; I don't see any immediately obvious
    problems with that code.

    If you're using the XSLT processor that ships with the Sun JVM, that may
    be a fairly ancient version of Xalan, with some known bugs. So the first
    thing I'd try would be to upgrade to a current copy of Xalan-j, from
    Apache, and see if the problem persists.
     
    Joe Kesselman, Jan 18, 2010
    #2
    1. Advertisements

  3. Adam Funk

    Adam Funk Guest

    Since my OP, I've also tried this, with the same result:

    org.apache.xml.serialize.OutputFormat format = new OutputFormat();
    StringWriter sw = new StringWriter ();
    org.apache.xml.serialize.XMLSerializer serial = new XMLSerializer (sw, format);
    serial.serialize(x);
    return sw.toString();
     
    Adam Funk, Jan 18, 2010
    #3
  4. Adam Funk

    Adam Funk Guest

    Just to be sure, I've tested this using the TCPMonitor proxy in axis
    1.4. The SOAP response from the server definitely includes the
    desired whitespace. The problem in particular is with spaces between
    elements inside an element that is supposed to consist of PCDATA and
    elements, for example:

    #v+
    <TextWithNodes><Node id="0"/> <Node id="1"/>Internationalisation<Node id="21"/> <Node id="22"/>vertical<Node id="30"/> <Node id="31"/>stream<Node id="37"/>:<Node id="38"/> <Node id="39"/>INT<Node id="42"/> <Node id="43"/>VS<Node id="45"/> <Node id="46"/>UPDATE<Node id="52"/>
    #v-

    But my WS client is displaying and saving the XML as follows:

    #v+
    <TextWithNodes><Node id="0"/><Node id="1"/>Internationalisation<Node id="21"/><Node id="22"/>vertical<Node id="30"/><Node id="31"/>stream<Node id="37"/>:<Node id="38"/><Node id="39"/>INT<Node id="42"/><Node id="43"/>VS<Node id="45"/><Node id="46"/>UPDATE<Node id="52"/>
    #v-
     
    Adam Funk, Jan 19, 2010
    #4
  5. Adam Funk

    Adam Funk Guest

    Do you know of any other ways to turn an org.w3c.dom.Element into a
    String and a File? (I'm only aware of the three I've tried so far.)

    Thanks, that makes sense. But now I've thrown in xalan-2.7.1.jar,
    serializer-2.7.1.jar, and the most up-to-date xercesImpl.jar (the one
    I was using was not the latest), but I'm still getting the lost space.
    Do you have any other ideas?
     
    Adam Funk, Jan 19, 2010
    #5
  6. Are you sure the whitespace shown above is present as text nodes in the
    DOM model you have? I don't think there is anything wrong with the code
    you use to serialize, it is more likely that you don't serialize a DOM
    having that whitespace present as text nodes.
     
    Martin Honnen, Jan 19, 2010
    #6
  7. Adam Funk

    Adam Funk Guest

    How would I verify that? I know that the XML coming back over HTTP
    has the correct whitespace, but I don't know how to check the
    org.w3c.dom.Element inside the client --- except by serializing it.

    Hmm. The application inside the service creates the XML and
    serializes it to String on the fly using StAX --- is there a good way
    to reverse the process?
     
    Adam Funk, Jan 19, 2010
    #7
  8. Look at the child nodes your DOM Element has e.g.
    NodeList children = element.getChildNodes();
    for (int i = 0; i < children.getLength(); i++)
    {
    System.out.println(children.item(i).getNodeType());
    }
    Element nodes have node type 1, text nodes 3 I think.
     
    Martin Honnen, Jan 19, 2010
    #8
  9. Adam Funk

    Adam Funk Guest

    Bingo! I did something like that (well, a bit more complicated) to
    iterate through the gubbins of the org.w3c.dom.Element and print bits
    out: the text nodes with whitespace are missing in there. Thanks very
    much for that good piece of debugging advice.

    The problem must be in the JAXB stuff that reads the SOAP response
    (which has the whitespace) and produces the Element. I'm not sure how
    to fix that, but at least this narrows the problem down quite a lot.
     
    Adam Funk, Jan 19, 2010
    #9
  10. Thanks, that makes sense. But now I've thrown in xalan-2.7.1.jar,
    I still find this surprising.

    If you have a Level 3 DOM implementation, the DOM itself may support the
    optional load and save operations. I believe Xerces does, though they
    actually run through Xalan's serializer these days so whatever's hitting
    you might not be cured. Worth a try...

    http://www.w3.org/TR/2004/REC-DOM-Level-3-LS-20040407/

    If that too fails for you, then I'd start to suspect that the problem
    isn't where you think it is.

    I haven't had time to run a sanity-check on my own machine, to see if
    there's something obvious you might be missing. I'll try to do so this week.

    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
     
    Joe Kesselman, Jan 19, 2010
    #10
  11. Much more believable than that multiple serializers were all wrong.
    <smile/> It's always worth either checking the DOM tree, or building
    your own DOM (local parser, or by using the DOM API) and checking
    whether that goes through cleanly.

    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
     
    Joe Kesselman, Jan 19, 2010
    #11
  12. Adam Funk

    Adam Funk Guest

    When the third one sank into the swamp, I did start to wonder...
    Now I just have to figure out how to make the JAXB library behave
    .... watch this space.
     
    Adam Funk, Jan 21, 2010
    #12
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.