Re: Space between ending and starting tag not ignorable in a XMLdocument? ...

Discussion in 'XML' started by Joe Kesselman, Jul 11, 2011.

  1. The question isn't just where the space occurs, but whether the language
    designer has said the space might be meaningful (and whether the
    document has been validated against that definition).

    The term "ignorable whitespace" is, unfortunately, misleading. What they
    really should have called it is "whitespace in element-only content".
    That is, even if the DTD says an Element's content can only be other
    elements, whitespace-only text nodes will be permitted since people
    wanted to be able to use them to make documents more human-readable.
    Those text nodes, because they aren't part of the official description
    of the language (as per the DTD) should not be considered semantically
    meaningful, and thus should be "ignored" by most applications which
    process the document. However, some applications -- such as those which
    want to retain the formatting of the document -- would NOT want to
    ignore them. So the parser passes them along, but flags them as being a
    special cases so you can decide what you want to do with them.

    A text node which contains *any* non-whitespace text would violate the
    promise of element-only content, and thus is not Ignorable.

    Similarly, a text node which appears in a place where text or mixed
    content is expected may be semantically meaningful even if it contains
    only whitespace, and is not Ignorable at the XML level. Of course the
    application may decide that whitespace can be discarded in some of these
    cases, but that's a decision the parser can't make for it.

    And if the document has not been validated against the DTD, the parser
    doesn't know which case applies, and thus has to assume that all text
    content, whitespace or not, might be meaningful... so without a DTD, no
    whitespace will ever be considered Ignorable.

    If you're using schemas rather than, or in addition to, DTDs... The same
    concepts apply for schemas, but the whitespace-in-element-content
    information appears in the Post-Schema-Validation Infoset PSVI) rather
    than in the basic SAX/DOM ignorable bit. I'm less than delighted about
    the fact that some pieces of similar information are presented in two
    different places, but that's what we have to work with.


    I hope that clarifies things a bit. "Whitespace in Element Content." If
    you remember that phrase and what it means, the behavior makes a lot
    more sense.


    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
     
    Joe Kesselman, Jul 11, 2011
    #1
    1. Advertising

  2. Mon, 11 Jul 2011 16:53:20 -0400, /Joe Kesselman/:

    > If you're using schemas rather than, or in addition to, DTDs... The
    > same concepts apply for schemas, but the
    > whitespace-in-element-content information appears in the
    > Post-Schema-Validation Infoset PSVI) rather than in the basic
    > SAX/DOM ignorable bit. I'm less than delighted about the fact that
    > some pieces of similar information are presented in two different
    > places, but that's what we have to work with.
    >
    > I hope that clarifies things a bit. "Whitespace in Element Content."
    > If you remember that phrase and what it means, the behavior makes a
    > lot more sense.


    Even people which understand the "whitespace in element content"
    might be confused by the fact it doesn't get reported the same way
    with a DTD applied and native XML parsing, and then with XML Schema
    applied, so I think this is the more important thing to try to
    understand.

    --
    Stanimir
     
    Stanimir Stamenkov, Jul 11, 2011
    #2
    1. Advertising

  3. BTW, I've been corrected: If the document specifies a DTD, and the DTD
    is loaded, the parser should try to recognize
    whitespace-in-element-content even when validation isn't being performed.

    (Most of what I do either doesn't have DTDs available or needs to
    preserve the whitespace, so I'd forgotten that detail.)


    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
     
    Joe Kesselman, Jul 12, 2011
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shoval Tomer
    Replies:
    0
    Views:
    446
    Shoval Tomer
    Jul 9, 2003
  2. Bob
    Replies:
    0
    Views:
    411
  3. Mayeul
    Replies:
    0
    Views:
    981
    Mayeul
    Jul 11, 2011
  4. Mayeul
    Replies:
    0
    Views:
    972
    Mayeul
    Jul 12, 2011
  5. Joe Kesselman
    Replies:
    0
    Views:
    759
    Joe Kesselman
    Jul 12, 2011
Loading...

Share This Page