Whitespace in Canonicalized XML

Discussion in 'XML' started by Celedor, Dec 25, 2003.

  1. Celedor

    Celedor Guest

    If I understand correctly, canonicalized XML is a simplified, or
    rather, "standardized" form of XML. It is in such a form such that
    two documents that are written in different ways, but contain the same
    information, will normalize towards one form. This standard form can
    then be used as the basis for encryption or digital verification (such
    as XML Digital Signature).

    If this is the case, then why is whitespace outside of any tags still
    preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)

    Isn't that whitespace only useful for formatting purposes (ie. so that
    it will look pretty on your text viewer)? Or am I missing something
    important?

    Thank you for your reply...
     
    Celedor, Dec 25, 2003
    #1
    1. Advertising

  2. "Celedor" <> wrote...
    > If this is the case, then why is whitespace outside of any tags still
    > preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)
    > Isn't that whitespace only useful for formatting purposes (ie. so that
    > it will look pretty on your text viewer)? Or am I missing something
    > important?


    Anything that affects how the image will appear is obviously part of
    the information.
     
    Douglas A. Gwyn, Dec 25, 2003
    #2
    1. Advertising

  3. "Celedor" <> wrote in message
    news:...
    > If I understand correctly, canonicalized XML is a simplified, or
    > rather, "standardized" form of XML. It is in such a form such that
    > two documents that are written in different ways, but contain the same
    > information, will normalize towards one form. This standard form can
    > then be used as the basis for encryption or digital verification (such
    > as XML Digital Signature).
    >
    > If this is the case, then why is whitespace outside of any tags still
    > preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)
    >

    Hi,

    The characteristics and properties of a "presentation" depend very much
    on who / what the intended recipient is. In the case of XML, by design,
    humans are not the only possible recipients. XML is intended to also convery
    data to machines, and these machines should be capable to processing XML
    without any ambiguity messing up the works. To accomplish this, XML has
    defined a very simple rule : anything in "tags" is XML markup, and
    everything else is data.

    If you look at the XML spec, you can see that there are different XML
    node types defined. One of them is the text node. Consider the example below
    :

    <a>This is a text node
    <ThisIsAnElementNode x="this is an attribute node">This is also a text
    node</ThisIsAnElementNode></a>

    This is perfectly valid XML. There are no assumptions that you can make
    in general about the content of the text nodes. They may be completely
    whitespace, or not, and only the recieving application / entity can tell you
    if the whitespace is significant. When writing a spec, obviously, the
    general case is what needs to be catered to, and hence, pure whitespace text
    nodes cannot be "normalized" away.

    That being said, the "xml:space" attribute exists to help normalization
    of pure whitespace nodes. When the XML / higher-level application processor
    (example XSL processor) encounters xml:space, it may or may not normalize -
    it depends on the application.

    Regards,
    Kenneth
     
    Kenneth Stephen, Dec 29, 2003
    #3
  4. Celedor

    Peter Flynn Guest

    Celedor wrote:
    > If I understand correctly, canonicalized XML is a simplified, or
    > rather, "standardized" form of XML. It is in such a form such that
    > two documents that are written in different ways, but contain the same
    > information, will normalize towards one form. This standard form can
    > then be used as the basis for encryption or digital verification (such
    > as XML Digital Signature).
    >
    > If this is the case, then why is whitespace outside of any tags still
    > preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)
    >
    > Isn't that whitespace only useful for formatting purposes (ie. so that
    > it will look pretty on your text viewer)? Or am I missing something
    > important?


    Only if you have a DTD or Schema that tells you where PCDATA is allowed.

    Without one, you have to assume character data can occur anywhere, which
    makes *all* white-space significant.

    ///Peter
     
    Peter Flynn, Jan 24, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Oli Filth
    Replies:
    9
    Views:
    3,361
    Uncle Pirate
    Jan 17, 2005
  2. kaens
    Replies:
    6
    Views:
    345
    Stefan Behnel
    May 23, 2007
  3. Replies:
    10
    Views:
    786
    Eric Brunel
    Dec 16, 2008
  4. MRAB
    Replies:
    3
    Views:
    401
  5. Dan Cuddeford

    Creating a canonicalized url

    Dan Cuddeford, Jan 24, 2008, in forum: Ruby
    Replies:
    9
    Views:
    128
    Jörg W Mittag
    Jan 26, 2008
Loading...

Share This Page