Whitespace in Canonicalized XML

Celedor · Dec 25, 2003

If I understand correctly, canonicalized XML is a simplified, or
rather, "standardized" form of XML. It is in such a form such that
two documents that are written in different ways, but contain the same
information, will normalize towards one form. This standard form can
then be used as the basis for encryption or digital verification (such
as XML Digital Signature).

If this is the case, then why is whitespace outside of any tags still
preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)

Isn't that whitespace only useful for formatting purposes (ie. so that
it will look pretty on your text viewer)? Or am I missing something
important?

Thank you for your reply...

Douglas A. Gwyn · Dec 25, 2003

Celedor said:
If this is the case, then why is whitespace outside of any tags still
preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)
Isn't that whitespace only useful for formatting purposes (ie. so that
it will look pretty on your text viewer)? Or am I missing something
important?

Anything that affects how the image will appear is obviously part of
the information.

Kenneth Stephen · Dec 29, 2003

Celedor said:
If I understand correctly, canonicalized XML is a simplified, or
rather, "standardized" form of XML. It is in such a form such that
two documents that are written in different ways, but contain the same
information, will normalize towards one form. This standard form can
then be used as the basis for encryption or digital verification (such
as XML Digital Signature).

If this is the case, then why is whitespace outside of any tags still
preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)

Hi,

The characteristics and properties of a "presentation" depend very much
on who / what the intended recipient is. In the case of XML, by design,
humans are not the only possible recipients. XML is intended to also convery
data to machines, and these machines should be capable to processing XML
without any ambiguity messing up the works. To accomplish this, XML has
defined a very simple rule : anything in "tags" is XML markup, and
everything else is data.

If you look at the XML spec, you can see that there are different XML
node types defined. One of them is the text node. Consider the example below
:

<a>This is a text node
<ThisIsAnElementNode x="this is an attribute node">This is also a text
node</ThisIsAnElementNode></a>

This is perfectly valid XML. There are no assumptions that you can make
in general about the content of the text nodes. They may be completely
whitespace, or not, and only the recieving application / entity can tell you
if the whitespace is significant. When writing a spec, obviously, the
general case is what needs to be catered to, and hence, pure whitespace text
nodes cannot be "normalized" away.

That being said, the "xml:space" attribute exists to help normalization
of pure whitespace nodes. When the XML / higher-level application processor
(example XSL processor) encounters xml:space, it may or may not normalize -
it depends on the application.

Regards,
Kenneth

Peter Flynn · Jan 24, 2004

Celedor said:
If I understand correctly, canonicalized XML is a simplified, or
rather, "standardized" form of XML. It is in such a form such that
two documents that are written in different ways, but contain the same
information, will normalize towards one form. This standard form can
then be used as the basis for encryption or digital verification (such
as XML Digital Signature).

If this is the case, then why is whitespace outside of any tags still
preserved? (See Example 3.2 of the W3C Canonical XML Recommendation)

Isn't that whitespace only useful for formatting purposes (ie. so that
it will look pretty on your text viewer)? Or am I missing something
important?

Only if you have a DTD or Schema that tells you where PCDATA is allowed.

Without one, you have to assume character data can occur anywhere, which
makes *all* white-space significant.

///Peter

Digital Signature field form in PDF generated document from HTML	5	Nov 16, 2022
XML in XMPP	8	Jul 6, 2012
Whitespace-preservating Search & Replace in multiple XML documents	3	Jul 18, 2005
A Look At The Advantages and Drawbacks of XML	13	Jan 22, 2013
Liquid Technologies Unvei Liquid XML Studio 2013	0	Mar 20, 2013
I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023
Syncro Soft Announces New Release of Oxygen XML Editor version 14.2	0	Feb 14, 2013
Questions about character entities in XML and PCI security compliance	7	Aug 7, 2008

Whitespace in Canonicalized XML

Celedor

Douglas A. Gwyn

Kenneth Stephen

Peter Flynn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads