It seems that XPath does not distinguish between an inexistent pathand a null string?

Ramon F Herrera · Jun 12, 2012

Some experimentation indicates that if I have an XML field that ends
like these:

<someword></someword>

or

<someword />

An XPath query will not return anything different from those and
completely weird inexistent path.

Is that the case? How am I supposed to save an empty string??

-Ramon

Alain Ketterlin · Jun 12, 2012

Ramon F Herrera said:
Some experimentation indicates that if I have an XML field that ends
like these:

<someword></someword>

or

<someword />

These two are exactly equivalent.

An XPath query will not return anything different from those and
completely weird inexistent path.

Why should it? There is no node inside the someword element, the same as
there is no node inside an inexistent path.

Is that the case? How am I supposed to save an empty string??

As you do above. If you want to test whether an element is empty, use
/path/to/the/element[count(node()) = 0] (absolutely no node, whatever
type -- if you want "no child element" use count(*), "no child and no
attribute" use count(*|@*), "no comment or pi" use
count(comment()|processing-instruction()), etc.)

-- Alain.

Bjoern Hoehrmann · Jun 12, 2012

* Ramon F Herrera wrote in comp.text.xml:

Some experimentation indicates that if I have an XML field that ends
like these:

<someword></someword>

or

<someword />

An XPath query will not return anything different from those and
completely weird inexistent path.

Is that the case? How am I supposed to save an empty string??

The XPath data model does not capture the difference between the two
expressions, much like other software might not capture the difference
between "1.000" and "1.0000". You could, in theory, use the difference
between the two instances to indicate an "empty string", but many XML
tools would not be able to tell them apart, so you would either have to
constrain yourself to a specific set of tools (that might break with
the next upgrade) or decide on some alternative representation, say, to
omit the element if the value is an empty string.

Joe Kesselman · Jun 12, 2012

<someword></someword>
or
<someword />

Per the definition of the XML Data Model, there is no difference between
the two.

Is that the case? How am I supposed to save an empty string??

If you need to distinguish between null and empty, XML Schema uses an
attribute to flag the former. (See the description of nillable values.)
I'd recommend adopting that approach.

Or simply not having <someword/> present at all...?

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."

mayeul.marguet · Jun 13, 2012

Some experimentation indicates that if I have an XML field that ends
like these:

<someword></someword>

or

<someword />

An XPath query will not return anything different from those and
completely weird inexistent path.

If you convert to text, XPath does not distinguish between empty text
and inexistent path. Indeed.

Is that the case? How am I supposed to save an empty string??

Don't convert to text, or don't convert to text yet. Get a list of
elements first.
If the list is empty, the path does not exist. If the list contains
something, the path exist. Get your text from there.

japisoft · Jun 14, 2012

Well,

When your document is parsed, the final result has no difference. But from
the schema/DTD view, there's a
difference because

<!ELEMENT somework EMPTY>

and

<!ELEMENT somework (#PCDATA)>

are not equal, so write :

<someword></someword>

is equal to

<someword/>

is only true if the content type is (#PCDATA) or

<xs:element name="somework" type="xs:string"/> from a W3C Schema.

Best wishes,

A.Brillant
http://www.editix.com - XML Editor

"Ramon F Herrera" a écrit dans le message de groupe de discussion :
(e-mail address removed)...

Some experimentation indicates that if I have an XML field that ends
like these:

<someword></someword>

or

<someword />

An XPath query will not return anything different from those and
completely weird inexistent path.

Is that the case? How am I supposed to save an empty string??

-Ramon

Alain Ketterlin · Jun 14, 2012

japisoft said:
When your document is parsed, the final result has no difference. But
from the schema/DTD view, there's a
difference because

<!ELEMENT somework EMPTY>

and

<!ELEMENT somework (#PCDATA)>

are not equal, so write :

<someword></someword>

is equal to

<someword/>

is only true if the content type is (#PCDATA) or

Do you have a source for this claim? I think it's wrong. (Not the fact
that #PCDATA and EMPTY are different, the fact that there is a
difference between the two forms of empty elements.)

The XML recommandation says: "If an element is empty, it must be
represented either by a start-tag immediately followed by an end-tag or
by an empty-element tag." (Section 3.1)

The only difference I know of is mentioned in the next paragraph: "For
interoperability, the empty-element tag must be used, and can only be
used, for elements which are declared EMPTY." And "For interoperability"
is defined earlier as non-binding.

-- Alain.

Richard Tobin · Jun 14, 2012

japisoft said:
<someword></someword>

is equal to

<someword/>

is only true if the content type is (#PCDATA) or

<xs:element name="somework" type="xs:string"/> from a W3C Schema.

No, there is no significant difference - they are just two ways of
writing the same XML document. They both represent an element called
"someword" with no children. <x/> is just shorthand for <x></x>.

-- Richard

japisoft · Jun 14, 2012

Well This is just supposition, because if you want to serialize your XML :

If the output is <myNode/> rather than <myNode></myNode> this is quite
wrong if this is EMPTY. So I figure the parser should keep a difference
info
between the two versions for the output.

<myNode></myNode> => <myNode/>

BUT

<myNode> != <myNode></myNode>

because <myNode></myNode> means this is empty But it could have text
content. I traduce

<myNode></myNode> => with empty text

<myNode/> => with nothing

"Alain Ketterlin" a écrit dans le message de groupe de discussion :
(e-mail address removed)-strasbg.fr...

japisoft said:
When your document is parsed, the final result has no difference. But
from the schema/DTD view, there's a
difference because

<!ELEMENT somework EMPTY>

and

<!ELEMENT somework (#PCDATA)>

are not equal, so write :

<someword></someword>

is equal to

<someword/>

is only true if the content type is (#PCDATA) or

Do you have a source for this claim? I think it's wrong. (Not the fact
that #PCDATA and EMPTY are different, the fact that there is a
difference between the two forms of empty elements.)

The XML recommandation says: "If an element is empty, it must be
represented either by a start-tag immediately followed by an end-tag or
by an empty-element tag." (Section 3.1)

The only difference I know of is mentioned in the next paragraph: "For
interoperability, the empty-element tag must be used, and can only be
used, for elements which are declared EMPTY." And "For interoperability"
is defined earlier as non-binding.

-- Alain.

Manuel Collado · Jun 14, 2012

El 14/06/2012 12:04, Alain Ketterlin escribió:

Do you have a source for this claim? I think it's wrong. (Not the fact
that #PCDATA and EMPTY are different, the fact that there is a
difference between the two forms of empty elements.)

The XML recommandation says: "If an element is empty, it must be
represented either by a start-tag immediately followed by an end-tag or
by an empty-element tag." (Section 3.1)

The only difference I know of is mentioned in the next paragraph: "For
interoperability, the empty-element tag must be used, and can only be
used, for elements which are declared EMPTY." And "For interoperability"
is defined earlier as non-binding.

Yes, at the end of Section 1.2. And it means "... chances that XML
documents can be processed by the existing installed base of SGML
processors ...".

I.e., there is no preference at all (<someword></someword> vs.
<someword/>) if the XML document is intended to be processed by an XML
processor.

Joe Kesselman · Jun 14, 2012

I.e. said:
<someword/>) if the XML document is intended to be processed by an XML
processor.

Speaking as someone who has been working with XML for over a decade now,
and who was involved in the design of the DOM: Exactly. XML considers
this *ONLY* a syntactic difference, with no semantic meaning.
<someword/> is just shorthand for <someword></someword>.

If you need to distinguish between empty and null, see the other thread:
Either use an attribute to distinguish the two cases, or leave the
element out entirely.

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."

Peter Flynn · Jun 27, 2012

Do you have a source for this claim?

It's been a long time, but I think you'll find the discussion in the
archives of the XML SIG at the time. I seem to remember we did it to
death and the consensus was that XML should not distinguish between the
two forms.

I think it's wrong. (Not the fact
that #PCDATA and EMPTY are different, the fact that there is a
difference between the two forms of empty elements.)

Officially there is no difference. In practice, if your parser has
access to a DTD or Schema, it would be possible for it to detect if
content was allowed or not.

The XML recommandation says: "If an element is empty, it must be
represented either by a start-tag immediately followed by an end-tag or
by an empty-element tag." (Section 3.1)

The only difference I know of is mentioned in the next paragraph: "For
interoperability, the empty-element tag must be used, and can only be
used, for elements which are declared EMPTY." And "For interoperability"
is defined earlier as non-binding.

It is still regarded as good practice in the publishing field, where the
semantics of potential mixed content can be important. It also serves as
a reminder to those who examine the markup that the element type cannot
have any content, rather than the current element just being empty by
chance.

///Peter

Joe Kesselman · Jun 28, 2012

Officially there is no difference

Correct. If you doubt this, look at the W3C's XML Infoset document (the
official statement of what information an XML document represents),
which has absolutely no way to represent this distinction. Whether
<foo></foo> or <foo/>, the infoset represents it as an element with no
child nodes.

There are stylistic conventions which may cause one or the other to be
preferred by specific applications -- "for interoperability" with SGML
tools being one of those, though realistically there is now enough XML
tooling out there that very few people are still trying to put XML
through SGML tools. But that's strictly style, not substance.

If you want to distinguish empty vs. null, XML lets you do so by adding
an attribute, or a child element, that your application recognizes as
signifying that the content is null. XML Schema suggests an attribute
for that purpose. Adopt it.

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."

Manuel Collado · Jun 28, 2012

El 28/06/2012 6:23, Joe Kesselman escribió:

Correct. If you doubt this, look at the W3C's XML Infoset document (the
official statement of what information an XML document represents),
which has absolutely no way to represent this distinction. Whether
<foo></foo> or <foo/>, the infoset represents it as an element with no
child nodes.

There are stylistic conventions which may cause one or the other to be
preferred by specific applications -- "for interoperability" with SGML
tools being one of those, though realistically there is now enough XML
tooling out there that very few people are still trying to put XML
through SGML tools. But that's strictly style, not substance.

Just a minor hint: IIRC, if interoperability with SGML tools matters,

A website that I couldn't make a screenshot of it nor save any page from.	1	Oct 29, 2023
Distinguish between empty string and no children, in XPath 2?	6	Sep 5, 2008
How do I save information from an GUI into a XML-file?	0	Aug 17, 2022
xpath: predicate to choose only elements that match a one of a list of values	5	Jul 2, 2008
An attempt was made to reference a token that does not exist	0	Mar 28, 2007
Problems to find out code changes that seems not to lead into a bug but does ?	2	Apr 11, 2004
Solution to: dropdownlist has a SelectedValue which is invalidbecause it does not exist in the list	1	Mar 22, 2008
[ANN] assert2-0.4.6 provides assert_xhtml, an alternative to assert_select	0	Mar 26, 2009

It seems that XPath does not distinguish between an inexistent pathand a null string?

Ramon F Herrera

Alain Ketterlin

Bjoern Hoehrmann

Joe Kesselman

mayeul.marguet

japisoft

Alain Ketterlin

Richard Tobin

japisoft

Manuel Collado

Joe Kesselman

Peter Flynn

Joe Kesselman

Manuel Collado

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads