It seems that XPath does not distinguish between an inexistent pathand a null string?

Discussion in 'XML' started by Ramon F Herrera, Jun 12, 2012.

  1. Some experimentation indicates that if I have an XML field that ends
    like these:

    <someword></someword>

    or

    <someword />

    An XPath query will not return anything different from those and
    completely weird inexistent path.

    Is that the case? How am I supposed to save an empty string??

    -Ramon
     
    Ramon F Herrera, Jun 12, 2012
    #1
    1. Advertising

  2. Re: It seems that XPath does not distinguish between an inexistent path and a null string?

    Ramon F Herrera <> writes:

    > Some experimentation indicates that if I have an XML field that ends
    > like these:
    >
    > <someword></someword>
    >
    > or
    >
    > <someword />


    These two are exactly equivalent.

    > An XPath query will not return anything different from those and
    > completely weird inexistent path.


    Why should it? There is no node inside the someword element, the same as
    there is no node inside an inexistent path.

    > Is that the case? How am I supposed to save an empty string??


    As you do above. If you want to test whether an element is empty, use
    /path/to/the/element[count(node()) = 0] (absolutely no node, whatever
    type -- if you want "no child element" use count(*), "no child and no
    attribute" use count(*|@*), "no comment or pi" use
    count(comment()|processing-instruction()), etc.)

    -- Alain.
     
    Alain Ketterlin, Jun 12, 2012
    #2
    1. Advertising

  3. Re: It seems that XPath does not distinguish between an inexistent path and a null string?

    * Ramon F Herrera wrote in comp.text.xml:
    >Some experimentation indicates that if I have an XML field that ends
    >like these:
    >
    ><someword></someword>
    >
    >or
    >
    ><someword />
    >
    >An XPath query will not return anything different from those and
    >completely weird inexistent path.
    >
    >Is that the case? How am I supposed to save an empty string??


    The XPath data model does not capture the difference between the two
    expressions, much like other software might not capture the difference
    between "1.000" and "1.0000". You could, in theory, use the difference
    between the two instances to indicate an "empty string", but many XML
    tools would not be able to tell them apart, so you would either have to
    constrain yourself to a specific set of tools (that might break with
    the next upgrade) or decide on some alternative representation, say, to
    omit the element if the value is an empty string.
    --
    Björn Höhrmann · mailto: · http://bjoern.hoehrmann.de
    Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
    25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
     
    Bjoern Hoehrmann, Jun 13, 2012
    #3
  4. Re: It seems that XPath does not distinguish between an inexistentpath and a null string?

    On 6/12/2012 11:41 AM, Ramon F Herrera wrote:
    > <someword></someword>
    > or
    > <someword />


    Per the definition of the XML Data Model, there is no difference between
    the two.

    > Is that the case? How am I supposed to save an empty string??


    If you need to distinguish between null and empty, XML Schema uses an
    attribute to flag the former. (See the description of nillable values.)
    I'd recommend adopting that approach.

    Or simply not having <someword/> present at all...?


    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
     
    Joe Kesselman, Jun 13, 2012
    #4
  5. Re: It seems that XPath does not distinguish between an inexistentpath and a null string?

    On 12/06/2012 17:41, Ramon F Herrera wrote:
    >
    > Some experimentation indicates that if I have an XML field that ends
    > like these:
    >
    > <someword></someword>
    >
    > or
    >
    > <someword />
    >
    > An XPath query will not return anything different from those and
    > completely weird inexistent path.


    If you convert to text, XPath does not distinguish between empty text
    and inexistent path. Indeed.

    > Is that the case? How am I supposed to save an empty string??


    Don't convert to text, or don't convert to text yet. Get a list of
    elements first.
    If the list is empty, the path does not exist. If the list contains
    something, the path exist. Get your text from there.

    --
    Mayeul
     
    mayeul.marguet, Jun 13, 2012
    #5
  6. Ramon F Herrera

    japisoft Guest

    Re: It seems that XPath does not distinguish between an inexistent path and a null string?

    Well,

    When your document is parsed, the final result has no difference. But from
    the schema/DTD view, there's a
    difference because

    <!ELEMENT somework EMPTY>

    and

    <!ELEMENT somework (#PCDATA)>

    are not equal, so write :

    <someword></someword>

    is equal to

    <someword/>

    is only true if the content type is (#PCDATA) or

    <xs:element name="somework" type="xs:string"/> from a W3C Schema.

    Best wishes,

    A.Brillant
    http://www.editix.com - XML Editor


    "Ramon F Herrera" a écrit dans le message de groupe de discussion :
    ...


    Some experimentation indicates that if I have an XML field that ends
    like these:

    <someword></someword>

    or

    <someword />

    An XPath query will not return anything different from those and
    completely weird inexistent path.

    Is that the case? How am I supposed to save an empty string??

    -Ramon
     
    japisoft, Jun 14, 2012
    #6
  7. Re: It seems that XPath does not distinguish between an inexistent path and a null string?

    "japisoft" <> writes:

    > When your document is parsed, the final result has no difference. But
    > from the schema/DTD view, there's a
    > difference because
    >
    > <!ELEMENT somework EMPTY>
    >
    > and
    >
    > <!ELEMENT somework (#PCDATA)>
    >
    > are not equal, so write :
    >
    > <someword></someword>
    >
    > is equal to
    >
    > <someword/>
    >
    > is only true if the content type is (#PCDATA) or


    Do you have a source for this claim? I think it's wrong. (Not the fact
    that #PCDATA and EMPTY are different, the fact that there is a
    difference between the two forms of empty elements.)

    The XML recommandation says: "If an element is empty, it must be
    represented either by a start-tag immediately followed by an end-tag or
    by an empty-element tag." (Section 3.1)

    The only difference I know of is mentioned in the next paragraph: "For
    interoperability, the empty-element tag must be used, and can only be
    used, for elements which are declared EMPTY." And "For interoperability"
    is defined earlier as non-binding.

    -- Alain.
     
    Alain Ketterlin, Jun 14, 2012
    #7
  8. Re: It seems that XPath does not distinguish between an inexistent path and a null string?

    In article <4fd9aa33$0$1706$>,
    japisoft <> wrote:

    ><someword></someword>
    >
    >is equal to
    >
    ><someword/>
    >
    >is only true if the content type is (#PCDATA) or
    >
    ><xs:element name="somework" type="xs:string"/> from a W3C Schema.


    No, there is no significant difference - they are just two ways of
    writing the same XML document. They both represent an element called
    "someword" with no children. <x/> is just shorthand for <x></x>.

    -- Richard
     
    Richard Tobin, Jun 14, 2012
    #8
  9. Ramon F Herrera

    japisoft Guest

    Re: It seems that XPath does not distinguish between an inexistent path and a null string?

    > Well This is just supposition, because if you want to serialize your XML :
    >
    > If the output is <myNode/> rather than <myNode></myNode> this is quite
    > wrong if this is EMPTY. So I figure the parser should keep a difference
    > info
    > between the two versions for the output.
    >
    > <myNode></myNode> => <myNode/>
    >
    > BUT
    >
    > <myNode> != <myNode></myNode>
    >
    > because <myNode></myNode> means this is empty But it could have text
    > content. I traduce
    >
    >
    > <myNode></myNode> => with empty text
    >
    > <myNode/> => with nothing


    "Alain Ketterlin" a écrit dans le message de groupe de discussion :
    -strasbg.fr...

    "japisoft" <> writes:

    > When your document is parsed, the final result has no difference. But
    > from the schema/DTD view, there's a
    > difference because
    >
    > <!ELEMENT somework EMPTY>
    >
    > and
    >
    > <!ELEMENT somework (#PCDATA)>
    >
    > are not equal, so write :
    >
    > <someword></someword>
    >
    > is equal to
    >
    > <someword/>
    >
    > is only true if the content type is (#PCDATA) or


    Do you have a source for this claim? I think it's wrong. (Not the fact
    that #PCDATA and EMPTY are different, the fact that there is a
    difference between the two forms of empty elements.)

    The XML recommandation says: "If an element is empty, it must be
    represented either by a start-tag immediately followed by an end-tag or
    by an empty-element tag." (Section 3.1)

    The only difference I know of is mentioned in the next paragraph: "For
    interoperability, the empty-element tag must be used, and can only be
    used, for elements which are declared EMPTY." And "For interoperability"
    is defined earlier as non-binding.

    -- Alain.
     
    japisoft, Jun 14, 2012
    #9
  10. Re: It seems that XPath does not distinguish between an inexistentpath and a null string?

    El 14/06/2012 12:04, Alain Ketterlin escribió:
    > "japisoft"<> writes:
    >
    >> When your document is parsed, the final result has no difference. But
    >> from the schema/DTD view, there's a
    >> difference because
    >>
    >> <!ELEMENT somework EMPTY>
    >>
    >> and
    >>
    >> <!ELEMENT somework (#PCDATA)>
    >>
    >> are not equal, so write :
    >>
    >> <someword></someword>
    >>
    >> is equal to
    >>
    >> <someword/>
    >>
    >> is only true if the content type is (#PCDATA) or

    >
    > Do you have a source for this claim? I think it's wrong. (Not the fact
    > that #PCDATA and EMPTY are different, the fact that there is a
    > difference between the two forms of empty elements.)
    >
    > The XML recommandation says: "If an element is empty, it must be
    > represented either by a start-tag immediately followed by an end-tag or
    > by an empty-element tag." (Section 3.1)
    >
    > The only difference I know of is mentioned in the next paragraph: "For
    > interoperability, the empty-element tag must be used, and can only be
    > used, for elements which are declared EMPTY." And "For interoperability"
    > is defined earlier as non-binding.


    Yes, at the end of Section 1.2. And it means "... chances that XML
    documents can be processed by the existing installed base of SGML
    processors ...".

    I.e., there is no preference at all (<someword></someword> vs.
    <someword/>) if the XML document is intended to be processed by an XML
    processor.

    --
    Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
     
    Manuel Collado, Jun 14, 2012
    #10
  11. Re: It seems that XPath does not distinguish between an inexistentpath and a null string?

    > I.e., there is no preference at all (<someword></someword> vs.
    > <someword/>) if the XML document is intended to be processed by an XML
    > processor.


    Speaking as someone who has been working with XML for over a decade now,
    and who was involved in the design of the DOM: Exactly. XML considers
    this *ONLY* a syntactic difference, with no semantic meaning.
    <someword/> is just shorthand for <someword></someword>.

    If you need to distinguish between empty and null, see the other thread:
    Either use an attribute to distinguish the two cases, or leave the
    element out entirely.


    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
     
    Joe Kesselman, Jun 14, 2012
    #11
  12. Ramon F Herrera

    Peter Flynn Guest

    Re: It seems that XPath does not distinguish between an inexistentpath and a null string?

    On 14/06/12 11:04, Alain Ketterlin wrote:
    > "japisoft" <> writes:
    >
    >> When your document is parsed, the final result has no difference. But
    >> from the schema/DTD view, there's a
    >> difference because
    >>
    >> <!ELEMENT somework EMPTY>
    >>
    >> and
    >>
    >> <!ELEMENT somework (#PCDATA)>
    >>
    >> are not equal, so write :
    >>
    >> <someword></someword>
    >>
    >> is equal to
    >>
    >> <someword/>
    >>
    >> is only true if the content type is (#PCDATA) or

    >
    > Do you have a source for this claim?


    It's been a long time, but I think you'll find the discussion in the
    archives of the XML SIG at the time. I seem to remember we did it to
    death and the consensus was that XML should not distinguish between the
    two forms.

    > I think it's wrong. (Not the fact
    > that #PCDATA and EMPTY are different, the fact that there is a
    > difference between the two forms of empty elements.)


    Officially there is no difference. In practice, if your parser has
    access to a DTD or Schema, it would be possible for it to detect if
    content was allowed or not.

    > The XML recommandation says: "If an element is empty, it must be
    > represented either by a start-tag immediately followed by an end-tag or
    > by an empty-element tag." (Section 3.1)
    >
    > The only difference I know of is mentioned in the next paragraph: "For
    > interoperability, the empty-element tag must be used, and can only be
    > used, for elements which are declared EMPTY." And "For interoperability"
    > is defined earlier as non-binding.


    It is still regarded as good practice in the publishing field, where the
    semantics of potential mixed content can be important. It also serves as
    a reminder to those who examine the markup that the element type cannot
    have any content, rather than the current element just being empty by
    chance.

    ///Peter
     
    Peter Flynn, Jun 27, 2012
    #12
  13. Re: It seems that XPath does not distinguish between an inexistentpath and a null string?

    On 6/27/2012 3:54 PM, Peter Flynn wrote:
    > Officially there is no difference


    Correct. If you doubt this, look at the W3C's XML Infoset document (the
    official statement of what information an XML document represents),
    which has absolutely no way to represent this distinction. Whether
    <foo></foo> or <foo/>, the infoset represents it as an element with no
    child nodes.

    There are stylistic conventions which may cause one or the other to be
    preferred by specific applications -- "for interoperability" with SGML
    tools being one of those, though realistically there is now enough XML
    tooling out there that very few people are still trying to put XML
    through SGML tools. But that's strictly style, not substance.

    If you want to distinguish empty vs. null, XML lets you do so by adding
    an attribute, or a child element, that your application recognizes as
    signifying that the content is null. XML Schema suggests an attribute
    for that purpose. Adopt it.

    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
     
    Joe Kesselman, Jun 28, 2012
    #13
  14. Re: It seems that XPath does not distinguish between an inexistentpath and a null string?

    El 28/06/2012 6:23, Joe Kesselman escribió:
    > On 6/27/2012 3:54 PM, Peter Flynn wrote:
    >> Officially there is no difference

    >
    > Correct. If you doubt this, look at the W3C's XML Infoset document (the
    > official statement of what information an XML document represents),
    > which has absolutely no way to represent this distinction. Whether
    > <foo></foo> or <foo/>, the infoset represents it as an element with no
    > child nodes.
    >
    > There are stylistic conventions which may cause one or the other to be
    > preferred by specific applications -- "for interoperability" with SGML
    > tools being one of those, though realistically there is now enough XML
    > tooling out there that very few people are still trying to put XML
    > through SGML tools. But that's strictly style, not substance.


    Just a minor hint: IIRC, if interoperability with SGML tools matters,
    the preferred notation for an empty element is <foo />. Note the space
    before the slash.

    >
    > If you want to distinguish empty vs. null, XML lets you do so by adding
    > an attribute, or a child element, that your application recognizes as
    > signifying that the content is null. XML Schema suggests an attribute
    > for that purpose. Adopt it.
    >

    --
    Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
     
    Manuel Collado, Jun 28, 2012
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Anna
    Replies:
    0
    Views:
    529
  2. gogomei
    Replies:
    3
    Views:
    386
    Thomas Matthews
    Sep 2, 2003
  3. amit
    Replies:
    4
    Views:
    352
    Keith Thompson
    Aug 11, 2006
  4. Arndt Jonasson
    Replies:
    6
    Views:
    541
    Dimitre Novatchev
    Sep 7, 2008
  5. Ricardo Amorim

    Inexistent instance variable value

    Ricardo Amorim, Aug 4, 2009, in forum: Ruby
    Replies:
    1
    Views:
    121
    Wisccal Wisccal
    Aug 6, 2009
Loading...

Share This Page