when will empty tags pass schema validation?

Discussion in 'XML' started by wolf_y, May 4, 2006.

  1. wolf_y

    wolf_y Guest

    My question is simply: under what conditions will empty tags of the
    form <MOM></MOM> pass schema validation? Of course, the mirror
    question is: under what conditions will empty tags fail validation?
    The former seems to be an easier question to answer.

    XML files will arrive from around the world and must be schema
    validated before further processing and loading into a database, so I'm
    trying to foresee the various layouts that might be submitted. I can
    anticipate suppliers starting with a template, filling in needed
    elements, and sending the file with empty tags in conditional segments
    with mandatory and conditional elements. I understand the role of
    restrictions, but there are about a dozen record types, dozens of
    segments, and hundreds of elements (some of which are sometimes
    mandatory, sometimes conditional, and sometimes
    conditionally-mandatory). One schema is 230 pages.

    I already created a test file where a conditional segment had empty
    tags and validation failed.

    Thanks
    wolf_y, May 4, 2006
    #1
    1. Advertising

  2. wolf_y wrote:
    > My question is simply: under what conditions will empty tags of the
    > form <MOM></MOM> pass schema validation?


    Semantically identical to <MOM/>, in XML. Therefore, they will pass in
    the same conditions where <MOM/> would pass: When the schema accepts
    that tag and does not require that it have any content.



    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, May 4, 2006
    #2
    1. Advertising

  3. wolf_y

    wolf_y Guest

    Thanks for answering, but maybe I should have led with my disclaimer:
    I'm a newbie to XML, primarily program in SAS, and consulted online
    documentation.

    Some of my confusion stems from the way terms such as empty, missing,
    null, and blank are used/handled in different languages. I don't mind
    reading docs, but I can't find an answer I understand at
    http://www.w3.org/ or url links I've found.

    I don't want to create an empty element, but need to know under what
    circumstances an empty element will pass schema checks, so that the
    backend processing in SAS can react correctly when it's time to load
    the data. There are 5 SAS programmers sharing responsibility for
    writing the load routines and I was chosen to explain what to expect
    after validation. There might be circumstances where an empty element
    is allowed and others where we want to reject the file, both based on
    the same element, depending upon the XML file provider or segment.

    There are 4 levels of schema involved. Here's an example of an element
    in the Level 3 schema:

    <xs:element name="MOM">
    <xs:annotation>
    <xs:documentation>Mother</xs:documentation>
    </xs:annotation>
    <xs:simpleType>
    <xs:restriction base="xs:string">
    <xs:minLength value="1"/>
    <xs:maxLength value="25"/>
    </xs:restriction>
    </xs:simpleType>
    </xs:element>

    I understand that because of minLength this element must have at least
    one character. In a simple test, whitespace <MOM> </MOM> passes (is
    this a blank in XML?) whereas <MOM></MOM> doesn't (null or empty?). An
    element defined with type=xs:integer fails in both circumstances.

    Is there any type (or attribute?) where both <MOM></MOM> and <MOM>
    </MOM> passes validation? Or must an element be explicitly defined as
    permitting Empty(nil?) values? Or must I test each unique element?

    I hope this makes sense.
    wolf_y, May 4, 2006
    #3
  4. wolf_y wrote:
    > Is there any type (or attribute?) where both <MOM></MOM> and <MOM>
    > </MOM> passes validation?


    Sure. If minimum length had been zero (or had not been explicitly set)
    for the xs:string example, both would pass.

    It's really a matter of what that specific schema has said the datatype
    is (which controls whether empty is syntactically acceptable) and what
    additional constraints (which controls whether empty is semantically
    acceptable for validation purposes).

    Nillable is a different concept, having to do with the concept of
    "explicitly has no meaningful value" rather than either "value is empty"
    or "element was not present". It may make more sense to folks who've
    worked with databases that support this idea.

    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
    Joseph Kesselman, May 4, 2006
    #4
  5. wolf_y

    wolf_y Guest

    > It's really a matter of what that specific schema has said the datatype
    > is (which controls whether empty is syntactically acceptable) and what
    > additional constraints (which controls whether empty is semantically
    > acceptable for validation purposes).


    You've helped by confirming my take on what I've read, and I'll
    continue to reread W3C docs. Since element properties are derived and
    there are so many elements, it looks like my safest strategy is to
    generate test files under both scenarios and see what happens.
    wolf_y, May 5, 2006
    #5
  6. wolf_y

    Peter Flynn Guest

    wolf_y wrote:
    > Thanks for answering, but maybe I should have led with my disclaimer:
    > I'm a newbie to XML, primarily program in SAS, and consulted online
    > documentation.
    >
    > Some of my confusion stems from the way terms such as empty, missing,
    > null, and blank are used/handled in different languages. I don't mind
    > reading docs, but I can't find an answer I understand at
    > http://www.w3.org/ or url links I've found.
    >
    > I don't want to create an empty element, but need to know under what
    > circumstances an empty element will pass schema checks,


    I think the confusion arises from the two different meanings of the word.

    a) EMPTY (in caps) is an XML keyword used to declare that a certain
    element type can *never* have any content (neither character data
    content nor other elements)

    b) empty (in lowercase) is just an adjective meaning "with no content";
    it doesn't specify whether content is permitted or not, it simply
    says that there isn't any content at the moment.

    An element type declared as EMPTY can be represented as <foo/> or as
    <foo></foo>. The first is often recommended because it is unambiguous
    and there is no possibility of anyone ever manually inserting any
    content and thereby breaking the document model.

    An element type declared *with* content *may* be empty on some
    occasions (like this <name></name>) but that does not necessarily mean
    that it was declared EMPTY: you'd have to consult the Schema or DTD
    to find that out.

    So an empty element like <name></name> will pass a validation check
    either

    a) if it was declared EMPTY, or
    b) it was declared with optional content and just doesn't happen to
    have any right now.

    An element like <foo/> will only pass a validation check if it was
    declared EMPTY.

    (In both cases I am assuming there are no compulsory attributes.)

    ///Peter
    --
    XML FAQ: http://xml.silmaril.ie/
    Peter Flynn, May 5, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page