when will empty tags pass schema validation?

W

wolf_y

My question is simply: under what conditions will empty tags of the
form <MOM></MOM> pass schema validation? Of course, the mirror
question is: under what conditions will empty tags fail validation?
The former seems to be an easier question to answer.

XML files will arrive from around the world and must be schema
validated before further processing and loading into a database, so I'm
trying to foresee the various layouts that might be submitted. I can
anticipate suppliers starting with a template, filling in needed
elements, and sending the file with empty tags in conditional segments
with mandatory and conditional elements. I understand the role of
restrictions, but there are about a dozen record types, dozens of
segments, and hundreds of elements (some of which are sometimes
mandatory, sometimes conditional, and sometimes
conditionally-mandatory). One schema is 230 pages.

I already created a test file where a conditional segment had empty
tags and validation failed.

Thanks
 
J

Joe Kesselman

wolf_y said:
My question is simply: under what conditions will empty tags of the
form <MOM></MOM> pass schema validation?

Semantically identical to <MOM/>, in XML. Therefore, they will pass in
the same conditions where <MOM/> would pass: When the schema accepts
that tag and does not require that it have any content.
 
W

wolf_y

Thanks for answering, but maybe I should have led with my disclaimer:
I'm a newbie to XML, primarily program in SAS, and consulted online
documentation.

Some of my confusion stems from the way terms such as empty, missing,
null, and blank are used/handled in different languages. I don't mind
reading docs, but I can't find an answer I understand at
http://www.w3.org/ or url links I've found.

I don't want to create an empty element, but need to know under what
circumstances an empty element will pass schema checks, so that the
backend processing in SAS can react correctly when it's time to load
the data. There are 5 SAS programmers sharing responsibility for
writing the load routines and I was chosen to explain what to expect
after validation. There might be circumstances where an empty element
is allowed and others where we want to reject the file, both based on
the same element, depending upon the XML file provider or segment.

There are 4 levels of schema involved. Here's an example of an element
in the Level 3 schema:

<xs:element name="MOM">
<xs:annotation>
<xs:documentation>Mother</xs:documentation>
</xs:annotation>
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:minLength value="1"/>
<xs:maxLength value="25"/>
</xs:restriction>
</xs:simpleType>
</xs:element>

I understand that because of minLength this element must have at least
one character. In a simple test, whitespace <MOM> </MOM> passes (is
this a blank in XML?) whereas <MOM></MOM> doesn't (null or empty?). An
element defined with type=xs:integer fails in both circumstances.

Is there any type (or attribute?) where both <MOM></MOM> and <MOM>
</MOM> passes validation? Or must an element be explicitly defined as
permitting Empty(nil?) values? Or must I test each unique element?

I hope this makes sense.
 
J

Joseph Kesselman

wolf_y said:
Is there any type (or attribute?) where both <MOM></MOM> and <MOM>
</MOM> passes validation?

Sure. If minimum length had been zero (or had not been explicitly set)
for the xs:string example, both would pass.

It's really a matter of what that specific schema has said the datatype
is (which controls whether empty is syntactically acceptable) and what
additional constraints (which controls whether empty is semantically
acceptable for validation purposes).

Nillable is a different concept, having to do with the concept of
"explicitly has no meaningful value" rather than either "value is empty"
or "element was not present". It may make more sense to folks who've
worked with databases that support this idea.
 
W

wolf_y

It's really a matter of what that specific schema has said the datatype
is (which controls whether empty is syntactically acceptable) and what
additional constraints (which controls whether empty is semantically
acceptable for validation purposes).

You've helped by confirming my take on what I've read, and I'll
continue to reread W3C docs. Since element properties are derived and
there are so many elements, it looks like my safest strategy is to
generate test files under both scenarios and see what happens.
 
P

Peter Flynn

wolf_y said:
Thanks for answering, but maybe I should have led with my disclaimer:
I'm a newbie to XML, primarily program in SAS, and consulted online
documentation.

Some of my confusion stems from the way terms such as empty, missing,
null, and blank are used/handled in different languages. I don't mind
reading docs, but I can't find an answer I understand at
http://www.w3.org/ or url links I've found.

I don't want to create an empty element, but need to know under what
circumstances an empty element will pass schema checks,

I think the confusion arises from the two different meanings of the word.

a) EMPTY (in caps) is an XML keyword used to declare that a certain
element type can *never* have any content (neither character data
content nor other elements)

b) empty (in lowercase) is just an adjective meaning "with no content";
it doesn't specify whether content is permitted or not, it simply
says that there isn't any content at the moment.

An element type declared as EMPTY can be represented as <foo/> or as
<foo></foo>. The first is often recommended because it is unambiguous
and there is no possibility of anyone ever manually inserting any
content and thereby breaking the document model.

An element type declared *with* content *may* be empty on some
occasions (like this <name></name>) but that does not necessarily mean
that it was declared EMPTY: you'd have to consult the Schema or DTD
to find that out.

So an empty element like <name></name> will pass a validation check
either

a) if it was declared EMPTY, or
b) it was declared with optional content and just doesn't happen to
have any right now.

An element like <foo/> will only pass a validation check if it was
declared EMPTY.

(In both cases I am assuming there are no compulsory attributes.)

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top