XML Schema: inheritance with variable order of childs

S

Sven

Hi,

I want to define a XML schema for something similar to the following
XML data:

<TextItem>
<Name>Temperature</Name>
<Content>27°C</Content>
</TextItem>
<TextItem>
<Content>cloudy</Content>
<Name>Sky</Name>
</TextItem>

With a variable sequence order for Name and Content I can define this
as:

<xs:complexType name="TextItemType">
<xs:sequence maxOccurs="unbounded" minOccurs="1">
<xs:choice>
<xs:element name="Name" type="xs:string" />
<xs:element name="Content" type="xs:string" />
</xs:choice>
</xs:sequence>
</xs:complexType>

As I want define different Item types I define a base type Item and
derive the special content types:

<xs:complexType name="ItemType">
<xs:sequence>
<xs:element maxOccurs="1" minOccurs="1" name="Name"
type="xs:string" />
</xs:sequence>
</xs:complexType>

<xs:complexType name="TextItemType">
<xs:complexContent>
<xs:extension base="ItemType">
<xs:sequence>
<xs:element maxOccurs="1" minOccurs="1" name="Content"
type="xs:string" />
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>

But with this schema, the order of "Name" and "Content" is relevant.
Is there a way to define this in a way that this order is variable?

Thanks,
Sven Bauhan
 
P

Pavel Lepin

Sven said:
<TextItem>
<Name>Temperature</Name>
<Content>27°C</Content>
</TextItem>
<TextItem>
<Content>cloudy</Content>
<Name>Sky</Name>
</TextItem>

With a variable sequence order for Name and Content I can
define this as:

<xs:complexType name="TextItemType">
<xs:sequence maxOccurs="unbounded" minOccurs="1">
<xs:choice>
<xs:element name="Name" type="xs:string" />
<xs:element name="Content" type="xs:string" />
</xs:choice>
</xs:sequence>
</xs:complexType>

Which is too lax.
As I want define different Item types I define a base type
Item and derive the special content types:

<xs:complexType name="ItemType">
<xs:sequence>
<xs:element maxOccurs="1" minOccurs="1" name="Name"
type="xs:string" />
</xs:sequence>
</xs:complexType>

<xs:complexType name="TextItemType">
<xs:complexContent>
<xs:extension base="ItemType">
<xs:sequence>
<xs:element maxOccurs="1" minOccurs="1"
name="Content"
type="xs:string" />
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>

But with this schema, the order of "Name" and "Content" is
relevant. Is there a way to define this in a way that this
order is variable?

I believe the answer is no, but you'd have to hire a
language lawyer to quote chapter and verse on it. Anyway,
that's beside the point, because you cannot differentiate
element types based on element content. xsi:type is the
only way around it and it's, well, clunky. See newsgroup
archives, this is one of the most commonly asked questions
about XML Schemata.

The standard recommendations are:

1. Stop wanting that.
2. Use a more powerful schema definition language.
3. Validate on application side, not on parser side.
4. Design a well-structured document:

<temperature scale="celsius">27</temperature>
<sky>cloudy</sky>

Another interesting way of dealing with modestly crippled
XML documents is transforming them into something sane
using XSLT. Make the transformation scream and swear if it
runs into something that shouldn't be there, and you're
golden.
 
S

Sven

The standard recommendations are:

1. Stop wanting that.
Ok, then I have to deal with more complex code for the xml-export.
And the human editors of the xml files are more restricted.
2. Use a more powerful schema definition language.
What do you mean with this?
3. Validate on application side, not on parser side.
The application already parses the files, but for offline usage I
would like to have the possibility to check against a schema, perhaps
with a simple xmllint.
4. Design a well-structured document:

<temperature scale="celsius">27</temperature>
<sky>cloudy</sky>
The sample provided above is not the original document. It just a
simplified example to describe the problem.
I think I described a quite well-structured schema already :)
Another interesting way of dealing with modestly crippled
XML documents is transforming them into something sane
using XSLT. Make the transformation scream and swear if it
runs into something that shouldn't be there, and you're
golden.
The documents will be generated by my application, but can be modified
by a human user. The schema should be a guideline which modifications
are allowed.

Thanks,
Sven
 
J

Joe Kesselman

Sven said:
But with this schema, the order of "Name" and "Content" is relevant.
Is there a way to define this in a way that this order is variable?


Just a thought, which I haven't tested: How about starting with an empty
ItemType, then driving the specific versions from that, each with their
own independent definition of the content? Something like:

<xs:complexType name="ItemType">
</xs:complexType>

<xs:complexType name="TextItemType">
<xs:complexContent>
<xs:extension base="ItemType">
<xs:sequence maxOccurs="unbounded" minOccurs="1">
<xs:choice>
<xs:element name="Name" type="xs:string" />
<xs:element name="Content" type="xs:string" />
</xs:choice>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>

<xs:complexType name="OtherItemType">
<xs:complexContent>
<xs:extension base="ItemType">
<xs:sequence maxOccurs="unbounded" minOccurs="1">
<xs:choice>
<xs:element name="Name" type="xs:string" />
<xs:element name="Number" type="xs:integer" />
</xs:choice>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>

Of course this does have the problem that each kind of ItemType can have
multiple instances of its fields; if that's an issue, see past posts
here and in the XSLT FAQ for (somewhat painful) ways to overcome that.
And I think you'd have to use xsi:type to tell the validator which type
you intended this particular element to conform to.

BTW, the _good_ fixes are either to make folks provide the values in a
stereotyped order, unless the variability is actually necessary for your
application (any decent coder, and any human, ought to be able to follow
that simple set of instructions)... or to recognize that XML Schema
really isn't intended to capture every possible constraint, and to
impose some of them in documentation and in application-level tests.
 
U

usenet

...
<xs:complexType name="TextItemType">
<xs:sequence maxOccurs="unbounded" minOccurs="1">
<xs:choice>
<xs:element name="Name" type="xs:string" />
<xs:element name="Content" type="xs:string" />
</xs:choice>
</xs:sequence>
</xs:complexType>

In this simplistic case you could use xs:all; e.g.:

<xs:complexType name="TextItemType">
<xs:all>
<xs:element name="Name" type="xs:string" />
<xs:element name="Content" type="xs:string" />
</xs:all>
As I want define different Item types I define a base type Item and
derive the special content types:

Sadly, xs:all doesn't currently allow you to use derivation. If the
extra precision that xs:all offers appears valuable you might have to
decide between using xs:all and defining each type separately, or
using the derivation tree schema you present.

BTW - I think the xs:sequence part in your original schema snippet is
a bit out of place. I think you should be able to use the schema:

<xs:complexType name="TextItemType">
<xs:choice maxOccurs="unbounded" minOccurs="1">
<xs:element name="Name" type="xs:string" />
<xs:element name="Content" type="xs:string" />
</xs:choice>
</xs:complexType>

HTH,

Pete Cordell
Codalogic Ltd
for XML Schema to C++ data binding visit
http://www.codalogic.com/lmx/
 
P

Pavel Lepin

Sven said:
Ok, then I have to deal with more complex code for the
xml-export. And the human editors of the xml files are
more restricted.

If your editors are techies, it's perfectly possible to
explain to them that there are certain rules they should
observe. If they are not, they have no business editing raw
XML documents, and you should provide them with an
application-specific editor instead.

If your editors are not techies, and you let them edit raw
XML documents, and try to design a 'loose' schema for their
convenience, they'll break your system six ways till
Thursday. Of course, it's your blood pressure, so feel free
to jump off that particular cliff.
What do you mean with this?

I mean using a more powerful schema definition language.
Google "XML schema definition languages". There are four
commonly used ones (for some values of 'commonly'): DTDs,
W3C XML Schemata, RELAX NG and Schematron.

DTDs are a holdover from SGML, and the original preferred
method for XML document validation. They're somewhat
similar in expressive power to XML Schemata, but unlike
those use a separate syntax, are untyped and not
namespace-aware, which limits their usefulness.

RELAX NG is an oft used alternative to W3C's schemata. It is
more powerful in some areas, but reportedly lacking in
others. It's an ISO standard if memory serves, but not
recommended by W3C.

Schematron is a powerful rule-based constraint checking
language that addresses some of the XML Schema/RELAX NG
shortcomings. It is commonly used in combination with
either of them. It's not a W3C recommendation, but an ISO
standard as well.

Note that *all* of the schema definition languages are aimed
at providing validation for well-structured documents. The
looser your schema is, the more likely you'll need a
general-purpose language to check whether all the gizmos
are in place and all the thingamajigs bound together.
The application already parses the files, but for offline
usage I would like to have the possibility to check
against a schema, perhaps with a simple xmllint.

So decide what's more important to you: readily available
validation, or DWYM processing.
The sample provided above is not the original document. It
just a simplified example to describe the problem.
I think I described a quite well-structured schema already
:)

You didn't, and my example demonstrates why.

Firstly, in your schema, you're using TextItem elements for
heterogeneous data; therefore, you cannot determine the
expected content model of a TextItem element... without
*looking* at its content. That's bad, and that alone rules
out XML Schemata as your schema definition language.

Secondly, it's impossible to determine the expected content
of a Content element without looking at *surrounding*
content. For validation purposes, that's beyond bad; it's
godawful.

And thirdly, just to round up things nicely, in case of
temperature your Content element contains both the value
and the units used for that value. One might argue it's
workable. I would scream bloody murder the moment I saw
that in our documents.
The documents will be generated by my application, but can
be modified by a human user. The schema should be a
guideline which modifications are allowed.

Prepare for the world of fun. If you're worrying about your
users being unable to comprehend that you always want Name
element before Content element... you should be worrying
about well-formedness constraints inherent in any kind of
XML processing first. How do you expect them to understand
that XML is case-sensitive, that 'tags' must be properly
closed and nested, that certain characters are off-limits
and others should always be escaped as entities or
character references?
 
S

Sven

If your editors are techies, it's perfectly possible to
explain to them that there are certain rules they should
observe. If they are not, they have no business editing raw
XML documents, and you should provide them with an
application-specific editor instead.
Unfortunately they are no techies, just database administrators.
But they a common with handling complex data files like csv.

The idea for a database specific editor already came to me too.
But first I want to provide just a xml editor which can validate
against a schema.
Perhaps someone can give me a hint for a good one, especially platform
independent.
I normally use eclipse, but a more light weight one would be better.
If your editors are not techies, and you let them edit raw
XML documents, and try to design a 'loose' schema for their
convenience, they'll break your system six ways till
Thursday. Of course, it's your blood pressure, so feel free
to jump off that particular cliff.
I'll keep that simple. If the xml file cannot be parsed by my
application it will be refused.
Ok, that sounds interesting - but there is not so much time to
investigate these languages in detail.
You didn't, and my example demonstrates why.
Ok ok - I think I explained it wrong. The example has nothing to do
with my data. I just wanted to have a short description about what I
mean. Here is a short part of my schema - I did not want to post it,
because I use self defined complex types in it:

<xs:complexType name="ItemType" >
<xs:sequence>
<xs:element maxOccurs="1" type="ItemOperFlags" minOccurs="0"
name="ItemFlags" />
</xs:sequence>
<xs:attribute use="optional" type="xs:dateTime" name="validFrom" />
<xs:attribute use="optional" type="xs:dateTime" name="validUntil" />
</xs:complexType>

<xs:complexType name="NormalItemType">
<xs:complexContent>
<xs:extension base="ItemType">
<xs:sequence>
<xs:element type="xs:string" name="ItemContent" />
</xs:sequence>
<xs:attribute name="name" type="xs:string" use="required"></
xs:attribute>
</xs:extension>
</xs:complexContent>
</xs:complexType>

I know, that the type xs:string for ItemContent is not the best, but
that is given by the database structure.
Prepare for the world of fun. If you're worrying about your
users being unable to comprehend that you always want Name
element before Content element... you should be worrying
about well-formedness constraints inherent in any kind of
XML processing first. How do you expect them to understand
that XML is case-sensitive, that 'tags' must be properly
closed and nested, that certain characters are off-limits
and others should always be escaped as entities or
character references?
The xml restrictions to the files should be handled by using an xml
editor. But as the most xml editors are not very good in handling xml
schema validation, this should be quite simple.

Sven
 
S

Sven

In this simplistic case you could use xs:all; e.g.:
<xs:complexType name="TextItemType">
<xs:all>
<xs:element name="Name" type="xs:string" />
<xs:element name="Content" type="xs:string" />
</xs:all>
</xs:complexType>
I already wondered about xs:all but I did not understand it really
yet.
What is the intention of xs:all? Is it just the same like xs:sequence
with having no strict order? But in xs:all all elements have to be
present exactly one time, right? There is no possibility to define
optional or multiple elements?!

By the way, why is s sequence the main collection type? I can see no
reason for a strict order of the child elements anyway.
 
P

Pavel Lepin

Please don't remove the attributions, and leave enough
quoted material to provide context for your words. Google
Groups inserts attributions by default unless I'm much
mistaken. Most of use are reading ctx using newsreaders,
and it's far more convenient to have all the context in one
place, instead of shuffling through the whole thread to see
who said what and what you're referring to in your
follow-ups.

Sven said:
Unfortunately they are no techies, just database
administrators. But they a common with handling complex
data files like csv.

The idea for a database specific editor already came to me
too. But first I want to provide just a xml editor which
can validate against a schema. Perhaps someone can give me
a hint for a good one, especially platform independent.

Can't help you on this one, try asking for the input from
group regulars. Note that you'll have to do with a
restrictive schema in this case.
I'll keep that simple. If the xml file cannot be parsed by
my application it will be refused.

I know I sound like a broken gramophone, but - stick to a
restrictive schema if you can. There are simply to many
problems with loose schemata: needless complications in
your parser/application code, needless complications in
schema itself and escalation (users wanting more and more
freedom, eventually getting to the point where you start
considering implementing ExtraSensoryParser).
Ok ok - I think I explained it wrong. The example has
nothing to do with my data. I just wanted to have a short
description about what I mean. Here is a short part of my
schema - I did not want to post it, because I use self
defined complex types in it:

[snip type definition]

Correct me if I'm wrong, but you want to derive numerous
sub-types from ItemType, with different content models
depending solely on content of ItemFlags child element? If
so, the same warning still applies. You can only do that by
defining a generic element of ItemType in your schema and
specifying a sub-type using xsi:type attribute on those
elements in your document.
 
S

Sven

If your editors are not techies, and you let them edit
I know I sound like a broken gramophone, but - stick to a
restrictive schema if you can. There are simply to many
problems with loose schemata: needless complications in
your parser/application code, needless complications in
schema itself and escalation (users wanting more and more
freedom, eventually getting to the point where you start
considering implementing ExtraSensoryParser).
Yes, I agree with you to define a strict schema that matches the data
structure in memory.
But anyway I cannot see a good reason why there has to be a fixed
order of the elements in a sequence at all. When I have a class with
attributes or a database entry I don't care about their ordering also.
Even the attributes of a xml element are not ordered.
Ok ok - I think I explained it wrong. The example has
nothing to do with my data. I just wanted to have a short
description about what I mean. Here is a short part of my
schema - I did not want to post it, because I use self
defined complex types in it:

[snip type definition]

Correct me if I'm wrong, but you want to derive numerous
sub-types from ItemType, with different content models
depending solely on content of ItemFlags child element? If
so, the same warning still applies. You can only do that by
defining a generic element of ItemType in your schema and
specifying a sub-type using xsi:type attribute on those
elements in your document.
Perhaps my english is too bad, but I do not really understand what you
want to say.
Especially what you mean with "different content models
depending solely on content of ItemFlags child element".
ItemFlags is common in all derived types, so I define it in the base
type ItemType to reduce redundance.

Thanks, Sven
 
U

usenet

What is the intention of xs:all? Is it just the same like xs:sequence
with having no strict order? But in xs:all all elements have to be
present exactly one time, right? There is no possibility to define
optional or multiple elements?!

xs:all is similar to xs:sequence, but, as you say, does not impose a
strict order. Elements from an xs:all in an XML instance can occur 0
or 1 times, but _not_ 2 or more. Unlike like xs:sequence, xs:all does
not allow anonymous compositors, can't be used as an anonymous
compositor, and can't extend from a base or be extended.
By the way, why is s sequence the main collection type? I can see no
reason for a strict order of the child elements anyway.

The restrictions on xs:all were imposed because at the time it was
thought that the impact on a finite state machine implementing
validation would suffer combinatorial explosion type problems. I
imagine that xs:sequence is ordered for similar reasons. (Note that
it's now considered that there are better ways to validate xs:all than
implementing a finite state machine and future versions of schema are
likely to allow more relaxed forms of xs:all.)

HTH,

Pete Cordell
Codalogic
Visit http://www.codalogic.com/lmx/
for XML Schema to C++ data binding
 
P

Pavel Lepin

Sven said:
Yes, I agree with you to define a strict schema that
matches the data structure in memory.
But anyway I cannot see a good reason why there has to be
a fixed order of the elements in a sequence at all. When I
have a class with attributes or a database entry I don't
care about their ordering also.

But your *tools* probably do - where internal representation
is concerned.

pavel@debian:~/dev/c/st$ a st.c
#include <stdio.h>
struct A { int a ; float b ; } t = {1729 , 1729.0} ;
typedef struct B { float b ; int a ; } B ;
int main () { printf ("%d\n" , (* (B *) &t).a) ; }
pavel@debian:~/dev/c/st$
gcc -ansi -Wall -Wextra -pedantic -O2 st.c
st.c: In function ‘main’:
st.c:4: warning: dereferencing type-punned pointer will
break strict-aliasing rules
st.c:4: warning: control reaches end of non-void function
pavel@debian:~/dev/c/st$ ./a.out
1155014656
pavel@debian:~/dev/c/st$
Even the attributes of a xml element are not ordered.

Well, let me overexaggerate a bit for dramatic effect.
Imagine a 92GB XML document containing information about a
few hundred thousand entities, each having some fifteen
thousand properties:

<data>
<entity>
<prop1>1.356</prop1>
<prop2>Wonka wonka.</prop2>
<prop3>red</prop3>
<!-- 14,997 more properties here -->
</entity>
<!-- another 91.999... GB of data -->
</data>

Now, a DOM parser - or any similar approach - would be
horridly out of place if we wanted to invoke some doFoobar
behaviour on all of our entities. A streaming parser
(expat, or something using SAX API or somesuch) would be a
much better solution. Assuming we had a clearly defined
order of elements representing properties, the processing
would be simple, fast and straightforward. We *know* what
we expect next at any point in parsing. If the order
doesn't matter, then in the middle of processing of a given
entity we may expect any of the 15,000 of them... save some
random 6,000 we've already seen for this one. Clearly, this
isn't something unsolvable, or even all that hard to solve,
but it does come with an overhead all its own.

Now consider what free order of elements gives us.
User-friendliness? Not only that is disputable, but also,
despite all appearances, XML is not really meant to be
processed by wetware. *shrug* YMMV.
[snip type definition]

Correct me if I'm wrong, but you want to derive numerous
sub-types from ItemType, with different content models
depending solely on content of ItemFlags child element?
If so, the same warning still applies. You can only do
that by defining a generic element of ItemType in your
schema and specifying a sub-type using xsi:type attribute
on those elements in your document.

Perhaps my english is too bad, but I do not really
understand what you want to say.

Mine ain't any good either, so, duh.
Especially what you mean with "different content models
depending solely on content of ItemFlags child element".
ItemFlags is common in all derived types, so I define it
in the base type ItemType to reduce redundance.

I'll try to explain using your original example. Basically,
assuming you want to make sure your Content elements
contain something sensible depending on the Name element,
you cannot design a strict schema that would validate
something like:

<TextItem>
  <Name>Temperature</Name>
  <Content>27°C</Content>
</TextItem>
<TextItem>
  <Content>cloudy</Content>
  <Name>Sky</Name>
</TextItem>

....without validating:

<TextItem>
  <Name>Temperature</Name>
<Content>cloudy</Content>
</TextItem>
<TextItem>
  <Name>Sky</Name>
<Content>27°C</Content>
</TextItem>

as well. You *could* do the following:

<TextItem xsi:type="temperatureType">
<Name>Temperature</Name>
<Content>27°C</Content>
</TextItem>
<TextItem xsi:type="skyType">
<Content>cloudy</Content>
<Name>Sky</Name>
</TextItem>

But as I already said, that's quite clunky, and probably not
worth the trouble.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top