Parsing XML schema- variable attributes

Mike · Sep 17, 2008

I'm trying to parse an XML file using the standard Java XML parsing
libraries. However, I'm running into issues with variables attributes.
How do I define an XML schema with variable attributes? Or is the
issue with the parsing? Let me give you an example:

File #1
----------
<catalog>
<item color="red" color="blue"/ >
</catalog>

File #2
----------
<catalog>
<item color="red" color="white" "blue" / >
</catalog>

File #3
---------
<catalog>
<item color="purple" / >
</catalog>

What schema would define this? Or is this not possible? I'm getting a
validation error when I validate these files. Is there a way to
accomplish what I'm trying to do without defining different
attributes: color1, color2,...colorN.

Wojtek · Sep 18, 2008

Mike wrote :

I'm trying to parse an XML file using the standard Java XML parsing
libraries. However, I'm running into issues with variables attributes.
How do I define an XML schema with variable attributes? Or is the
issue with the parsing? Let me give you an example:

File #1

<catalog>
<item>
<color>red</color>
<color>blue</color>
</item>
</catalog>

and so on

Maverick · Sep 18, 2008

<catalog>
<item color="red" color="blue" />
</catalog>
This is not a well formed xml, cause u cant have more than one attribute with the same name.
Here are a few pointer wrt xml attributes...
attributes cannot contain multiple values (elements can)
attributes cannot contain tree structures (elements can)

Mike · Sep 19, 2008

Mike wrote :

<catalog>
<item>
<color>red</color>
<color>blue</color>
</item>
</catalog>

and so on

So, there's absolutely no way to parse attributes with same name?

Mike Schilling · Sep 19, 2008

Mike said:
So, there's absolutely no way to parse attributes with same name?

It is illegal for an element to have two attributes with the same name.

Stefan Ram · Sep 19, 2008

Mike said:
So, there's absolutely no way to parse attributes with same name?

Not in XML.

But you can have multiple IDREFs per attribute value in XML.

http://www.w3.org/TR/2000/REC-xml-20001006.html#idref

(see there for »IDREFS«)

I have specified and implemented a data language »Unotal« that
directly handles multiple values with the same attribute name
indeed. For example:

import java.lang.String;
import java.lang.System;
import de.dclj.ram.notation.unotal.Room;
import static de.dclj.ram.notation.unotal.RoomFromModule.room;

public final class Main
{ public static void main( final String argv[] )
{ System.out.println( room( "< a=b >" ).get( "a" ));
System.out.println( room( "< a=b >" ).get( "a" ).getClass() );

System.out.println( room( "< a=b a=c >" ).get( "a" ));
System.out.println( room( "< a=b a=c >" ).get( "a" ).getClass() );

System.out.println( room( "< >" ).getValues( "a" ));
System.out.println( room( "< >" ).getValues( "a" ).getClass() );

System.out.println( room( "< a=b >" ).getValues( "a" ));
System.out.println( room( "< a=b >" ).getValues( "a" ).getClass() );

System.out.println( room( "< a=b a=b >" ).getValues( "a" ));
System.out.println( room( "< a=b a=b >" ).getValues( "a" ).getClass() );

System.out.println( room( "< a=b a=c >" ).getValues( "a" ));
System.out.println( room( "< a=b a=c >" ).getValues( "a" ).getClass() ); }}

System.out

b
class de.dclj.ram.notation.unotal.StringValue

[b, c]
class de.dclj.ram.notation.unotal.SprayValue

[]
class de.dclj.ram.notation.unotal.SprayValue

class java.util.HashSet

class java.util.HashSet

[b, c]
class java.util.HashSet

For more about this:

http://www.purl.org/stefan_ram/pub/junotal_tutorial

I have written an XML criticism, but this has not yet incorporated
the possibility to use an IDREFS attribute in XML. Still the rest
of it is valid: (The rest of this post is my XML criticism.)

When a new document type is to be defined, when should one
choose child elements and when attributes?

The criterion that makes sense regarding the meaning can not
be used in XML due to syntactic restrictions.

An element is describing something. A description is an
assertion. An assertion might contain unary predicates or
binary relations.

Comparing this structure of assertions with the structure
of XML, it seems to be natural to represent unary predicates
with types and binary relations with attributes.

Say, "x" is a rose and belongs to Jack. This assertion can
be written in a more formal way to show the relations used:

rose( x ) ^ owner( x, Jack )

This is written in XML as:

<rose owner="Jack" />

Thus, my answer would be: use element types for unary
predicates and attributes for binary relations.

Unfortunately, in XML, this is not always possible, because
in XML:

- there might be at most one type per element,

- there might be at most one attribute value per attribute
name, and

- attribute values are not allowed to be structured in
XML.

Therefore, the designers of XML document types are forced to
abuse element /types/ in order to describe the /relation/
of an element to its parent element.

This /is/ an abuse, because the designation "element type"
obviously is supposed to give the /type of an element/,
i.e., a property which is intrinsic to the element alone
and has nothing to do with its relation to other elements.

The document type designers, however, are being forced to
commit this abuse, to reinvent poorly the missing structured
attribute values using the means of XML. If a rose has two
owners, the following element is not allowed in XML:

<rose owner="Jack" owner="Jill" />

One is made to use representations such as the following:

<rose>
<owner>Jack</owner>
<owner>Jill</owner></rose>

Here the notion "element type" suggests that it is marked
that Jack is "an owner", in the sense that "owner" is
supposed to be the type (the kind) of Jack. Not an
"owner of ..." (which would make sense), but just "an owner".

The intention of the author, however, is that "owner" is
supposed to give the /relation/ to the containing element
"rose". This is the natural field of application for
attributes, as the meaning of the word "attribute" outside
of XML clearly indicates, but it is not possible to
always use attributes for this purpose in XML.

An alternative solution might be the following notation.

<rose owner="Jack Jill" />

Here a /new/ mini language (not XML anymore) is used within
an attribute value, which, of course, can not be checked
anymore by XML validators. This is really done so, for
example, in XHTML, where classes are written this way.

So in its most prominent XML application XHTML, the W3C
has to abandon XML even to write class attributes. This
is not such a good accomplishment given that the W3C
was able to use the experience made with SGML and HTML
when designing XML.

The needless restrictions of XML inhibit the meaningful
use of syntax. This makes many document type designers
wonder, when attributes and when elements
should be used, which actually is an evidence of
incapacity for the design of XML: XML does not have many
more notations than these two: attributes and elements.
And now the W3C failed to give even these two
notations a clear and meaningful dedication!

Without the restrictions described, XML alone would have
nearly the expressive power of RDF/XML, which has to repair
painfully some of the errors made in the XML-design.

Now, some "experts" recommend to /always/ use subelements,
because one can never know whether an attribute value
that seems to be unstructured today might need to become
structured tomorrow. Other "experts" recommend to use
attributes only when one is quite confident that they
never will need to be structured. This recommendation
does not even try to make a sense out of attributes,
but just explains how to circumvent the obstacles
the W3C has built into XML.

Others recommend to use attributes for something they
call "metadata". They ignore that this limits "metadata"
to unstructured values.

Others use an XML editor that happens to make the input of
attributes more comfortable than the input of elements and
seriously suggest, therefore, to use as many attributes as
possible.

Still others have studied how to use CSS to format XML
documents and are using this to give recommendations about
when to use attributes and when to use subelements. (So
that the resulting document can be formatted most easily
with CSS.)

Of course: Mixing all these criteria (structured vs.
unstructured, data vs. "metadata", by CSS, by the ease of
editing, ...) often will give conflicting recommendations.

Certain other notations than XML have solved the problem
by either omitting attributes altogether or by allowing
structured attributes.

Java Problems (Really Need Help!)	2	Jan 26, 2014
XML Schema with variable number of attributes	4	Jun 20, 2007
Automatically retrieving XML	4	Oct 26, 2011
parsing nested unbounded XML fields with ElementTree	6	Nov 25, 2013
XML parsing ExpatError with xml.dom.minidom at line 1, column 0	2	Feb 13, 2014
Parsing XML against multiple complex XSD	0	Oct 7, 2008
schema, attributes, interdependent restrictions	3	May 23, 2008
EJB Bindings - Class Cast Exception	0	Sep 21, 2017

Parsing XML schema- variable attributes

Mike

Wojtek

Maverick

Mike

Mike Schilling

Stefan Ram

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads