Tags v.s. Attributes

G

gaijinco

I'm fairly new to using XML and I tend to be quite verbose when
writting files.

Is there any disadvantage of writting:

<person>
<name first="Carlos" second="" />
<lastname first="Obregon" second="Jimenez" />
<id type="CC" number="79879389" />
<birthday day="17" month="02" year="1979" />
<member day="01" month="06" year="2007" />
<adress id="Evegreen Cr. 1234" />
<telephone number="555-123456" />
<email user="me" domain="home.com" />
</person>

Intead of:

<person>
<name>
<first>Carlos</first>
<second></second>
</name>
<lastname>
<first>Obregon</first>
<second>Jimenez</second>
</lastname>
<id>
<type>CC</type>
<number>79879389</number>
</id>
<birthday>
<day>17</day>
<month>02</month>
<year>1979</year>
</birthday>
<member>
<day>01</day>
<month>06</month>
<year>2008</year>
</member>
<adress>Evegreen Cr. 1234</adress>
<telephone>555-123456</telephone>
<email>
<user>me</user>
<domain>home.com</domain>
</email>
</person>

Thanks.
 
S

Stefan Ram

gaijinco said:
<name first="Carlos" second="" />
<name>
<first>Carlos</first>
<second></second>
</name>

When a new document type is to be defined, when should one
choose child elements and when attributes?

The criterion that makes sense regarding the meaning can not
be used in XML due to syntactic restrictions.

An element is describing something. A description is an
assertion. An assertion might contain unary predicates or
binary relations.

Comparing this structure of assertions with the structure
of XML, it seems to be natural to represent unary predicates
with types and binary relations with attributes.

Say, "x" is a rose and belongs to Jack. This assertion can
be written in a more formal way to show the relations used:

rose( x ) ^ owner( x, Jack )

This is written in XML as:

<rose owner="Jack" />

Thus, my answer would be: use element types for unary
predicates and attributes for binary relations.

Unfortunately, in XML, this is not always possible, because
in XML:

- there might be at most one type per element,

- there might be at most one attribute value per attribute
name, and

- attribute values are not allowed to be structured in
XML.

Therefore, the designers of XML document types are forced to
abuse element /types/ in order to describe the /relation/
of an element to its parent element.

This /is/ an abuse, because the designation "element type"
obviously is supposed to give the /type of an element/,
i.e., a property which is intrinsic to the element alone
and has nothing to do with its relation to other elements.

The document type designers, however, are being forced to
commit this abuse, to reinvent poorly the missing structured
attribute values using the means of XML. If a rose has two
owners, the following element is not allowed in XML:

<rose owner="Jack" owner="Jill" />

One is made to use representations such as the following:

<rose>
<owner>Jack</owner>
<owner>Jill</owner></rose>

Here the notion "element type" suggests that it is marked
that Jack is "an owner", in the sense that "owner" is
supposed to be the type (the kind) of Jack. Not an
"owner of ..." (which would make sense), but just "an owner".

The intention of the author, however, is that "owner" is
supposed to give the /relation/ to the containing element
"rose". This is the natural field of application for
attributes, as the meaning of the word "attribute" outside
of XML clearly indicates, but it is not possible to
always use attributes for this purpose in XML.

An alternative solution might be the following notation.

<rose owner="Alexander Marie" />

Here a /new/ mini language (not XML anymore) is used within
an attribute value, which, of course, can not be checked
anymore by XML validators. This is really done so, for
example, in XHTML, where classes are written this way.

So in its most prominent XML application XHTML, the W3C
has to abandon XML even to write class attributes. This
is not such a good accomplishment given that the W3C
was able to use the experience made with SGML and HTML
when designing XML.

The needless restrictions of XML inhibit the meaningful
use of syntax. This makes many document type designers
wonder, when attributes and when elements
should be used, which actually is an evidence of
incapacity for the design of XML: XML does not have many
more notations than these two: attributes and elements.
And now the W3C failed to give even these two
notations a clear and meaningful dedication!

Without the restrictions described, XML alone would have
nearly the expressive power of RDF/XML, which has to repair
painfully some of the errors made in the XML-design.

Now, some "experts" recommend to /always/ use subelements,
because one can never know, whether an attribute value
that seems to be unstructured today might need to become
structured tomorrow. Other "experts" recommend to use
attributes only when one is quite confident that they
never will need to be structured. This recommendation
does not even try to make a sense out of attributes,
but just explains how to circumvent the obstacles
the W3C has built into XML.

Others recommend to use attributes for something they
call "metadata". They ignore that this limits "metadata"
to unstructured values.

Others use an XML editor that happens to make the input of
attributes more comfortable than the input of elements and
seriously suggest, therefore, to use as many attributes as
possible.

Still others have studied how to use CSS to format XML
documents and are using this to give recommendations about
when to use attributes and when to use subelements. (So
that the resulting document can be formatted most easily
with CSS.)

Of course: Mixing all these criteria (structured vs.
unstructured, data vs. "metadata", by CSS, by the ease of
editing, ...) often will give conflicting recommendations.

Certain other notations than XML have solved the problem
by either omitting attributes altogether or by allowing
structured attributes.
 
P

Peter Flynn

Stefan said:
When a new document type is to be defined, when should one
choose child elements and when attributes?

The criterion that makes sense regarding the meaning can not
be used in XML due to syntactic restrictions.

That is too broad. Often it can.
An element is describing something. A description is an
assertion. An assertion might contain unary predicates or
binary relations.

Comparing this structure of assertions with the structure
of XML, it seems to be natural to represent unary predicates
with types and binary relations with attributes.

Say, "x" is a rose and belongs to Jack. This assertion can
be written in a more formal way to show the relations used:

rose( x ) ^ owner( x, Jack )

This is written in XML as:

<rose owner="Jack" />

This is not true. It demonstrates very well a misunderstanding of text
markup that is unfortunately far too prevalent. Naming element types
after concrete objects is rare and almost always wrong. Possibly a DTD
for a horticulturalist might do this, but in normal text applications
you would write something like

<plant type="rose" owner="Jack">x</plant>

That is, "x" is an instance of a type of plant called a rose and this
one belongs to Jack.
Thus, my answer would be: use element types for unary
predicates and attributes for binary relations.

Unfortunately, in XML, this is not always possible, because
in XML:

- there might be at most one type per element,

- there might be at most one attribute value per attribute
name, and

- attribute values are not allowed to be structured in
XML.

Therefore, the designers of XML document types are forced to
abuse element /types/ in order to describe the /relation/
of an element to its parent element.

This /is/ an abuse, because the designation "element type"
obviously is supposed to give the /type of an element/,
i.e., a property which is intrinsic to the element alone
and has nothing to do with its relation to other elements.

Nearly. But you are trying to force XML into a very narrow,
computer-science style mould of logic, which it was never intended for.
The document type designers, however, are being forced to
commit this abuse, to reinvent poorly the missing structured
attribute values using the means of XML. If a rose has two
owners, the following element is not allowed in XML:

<rose owner="Jack" owner="Jill" />

One is made to use representations such as the following:

<rose>
<owner>Jack</owner>
<owner>Jill</owner></rose>

This would be suboptimal for this case, where the owners are presumed to
be uniquely occurring individuals. But it would be possible.
Here the notion "element type" suggests that it is marked
that Jack is "an owner", in the sense that "owner" is
supposed to be the type (the kind) of Jack. Not an
"owner of ..." (which would make sense), but just "an owner".

The normal solution would be something like

....
<owners>
<owner id="Jack">Jack the Lad</owner>
<owner id="Jill">Jill the Lass</owner>
...
</owners>
....
<plant type="rose" owners="Jack Jill">x</plant>

(with id as ID and owners as IDREFS). Certainly you could choose to
The intention of the author, however, is that "owner" is
supposed to give the /relation/ to the containing element
"rose". This is the natural field of application for
attributes, as the meaning of the word "attribute" outside
of XML clearly indicates, but it is not possible to
always use attributes for this purpose in XML.

An alternative solution might be the following notation.

<rose owner="Alexander Marie" />

Here a /new/ mini language (not XML anymore) is used within
an attribute value, which, of course, can not be checked
anymore by XML validators. This is really done so, for
example, in XHTML, where classes are written this way.

I suggest you re-read the XML Spec for IDREFS and ENTITIES.
So in its most prominent XML application XHTML, the W3C
has to abandon XML even to write class attributes. This
is not such a good accomplishment given that the W3C
was able to use the experience made with SGML and HTML
when designing XML.

That was done for exogenous political reasons, as I understand it, not
for technical ones.
The needless restrictions of XML inhibit the meaningful
use of syntax. This makes many document type designers
wonder, when attributes and when elements
should be used, which actually is an evidence of
incapacity for the design of XML: XML does not have many
more notations than these two: attributes and elements.
And now the W3C failed to give even these two
notations a clear and meaningful dedication!

No-one is pretending that XML is perfect, but you must understand that
it was designed for text documents, not for database engineering.
Without the restrictions described, XML alone would have
nearly the expressive power of RDF/XML, which has to repair
painfully some of the errors made in the XML-design.

Now, some "experts" recommend to /always/ use subelements,
because one can never know, whether an attribute value
that seems to be unstructured today might need to become
structured tomorrow. Other "experts" recommend to use
attributes only when one is quite confident that they
never will need to be structured. This recommendation
does not even try to make a sense out of attributes,
but just explains how to circumvent the obstacles
the W3C has built into XML.

Please re-read the FAQ warning on this subject.

[snip]

///Peter
 
S

Stefan Ram

Peter Flynn said:
Again, not true. <rose owner="Jack Jill Stefan"/> is the normal solution
to multiple parallel values, where owner is declared as IDREFS or ENTITIES.

Thank you, I was not aware of IDREFS or ENTITIES yet.
So, there is limited support for parallel values in XML.

One might say for »parallel /references/« (to ids or entities).
It seems as if it can not be used when the values are literals
(not references) such as numerals (numbers), for example.
 
P

Peter Flynn

Stefan said:
Thank you, I was not aware of IDREFS or ENTITIES yet.
So, there is limited support for parallel values in XML.

One might say for »parallel /references/« (to ids or entities).
It seems as if it can not be used when the values are literals
(not references) such as numerals (numbers), for example.

That's correct. XML is based on SGML DTDs, and was aimed at the document
publishing field. There are many things that users of rectangular data
would like to see allowed, but for that you need another syntax.

///Peter
 
D

David Carlisle

Stefan said:
Thank you, I was not aware of IDREFS or ENTITIES yet.
So, there is limited support for parallel values in XML.

One might say for »parallel /references/« (to ids or entities).
It seems as if it can not be used when the values are literals
(not references) such as numerals (numbers), for example.

That though is a restriction of DTD rather than of XML itself. Other XML
validation languages such as XSD or Relax NG Schema, or schematron for
exampes can all easily be used to constrain an attribute to be (for
example) a white space separated list of integer values.

David
 
S

Stefan Ram

David Carlisle said:
That though is a restriction of DTD rather than of XML itself. Other XML
validation languages such as XSD or Relax NG Schema, or schematron for
exampes can all easily be used to constrain an attribute to be (for
example) a white space separated list of integer values.

You can also create a validation language that can be used to
constrain an attribute to be a valid Java program.

However, the structure of such an attribute is not being
described by the XML language anymore. The XML TR does not
describe the Java syntax. So it is not provided by the XML TR.

The XML TR describes a document made of elements and possibly
attributes. It provides rules and names for these structural parts.

XML provides rules and names for a list of IDREFs within an
attribute, so this still »is« XML.

But the XML TR does not provide rules and syntactical names
(nonterminal symbols) for a list of integer numerals (integer
literals) within an attribut.

This is another language. It might be call »Relax-XML« or so.

Such a valid Relax-XML document also can be a valid XML document.
Insofar it »is« XML. But XML does not describe a special
syntax for integer numerals within an attribute value. To XML,
this is just an opaque attribute value. Interpreting this as a
list of integers is not backed by the XML TR anymore, this needs
the additional Relax specification.
 
D

David Carlisle

Stefan said:
You can also create a validation language that can be used to
constrain an attribute to be a valid Java program.

However, the structure of such an attribute is not being
described by the XML language anymore. The XML TR does not
describe the Java syntax. So it is not provided by the XML TR.

The XML TR describes a document made of elements and possibly
attributes. It provides rules and names for these structural parts.

XML provides rules and names for a list of IDREFs within an
attribute, so this still »is« XML.

But the XML TR does not provide rules and syntactical names
(nonterminal symbols) for a list of integer numerals (integer
literals) within an attribut.

This is another language. It might be call »Relax-XML« or so.

Such a valid Relax-XML document also can be a valid XML document.
Insofar it »is« XML. But XML does not describe a special
syntax for integer numerals within an attribute value. To XML,
this is just an opaque attribute value. Interpreting this as a
list of integers is not backed by the XML TR anymore, this needs
the additional Relax specification.

there are lots of things the XML spec doesn't secify, but to say XML
encoding lists of numbers (an SVG path attribute for example) isn't XML
is a rather strange conclusion to draw. For a start a lot (quite
possibly a majority) of XML is "just" well formed and not validated at
all so the relative expressive strengths of validation languages are
irrelevant. For documents that are to be validated, it's issentially
irrelevant to the end user, the internal organisation and timing of the
various working groups that mean that xml is split across a range of
specifications, xml itself, xml names, sax, dom, xsd etc. If you just go
by what's in the XML rec without relying on anything else, you can not
even use any standard parsing model, so use fo XML woul dbe rather hard,
or you may decide it's legal to use names like <a:b:c/> but then find
that the vast majority of current xml tools follow the additinal
constraints in the namespace spec and would reject such an element.

Actually even by your definition XML can do more than you imply: IDREFS
isn't the only list type NMTOKENS for example would allow you to specify
an attribute is a white space list of something, even if you can't, in
DTD, restrict the tokens further to be just digits. But to say XML with
lists of integers in an attribute isn't really XML because DTD can't
validate the XML is just like saying that an XMl document containing
english text isn't really XML because DTD can't enforce spell checking.
By that definition, what can XML be used for?

David
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top