XML element structure Q.

J

Jules

I was just reading the w3cschools pages on XML (it's been years since I've
actively used XML for anything) and it claims that elements can contain a
mixture of other elements and simple text.

Fair enough - but from a parser point of view, what's the correct
behaviour if multiple blocks of text are found amongst multiple "sub
elements"? Should the parser somehow concatenate the blocks and present a
single text string at the API layer? Or should it present multiple blocks
as individual strings?

cheers

Jules
 
M

Martin Honnen

Jules said:
I was just reading the w3cschools pages on XML (it's been years since I've
actively used XML for anything) and it claims that elements can contain a
mixture of other elements and simple text.

Fair enough - but from a parser point of view, what's the correct
behaviour if multiple blocks of text are found amongst multiple "sub
elements"? Should the parser somehow concatenate the blocks and present a
single text string at the API layer? Or should it present multiple blocks
as individual strings?

It depends on the API. The W3C DOM Level 3 Core has a property named
textContent that concatenates the text (but includes not only child text
but also descendant text). On the other hand in the DOM tree you will
obviously have separated child nodes e.g. with
<foo>Text 1<bar/>Text 2<bar/>Text 3</foo>
the foo element has five child nodes, a text node with nodeValue "Text
1", a 'bar' element, a text node with nodeValue "Text 2", a 'bar'
element and a text node with nodeValue "Text 3". The textContent
property would be "Text 1Text 2Text 3".
 
P

Peter Flynn

Jules said:
I was just reading the w3cschools pages on XML (it's been years since I've
actively used XML for anything) and it claims that elements can contain a
mixture of other elements and simple text.

Yes, this is normal, just like HTML does. I'm slightly puzzled by the
"claim" aspect, though. While you're right to be suspicious of the
w3cschools documents, does something like

<para>The <productname>Splosh</productname> parser is definitely
the <emphasis>best</emphasis> product.</para>

describe what you mean?
Fair enough - but from a parser point of view, what's the correct
behaviour if multiple blocks of text are found amongst multiple "sub
elements"?

A parser would normally report the components separately. If you run
onsgmls -wxml on the example above, you get:

(PARA
-The
(PRODUCTNAME
-Splosh
)PRODUCTNAME
- parser is definitely\n the
(EMPHASIS
-best
)EMPHASIS
- product.
)PARA

"(" means start-tag, ")" means end-tag, and "-" means character data
(text). \n is the embedded linebreak.
Should the parser somehow concatenate the blocks and present a single
text string at the API layer?

Only if requested, as Martin has described.

///Peter
 
R

Richard Tobin

Jules said:
Fair enough - but from a parser point of view, what's the correct
behaviour if multiple blocks of text are found amongst multiple "sub
elements"? Should the parser somehow concatenate the blocks and present a
single text string at the API layer? Or should it present multiple blocks
as individual strings?

Concatenating the immediate text children would almost never be the
right thing. Mixed content is rarely used in data-oriented XML, but
is very common in traditional text markup (such as XHTML). Typically
the child elements will contain further text, for example

This is <em>important</em>.

so concatenating the child text would give you "This is." which
is not likely to be useful.

Concatenating all the text descendants makes more sense - in this case
it would give you "This is important." - but in effect you are
stripping out the markup, so it's unlikely to be the main use of
the document.

-- Richard
 
J

Jules

I was just reading the w3cschools pages on XML (it's been years since I've
actively used XML for anything) and it claims that elements can contain a
mixture of other elements and simple text.

Follow-up to my own reply to say thanks to the individuals who responded
- all useful, and it seems obvious now that it's best to present data at
the API layer as multiple items (and calling routines can then make the
decision as to whether to concatenate text portions or not)

thanks!

Jules
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,144
Latest member
KetoBaseReviews
Top