XML problem

G

gk

XML
======

<?xml version="1.0" encoding='UTF-8'?>
<sample>

data before element
<element attr1="value1" attr2="value2">
<data1 a="v">some data &amp; one attribute</data1>
<data2>CDATA follows <![CDATA[more data]]></data2>
</element>
data after element
<?proc data for processing?>
</sample>




Output
========
java FirstSample first.xml
startDocument
startElement:
characters: data before element
startElement:
attribute: ="value1"
attribute: ="value2"
startElement:
attribute: ="v"
characters: some data
characters: &
characters: one attribute
endElement
startElement:
characters: CDATA follows
characters: more data
endElement
endElement
characters: data after element
processingInstruction: proc
data: data for processing
endElement
endDocument


this is a SAX parser.

what i dont understand here is , in above why the characters are

characters: some data
characters: &
characters: one attribute



WHY NOT characters : some data & one attribute.

why there are 3 lines for it ?

how do i know under which character , he characters will be broken and
will make 3 lines as above ?
 
T

Thomas Fritsch

gk said:
XML [...]
<data1 a="v">some data &amp; one attribute</data1> [...]
[...]
startElement:
attribute: ="v"
characters: some data
characters: &
characters: one attribute
endElement [...]

what i dont understand here is , in above why the characters are

characters: some data
characters: &
characters: one attribute

WHY NOT characters : some data & one attribute.

why there are 3 lines for it ?
Because it is easier for the parser.
how do i know under which character , the characters will be broken and
will make 3 lines as above ?

You can't know. :-(
The parsers are free to do it as they like. That means the parser may do
it in a way that is easiest for *him*, but not easiest for *you*.
The justification is in the API docs of ContentHandler.chars() at
<http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/ContentHandler.html#characters(char[], int, int)>
<QUOTE>
SAX parsers may return all contiguous character data in a single chunk,
or they may split it into several chunks;
</QUOTE>
So you have to cope with possibly multiple chunks of characters (i.e.
reassemble them somehow).
 
G

gk

Thomas said:
gk said:
XML [...]
<data1 a="v">some data &amp; one attribute</data1> [...]
[...]
startElement:
attribute: ="v"
characters: some data
characters: &
characters: one attribute
endElement [...]

what i dont understand here is , in above why the characters are

characters: some data
characters: &
characters: one attribute

WHY NOT characters : some data & one attribute.

why there are 3 lines for it ?
Because it is easier for the parser.
how do i know under which character , the characters will be broken and
will make 3 lines as above ?

You can't know. :-(
The parsers are free to do it as they like. That means the parser may do
it in a way that is easiest for *him*, but not easiest for *you*.
The justification is in the API docs of ContentHandler.chars() at
<http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/ContentHandler.html#characters(char[], int, int)>
<QUOTE>
SAX parsers may return all contiguous character data in a single chunk,
or they may split it into several chunks;
</QUOTE>
So you have to cope with possibly multiple chunks of characters (i.e.
reassemble them somehow).


nicely spoken.

but the parser is running some algo ...right ? or it is really randomly
breaking chars !
the parser might be abiding some rules or algo to do this taks ....is
not it ?

may be parser has this algo ...

If parser finds "amp' the break chars
if parser finds ";" the break chars


some sort of like this ....

or it is a whimsical parser !
 
I

Ian Wilson

gk said:
Thomas said:
gk said:
XML
[...]

<data1 a="v">some data &amp; one attribute</data1>
[...]

[...]

startElement: attribute: ="v" characters: some data characters: &
characters: one attribute endElement
[...]

what i dont understand here is , in above why the characters are

characters: some data characters: & characters: one attribute

WHY NOT characters : some data & one attribute.

why there are 3 lines for it ?

Because it is easier for the parser.
how do i know under which character , the characters will be
broken and will make 3 lines as above ?

You can't know. :-( The parsers are free to do it as they like.
That means the parser may do it in a way that is easiest for *him*,
but not easiest for *you*. The justification is in the API docs of
ContentHandler.chars() at
<http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/ContentHandler.html#characters(char[], int, int)>
<QUOTE> SAX parsers may return all contiguous character data in a
single chunk, or they may split it into several chunks; </QUOTE> So
you have to cope with possibly multiple chunks of characters (i.e.
reassemble them somehow).

nicely spoken.

but the parser is running some algo ...right ? or it is really
randomly breaking chars ! the parser might be abiding some rules or
algo to do this taks ....is not it ?

may be parser has this algo ...

If parser finds "amp' the break chars if parser finds ";" the
break chars


some sort of like this ....

or it is a whimsical parser !

Code as if it is whimsical and capricious and all will be well.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,157
Latest member
MercedesE4
Top