XML problem

gk · Nov 20, 2006

XML
======

<?xml version="1.0" encoding='UTF-8'?>
<sample>

data before element
<element attr1="value1" attr2="value2">
<data1 a="v">some data & one attribute</data1>
<data2>CDATA follows <![CDATA[more data]]></data2>
</element>
data after element
<?proc data for processing?>
</sample>

Output
========

java FirstSample first.xml

startDocument
startElement:
characters: data before element
startElement:
attribute: ="value1"
attribute: ="value2"
startElement:
attribute: ="v"
characters: some data
characters: &
characters: one attribute
endElement
startElement:
characters: CDATA follows
characters: more data
endElement
endElement
characters: data after element
processingInstruction: proc
data: data for processing
endElement
endDocument

this is a SAX parser.

what i dont understand here is , in above why the characters are

characters: some data
characters: &
characters: one attribute

WHY NOT characters : some data & one attribute.

why there are 3 lines for it ?

how do i know under which character , he characters will be broken and
will make 3 lines as above ?

Thomas Fritsch · Nov 20, 2006

gk said:
XML [...]
<data1 a="v">some data & one attribute</data1> [...]

Output

Click to expand...

[...]
startElement:
attribute: ="v"
characters: some data
characters: &
characters: one attribute
endElement [...]

what i dont understand here is , in above why the characters are

characters: some data
characters: &
characters: one attribute

WHY NOT characters : some data & one attribute.

why there are 3 lines for it ?

Because it is easier for the parser.

how do i know under which character , the characters will be broken and
will make 3 lines as above ?

You can't know. :-(
The parsers are free to do it as they like. That means the parser may do
it in a way that is easiest for *him*, but not easiest for *you*.
The justification is in the API docs of ContentHandler.chars() at
<http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/ContentHandler.html#characters(char[], int, int)>
<QUOTE>
SAX parsers may return all contiguous character data in a single chunk,
or they may split it into several chunks;
</QUOTE>
So you have to cope with possibly multiple chunks of characters (i.e.
reassemble them somehow).

gk · Nov 21, 2006

Thomas said:
gk said:

XML [...]
<data1 a="v">some data & one attribute</data1> [...]

Output

Click to expand...

[...]
startElement:
attribute: ="v"
characters: some data
characters: &
characters: one attribute
endElement [...]

what i dont understand here is , in above why the characters are

characters: some data
characters: &
characters: one attribute

WHY NOT characters : some data & one attribute.

why there are 3 lines for it ?

Click to expand...

Because it is easier for the parser.

how do i know under which character , the characters will be broken and
will make 3 lines as above ?

Click to expand...

You can't know. :-(
The parsers are free to do it as they like. That means the parser may do
it in a way that is easiest for *him*, but not easiest for *you*.
The justification is in the API docs of ContentHandler.chars() at
<http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/ContentHandler.html#characters(char[], int, int)>
<QUOTE>
SAX parsers may return all contiguous character data in a single chunk,
or they may split it into several chunks;
</QUOTE>
So you have to cope with possibly multiple chunks of characters (i.e.
reassemble them somehow).

nicely spoken.

but the parser is running some algo ...right ? or it is really randomly
breaking chars !
the parser might be abiding some rules or algo to do this taks ....is
not it ?

may be parser has this algo ...

If parser finds "amp' the break chars
if parser finds ";" the break chars

some sort of like this ....

or it is a whimsical parser !

Ian Wilson · Nov 21, 2006

gk said:
Thomas said:

gk said:

XML
[...]

<data1 a="v">some data & one attribute</data1>
[...]

Output

Click to expand...

[...]

startElement: attribute: ="v" characters: some data characters: &
characters: one attribute endElement
[...]

what i dont understand here is , in above why the characters are

characters: some data characters: & characters: one attribute

WHY NOT characters : some data & one attribute.

why there are 3 lines for it ?

Click to expand...

Because it is easier for the parser.

how do i know under which character , the characters will be
broken and will make 3 lines as above ?

Click to expand...

You can't know. :-( The parsers are free to do it as they like.
That means the parser may do it in a way that is easiest for *him*,
but not easiest for *you*. The justification is in the API docs of
ContentHandler.chars() at
<http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/ContentHandler.html#characters(char[], int, int)>
<QUOTE> SAX parsers may return all contiguous character data in a
single chunk, or they may split it into several chunks; </QUOTE> So
you have to cope with possibly multiple chunks of characters (i.e.
reassemble them somehow).

Click to expand...

nicely spoken.

but the parser is running some algo ...right ? or it is really
randomly breaking chars ! the parser might be abiding some rules or
algo to do this taks ....is not it ?

may be parser has this algo ...

If parser finds "amp' the break chars if parser finds ";" the
break chars

some sort of like this ....

or it is a whimsical parser !

Code as if it is whimsical and capricious and all will be well.

XML/SAX - endElement is never triggered	4	Apr 25, 2005
Xml parser and character encoding	8	Jun 26, 2006
xml-filter with XMLFilterBase() and XMLGenerator() shuffles attributes	2	Dec 20, 2007
how to stop processing xml file when error found	1	Jan 25, 2006
Copy characterdata from XML file to XML file	12	Dec 7, 2005
SAX XML Parse Python error message	5	Jul 13, 2008
XMLFilterImpl Problem & Question	0	Oct 14, 2003
xml entity problem	7	Aug 26, 2004

XML problem

gk

Thomas Fritsch

gk

Ian Wilson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads