SAX callback method question

S

steve_marjoribanks

If I have an XML document with some elements like this:

<line>
<point>0 2</point>
<point>1 4</point>
<point>3 5</point>
etc.

</line>

ie. a collection of points which I want to extract the coordinates of
from the XML file and draw them using Java.
I was thinking I can obviously use the startElement method and use a
test to see if it's a <point> element and then use the characters
method to extract the coordinates as strings and cast them to intergers
and store in an array or similar. This might sound like a silly
question but will the parser always traverse through the XML document
in order parsing as it goes? ie, if using the method just described,
will the coordinates of the points be stored in the correct order in
the array?

Also, if my XML document was like:

<line>
<point>5 1</point>
<point>4 8</point>
etc
</line>
<line>
<point>3 4</point>
<point>4 1</point>
etc
</line>
etc

how would I go about making sure that the point coordinates for each
line remain separate from each other and do not get mixed up?
I'm starting to think that DOM might have been a better idea than SAX!!
:-(

Steve
 
R

Robert Klemme

If I have an XML document with some elements like this:

<line>
<point>0 2</point>
<point>1 4</point>
<point>3 5</point>
etc.

</line>

ie. a collection of points which I want to extract the coordinates of
from the XML file and draw them using Java.
I was thinking I can obviously use the startElement method and use a
test to see if it's a <point> element and then use the characters
method to extract the coordinates as strings and cast them to
intergers and store in an array or similar. This might sound like a
silly question but will the parser always traverse through the XML
document in order parsing as it goes? ie, if using the method just
described, will the coordinates of the points be stored in the
correct order in the array?

Yes. AFAIK order matters by the XML standard.
Also, if my XML document was like:

<line>
<point>5 1</point>
<point>4 8</point>
etc
</line>
<line>
<point>3 4</point>
<point>4 1</point>
etc
</line>
etc

how would I go about making sure that the point coordinates for each
line remain separate from each other and do not get mixed up?
I'm starting to think that DOM might have been a better idea than
SAX!! :-(

I prefer SAX as it's less resource intensive and you can easily skip
things you want to ignore without wasting mem or CPU cycles.

The way I usually do it is this: create a proxy that implements the
callback interface(s) I need. Internally when it sees an opening element
it will create a delegate instance based on the nane of the element and
puts it onto a stack by giving him the reference of the current elem.
Then the proxy delegates the method call to the topmost element on the
stack. Delegates store state as they see fit and model instances are
updated when the closing tag is detected.

Hope that was clear enough.

Btw, your points are really structured elements. I'd rather do something
like:

<line>
<point>
<x>0</x>
<y>2</y>
</point>
</line>

(With better names probably.)

Kind regards

robert
 
S

steve_marjoribanks

Thanks for the reply. With regards to the naming, I just made up an
example, a sample of the real XML I am using is shown below. The
problem is that my schema is an extension of other schemas and as such
contains elements and complex types whose naming is out of my control.


<geotechml:layers>
<geotechml:Layer materialID="1">
<geotechml:layerTop>
<geotechml:Curve>
<gml:LineString>
<gml:pos>0 10</gml:pos>
<gml:pos>30 10</gml:pos>
<gml:pos>60 40</gml:pos>
</gml:LineString>
</geotechml:Curve>
</geotechml:layerTop>
</geotechml:Layer>
<geotechml:Layer materialID="2">
<geotechml:layerTop>
<geotechml:Curve>
<gml:LineString>
<gml:pos>0 30</gml:pos>
<gml:pos>20 40</gml:pos>
<gml:pos>60 40</gml:pos>
</gml:LineString>
</geotechml:Curve>
</geotechml:layerTop>
</geotechml:Layer>
<geotechml:Layer materialID="3">
<geotechml:layerTop>
<geotechml:Curve>
<gml:LineString>
<gml:pos>0 60</gml:pos>
<gml:pos>20 65</gml:pos>
<gml:pos>50 70</gml:pos>
<gml:pos>70 70</gml:pos>
<gml:pos>100 80</gml:pos>
</gml:LineString>
</geotechml:Curve>
</geotechml:layerTop>
</geotechml:Layer>
</geotechml:layers>


In the example above I need to extract the values of the 3 coordinate
points given for each lineString and then draw them in my Java
application.
Sorry, but being a newbie I have no idea what you're talking about when
you gave your solution using a proxy? Any chance you could exlplain
further please? (sorry!).
Do you think in this instance it would be easier to use DOM? I say this
because although I don't need to extract data from every element (as
shown above) there are a number of elements which I need to get the
data from and they're not all named the same as in the example above
either.

Steve
 
R

Robert Klemme

Thanks for the reply. With regards to the naming, I just made up an
example, a sample of the real XML I am using is shown below. The
problem is that my schema is an extension of other schemas and as such
contains elements and complex types whose naming is out of my control.

Well, bad. :-}

In the example above I need to extract the values of the 3 coordinate
points given for each lineString and then draw them in my Java
application.
Sorry, but being a newbie I have no idea what you're talking about
when you gave your solution using a proxy? Any chance you could
exlplain further please? (sorry!).

You create an object that does just part of the job (finding the one that
should do the real work) and then delegates the method invocation to that
object.
Do you think in this instance it would be easier to use DOM? I say
this because although I don't need to extract data from every element
(as shown above) there are a number of elements which I need to get
the data from and they're not all named the same as in the example
above either.

Can't really tell as I don't see the whole picture. If you use DOM,
you'll have to do the traversal or work with an XSLT processor. If those
documents can be large I'd favour the other approach but YMMV (especially
if you need a lot of the data from the tree).

Kind regards

robert
 
S

steve_marjoribanks

You create an object that does just part of the job (finding the one that
should do the real work) and then delegates the method invocation to that
object.

Do you mean kind of 'nesting' callback methods? So would I have one
handler that find a certain node and then delegates the handling of the
children of that node to another node and so on until I get the data I
need? Sorry for all the questions!
Can't really tell as I don't see the whole picture. If you use DOM,
you'll have to do the traversal or work with an XSLT processor. If those
documents can be large I'd favour the other approach but YMMV (especially
if you need a lot of the data from the tree).

Hmm, its a tricky one. I originally chose SAX because the documents I'm
working with with have the potential to become fairly large, not
massive but not particularly small either. Also, I have no need to
write or change the XML so I thought I'd use SAX. Having thought about
it now though, I do need to extract a fair amount of data from the tree
but as shown above I'll need to traverse down though a fairly large
tree structure to get the information I need because there is a
reasonably 'deep' tree structure and the information needed is at the
bottom of the tree.
 
R

Robert Klemme

Do you mean kind of 'nesting' callback methods? So would I have one
handler that find a certain node and then delegates the handling of
the children of that node to another node and so on until I get the
data I need? Sorry for all the questions!

Yes. I think you get the hang of it.
Hmm, its a tricky one. I originally chose SAX because the documents
I'm working with with have the potential to become fairly large, not
massive but not particularly small either. Also, I have no need to
write or change the XML so I thought I'd use SAX. Having thought about
it now though, I do need to extract a fair amount of data from the
tree but as shown above I'll need to traverse down though a fairly
large tree structure to get the information I need because there is a
reasonably 'deep' tree structure and the information needed is at the
bottom of the tree.

But if you just need info from some top level nodes and leaf nodes and
there's a lot of stuff in between that you want to ignore, then that
sounds as if you rather only extract 20% of the data. In that case I'd go
for SAX.

Kind regards

robert
 
S

steve_marjoribanks

Having had a think about it, I'm struggling to get my head around how
this would actually be implemented. I've read up on the DefaultHandler
and as far as I can work out you can only assign one per reader. How
can I use multiple handlers on just one input?
 
S

steve_marjoribanks

Having had a think about it, I'm struggling to get my head around how
this would actually be implemented. I've read up on the DefaultHandler
and as far as I can work out you can only assign one per reader. How
can I use multiple handlers on just one input?

Thanks
Steve
 
R

Robert Klemme

Having had a think about it, I'm struggling to get my head around how
this would actually be implemented. I've read up on the DefaultHandler
and as far as I can work out you can only assign one per reader. How
can I use multiple handlers on just one input?

This is a basic pattern called "delegation" (also "strategy pattern" and
"state pattern"). Information about this abounds on the web but you might
be better off by first reading a book about OO design and / or software
design in general.

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top