SAX Parser problem

M

Mize-ze

I am using SAX to parse an XML file.
I want to get the "characters" of a specific tag (element)

Right now I extend the DefaultHandler and override public void
characters(char[] ch, int start, int length) method. but this event is
raised whenever there is content in a tag.
How can I get a specific charcters from an element using SAX

I don't have access to the qName from this event.
Any ideas?

Thanks.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Mize-ze said:
I am using SAX to parse an XML file.
I want to get the "characters" of a specific tag (element)

Right now I extend the DefaultHandler and override public void
characters(char[] ch, int start, int length) method. but this event is
raised whenever there is content in a tag.
How can I get a specific charcters from an element using SAX

I don't have access to the qName from this event.

Override:

public void startElement(
String namespaceURI,
String localName,
String rawName,
Attributes atts)
throws SAXException {

Arne
 
M

Mize-ze

Arne said:
Mize-ze said:
I am using SAX to parse an XML file.
I want to get the "characters" of a specific tag (element)

Right now I extend the DefaultHandler and override public void
characters(char[] ch, int start, int length) method. but this event is
raised whenever there is content in a tag.
How can I get a specific charcters from an element using SAX

I don't have access to the qName from this event.

Override:

public void startElement(
String namespaceURI,
String localName,
String rawName,
Attributes atts)
throws SAXException {

Arne


But where will I have access to the "characters"? (not to the atts)

<ELEMENT>charaters: this is what I want!!</ELEMENT>


thanks
 
I

Ian Wilson

Mize-ze said:
Arne said:
Mize-ze said:
I am using SAX to parse an XML file.
I want to get the "characters" of a specific tag (element)

Right now I extend the DefaultHandler and override public void
characters(char[] ch, int start, int length) method. but this event is
raised whenever there is content in a tag.
How can I get a specific charcters from an element using SAX

I don't have access to the qName from this event.

Override:

public void startElement(
String namespaceURI,
String localName,
String rawName,
Attributes atts)
throws SAXException {

Arne



But where will I have access to the "characters"? (not to the atts)

<ELEMENT>charaters: this is what I want!!</ELEMENT>

Here's a simple approach which I've used*:

In startElement(), store the localName (or qName). For example you could
store it in an instance variable (i.e. a field) such as String
currentElementName.

In characters() retrieve the stored localName (or qName). You then have
both tagname ("ELEMENT") and content ("charaters: this is what I
want!!") together in one place.

If necessary, you could nullify the stored localName (or qName) in
endElement().

* Actually I store a structure that represents all the elements leading
to a particular leaf in the XML tree

e.g. for
currentElement
<foo> foo
<bar> foo.bar
<baz>XXX</baz> foo.bar.baz
</bar>
</foo>
 
D

Donald Roby

Ian said:
Here's a simple approach which I've used*:

In startElement(), store the localName (or qName). For example you could
store it in an instance variable (i.e. a field) such as String
currentElementName.
In startElement(), also initialize a StringBuffer to collect the
characters into.
In characters() retrieve the stored localName (or qName). You then have
both tagname ("ELEMENT") and content ("charaters: this is what I
want!!") together in one place.
You don't get them all at once necessarily. Collect them into the
above-mentioned StringBuffer in the characters() method for use elsewhere.
If necessary, you could nullify the stored localName (or qName) in
endElement().
In endElement(), convert the StringBuffer to a String and at this point,
you do have both the tag and the entire character contents.

At this point, I create whatever internal structure it is I'm building,
usually by a call to a separate builder that had been passed in via the
handler's constructor, using the tag and the extracted contents, and
then clear them out to be ready for the next one parsed.
 
I

Ian Wilson

Donald said:
In startElement(), also initialize a StringBuffer to collect the
characters into.

You don't get them all at once necessarily. Collect them into the
above-mentioned StringBuffer in the characters() method for use elsewhere.

Thanks for pointing that out!

On re-rereading the javadocs for DefaultHandler I now see that it refers
to "each chunk of character data", which is a clue I overlooked.

I'm not sure if my testing has been lucky or my XML is sufficiently
simple that the first "chunk" will always contain the whole character
data for that element.

Do you know of a simple XML example that illustrates character()
providing several chunks? Or is it some relatively unpredictable
buffering related phenomenon?
In endElement(), convert the StringBuffer to a String and at this point,
you do have both the tag and the entire character contents.

Noted :)
 
I

Ian Wilson

Ian said:
Do you know of a simple XML example that illustrates character()
providing several chunks? Or is it some relatively unpredictable
buffering related phenomenon?

It seems to happen if the character data contains newlines.

<inventory>
<animal type="mammal">
<name>Fred</name>
<species>Hippo</species>
<weight units="Kg">1552</weight>
</animal>
<animal type="reptile">
<name>
Gert
AKA Gertrude
the galloping reptile
</name>
<species>Croc</species>
</animal>
</inventory>

I find character() is called separately for "Gert", "AKA Gertrude" and
"the galloping reptile".

My XML data has no newlines within character data, so I didn't have a
problem. Nevertheless I have made the necessary changes just in case :)
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Mize-ze said:
Arne said:
Mize-ze said:
I am using SAX to parse an XML file.
I want to get the "characters" of a specific tag (element)

Right now I extend the DefaultHandler and override public void
characters(char[] ch, int start, int length) method. but this event is
raised whenever there is content in a tag.
How can I get a specific charcters from an element using SAX

I don't have access to the qName from this event.
Override:

public void startElement(
String namespaceURI,
String localName,
String rawName,
Attributes atts)
throws SAXException {

Arne


But where will I have access to the "characters"? (not to the atts)

You find the tag with startElement and the text inside with characters.

Arne
 
V

vahan

In handle class:

String localName =null;

public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws
SAXException {


this.localName = localName;
}
}

public void endElement(String uri,
String localName,
String qName) throws SAXException {
this.localName = null;

}


public void characters(char ch[], int start, int length) throws
SAXException {
if ("YourTagName".equalsIgnoreCase(localName)) {
String desiredContext =new String(ch, start,
length));
}
}










Mize-ze said:
Arne said:
Mize-ze wrote:
I am using SAX to parse an XML file.
I want to get the "characters" of a specific tag (element)

Right now I extend the DefaultHandler and override public void
characters(char[] ch, int start, int length) method. but this event is
raised whenever there is content in a tag.
How can I get a specific charcters from an element using SAX

I don't have access to the qName from this event.
Override:

public void startElement(
String namespaceURI,
String localName,
String rawName,
Attributes atts)
throws SAXException {

Arne


But where will I have access to the "characters"? (not to the atts)

You find the tag with startElement and the text inside with characters.

Arne
 
I

Ian Wilson

vahan wrote:

<top-posted example code snipped>

You're making the same mistake I did, see earlier in thread.

For one element, character() may be called several times providing
character data in several chunks per element.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Ian said:
You're making the same mistake I did, see earlier in thread.

For one element, character() may be called several times providing
character data in several chunks per element.

Having character accumulate in a StringBuffer combined with some logic
in startElement and endElement is rather standard.

Arne
 
I

Ian Wilson

Arne said:
Having character accumulate in a StringBuffer combined with some logic
in startElement and endElement is rather standard.

I guess you mean standard as in "a customary programming idiom amongst
experienced users of SAX" rather than standard as in "explicitly written
down somewhere authoritative where people might be expected to easily
find it"?

I didn't find this idiom in the javadoc for DefaultHandler or in the
Java books I have (which admittedly only cover SAX briefly).

http://www.saxproject.org/quickstart.html doesn't describe this
programming idiom either.

When I Googled for "Java SAX example", the first three examples didn't
show this idiom, however the fourth did
(http://www.cafeconleche.org/slides/sd2002west/introxml/265.html)

This wasn't intended to be a whinge, I'm just pointing out that the
"standard" idiom may not be immediately obvious to people new to SAX.
 
C

Chris Uppal

Ian said:
This wasn't intended to be a whinge, I'm just pointing out that the
"standard" idiom may not be immediately obvious to people new to SAX.

This is example of an API design which might almost have been designed to be
misunderstood.

Other examples are from java.io.InputStream (and friends) where the uselessness
of available() and the not-totally-obvious semantics of read() seem to evade a
good many programmers.

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top