SAX Parser problem

Discussion in 'Java' started by Mize-ze, Nov 13, 2006.

  1. Mize-ze

    Mize-ze Guest

    I am using SAX to parse an XML file.
    I want to get the "characters" of a specific tag (element)

    Right now I extend the DefaultHandler and override public void
    characters(char[] ch, int start, int length) method. but this event is
    raised whenever there is content in a tag.
    How can I get a specific charcters from an element using SAX

    I don't have access to the qName from this event.
    Any ideas?

    Thanks.
     
    Mize-ze, Nov 13, 2006
    #1
    1. Advertising

  2. Mize-ze wrote:
    > I am using SAX to parse an XML file.
    > I want to get the "characters" of a specific tag (element)
    >
    > Right now I extend the DefaultHandler and override public void
    > characters(char[] ch, int start, int length) method. but this event is
    > raised whenever there is content in a tag.
    > How can I get a specific charcters from an element using SAX
    >
    > I don't have access to the qName from this event.


    Override:

    public void startElement(
    String namespaceURI,
    String localName,
    String rawName,
    Attributes atts)
    throws SAXException {

    Arne
     
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=, Nov 14, 2006
    #2
    1. Advertising

  3. Mize-ze

    Mize-ze Guest

    Arne Vajhøj wrote:
    > Mize-ze wrote:
    > > I am using SAX to parse an XML file.
    > > I want to get the "characters" of a specific tag (element)
    > >
    > > Right now I extend the DefaultHandler and override public void
    > > characters(char[] ch, int start, int length) method. but this event is
    > > raised whenever there is content in a tag.
    > > How can I get a specific charcters from an element using SAX
    > >
    > > I don't have access to the qName from this event.

    >
    > Override:
    >
    > public void startElement(
    > String namespaceURI,
    > String localName,
    > String rawName,
    > Attributes atts)
    > throws SAXException {
    >
    > Arne



    But where will I have access to the "characters"? (not to the atts)

    <ELEMENT>charaters: this is what I want!!</ELEMENT>


    thanks
     
    Mize-ze, Nov 16, 2006
    #3
  4. Mize-ze

    Ian Wilson Guest

    Mize-ze wrote:
    > Arne Vajhøj wrote:
    >
    >>Mize-ze wrote:
    >>
    >>>I am using SAX to parse an XML file.
    >>>I want to get the "characters" of a specific tag (element)
    >>>
    >>>Right now I extend the DefaultHandler and override public void
    >>>characters(char[] ch, int start, int length) method. but this event is
    >>>raised whenever there is content in a tag.
    >>>How can I get a specific charcters from an element using SAX
    >>>
    >>>I don't have access to the qName from this event.

    >>
    >>Override:
    >>
    >> public void startElement(
    >> String namespaceURI,
    >> String localName,
    >> String rawName,
    >> Attributes atts)
    >> throws SAXException {
    >>
    >>Arne

    >
    >
    >
    > But where will I have access to the "characters"? (not to the atts)
    >
    > <ELEMENT>charaters: this is what I want!!</ELEMENT>
    >


    Here's a simple approach which I've used*:

    In startElement(), store the localName (or qName). For example you could
    store it in an instance variable (i.e. a field) such as String
    currentElementName.

    In characters() retrieve the stored localName (or qName). You then have
    both tagname ("ELEMENT") and content ("charaters: this is what I
    want!!") together in one place.

    If necessary, you could nullify the stored localName (or qName) in
    endElement().

    * Actually I store a structure that represents all the elements leading
    to a particular leaf in the XML tree

    e.g. for
    currentElement
    <foo> foo
    <bar> foo.bar
    <baz>XXX</baz> foo.bar.baz
    </bar>
    </foo>
     
    Ian Wilson, Nov 16, 2006
    #4
  5. Mize-ze

    Donald Roby Guest

    Ian Wilson wrote:
    >
    > Here's a simple approach which I've used*:
    >
    > In startElement(), store the localName (or qName). For example you could
    > store it in an instance variable (i.e. a field) such as String
    > currentElementName.
    >

    In startElement(), also initialize a StringBuffer to collect the
    characters into.

    > In characters() retrieve the stored localName (or qName). You then have
    > both tagname ("ELEMENT") and content ("charaters: this is what I
    > want!!") together in one place.
    >

    You don't get them all at once necessarily. Collect them into the
    above-mentioned StringBuffer in the characters() method for use elsewhere.

    > If necessary, you could nullify the stored localName (or qName) in
    > endElement().
    >

    In endElement(), convert the StringBuffer to a String and at this point,
    you do have both the tag and the entire character contents.

    At this point, I create whatever internal structure it is I'm building,
    usually by a call to a separate builder that had been passed in via the
    handler's constructor, using the tag and the extracted contents, and
    then clear them out to be ready for the next one parsed.
     
    Donald Roby, Nov 16, 2006
    #5
  6. Mize-ze

    Ian Wilson Guest

    Donald Roby wrote:
    > Ian Wilson wrote:
    >
    >>
    >> Here's a simple approach which I've used*:
    >>
    >> In startElement(), store the localName (or qName). For example you
    >> could store it in an instance variable (i.e. a field) such as String
    >> currentElementName.
    >>

    > In startElement(), also initialize a StringBuffer to collect the
    > characters into.
    >
    >> In characters() retrieve the stored localName (or qName). You then
    >> have both tagname ("ELEMENT") and content ("charaters: this is what I
    >> want!!") together in one place.
    >>

    > You don't get them all at once necessarily. Collect them into the
    > above-mentioned StringBuffer in the characters() method for use elsewhere.
    >


    Thanks for pointing that out!

    On re-rereading the javadocs for DefaultHandler I now see that it refers
    to "each chunk of character data", which is a clue I overlooked.

    I'm not sure if my testing has been lucky or my XML is sufficiently
    simple that the first "chunk" will always contain the whole character
    data for that element.

    Do you know of a simple XML example that illustrates character()
    providing several chunks? Or is it some relatively unpredictable
    buffering related phenomenon?

    >> If necessary, you could nullify the stored localName (or qName) in
    >> endElement().
    >>

    > In endElement(), convert the StringBuffer to a String and at this point,
    > you do have both the tag and the entire character contents.


    Noted :)
     
    Ian Wilson, Nov 16, 2006
    #6
  7. Mize-ze

    Ian Wilson Guest

    Ian Wilson wrote:
    > Donald Roby wrote:
    >> Ian Wilson wrote:
    >>
    >>> In characters() retrieve the stored localName (or qName). You then
    >>> have both tagname ("ELEMENT") and content ("charaters: this is what I
    >>> want!!") together in one place.
    >>>

    >> You don't get them all at once necessarily. Collect them into the
    >> above-mentioned StringBuffer in the characters() method for use
    >> elsewhere.
    >>

    >
    > Do you know of a simple XML example that illustrates character()
    > providing several chunks? Or is it some relatively unpredictable
    > buffering related phenomenon?
    >


    It seems to happen if the character data contains newlines.

    <inventory>
    <animal type="mammal">
    <name>Fred</name>
    <species>Hippo</species>
    <weight units="Kg">1552</weight>
    </animal>
    <animal type="reptile">
    <name>
    Gert
    AKA Gertrude
    the galloping reptile
    </name>
    <species>Croc</species>
    </animal>
    </inventory>

    I find character() is called separately for "Gert", "AKA Gertrude" and
    "the galloping reptile".

    My XML data has no newlines within character data, so I didn't have a
    problem. Nevertheless I have made the necessary changes just in case :)
     
    Ian Wilson, Nov 16, 2006
    #7
  8. Mize-ze wrote:
    > Arne Vajhøj wrote:
    >> Mize-ze wrote:
    >>> I am using SAX to parse an XML file.
    >>> I want to get the "characters" of a specific tag (element)
    >>>
    >>> Right now I extend the DefaultHandler and override public void
    >>> characters(char[] ch, int start, int length) method. but this event is
    >>> raised whenever there is content in a tag.
    >>> How can I get a specific charcters from an element using SAX
    >>>
    >>> I don't have access to the qName from this event.

    >> Override:
    >>
    >> public void startElement(
    >> String namespaceURI,
    >> String localName,
    >> String rawName,
    >> Attributes atts)
    >> throws SAXException {
    >>
    >> Arne

    >
    >
    > But where will I have access to the "characters"? (not to the atts)


    You find the tag with startElement and the text inside with characters.

    Arne
     
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=, Nov 17, 2006
    #8
  9. Mize-ze

    vahan Guest

    In handle class:

    String localName =null;

    public void startElement(String uri, String localName,
    String qName, Attributes attributes)
    throws
    SAXException {


    this.localName = localName;
    }
    }

    public void endElement(String uri,
    String localName,
    String qName) throws SAXException {
    this.localName = null;

    }


    public void characters(char ch[], int start, int length) throws
    SAXException {
    if ("YourTagName".equalsIgnoreCase(localName)) {
    String desiredContext =new String(ch, start,
    length));
    }
    }











    Arne Vajhøj wrote:
    > Mize-ze wrote:
    > > Arne Vajhøj wrote:
    > >> Mize-ze wrote:
    > >>> I am using SAX to parse an XML file.
    > >>> I want to get the "characters" of a specific tag (element)
    > >>>
    > >>> Right now I extend the DefaultHandler and override public void
    > >>> characters(char[] ch, int start, int length) method. but this event is
    > >>> raised whenever there is content in a tag.
    > >>> How can I get a specific charcters from an element using SAX
    > >>>
    > >>> I don't have access to the qName from this event.
    > >> Override:
    > >>
    > >> public void startElement(
    > >> String namespaceURI,
    > >> String localName,
    > >> String rawName,
    > >> Attributes atts)
    > >> throws SAXException {
    > >>
    > >> Arne

    > >
    > >
    > > But where will I have access to the "characters"? (not to the atts)

    >
    > You find the tag with startElement and the text inside with characters.
    >
    > Arne
     
    vahan, Nov 17, 2006
    #9
  10. Mize-ze

    Ian Wilson Guest

    vahan wrote:

    <top-posted example code snipped>

    You're making the same mistake I did, see earlier in thread.

    For one element, character() may be called several times providing
    character data in several chunks per element.
     
    Ian Wilson, Nov 17, 2006
    #10
  11. Ian Wilson wrote:
    > You're making the same mistake I did, see earlier in thread.
    >
    > For one element, character() may be called several times providing
    > character data in several chunks per element.


    Having character accumulate in a StringBuffer combined with some logic
    in startElement and endElement is rather standard.

    Arne
     
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=, Nov 18, 2006
    #11
  12. Mize-ze

    Ian Wilson Guest

    Arne Vajhøj wrote:
    > Ian Wilson wrote:
    >
    >> You're making the same mistake I did, see earlier in thread.
    >>
    >> For one element, character() may be called several times providing
    >> character data in several chunks per element.

    >
    >
    > Having character accumulate in a StringBuffer combined with some logic
    > in startElement and endElement is rather standard.
    >


    I guess you mean standard as in "a customary programming idiom amongst
    experienced users of SAX" rather than standard as in "explicitly written
    down somewhere authoritative where people might be expected to easily
    find it"?

    I didn't find this idiom in the javadoc for DefaultHandler or in the
    Java books I have (which admittedly only cover SAX briefly).

    http://www.saxproject.org/quickstart.html doesn't describe this
    programming idiom either.

    When I Googled for "Java SAX example", the first three examples didn't
    show this idiom, however the fourth did
    (http://www.cafeconleche.org/slides/sd2002west/introxml/265.html)

    This wasn't intended to be a whinge, I'm just pointing out that the
    "standard" idiom may not be immediately obvious to people new to SAX.
     
    Ian Wilson, Nov 20, 2006
    #12
  13. Mize-ze

    Chris Uppal Guest

    Ian Wilson wrote:

    > This wasn't intended to be a whinge, I'm just pointing out that the
    > "standard" idiom may not be immediately obvious to people new to SAX.


    This is example of an API design which might almost have been designed to be
    misunderstood.

    Other examples are from java.io.InputStream (and friends) where the uselessness
    of available() and the not-totally-obvious semantics of read() seem to evade a
    good many programmers.

    -- chris
     
    Chris Uppal, Nov 20, 2006
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Martin Schlatter

    Encoding problem with SAX parser

    Martin Schlatter, Dec 10, 2003, in forum: Java
    Replies:
    2
    Views:
    876
    Martin Schlatter
    Dec 14, 2003
  2. Mladen Adamovic
    Replies:
    0
    Views:
    759
    Mladen Adamovic
    Jan 14, 2005
  3. Replies:
    5
    Views:
    16,320
    Steve W. Jackson
    Sep 15, 2005
  4. SAX parser problem

    , Feb 18, 2006, in forum: Java
    Replies:
    3
    Views:
    770
  5. Replies:
    0
    Views:
    909
Loading...

Share This Page