SAX parseing goes 'all funny' on value [en]

Discussion in 'XML' started by Fred, Dec 13, 2003.

  1. Fred

    Fred Guest

    Hi,

    I am parsing a small xml document and the parseing goes 'all funny'
    when parsing this element: <useragent>Mozilla/4.61 [en] (WinNT;
    I)</useragent>

    I've created a subclass of org.xml.sax.helpers.DefaultHandler, and an
    instance of this subclass is set on my
    org.apache.xerces.parsers.SAXParser:

    SAXParser parser = new SAXParser();
    parser.setContentHandler(pdh);
    parser.setErrorHandler(pdh);

    I've found that the

    public void characters(char[] ch, int offset, int length) throws
    SAXException

    method is called once per element parsed. my debug output confirms
    this. e.g. when parsing <useragent>MobileExplorer/3.00 (Mozilla/1.22;
    compatible; MMEF300; Microsoft; Windows; GenericLarge)</useragent> it
    reads:

    D: reading characters...(useragent) length=89, offset=721,
    found='MobileExplorer/3.00 (Mozilla/1.22; compatible; MMEF300;
    Microsoft; Windows; GenericLarge)'
    D: ending element (useragent) current element value is :
    [MobileExplorer/3.00 (Mozilla/1.22; compatible; MMEF300; Microsoft;
    Windows; GenericLarge)]


    But... when parsing <useragent>Mozilla/4.61 [en] (WinNT;
    I)</useragent>
    the debug output reads

    D: reading characters...(useragent) length=16, offset=1097,
    found='Mozilla/4.61 [en'
    D: reading characters...(useragent) length=1, offset=0, found=']'
    D: reading characters...(useragent) length=11, offset=1114, found='
    (WinNT; I)'
    D: ending (useragent) current element value is : [ (WinNT; I)]

    It calls the characters method trice?!
    Does the [en] bit in the element value have anything to do with this?
    Would like to understand what and why.

    (As a 'temp fix' I thought to have the DefaultHandlers characters(...)
    method concatenate characters read, till the endElement(...) is
    invoked; but that seems to break everything.)

    Thanks for your input.
    Fred.
    Fred, Dec 13, 2003
    #1
    1. Advertising

  2. Fred wrote:

    > (As a 'temp fix' I thought to have the DefaultHandlers characters(...)
    > method concatenate characters read, till the endElement(...) is
    > invoked; but that seems to break everything.)


    I think that's how SAX is supposed to work. There's no guarantee that
    you're only getting a single event here.
    Julian Reschke, Dec 13, 2003
    #2
    1. Advertising

  3. Fred

    Eric Bohlman Guest

    Julian Reschke <> wrote in
    news::

    > Fred wrote:
    >
    >> (As a 'temp fix' I thought to have the DefaultHandlers characters(...)
    >> method concatenate characters read, till the endElement(...) is
    >> invoked; but that seems to break everything.)

    >
    > I think that's how SAX is supposed to work. There's no guarantee that
    > you're only getting a single event here.


    It *is* how SAX is supposed to work. Keep in mind that character data in
    XML can be arbitrarily long; if a parser had to deliver character data in a
    single chunk, it could find itself constantly allocating and reallocating
    buffers. Not imposing such a requirement greatly simplifies buffer
    management in a parser; it can use a fixed-size internal buffer and just
    call the character handler when everything up to the end of the buffer is
    character data, rather than having to shift everything around. That can
    greatly speed up parsing.
    Eric Bohlman, Dec 14, 2003
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    613
  2. tag

    excel formula parseing ??

    tag, Sep 9, 2004, in forum: Python
    Replies:
    0
    Views:
    384
  3. =?Utf-8?B?Z3V5?=

    Parseing HTML

    =?Utf-8?B?Z3V5?=, Nov 10, 2006, in forum: ASP .Net
    Replies:
    4
    Views:
    289
    Martin Honnen
    Nov 10, 2006
  4. An S.

    xml parseing

    An S., Sep 5, 2005, in forum: C++
    Replies:
    2
    Views:
    342
    Gianni Mariani
    Sep 5, 2005
  5. Sebastian (syepes)

    Problem parseing a XML - PullParser

    Sebastian (syepes), Dec 9, 2008, in forum: Ruby
    Replies:
    10
    Views:
    237
    Mark Thomas
    Dec 12, 2008
Loading...

Share This Page