SAX parsing problem

Discussion in 'Python' started by anon, Mar 16, 2005.

  1. anon

    anon Guest

    So I've encountered a strange behavior that I'm hoping someone can fill
    me in on. i've written a simple handler that works with one small
    exception, when the parser encounters a line with '&' in it, it
    only returns the portion that follows the occurence.

    For example, parsing a file with the line :
    <key>mykey</key><value>some%20&%20value</value>

    results in getting "%20value" back from the characters method, rather
    than "some%20&%20value".

    After looking into this a bit, I found that SAX supports entities and
    that it is probably believing the & to be an entity and processing
    it in some way that i'm unware of. I'm using the default
    EntityResolver.

    Any help/info would be much appreciated.

    gh
     
    anon, Mar 16, 2005
    #1
    1. Advertising

  2. anon <> writes:

    > So I've encountered a strange behavior that I'm hoping someone can fill
    > me in on. i've written a simple handler that works with one small
    > exception, when the parser encounters a line with '&' in it, it
    > only returns the portion that follows the occurence.
    >
    > For example, parsing a file with the line :
    > <key>mykey</key><value>some%20&%20value</value>
    >
    > results in getting "%20value" back from the characters method, rather
    > than "some%20&%20value".
    >
    > After looking into this a bit, I found that SAX supports entities and
    > that it is probably believing the & to be an entity and processing
    > it in some way that i'm unware of. I'm using the default
    > EntityResolver.


    Are you sure you're not actually getting three chunks: "some%20", "&",
    and "%20value"? The xml.sax.handler.ContentHandler.characters method
    (which I presume you're using for SAX, as you don't mention!) is not
    guaranteed to get all contiguous character data in one call. Also check
    if .skippedEntity() methods are firing.

    --
    |>|\/|<
    /--------------------------------------------------------------------------\
    |David M. Cooke
    |cookedm(at)physics(dot)mcmaster(dot)ca
     
    David M. Cooke, Mar 16, 2005
    #2
    1. Advertising

  3. anon

    gh Guest

    In article <>, David M.
    Cooke <> wrote:

    > anon <> writes:
    >
    > > So I've encountered a strange behavior that I'm hoping someone can fill
    > > me in on. i've written a simple handler that works with one small
    > > exception, when the parser encounters a line with '&' in it, it
    > > only returns the portion that follows the occurence.
    > >
    > > For example, parsing a file with the line :
    > > <key>mykey</key><value>some%20&%20value</value>
    > >
    > > results in getting "%20value" back from the characters method, rather
    > > than "some%20&%20value".
    > >
    > > After looking into this a bit, I found that SAX supports entities and
    > > that it is probably believing the & to be an entity and processing
    > > it in some way that i'm unware of. I'm using the default
    > > EntityResolver.

    >
    > Are you sure you're not actually getting three chunks: "some%20", "&",
    > and "%20value"? The xml.sax.handler.ContentHandler.characters method
    > (which I presume you're using for SAX, as you don't mention!) is not
    > guaranteed to get all contiguous character data in one call. Also check
    > if .skippedEntity() methods are firing.


    Ya, skippedEntity() wasn't firing, but you are correct about receiving
    three chunks. The characters handler routine is fired 3 times for a
    single text block. Why does it do this? Is there a way to prevent
    doing this?

    Much thanks.

    gh
     
    gh, Mar 16, 2005
    #3
  4. anon

    Uche Ogbuji Guest

    On Wed, 2005-03-16 at 00:14 -0800, gh wrote:
    > The characters handler routine is fired 3 times for a
    > single text block. Why does it do this? Is there a way to prevent
    > doing this?


    Continuing in the vein of closing matters cross-posted to XML-SIG:

    http://mail.python.org/pipermail/xml-sig/2005-March/011013.html

    --
    Uche Ogbuji Fourthought, Inc.
    http://uche.ogbuji.net http://4Suite.org http://fourthought.com
    Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
    Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html
    Gems from the Mines: 2002 to 2003 - http://www.xml.com/pub/a/2005/03/02/pyxml.html
    Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
    Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html
    Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html
     
    Uche Ogbuji, Mar 23, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. silviu

    SAX parsing problem

    silviu, Sep 19, 2003, in forum: XML
    Replies:
    4
    Views:
    568
    Bob Foster
    Sep 20, 2003
  2. Jonathan
    Replies:
    0
    Views:
    416
    Jonathan
    Oct 28, 2003
  3. Naren
    Replies:
    0
    Views:
    591
    Naren
    May 11, 2004
  4. Kai Schlamp
    Replies:
    1
    Views:
    423
    Arne Vajhøj
    Mar 27, 2008
  5. Stefan Behnel
    Replies:
    5
    Views:
    446
Loading...

Share This Page