RE: [XML-SIG] SAX characters() output on multiple lines for non-ascii

Discussion in 'Python' started by Brian Smith, Feb 2, 2008.

  1. Brian Smith

    Brian Smith Guest

    > def characters(self, chars):
    >
    > newchars=[]
    > newchars.append(chars.encode('ISO-8859-1'))


    The SAX parser calls characters() multiple times for the same text block. For example, in the input <foo>123</foo>, characters() could be called once:
    handler.characters("123")
    or twice:
    handler.characters("12")
    handler.characters("3")
    or:
    handler.characters("1")
    handler.cahraceters("23")
    or three times:
    handler.characters("1")
    handler.characters("2")
    handler.characters("3")

    If you want the whole text block, then you need to do something like this:

    in __init__:
    self.newchars = []

    in startElement:
    self.newchars = []

    in characters:
    self.newchars.append(chars)

    in endElement:
    if len(self.newchars) > 0:
    combined = "".join(self.newchars).encode('ISO-8859-1')
    print "Strean read is '%s'" % combined

    I recommend using ElementTree instead.

    - Brian
     
    Brian Smith, Feb 2, 2008
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. TOXiC
    Replies:
    5
    Views:
    1,259
    TOXiC
    Jan 31, 2007
  2. matchstick86

    sig : process vs. process(sig)

    matchstick86, Oct 12, 2009, in forum: VHDL
    Replies:
    1
    Views:
    564
    power_hf2005
    Oct 13, 2009
  3. Alextophi
    Replies:
    8
    Views:
    516
    Alan J. Flavell
    Dec 30, 2005
  4. bruce
    Replies:
    38
    Views:
    277
    Mark Lawrence
    Nov 1, 2013
  5. MRAB
    Replies:
    0
    Views:
    98
Loading...

Share This Page