Convert xml symbol notation

Discussion in 'Python' started by dumbkiwi, Apr 6, 2007.

  1. dumbkiwi

    dumbkiwi Guest

    Hi,

    I'm working on a script to download and parse a web page, and it
    includes xml symbol notation, such as ' for the ' character. Does
    anyone know of a pre-existing python script/lib to convert the xml
    notation back to the actual symbol it represents?
     
    dumbkiwi, Apr 6, 2007
    #1
    1. Advertisements

  2. dumbkiwi wrote:

    > On Apr 7, 5:23 pm, "Gabriel Genellina" <> wrote:


    >>Try the htmlentitydefs module.

    >
    > Is that a standard module? I can't see it anywhere - googled it.


    Sure! For quite a while, at least, since Python 1.5 (I can't go earlier
    in time...)
    http://svn.python.org/view/python/trunk/Lib/htmlentitydefs.py
    Added Wed Sep 27 16:22:08 1995 UTC (11 years, 6 months ago) by guido

    --
    Gabriel Genellina
     
    Gabriel Genellina, Apr 7, 2007
    #2
    1. Advertisements

  3. dumbkiwi wrote:

    > I'm working on a script to download and parse a web page, and it
    > includes xml symbol notation, such as ' for the ' character. Does
    > anyone know of a pre-existing python script/lib to convert the xml
    > notation back to the actual symbol it represents?


    Try the htmlentitydefs module.

    --
    Gabriel Genellina
     
    Gabriel Genellina, Apr 7, 2007
    #3
  4. dumbkiwi

    dumbkiwi Guest

    On Apr 7, 5:23 pm, "Gabriel Genellina" <> wrote:
    > dumbkiwi wrote:
    > > I'm working on a script to download and parse a web page, and it
    > > includes xml symbol notation, such as ' for the ' character. Does
    > > anyone know of a pre-existing python script/lib to convert the xml
    > > notation back to the actual symbol it represents?

    >
    > Try the htmlentitydefs module.


    Is that a standard module? I can't see it anywhere - googled it.
     
    dumbkiwi, Apr 7, 2007
    #4
  5. >> I'm working on a script to download and parse a web page, and it
    >> includes xml symbol notation, such as ' for the ' character. Does
    >> anyone know of a pre-existing python script/lib to convert the xml
    >> notation back to the actual symbol it represents?

    >
    > Try the htmlentitydefs module.


    That won't help: this is a character reference, not an entity reference.
    htmlentitydefs only contains the definitions of entities.

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Apr 7, 2007
    #5
  6. > I'm working on a script to download and parse a web page, and it
    > includes xml symbol notation, such as ' for the ' character. Does
    > anyone know of a pre-existing python script/lib to convert the xml
    > notation back to the actual symbol it represents?


    If you have this given in an XML file (rather than an HTML file which
    is not well-formed XML), you could use an XML parser for the entire
    file. This would automatically unescape character references. Likewise,
    you can parse it with HTMLParser, which will invoke the handle_charref
    method for these.

    If you just want to unescape references, you can use the code in

    http://effbot.org/zone/re-sub.htm

    HTH,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Apr 7, 2007
    #6
  7. Martin v. Löwis wrote:

    > >> I'm working on a script to download and parse a web page, and it
    > >> includes xml symbol notation, such as ' for the ' character. Does

    > >
    > > Try the htmlentitydefs module.

    >
    > That won't help: this is a character reference, not an entity reference.
    > htmlentitydefs only contains the definitions of entities.


    Ouch! Sorry.

    --
    Gabriel Genellina
     
    Gabriel Genellina, Apr 7, 2007
    #7
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ruslan Spivak

    how to convert ip address to dot notation

    Ruslan Spivak, Aug 4, 2003, in forum: Python
    Replies:
    0
    Views:
    519
    Ruslan Spivak
    Aug 4, 2003
  2. baumann@pan
    Replies:
    1
    Views:
    985
    Richard Bos
    Apr 15, 2005
  3. Grey Squirrel

    Hungarian Notation Vs. Pascal Notation?

    Grey Squirrel, Mar 19, 2007, in forum: ASP .Net
    Replies:
    6
    Views:
    1,715
    Steve C. Orr [MCSD, MVP, CSM, ASP Insider]
    Mar 21, 2007
  4. Tameem
    Replies:
    454
    Views:
    15,518
  5. Song Ma
    Replies:
    2
    Views:
    429
    Charles Oliver Nutter
    Jul 20, 2008
  6. Robert Mark Bram

    Dot notation V Bracket notation

    Robert Mark Bram, Jul 4, 2003, in forum: Javascript
    Replies:
    3
    Views:
    641
    Robert Mark Bram
    Jul 5, 2003
  7. Replies:
    6
    Views:
    3,467
  8. Bart Vandewoestyne
    Replies:
    8
    Views:
    1,123
    Bart Vandewoestyne
    Sep 25, 2012
Loading...