Unicode to HTML entities

Discussion in 'Python' started by Clodoaldo, May 29, 2007.

  1. Clodoaldo

    Clodoaldo Guest

    I was looking for a function to transform a unicode string into
    htmlentities. Not only the usual html escaping thing but all
    characters.

    As I didn't find I wrote my own:

    # -*- coding: utf-8 -*-
    from htmlentitydefs import codepoint2name

    def unicode2htmlentities(u):

    htmlentities = list()

    for c in u:
    if ord(c) < 128:
    htmlentities.append(c)
    else:
    htmlentities.append('&%s;' % codepoint2name[ord(c)])

    return ''.join(htmlentities)

    print unicode2htmlentities(u'São Paulo')

    Is there a function like that in one of python builtin modules? If not
    is there a better way to do it?

    Regards, Clodoaldo Pinto Neto
     
    Clodoaldo, May 29, 2007
    #1
    1. Advertising

  2. "Clodoaldo" <> wrote in message
    news:...

    >I was looking for a function to transform a unicode string into
    >htmlentities.


    >>> u'São Paulo'.encode('ascii', 'xmlcharrefreplace')

    'São Paulo'
     
    Richard Brodie, May 29, 2007
    #2
    1. Advertising

  3. Clodoaldo

    Clodoaldo Guest

    On May 29, 12:57 pm, "Richard Brodie" <> wrote:
    > "Clodoaldo" <> wrote in message
    >
    > news:...
    >
    > >I was looking for a function to transform a unicode string into
    > >htmlentities.
    > >>> u'São Paulo'.encode('ascii', 'xmlcharrefreplace')

    >
    > 'São Paulo'


    That was a fast answer. I would never find that myself.

    Thanks, Clodoaldo
     
    Clodoaldo, May 29, 2007
    #3
  4. Clodoaldo

    Duncan Booth Guest

    Clodoaldo <> wrote:

    > On May 29, 12:57 pm, "Richard Brodie" <> wrote:
    >> "Clodoaldo" <> wrote in message
    >>
    >> news:...
    >>
    >> >I was looking for a function to transform a unicode string into
    >> >htmlentities.
    >> >>> u'São Paulo'.encode('ascii', 'xmlcharrefreplace')

    >>
    >> 'São Paulo'

    >
    > That was a fast answer. I would never find that myself.
    >

    You might actually want:

    >>> cgi.escape(u'São Paulo & Espírito Santo').encode('ascii', 'xmlcharrefreplace')

    'São Paulo &amp; Espírito Santo'

    as you have to be sure to escape any ampersands in your unicode
    string before doing the encode.
     
    Duncan Booth, May 30, 2007
    #4
  5. On 29 maj 2007, at 17.52, Clodoaldo wrote:

    > I was looking for a function to transform a unicode string into
    > htmlentities. Not only the usual html escaping thing but all
    > characters.
    >
    > As I didn't find I wrote my own:
    >
    > # -*- coding: utf-8 -*-
    > from htmlentitydefs import codepoint2name
    >
    > def unicode2htmlentities(u):
    >
    > htmlentities = list()
    >
    > for c in u:
    > if ord(c) < 128:
    > htmlentities.append(c)
    > else:
    > htmlentities.append('&%s;' % codepoint2name[ord(c)])
    >
    > return ''.join(htmlentities)
    >
    > print unicode2htmlentities(u'São Paulo')
    >
    > Is there a function like that in one of python builtin modules? If not
    > is there a better way to do it?
    >
    > Regards, Clodoaldo Pinto Neto
    >

    In many cases, the need to use html/xhtml entities can be avoided by
    generating
    utf8- coded pages.
    ------------------------------------------------------
    "Home is not where you are born, but where your heart finds peace" -
    Tommy Nordgren, "The dying old crone"
     
    Tommy Nordgren, May 30, 2007
    #5
  6. Clodoaldo

    Clodoaldo Guest

    On May 30, 8:53 am, Tommy Nordgren <> wrote:
    > On 29 maj 2007, at 17.52, Clodoaldo wrote:
    >
    >
    >
    > > I was looking for a function to transform a unicode string into
    > > htmlentities. Not only the usual html escaping thing but all
    > > characters.

    >
    > > As I didn't find I wrote my own:

    >
    > > # -*- coding: utf-8 -*-
    > > from htmlentitydefs import codepoint2name

    >
    > > def unicode2htmlentities(u):

    >
    > > htmlentities = list()

    >
    > > for c in u:
    > > if ord(c) < 128:
    > > htmlentities.append(c)
    > > else:
    > > htmlentities.append('&%s;' % codepoint2name[ord(c)])

    >
    > > return ''.join(htmlentities)

    >
    > > print unicode2htmlentities(u'São Paulo')

    >
    > > Is there a function like that in one of python builtin modules? If not
    > > is there a better way to do it?

    >
    > > Regards, Clodoaldo Pinto Neto

    >
    > In many cases, the need to use html/xhtml entities can be avoided by
    > generating
    > utf8- coded pages.


    Sure. All my pages are utf-8 encoded. The case I'm dealing with is an
    email link which subject has non ascii characters like in:

    <a href=mailto:?subject=Dúvidas>Mail to</a>

    Somehow when the user clicks on the link the subject goes to his email
    client with the non ascii chars as garbage.

    And before someone points that I should not expose email addresses,
    the email is only linked with the consent of the owner and the source
    is obfuscated to make it harder for a robot to harvest it.

    Regards, Clodoaldo
     
    Clodoaldo, May 30, 2007
    #6
  7. Clodoaldo

    Clodoaldo Guest

    On May 30, 4:25 am, Duncan Booth <> wrote:
    > Clodoaldo <> wrote:
    > > On May 29, 12:57 pm, "Richard Brodie" <> wrote:
    > >> "Clodoaldo" <> wrote in message

    >
    > >>news:...

    >
    > >> >I was looking for a function to transform a unicode string into
    > >> >htmlentities.
    > >> >>> u'São Paulo'.encode('ascii', 'xmlcharrefreplace')

    >
    > >> 'São Paulo'

    >
    > > That was a fast answer. I would never find that myself.

    >
    > You might actually want:
    >
    > >>> cgi.escape(u'São Paulo & Espírito Santo').encode('ascii', 'xmlcharrefreplace')

    >
    > 'São Paulo &amp; Espírito Santo'
    >
    > as you have to be sure to escape any ampersands in your unicode
    > string before doing the encode.


    I will do it. Thanks.

    Regards, Clodoaldo.
     
    Clodoaldo, May 30, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steven D'Aprano

    Convert from unicode chars to HTML entities

    Steven D'Aprano, Jan 29, 2007, in forum: Python
    Replies:
    8
    Views:
    687
    Roberto Bonvallet
    Feb 8, 2007
  2. ldng
    Replies:
    3
    Views:
    1,962
    Tim Golden
    May 10, 2007
  3. Beat Richli

    ASP converts Unicode Chars to HTML entities?

    Beat Richli, Sep 5, 2005, in forum: ASP General
    Replies:
    2
    Views:
    560
    Beat Richli
    Sep 7, 2005
  4. Mr Peepers
    Replies:
    4
    Views:
    328
    Mr Peepers
    Sep 26, 2010
  5. Jim Higson
    Replies:
    3
    Views:
    250
    Eric Amick
    Jul 25, 2004
Loading...

Share This Page