unescape HTML entities

R

Rares Vernica

Hi,

How can I unescape HTML entities like " "?

I know about xml.sax.saxutils.unescape() but it only deals with "&",
"<", and ">".

Also, I know about htmlentitydefs.entitydefs, but not only this
dictionary is the opposite of what I need, it does not have " ".

It has to be in python 2.4.

Thanks a lot,
Ray
 
J

Jim

Rares said:
How can I unescape HTML entities like " "?
Can I ask what you mean by "unescaping"? Do you mean converting into
numeric references? Into Unicode?

Jim
 
K

Klaus Alexander Seistrup

Rares said:
How can I unescape HTML entities like " "?

I know about xml.sax.saxutils.unescape() but it only deals with
"&", "<", and ">".

Also, I know about htmlentitydefs.entitydefs, but not only this
dictionary is the opposite of what I need, it does not have
" ".

How about something like:

#v+
#!/usr/bin/env/python
'''dehtml.py'''

import re
import htmlentitydef

myrx = re.compile('&(' + '|'.join(htmlentitydefs.name2codepoint.keys()) + ');')

def dehtml(s):
return re.sub(
myrx,
lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]),
s
)
# end def dehtml

if __name__ == '__main__':
import sys
print dehtml(sys.stdin.read()).encode('utf-8')
# end if

#v-

E.g.:

#v+

$ echo 'frække frølår' | ./dehtml.py
frække frølår
$

#v-
 
R

Rares Vernica

Hi,

How does your code deal with ' like entities?

Thanks,
Ray
Rares said:
How can I unescape HTML entities like " "?

I know about xml.sax.saxutils.unescape() but it only deals with
"&", "<", and ">".

Also, I know about htmlentitydefs.entitydefs, but not only this
dictionary is the opposite of what I need, it does not have
" ".

How about something like:

#v+
#!/usr/bin/env/python
'''dehtml.py'''

import re
import htmlentitydef

myrx = re.compile('&(' + '|'.join(htmlentitydefs.name2codepoint.keys()) + ');')

def dehtml(s):
return re.sub(
myrx,
lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]),
s
)
# end def dehtml

if __name__ == '__main__':
import sys
print dehtml(sys.stdin.read()).encode('utf-8')
# end if

#v-

E.g.:

#v+

$ echo 'frække frølår' | ./dehtml.py
frække frølår
$

#v-
 
K

Klaus Alexander Seistrup

Rares said:
How does your code deal with ' like entities?

It doesn't, it deals with named entities only. But take a look
at Fredrik's example.

Cheers,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top