W
William Heymann
How do I decode a string back to useful unicode that has xml numeric character
references in it?
Things like 占
references in it?
Things like 占
Try something like this:William Heymann said:How do I decode a string back to useful unicode that has xml numeric
character references in it?
Things like 占
)
return match.group(0)
return EntityPattern.sub(unescape, s.decode(encoding))
Obviously if you really do only want numeric references you can take out
the lines using name2codepoint and simplify the regex.
How do I decode a string back to useful unicode that has xml numeric character
references in it?
Things like 占
BeautifulSoup can handle two of the three formats for html entities.
For instance, an 'o' with umlaut can be represented in three different
ways:
&_ouml_;
ö
ö
7stud said:lol. It's hard to even make posts about this stuff because html
entities get converted by the forum software. Here are the three
different formats for an 'o with umlaut' with some underscores added
to keep the forum software from rendering the characters:
&_ouml_;
&_#246_;
&_#xf6_;
7stud said:For instance, an 'o' with umlaut can be represented in three
different ways:
'&' followed by 'ouml;'
'&' followed by '#246;'
'&' followed by '#xf6;'
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.