G
globophobe
This is likely an easy problem; however, I couldn't think of
appropriate keywords for google:
Basically, I have some raw data that needs to be preprocessed before
it is saved to the database e.g.
In [1]: unicode_html = u'\u3055\u3080\u3044\uff0f\r\n\u3064\u3081\u305f
\u3044\r\n'
I need to turn this into an elementtree, but some of the data is
japanese whereas the rest is html. This string contains a <br />.
In [2]: e = ET.fromstring('<data>%s</data>' % unicode_html)
In [2]: e.text
Out[3]: u'\u3055\u3080\u3044\uff0f\n\u3064\u3081\u305f\u3044\n'
In [4]: len(e)
Out[4]: 0
How can I decode the unicode html <br /> into a string that
ElementTree can understand?
appropriate keywords for google:
Basically, I have some raw data that needs to be preprocessed before
it is saved to the database e.g.
In [1]: unicode_html = u'\u3055\u3080\u3044\uff0f\r\n\u3064\u3081\u305f
\u3044\r\n'
I need to turn this into an elementtree, but some of the data is
japanese whereas the rest is html. This string contains a <br />.
In [2]: e = ET.fromstring('<data>%s</data>' % unicode_html)
In [2]: e.text
Out[3]: u'\u3055\u3080\u3044\uff0f\n\u3064\u3081\u305f\u3044\n'
In [4]: len(e)
Out[4]: 0
How can I decode the unicode html <br /> into a string that
ElementTree can understand?