Unicode problem.... as always

T

Todd Jenista

I have a parser I am building with python and, unfortunately, people
have decided to put unicode characters in the files I am parsing.
The parser seems to have a fit when I search for one \uXXXX symbol,
and there is another unicode symbol in the file. In this case, a
search and replace for © with a µ in the file causes the infamous
ordinal error.
My quick-fix, because they have good context, is to change them both
to "UTF8", and then attempt to replace the UTF8 at the end with the
original µ. The problem is that I am getting a µ when I try to
re-insert using \u00b5 which is the UTF8 code.
Words of wisdom would be greatly appreciated.
 
T

Thomas =?ISO-8859-15?Q?G=FCttler?=

Todd said:
I have a parser I am building with python and, unfortunately, people
have decided to put unicode characters in the files I am parsing.

Maybe this helps you. It converts a latin1 byte to unicode
and then converts it to utf8.
You need to know the encoding of the input (utf8, utf16) .

thomas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top