Unicode problem.... as always

Todd Jenista · Jul 1, 2003

I have a parser I am building with python and, unfortunately, people
have decided to put unicode characters in the files I am parsing.
The parser seems to have a fit when I search for one \uXXXX symbol,
and there is another unicode symbol in the file. In this case, a
search and replace for © with a µ in the file causes the infamous
ordinal error.
My quick-fix, because they have good context, is to change them both
to "UTF8", and then attempt to replace the UTF8 at the end with the
original µ. The problem is that I am getting a Âµ when I try to
re-insert using \u00b5 which is the UTF8 code.
Words of wisdom would be greatly appreciated.

Thomas =?ISO-8859-15?Q?G=FCttler?= · Jul 1, 2003

Todd said:
I have a parser I am building with python and, unfortunately, people
have decided to put unicode characters in the files I am parsing.

Maybe this helps you. It converts a latin1 byte to unicode
and then converts it to utf8.
You need to know the encoding of the input (utf8, utf16) .

thomas

Unicode	2	Mar 15, 2013
Python dict as unicode	1	Nov 24, 2010
Unicode strings as arguments to exceptions	3	Jan 16, 2014
Unicode help please	5	Oct 19, 2013
Thinking Unicode	0	Aug 8, 2013
Unicode codepoints	5	Jun 22, 2011
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
Logging library unicode problem	0	Aug 13, 2008

Unicode problem.... as always

Todd Jenista

Thomas =?ISO-8859-15?Q?G=FCttler?=

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads