unicode codecs

Ivan Voras · Feb 9, 2004

When concatenating strings (actually, a constant and a string...) i get
the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 1:
ordinal not in range(128)

Now I don't think either string is unicode, but I'm working with
win32api so it might be...

The point is: I know all values will fit
in a particular code page (iso-8859-2), so how do I change the 'ascii'
codec in the above error into something that will work?

Christopher Koppler · Feb 9, 2004

When concatenating strings (actually, a constant and a string...) i get
the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 1:
ordinal not in range(128)

Now I don't think either string is unicode, but I'm working with
win32api so it might be... The point is: I know all values will fit
in a particular code page (iso-8859-2), so how do I change the 'ascii'
codec in the above error into something that will work?

To get a real solution, you should also post the offending code, but
you might try to convert your values to unicode with the built-in
unicode() and the string method decode(). See the library reference
sections 2.1 and 2.2.6.

Ivan Voras · Feb 9, 2004

Christopher said:
To get a real solution, you should also post the offending code, but
you might try to convert your values to unicode with the built-in
unicode() and the string method decode(). See the library reference
sections 2.1 and 2.2.6.

I tried that, without luck. It is somewhat difficult to reproduce the
problem, but here's how I see it:

- win32api function returns a string (8bit) with some of the characters
from the upper half of code page, let's call it s1
- a statement such as a='x'+s1 fails with the above error.

I don't really know why should concatenation check if characters are
7-bit clean (or indeed if they represent anything in whatever code page).

Since win32api functions exist also in unicode version, I tried this:

- call the unicode version of function. Returned is a unicode string
(checked, it really is unicode) like u'R\xfcgenwald.txt', let's call it s2
- a statement a='x'+s2.encode('iso-8859-2') also fails with the exact
same error.

It is strange that if I execute similar code in Idle (e.g. manually
assigning string constants to variables and concatenating), everything
works!

The exact error is:
File "E:\develop\pynetdb\netdbcreate.py", line 32, in walkdirs
fullname = root+'\\'+filename
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1:
ordinal not in range(128)

The filename variable contains (in my latest effort) utf-8 encoded value
'R\xc3\xbcgenwald.mp3', and root variable contains a normal non-unicode
string.

I tried various combinations of unicode and non-unicode types, and thay
all fail sooner or later when they meet with a non-unicode string that
is not 7-bit clean.

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Feb 9, 2004

Ivan said:
When concatenating strings (actually, a constant and a string...) i get
the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 1:
ordinal not in range(128)

Now I don't think either string is unicode

This statement must be false. When concatenating two byte strings, no
codec is ever used. So, either
1. one of the strings is a Unicode objects, or
2. you are not performing concatenation, or you get the exception
from an operation that is not concatenation, or
3. you are not getting this exception.

Most likely, it is 1)

The point is: I know all values will fit
in a particular code page (iso-8859-2), so how do I change the 'ascii'
codec in the above error into something that will work?

Explicitly encode the Unicode string in your concatenation as
iso-8859-2.

Regards,
Martin

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Feb 9, 2004

Ivan said:
- win32api function returns a string (8bit) with some of the characters
from the upper half of code page, let's call it s1

Are you absolutely certain that type(s1) is str?

- a statement such as a='x'+s1 fails with the above error.

Are you absolutely certain the constant is the literal string 'x'?

I don't really know why should concatenation check if characters are
7-bit clean (or indeed if they represent anything in whatever code page).

As you have shown, there would be no need, and indeed, Python will not
check code pages in this case. So you must be doing something else.

- call the unicode version of function. Returned is a unicode string
(checked, it really is unicode) like u'R\xfcgenwald.txt', let's call it s2
- a statement a='x'+s2.encode('iso-8859-2') also fails with the exact
same error.

How do you know it is the concatenation that causes the exception?

The exact error is:
File "E:\develop\pynetdb\netdbcreate.py", line 32, in walkdirs
fullname = root+'\\'+filename
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1:
ordinal not in range(128)

The filename variable contains (in my latest effort) utf-8 encoded value
'R\xc3\xbcgenwald.mp3', and root variable contains a normal non-unicode
string.

Which string precisely (what is its repr())?

Regards,
Martin

Peter Otten · Feb 9, 2004

Ivan said:
When concatenating strings (actually, a constant and a string...) i get
the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 1:
ordinal not in range(128)

Now I don't think either string is unicode, but I'm working with
win32api so it might be... The point is: I know all values will fit
in a particular code page (iso-8859-2), so how do I change the 'ascii'
codec in the above error into something that will work?

You can either convert all strings to unicode or to iso-8859-2.
A hands on approach:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 1:
ordinal not in range(128)

This error is prevented by an explicit conversion:
'R\xfcbeR\xfcbe'

or
u'R\xfcbeR\xfcbe'

If you aren't sure which string is unicode and which is not:
.... if isinstance(s, unicode):
.... return u.encode("iso-8859-1")
.... return s
....'R\xfcbeR\xfcbe'

Peter

Peter Otten · Feb 9, 2004

Peter said:
... if isinstance(s, unicode):
... return u.encode("iso-8859-1")
... return s
...
'R\xfcbeR\xfcbe'

Oops, that should be:
.... if isinstance(t, unicode):
.... return t.encode("iso-8859-1")
.... return t
....'R\xfcbeR\xfcbe'

Ivan Voras · Feb 9, 2004

Martin said:
Are you absolutely certain that type(s1) is str?

Yes. Plain string.

Are you absolutely certain the constant is the literal string 'x'?

Um, what else could it be? This is an example, in the real case the
literal string is something else (but of the same format).

How do you know it is the concatenation that causes the exception?

What else could cause it? It's a simple command, nothing fancy - an
concatenation and assignment.

I've tried converting everything to use unicode and I'm getting *really*
weird results now - it may be a bug in the win32api library.

Ivan Voras · Feb 9, 2004

Peter said:
You can either convert all strings to unicode or to iso-8859-2.
A hands on approach:

(u'R\xfcbe', 'R\xfcbe')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 1:
ordinal not in range(128)

This error is prevented by an explicit conversion:

Thank you - I eventually found that out the hard way

It was a mix of
some bugs from my code and the win32api library code, and I was seeing
exeptions pop up from both of them depending on what conditions were
met. Eventually I seem to have found a workaround for the library bugs
but I don't like it - it's a mixup of using unicode and code-page and
converting around when necessary. The good thing is that it doesn't seem
to influence performance a lot...

(Apparently, win32file.FindFilesW does something with its parameter that
breaks with above error when the parameter is unicode.)

Thanks for the help, all!

Py3: Read file with Unicode characters	4	Apr 8, 2010
Encoding trouble when script called from application	0	Jan 14, 2014
[email protected]	0	Jan 14, 2014
Proper use of the codecs module.	3	Aug 16, 2013
Anoying unicode / str conversion problem	2	Jan 26, 2009
Unicode string formating	1	Nov 30, 2007
Cookie aint retrieving when visiting happens from a backlink.	1	Oct 25, 2013
Python dict as unicode	1	Nov 24, 2010

unicode codecs

Ivan Voras

Christopher Koppler

Ivan Voras

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Peter Otten

Peter Otten

Ivan Voras

Ivan Voras

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads