decode(..., errors='ignore') has no effect

J

Jens Müller

Hi,

I try to decode a string,e.g.
u'M\xfcnchen, pronounced [\u02c8m\u028fn\xe7\u0259n]'.decode('cp1252',
'ignore')
but even thoug I use errors='ignore'
I get UnicodeEncodeError: 'charmap' codec can't encode character u'\u02c8'
in position 21: character maps to <undefined>

How come?

Thanks,
Jens
 
U

Ulrich Eckhardt

Jens said:
I try to decode a string,e.g.
u'M\xfcnchen, pronounced [\u02c8m\u028fn\xe7\u0259n]'.decode('cp1252',
'ignore')
but even thoug I use errors='ignore'
I get UnicodeEncodeError: 'charmap' codec can't encode character u'\u02c8'
in position 21: character maps to <undefined>

How come?

Wrong way? Don't you want to encode the Unicode string using codepage 1252
instead?

Uli
 
P

Peter Otten

Jens said:
I try to decode a string,e.g.
u'M\xfcnchen, pronounced [\u02c8m\u028fn\xe7\u0259n]'.decode('cp1252',
'ignore')
but even thoug I use errors='ignore'
I get UnicodeEncodeError: 'charmap' codec can't encode character u'\u02c8'
in position 21: character maps to <undefined>

How come?

To convert unicode into str you have to *encode()* it.

u"...".decode(...) will implicitly convert to ASCII first, i. e. is
equivalent to

u"...".encode("ascii").decode(...)

Hence the error message

....codec can't encode character u'\u02c8'...

Peter
 
J

Jens Müller

To convert unicode into str you have to *encode()* it.
u"...".decode(...) will implicitly convert to ASCII first, i. e. is
equivalent to

u"...".encode("ascii").decode(...)

Hence the error message

Ah - yes of course.

And how can you use the system's default encoding with errors=ignore?
The default encoding is the one that is used if no parameters are given to
"encode".

Thanks again!
 
L

Lie Ryan

Ah - yes of course.

And how can you use the system's default encoding with errors=ignore?
The default encoding is the one that is used if no parameters are given
to "encode".

Thanks again!
[\u02c8m\u028fn\xe7\u0259n]'.encode(sys.getdefaultencoding(), 'ignore')
'Mnchen, pronounced [mnn]'


unless this is for debugging, I doubt ignoring error in this particular
case is an acceptable solution (how do you pronounce [mnn]?)
 
P

Peter Otten

Lie said:
Ah - yes of course.

And how can you use the system's default encoding with errors=ignore?
The default encoding is the one that is used if no parameters are given
to "encode".

Thanks again!
[\u02c8m\u028fn\xe7\u0259n]'.encode(sys.getdefaultencoding(), 'ignore')
'Mnchen, pronounced [mnn]'


unless this is for debugging, I doubt ignoring error in this particular
case is an acceptable solution (how do you pronounce [mnn]?)

Also, I think on most systems
'ascii'

You might try
'UTF-8'

instead.

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top