unicode wrap unicode object?

ygao · Apr 8, 2006

import syshow do I get ss from s?
Can there be a way do this?
thanks!

Fredrik Lundh · Apr 8, 2006

hmm. what kind of bootleg python is that ?
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'module' object has no attribute 'setdefaultencoding'

(you're not supposed to change the default encoding. don't
do that; it'll only cause problems in the long run).

how do I get ss from s?
Can there be a way do this?

you have UTF-8 *bytes* in a Unicode text string? sounds like
someone's made a mistake earlier on...

anyway, iso-8859-1 is, in practice, a null transform, that simply
converts unicode characters to bytes:
'CJK UNIFIED IDEOGRAPH-9AD8'

but it's probably better to fix the code that puts UTF-8 data in your
Unicode strings (look for bogus iso-8859-1 conversions)

</F>

ygao · Apr 8, 2006

sorry,my poor english.
I got a solution from others.
I must use utf-8 for chinese.

ygao · Apr 8, 2006

sorry,my poor english.
I got a solution from others.
I must use utf-8 for chinese.True

Fredrik Lundh · Apr 8, 2006

"ygao" wrpte_

I must use utf-8 for chinese.

yeah, but you shouldn't store it in a *Unicode* string. Unicode strings
are designed to hold things that you've already decoded (that is, your
chinese text), not the raw UTF-8 bytes.

if you store the UTF-8 in an ordinary 8-bit string instead, you can use
the unicode constructor to convert things properly:

b = "... some utf-8 data ..."

# turn it into a unicode string
u = unicode(b, "utf-8")

# ... do something with it ...

# turn it back into a utf-8 string
s = u.encode("utf-8")

# or use some other encoding
s = u.encode("big5")

e.g.
'\xb0\xaa'

</F>

ygao · Apr 8, 2006

thanks for your advice.

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Apr 8, 2006

ygao said:
I must use utf-8 for chinese.

Sure. But please don't do that:

As Fredrik says, you should really avoid changing the
default encoding.

True

Ok. But how about that:

py> s='\xe9\xab\x98'
py> ss=u'\u9ad8'
py> s1=s.decode('utf-8')
py> s1==ss
True

Here, ss is a single character, which uses 3 bytes in UTF-8.
In your example, ss has three characters, which are not Chinese,
but European.

Regards,
Martin

Unicode	2	Mar 15, 2013
Unicode	20	Dec 16, 2012
Unicode questions	17	Oct 19, 2010
string to unicode	0	Aug 15, 2011
split lines from stdin into a list of unicode strings	0	Aug 28, 2013
Python dict as unicode	1	Nov 24, 2010
API for custom Unicode error handlers	5	Oct 4, 2013
Python Unicode handling wins again -- mostly	67	Nov 30, 2013

unicode wrap unicode object?

ygao

Fredrik Lundh

ygao

ygao

Fredrik Lundh

ygao

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads