WTF? Printing unicode strings

J

John Salerno

Fredrik said:
so stdout on your machine is ascii, and you don't understand why you
cannot print a non-ascii unicode character to it? wtf?

</F>

AFAIK, I'm all ASCII (at least, I never made explicit changes to the
default Python install), so how am I able to print out the character?
 
R

Robert Kern

John said:
AFAIK, I'm all ASCII (at least, I never made explicit changes to the
default Python install), so how am I able to print out the character?

Because sys.stdout.encoding isn't determined by your Python configuration, but
your terminal's.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
S

skip

Robert> Because sys.stdout.encoding isn't determined by your Python
Robert> configuration, but your terminal's.

Learn something every day. I take it "646" is an alias for "ascii" (or vice
versa)?

% python
Python 2.4.2 (#1, Feb 23 2006, 12:48:31)
[GCC 3.4.1] on sunos5
Type "help", "copyright", "credits" or "license" for more information. (<built-in function ascii_encode>, <built-in function ascii_decode>, <class encodings.ascii.StreamReader at 0x819aa4c>, <class encodings.ascii.StreamWriter at 0x819aa1c>)

Skip
 
J

John Salerno

Robert> Because sys.stdout.encoding isn't determined by your Python
Robert> configuration, but your terminal's.

Learn something every day. I take it "646" is an alias for "ascii" (or vice
versa)?


Hmm, not that this helps me any :)
(<bound method Codec.encode of <encodings.cp1252.Codec instance at
0x009D6670>>, <bound method Codec.decode of <encodings.cp1252.Codec
instance at 0x009D6698>>, <class encodings.cp1252.StreamReader at
 
S

skip

John> Hmm, not that this helps me any :)
John> 'cp1252'

Sure it does. You can print Unicode objects which map to cp1252. I assume
that means you're on Windows or that for some perverse reason you have your
Mac's Terminal window set to cp1252. (Does it go there? I'm at work right
now so I can't check).

Skip
 
J

John Salerno

John> Hmm, not that this helps me any :)

John> 'cp1252'

Sure it does. You can print Unicode objects which map to cp1252. I assume
that means you're on Windows or that for some perverse reason you have your
Mac's Terminal window set to cp1252. (Does it go there? I'm at work right
now so I can't check).

Skip

You're right, I'm on XP. I just couldn't make sense of the lookup call,
although some of the names looked like .NET classes.
 
R

Robert Kern

Robert> Because sys.stdout.encoding isn't determined by your Python
Robert> configuration, but your terminal's.

Learn something every day. I take it "646" is an alias for "ascii" (or vice
versa)?

% python
Python 2.4.2 (#1, Feb 23 2006, 12:48:31)
[GCC 3.4.1] on sunos5
Type "help", "copyright", "credits" or "license" for more information.(<built-in function ascii_encode>, <built-in function ascii_decode>, <class encodings.ascii.StreamReader at 0x819aa4c>, <class encodings.ascii.StreamWriter at 0x819aa1c>)

Yes. In encodings/aliases.py in the standard library:

"""
aliases = {

# Please keep this list sorted alphabetically by value !

# ascii codec
'646' : 'ascii',

"""

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
R

Ron Garret

"Serge Orlov said:
That's recent enough. I guess the distribution you're using set LC_*
variables for no good reason.

Nope:

ron@www01:~$ export | grep LC
ron@www01:~$
Either unset all enviromental variables
starting with LC_ and set LANG variable or overide LC_CTYPE variable:

LC_CTYPE=en_US.utf-8 python -c "print unichr(0xbd)"

Should be working now :)

Nope:

ron@www01:~$ LC_CTYPE=en_US.utf-8 python -c "print unichr(0xbd)"
Traceback (most recent call last):
File "<string>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in
position 0: ordinal not in range(128)

rg
 
R

Ron Garret

"Serge Orlov said:
I've pulled myself together and installed linux in vwware player.
Apparently there is another way linux distributors can screw up. I
chose debian 3.1 minimal network install and after answering all
installation questions I found that only ascii and latin-1 english
locales were installed:
$ locale -a
C
en_US
en_US.iso88591
POSIX

In 2006, I would expect utf-8 english locale to be present even in
minimal install. I had to edit /etc/locale.gen and run locale-gen as
root. After that python started to print unicode characters.

That's it. Thanks!

rg
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Learn something every day. I take it "646" is an alias for "ascii" (or vice
versa)?

Usage of "646" as an alias for ASCII is primarily a Sun invention. When
ASCII became an international standard, its standard number became
ISO/IEC 646:1968. It's not *quite* the same as ASCII, as it leaves a
certain number of code points unassigned that ASCII defines (most
notably, the dollar sign, and the square and curly braces). What Sun
means is probably the "International Reference Version" of ISO 646,
which is (now) identical to ASCII.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top