python3 - the hardest hello world ever ?

Helmut Jarausch · Oct 14, 2008

Hi,

do I miss something (I do hope so) or is switching to Python3
really hard for Latin1-users?

My simplest hello world script - which uses a few German
umlaut characters - doesn't look very intuitive.
I have to set an internal property (with leading underscore)
for each output file I'm using - right?

#!/usr/local/bin/python3.0
# _*_ coding: latin1 _*_

import sys

# the following call doesn't do the job
# sys.setfilesystemencoding('latin1')

# but this ugly one (to be done for each output file)
sys.stdout._encoding='latin1'

print("Hallo, Süßes Python")

Thanks for any enlightening on that subject,
Helmut.

--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany

pjacobi.de · Oct 14, 2008

Hi Helmut, All,

do I miss something (I do hope so) or is switching to Python3
really hard for Latin1-users?

It's as complicated as ever -- if you have used unicode strings
in the past (as the 3.0 strings now are always unicode strings).

# sys.setfilesystemencoding('latin1')

This cares about the character encoding in filenames, not
in file content.

sys.setdefaultencoding('iso-8859-1') # or 'latin1'
would do the job, but only in sitecustomize.py. After
initializing, the function is no longer available.

And using it in sitecustomize.py is sort of discouraged.

IMHO the assumptions the typical Python installation makes
about the character encoding used in the system are much too
conservative. E.g. under Windows it should it use
GetLocaleInfo (LOCALE_USER_DEFAULT, LOCALE_IDEFAULTANSICODEPAGE, ...).

Then a lot of things would work out of the box. Of course
including some methods to shoot yourself in the foot, which
you are prevented from by the current behaviour.

Regards,
Peter

Martin v. Löwis · Oct 14, 2008

do I miss something (I do hope so) or is switching to Python3

really hard for Latin1-users?

Why do you want to switch? sys.stdout.encoding should already be
iso-8859-1, if you are a Latin1-user.

Regards,
Martin

Brian Quinlan · Oct 14, 2008

Hey Helmut,

Did you try just:

print("Hallo, Süßes Python")

Cheers,
Brian

Helmut Jarausch · Oct 15, 2008

Martin said:
Why do you want to switch? sys.stdout.encoding should already be
iso-8859-1, if you are a Latin1-user.

What defines me as latin1-user?

commenting
# sys.stdout._encoding='latin1'

I get
Traceback (most recent call last):
File "latin1.py", line 8, in <module>

File "/usr/local/lib/python3.0/io.py", line 1485, in write
b = encoder.encode(s)
File "/usr/local/lib/python3.0/encodings/ascii.py", line 22, in encode
return codecs.ascii_encode(input, self.errors)[0]
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2:
ordinal not in range(128)

So my system seems to be an ASCII system?

Thanks,
Helmut

--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany

Helmut Jarausch · Oct 15, 2008

Ben said:
If you're referring to the source encoding declaration: No,
underscores have no effect. The specification is at

I'm not sure why you use underscores in this line. The usual form is
to use a mode line as recognised by Emacs::

# -*- coding: latin1 -*-

or Vim::

# vim: fileencoding=latin1 :

No, I meant the underscore in sys.stdout._encoding='latin1'
^

As for the source encoding, I have used the underscore version
which seems to work, as well.

Thanks,
Helmut.

--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany

Helmut Jarausch · Oct 15, 2008

Brian said:
Hey Helmut,

Did you try just:

print("Hallo, Süßes Python")

Yes, but that doesn't work here.
Please see my reply to Martin's reply.

Thanks,
Helmut.

--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany

Helmut Jarausch · Oct 15, 2008

Brian said:
Hey Helmut,

Did you try just:

print("Hallo, Süßes Python")

Yes, but that doesn't work here.
Please see my reply to Martin's reply.

Thanks,
Helmut.

--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany

Orestis Markou · Oct 15, 2008

I would just use UTF-8 and be done with it.

Set your editor to write UTF-8 files, set the correct #coding at your
python script, make sure your terminal supports outputting UTF-8
characters (and your font has the correct glyphs) and everything
should be fine. No trickery required.

Even for Python 2.x, the only extra thing needed was the u"" kind of
strings. No other trickery in sys.stdout required. What platform do
you use?

Orestis

Paul Boddie · Oct 15, 2008

What defines me as latin1-user?

What does sys.stdout.encoding say? In Python 2.x, at least, that
attribute should reflect the capabilities of your environment
(specifically, the character encoding) and help determine whether it
makes sense for Python to try and encode Unicode objects (plain
strings in Python 3.x) using a particular output encoding when
printing those objects to the display.

Paul

Helmut Jarausch · Oct 15, 2008

Paul said:
What does sys.stdout.encoding say? In Python 2.x, at least, that

It says ansi_x3.4-1968

Where can I change this?

attribute should reflect the capabilities of your environment
(specifically, the character encoding) and help determine whether it
makes sense for Python to try and encode Unicode objects (plain
strings in Python 3.x) using a particular output encoding when
printing those objects to the display.

Thanks,
Helmut.

--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany

Paul Boddie · Oct 15, 2008

It says ansi_x3.4-1968

That's ASCII, yes.

Where can I change this?

What's your locale? I can provoke the same setting if I run a Python
program like this:

LC_ALL=en_US.ascii python xxx.py

Are you running some kind of GNU/Linux distribution or something else?
If the former, have you installed various language/locale packages? If
you're not sure, which language or country did you select when
installing or configuring your system? This may seem like an odd line
of questioning, but UNIX-like systems have a history of treating
everything as bytes, which works acceptably until you have to take a
stand on what those bytes mean.

Another important question: what does Python 2.x do with the following
program...?

import sys
print sys.stdout.encoding
print u"\xe6\xf8\xe5"

You should get three Scandinavian characters if the encoding and
locales match. Otherwise, you'll either get a different output
(indicating a mismatch) or an error (indicating that the environment
cannot handle the characters output by the program). Sometimes you can
persuade a terminal to use a different character set, and this might
help, too.

Paul

Diez B. Roggisch · Oct 15, 2008

Helmut said:
It says ansi_x3.4-1968

Where can I change this?

By changing your console's terminal settings. See what

locale -a

outputs.

See this:

(devtools)dir@client8049:~$ locale -a
C
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZW.utf8
POSIX
(devtools)dir@client8049:~$ python
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Welcome to rlcompleter2 0.96

Diez

Martin v. Löwis · Oct 15, 2008

What defines me as latin1-user?

That your locale is based on Latin-1, e.g. because it is a German
locale. How precisely that works depends on the operating system.

So my system seems to be an ASCII system?

At least that's what Python determined. If Python couldn't have found
out that you usually use Latin-1, your system is misconfigured. If
Python could have found out, but failed to do so, it's a bug in Python.

Regards,
Martin

Helmut Jarausch · Oct 16, 2008

Martin said:
That your locale is based on Latin-1, e.g. because it is a German
locale. How precisely that works depends on the operating system.

At least that's what Python determined. If Python couldn't have found
out that you usually use Latin-1, your system is misconfigured. If
Python could have found out, but failed to do so, it's a bug in Python.

Many thanks, it works when setting the LANG environment variable.

Still, I wished it were possible call sys.setdefaultencoding
at the very beginning of a script.

Why isn't that possible?

Helmut.

--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany

Martin v. Löwis · Oct 16, 2008

Still, I wished it were possible call sys.setdefaultencoding

at the very beginning of a script.

Why isn't that possible?

The default encoding was used when combining byte-oriented
text and unicode-oriented text. Such combination is no longer
supported, hence the notion of a default encoding
has disappeared. You have to perform conversion between bytes
and strings now explicitly.

Regards,
Martin

Helmut Jarausch · Oct 16, 2008

Martin said:
The default encoding was used when combining byte-oriented
text and unicode-oriented text. Such combination is no longer
supported, hence the notion of a default encoding
has disappeared. You have to perform conversion between bytes
and strings now explicitly.

I meant setting the default encoding which is used by print (e.g.) when
outputting the internal unicode string to a file.
As far as I understood, currently I am fixed to setting either
the 'locale' or to switch settings for each output file (by settting
the _encoding property.
I wished I could override the locale settings within a Python script.

Thanks,
Helmut.

--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany

Paul Boddie · Oct 16, 2008

I meant setting the default encoding which is used by print (e.g.) when
outputting the internal unicode string to a file.
As far as I understood, currently I am fixed to setting either
the 'locale' or to switch settings for each output file (by settting
the _encoding property.
I wished I could override the locale settings within a Python script.

You could use the locale module. ;-)

But seriously, I'd like to know whether the program I posted works
with Python 2.x because there could be differences between 2.x and
3.x, and we'd obviously like to solve your problems regardless of
which Python version you're using.

Paul

Helmut Jarausch · Oct 16, 2008

Paul said:
You could use the locale module. ;-)

But seriously, I'd like to know whether the program I posted works
with Python 2.x because there could be differences between 2.x and
3.x, and we'd obviously like to solve your problems regardless of
which Python version you're using.

Yes, of course.
I have always worked with latin-1 strings with an US locale under
python-2.x with x < 6 (I haven't tried 2.6, though). I hope to switch to 3.0
as soon as possible.

--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany

Martin v. Löwis · Oct 16, 2008

I meant setting the default encoding which is used by print (e.g.) when

outputting the internal unicode string to a file.

Having such a thing would be conceptually wrong. What encoding should
be used depends on the file - different files may have different
encodings. When opening a file, you need to specify the encoding.

As far as I understood, currently I am fixed to setting either
the 'locale' or to switch settings for each output file (by settting
the _encoding property.

That's not true. You can also specify the encoding when opening the file

I wished I could override the locale settings within a Python script.

You can monkey-patch locale.getpreferredencoding, which is used when
determining what encoding to use when opening new files. I don't
recommend doing so, though.

Regards,
Martin

External Hashing [was Re: matching strings in a large set of strings]	3	Apr 30, 2010
append to a sublist - please help	2	Apr 6, 2008
How to search this newsgroup by a python script.	2	Jul 16, 2009
ReSTedit ported to Linux?	0	Jul 4, 2007
Python-3.2 (SVN) bug [was syntax question]	2	Oct 12, 2009
ctypes CDLL - which paths are searched?	4	Jan 21, 2008
[2to3] Bug converting import	1	Jun 23, 2008
2to3 bug and question	2	Jun 23, 2008

python3 - the hardest hello world ever ?

Helmut Jarausch

pjacobi.de

Martin v. Löwis

Brian Quinlan

Helmut Jarausch

Helmut Jarausch

Helmut Jarausch

Helmut Jarausch

Orestis Markou

Paul Boddie

Helmut Jarausch

Paul Boddie

Diez B. Roggisch

Martin v. Löwis

Helmut Jarausch

Martin v. Löwis

Helmut Jarausch

Paul Boddie

Helmut Jarausch

Martin v. Löwis

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads