python3 - the hardest hello world ever ?

Discussion in 'Python' started by Helmut Jarausch, Oct 14, 2008.

  1. Hi,

    do I miss something (I do hope so) or is switching to Python3
    really hard for Latin1-users?

    My simplest hello world script - which uses a few German
    umlaut characters - doesn't look very intuitive.
    I have to set an internal property (with leading underscore)
    for each output file I'm using - right?

    #!/usr/local/bin/python3.0
    # _*_ coding: latin1 _*_

    import sys

    # the following call doesn't do the job
    # sys.setfilesystemencoding('latin1')

    # but this ugly one (to be done for each output file)
    sys.stdout._encoding='latin1'

    print("Hallo, Süßes Python")


    Thanks for any enlightening on that subject,
    Helmut.

    --
    Helmut Jarausch

    Lehrstuhl fuer Numerische Mathematik
    RWTH - Aachen University
    D 52056 Aachen, Germany
    Helmut Jarausch, Oct 14, 2008
    #1
    1. Advertising

  2. Helmut Jarausch

    Guest

    Hi Helmut, All,


    > do I miss something (I do hope so) or is switching to Python3
    > really hard for Latin1-users?


    It's as complicated as ever -- if you have used unicode strings
    in the past (as the 3.0 strings now are always unicode strings).

    > # sys.setfilesystemencoding('latin1')

    This cares about the character encoding in filenames, not
    in file content.

    sys.setdefaultencoding('iso-8859-1') # or 'latin1'
    would do the job, but only in sitecustomize.py. After
    initializing, the function is no longer available.

    And using it in sitecustomize.py is sort of discouraged.

    IMHO the assumptions the typical Python installation makes
    about the character encoding used in the system are much too
    conservative. E.g. under Windows it should it use
    GetLocaleInfo (LOCALE_USER_DEFAULT, LOCALE_IDEFAULTANSICODEPAGE, ...).

    Then a lot of things would work out of the box. Of course
    including some methods to shoot yourself in the foot, which
    you are prevented from by the current behaviour.


    Regards,
    Peter
    , Oct 14, 2008
    #2
    1. Advertising

  3. > do I miss something (I do hope so) or is switching to Python3
    > really hard for Latin1-users?


    Why do you want to switch? sys.stdout.encoding should already be
    iso-8859-1, if you are a Latin1-user.

    Regards,
    Martin
    Martin v. Löwis, Oct 14, 2008
    #3
  4. Hey Helmut,

    Did you try just:

    print("Hallo, Süßes Python")

    Cheers,
    Brian

    Helmut Jarausch wrote:
    > Hi,
    >
    > do I miss something (I do hope so) or is switching to Python3
    > really hard for Latin1-users?
    >
    > My simplest hello world script - which uses a few German
    > umlaut characters - doesn't look very intuitive.
    > I have to set an internal property (with leading underscore)
    > for each output file I'm using - right?
    >
    > #!/usr/local/bin/python3.0
    > # _*_ coding: latin1 _*_
    >
    > import sys
    >
    > # the following call doesn't do the job
    > # sys.setfilesystemencoding('latin1')
    >
    > # but this ugly one (to be done for each output file)
    > sys.stdout._encoding='latin1'
    >
    > print("Hallo, Süßes Python")
    >
    >
    > Thanks for any enlightening on that subject,
    > Helmut.
    >
    Brian Quinlan, Oct 14, 2008
    #4
  5. Martin v. Löwis wrote:
    >> do I miss something (I do hope so) or is switching to Python3
    >> really hard for Latin1-users?

    >
    > Why do you want to switch? sys.stdout.encoding should already be
    > iso-8859-1, if you are a Latin1-user.
    >


    What defines me as latin1-user?

    commenting
    # sys.stdout._encoding='latin1'


    I get
    Traceback (most recent call last):
    File "latin1.py", line 8, in <module>

    File "/usr/local/lib/python3.0/io.py", line 1485, in write
    b = encoder.encode(s)
    File "/usr/local/lib/python3.0/encodings/ascii.py", line 22, in encode
    return codecs.ascii_encode(input, self.errors)[0]
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2:
    ordinal not in range(128)

    So my system seems to be an ASCII system?


    Thanks,
    Helmut

    --
    Helmut Jarausch

    Lehrstuhl fuer Numerische Mathematik
    RWTH - Aachen University
    D 52056 Aachen, Germany
    Helmut Jarausch, Oct 15, 2008
    #5
  6. Ben Finney wrote:
    > Helmut Jarausch <> writes:
    >
    >> I have to set an internal property (with leading underscore)
    >> for each output file I'm using - right?

    >
    > If you're referring to the source encoding declaration: No,
    > underscores have no effect. The specification is at
    > <URL:http://www.python.org/doc/2.5.2/ref/encodings.html>.
    >
    >> #!/usr/local/bin/python3.0
    >> # _*_ coding: latin1 _*_

    >
    > I'm not sure why you use underscores in this line. The usual form is
    > to use a mode line as recognised by Emacs::
    >
    > # -*- coding: latin1 -*-
    >
    > or Vim::
    >
    > # vim: fileencoding=latin1 :
    >


    No, I meant the underscore in sys.stdout._encoding='latin1'
    ^

    As for the source encoding, I have used the underscore version
    which seems to work, as well.

    Thanks,
    Helmut.


    --
    Helmut Jarausch

    Lehrstuhl fuer Numerische Mathematik
    RWTH - Aachen University
    D 52056 Aachen, Germany
    Helmut Jarausch, Oct 15, 2008
    #6
  7. Brian Quinlan wrote:
    > Hey Helmut,
    >
    > Did you try just:
    >
    > print("Hallo, Süßes Python")
    >


    Yes, but that doesn't work here.
    Please see my reply to Martin's reply.

    Thanks,
    Helmut.



    --
    Helmut Jarausch

    Lehrstuhl fuer Numerische Mathematik
    RWTH - Aachen University
    D 52056 Aachen, Germany
    Helmut Jarausch, Oct 15, 2008
    #7
  8. Brian Quinlan wrote:
    > Hey Helmut,
    >
    > Did you try just:
    >
    > print("Hallo, Süßes Python")
    >


    Yes, but that doesn't work here.
    Please see my reply to Martin's reply.

    Thanks,
    Helmut.



    --
    Helmut Jarausch

    Lehrstuhl fuer Numerische Mathematik
    RWTH - Aachen University
    D 52056 Aachen, Germany
    Helmut Jarausch, Oct 15, 2008
    #8
  9. I would just use UTF-8 and be done with it.

    Set your editor to write UTF-8 files, set the correct #coding at your
    python script, make sure your terminal supports outputting UTF-8
    characters (and your font has the correct glyphs) and everything
    should be fine. No trickery required.

    Even for Python 2.x, the only extra thing needed was the u"" kind of
    strings. No other trickery in sys.stdout required. What platform do
    you use?

    Orestis
    --

    http://orestis.gr/




    On 15 Oct 2008, at 11:12, Helmut Jarausch wrote:

    > Brian Quinlan wrote:
    >> Hey Helmut,
    >> Did you try just:
    >> print("Hallo, Süßes Python")

    >
    > Yes, but that doesn't work here.
    > Please see my reply to Martin's reply.
    >
    > Thanks,
    > Helmut.
    >
    >
    >
    > --
    > Helmut Jarausch
    >
    > Lehrstuhl fuer Numerische Mathematik
    > RWTH - Aachen University
    > D 52056 Aachen, Germany
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    Orestis Markou, Oct 15, 2008
    #9
  10. Helmut Jarausch

    Paul Boddie Guest

    On 15 Okt, 12:08, Helmut Jarausch <-aachen.de>
    wrote:
    >
    > What defines me as latin1-user?


    What does sys.stdout.encoding say? In Python 2.x, at least, that
    attribute should reflect the capabilities of your environment
    (specifically, the character encoding) and help determine whether it
    makes sense for Python to try and encode Unicode objects (plain
    strings in Python 3.x) using a particular output encoding when
    printing those objects to the display.

    Paul
    Paul Boddie, Oct 15, 2008
    #10
  11. Paul Boddie wrote:
    > On 15 Okt, 12:08, Helmut Jarausch <-aachen.de>
    > wrote:
    >> What defines me as latin1-user?

    >
    > What does sys.stdout.encoding say? In Python 2.x, at least, that


    It says ansi_x3.4-1968

    Where can I change this?
    > attribute should reflect the capabilities of your environment
    > (specifically, the character encoding) and help determine whether it
    > makes sense for Python to try and encode Unicode objects (plain
    > strings in Python 3.x) using a particular output encoding when
    > printing those objects to the display.
    >


    Thanks,
    Helmut.


    --
    Helmut Jarausch

    Lehrstuhl fuer Numerische Mathematik
    RWTH - Aachen University
    D 52056 Aachen, Germany
    Helmut Jarausch, Oct 15, 2008
    #11
  12. Helmut Jarausch

    Paul Boddie Guest

    On 15 Okt, 17:59, Helmut Jarausch <> wrote:
    > Paul Boddie wrote:
    > > What does sys.stdout.encoding say? In Python 2.x, at least, that

    >
    > It says  ansi_x3.4-1968


    That's ASCII, yes.

    > Where can I change this?


    What's your locale? I can provoke the same setting if I run a Python
    program like this:

    LC_ALL=en_US.ascii python xxx.py

    Are you running some kind of GNU/Linux distribution or something else?
    If the former, have you installed various language/locale packages? If
    you're not sure, which language or country did you select when
    installing or configuring your system? This may seem like an odd line
    of questioning, but UNIX-like systems have a history of treating
    everything as bytes, which works acceptably until you have to take a
    stand on what those bytes mean.

    Another important question: what does Python 2.x do with the following
    program...?

    import sys
    print sys.stdout.encoding
    print u"\xe6\xf8\xe5"

    You should get three Scandinavian characters if the encoding and
    locales match. Otherwise, you'll either get a different output
    (indicating a mismatch) or an error (indicating that the environment
    cannot handle the characters output by the program). Sometimes you can
    persuade a terminal to use a different character set, and this might
    help, too.

    Paul
    Paul Boddie, Oct 15, 2008
    #12
  13. Helmut Jarausch wrote:

    > Paul Boddie wrote:
    >> On 15 Okt, 12:08, Helmut Jarausch <-aachen.de>
    >> wrote:
    >>> What defines me as latin1-user?

    >>
    >> What does sys.stdout.encoding say? In Python 2.x, at least, that

    >
    > It says ansi_x3.4-1968
    >
    > Where can I change this?


    By changing your console's terminal settings. See what

    locale -a

    outputs.

    See this:


    (devtools)dir@client8049:~$ locale -a
    C
    en_AU.utf8
    en_BW.utf8
    en_CA.utf8
    en_DK.utf8
    en_GB.utf8
    en_HK.utf8
    en_IE.utf8
    en_IN
    en_NZ.utf8
    en_PH.utf8
    en_SG.utf8
    en_US.utf8
    en_ZA.utf8
    en_ZW.utf8
    POSIX
    (devtools)dir@client8049:~$ python
    Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
    [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    Welcome to rlcompleter2 0.96
    for nice experiences hit <tab> multiple times
    >>> import sys
    >>> sys.stdout.encoding

    'UTF-8'
    >>>



    Diez
    Diez B. Roggisch, Oct 15, 2008
    #13
  14. > What defines me as latin1-user?

    That your locale is based on Latin-1, e.g. because it is a German
    locale. How precisely that works depends on the operating system.

    > So my system seems to be an ASCII system?


    At least that's what Python determined. If Python couldn't have found
    out that you usually use Latin-1, your system is misconfigured. If
    Python could have found out, but failed to do so, it's a bug in Python.

    Regards,
    Martin
    Martin v. Löwis, Oct 15, 2008
    #14
  15. Martin v. Löwis wrote:
    >> What defines me as latin1-user?

    >
    > That your locale is based on Latin-1, e.g. because it is a German
    > locale. How precisely that works depends on the operating system.
    >
    >> So my system seems to be an ASCII system?

    >
    > At least that's what Python determined. If Python couldn't have found
    > out that you usually use Latin-1, your system is misconfigured. If
    > Python could have found out, but failed to do so, it's a bug in Python.
    >


    Many thanks, it works when setting the LANG environment variable.

    Still, I wished it were possible call sys.setdefaultencoding
    at the very beginning of a script.

    Why isn't that possible?

    Helmut.


    --
    Helmut Jarausch

    Lehrstuhl fuer Numerische Mathematik
    RWTH - Aachen University
    D 52056 Aachen, Germany
    Helmut Jarausch, Oct 16, 2008
    #15
  16. > Still, I wished it were possible call sys.setdefaultencoding
    > at the very beginning of a script.
    >
    > Why isn't that possible?


    The default encoding was used when combining byte-oriented
    text and unicode-oriented text. Such combination is no longer
    supported, hence the notion of a default encoding
    has disappeared. You have to perform conversion between bytes
    and strings now explicitly.

    Regards,
    Martin
    Martin v. Löwis, Oct 16, 2008
    #16
  17. Martin v. Löwis wrote:
    >> Still, I wished it were possible call sys.setdefaultencoding
    >> at the very beginning of a script.
    >>
    >> Why isn't that possible?

    >
    > The default encoding was used when combining byte-oriented
    > text and unicode-oriented text. Such combination is no longer
    > supported, hence the notion of a default encoding
    > has disappeared. You have to perform conversion between bytes
    > and strings now explicitly.
    >


    I meant setting the default encoding which is used by print (e.g.) when
    outputting the internal unicode string to a file.
    As far as I understood, currently I am fixed to setting either
    the 'locale' or to switch settings for each output file (by settting
    the _encoding property.
    I wished I could override the locale settings within a Python script.

    Thanks,
    Helmut.


    --
    Helmut Jarausch

    Lehrstuhl fuer Numerische Mathematik
    RWTH - Aachen University
    D 52056 Aachen, Germany
    Helmut Jarausch, Oct 16, 2008
    #17
  18. Helmut Jarausch

    Paul Boddie Guest

    On 16 Okt, 11:28, Helmut Jarausch <-aachen.de>
    wrote:
    >
    > I meant setting the default encoding which is used by print (e.g.) when
    > outputting the internal unicode string to a file.
    > As far as I understood, currently I am fixed to setting either
    > the 'locale' or to switch settings for each output file (by settting
    > the _encoding property.
    > I wished I could override the locale settings within a Python script.


    You could use the locale module. ;-)

    But seriously, I'd like to know whether the program I posted works
    with Python 2.x because there could be differences between 2.x and
    3.x, and we'd obviously like to solve your problems regardless of
    which Python version you're using.

    Paul
    Paul Boddie, Oct 16, 2008
    #18
  19. Paul Boddie wrote:
    > On 16 Okt, 11:28, Helmut Jarausch <-aachen.de>
    > wrote:
    >> I meant setting the default encoding which is used by print (e.g.) when
    >> outputting the internal unicode string to a file.
    >> As far as I understood, currently I am fixed to setting either
    >> the 'locale' or to switch settings for each output file (by settting
    >> the _encoding property.
    >> I wished I could override the locale settings within a Python script.

    >
    > You could use the locale module. ;-)
    >
    > But seriously, I'd like to know whether the program I posted works
    > with Python 2.x because there could be differences between 2.x and
    > 3.x, and we'd obviously like to solve your problems regardless of
    > which Python version you're using.
    >


    Yes, of course.
    I have always worked with latin-1 strings with an US locale under
    python-2.x with x < 6 (I haven't tried 2.6, though). I hope to switch to 3.0
    as soon as possible.


    --
    Helmut Jarausch

    Lehrstuhl fuer Numerische Mathematik
    RWTH - Aachen University
    D 52056 Aachen, Germany
    Helmut Jarausch, Oct 16, 2008
    #19
  20. > I meant setting the default encoding which is used by print (e.g.) when
    > outputting the internal unicode string to a file.


    Having such a thing would be conceptually wrong. What encoding should
    be used depends on the file - different files may have different
    encodings. When opening a file, you need to specify the encoding.

    > As far as I understood, currently I am fixed to setting either
    > the 'locale' or to switch settings for each output file (by settting
    > the _encoding property.


    That's not true. You can also specify the encoding when opening the file

    > I wished I could override the locale settings within a Python script.


    You can monkey-patch locale.getpreferredencoding, which is used when
    determining what encoding to use when opening new files. I don't
    recommend doing so, though.

    Regards,
    Martin
    Martin v. Löwis, Oct 16, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. C
    Replies:
    424
    Views:
    6,418
    Mike Wahler
    Oct 9, 2003
  2. The Saqe
    Replies:
    2
    Views:
    385
  3. Greg Comeau
    Replies:
    1
    Views:
    471
  4. vijay
    Replies:
    8
    Views:
    686
  5. Roy
    Replies:
    6
    Views:
    577
    Roedy Green
    Jan 7, 2008
Loading...

Share This Page