Can't print Chinese to HTTP

Discussion in 'Python' started by Gnarlodious, Nov 30, 2009.

  1. Gnarlodious

    Gnarlodious Guest

    Hello.
    The "upgrade to Python 3.1 has been disaster so far. I can't figure out how to print Chinese to a browser. If my script is:

    #!/usr/bin/python
    print("Content-type:text/html\n\n")
    print('晉')

    the Chinese string simply does not print. It works in interactive Terminal no problem, and also works in Python 2.6 (which my server is still running) in 4 different browsers. What am I doing wrong? BTW searched Google for 2 days no solution, if this doesn't get solved soon I will have to roll back to 2.6.

    Thanks for any clue.

    -- Gnarlie
    http://Gnarlodious.com
    Gnarlodious, Nov 30, 2009
    #1
    1. Advertising

  2. Gnarlodious wrote:
    > Hello. The "upgrade to Python 3.1 has been disaster so far. I can't
    > figure out how to print Chinese to a browser. If my script is:
    >
    > #!/usr/bin/python
    > print("Content-type:text/html\n\n")
    > print('晉')
    >
    > the Chinese string simply does not print. It works in interactive
    > Terminal no problem, and also works in Python 2.6 (which my server is
    > still running) in 4 different browsers. What am I doing wrong? BTW
    > searched Google for 2 days no solution, if this doesn't get solved
    > soon I will have to roll back to 2.6.
    >
    > Thanks for any clue.


    In the CGI case, Python cannot figure out what encoding to use for
    output, so it raises an exception. This exception should show up in
    the error log of your web server, please check.

    One way of working around this problem is to encode the output
    explicitly:

    #!/usr/bin/python
    print("Content-type:text/plain;charset=utf-8\n\n")
    sys.stdout.buffer.write('晉\n'.encode("utf-8"))

    FWIW, the Content-type in your example is wrong in two ways:
    what you produce is not HTML, and the charset parameter is
    missing.

    Regards,
    Martin
    Martin v. Löwis, Nov 30, 2009
    #2
    1. Advertising

  3. Gnarlodious

    Gnarlodious Guest

    Thanks for the help, but it doesn't work. All I get is an error like:

    UnicodeEncodeError: 'ascii' codec can't encode character '\\u0107' in
    position 0: ordinal not in range(128)

    It does work in Terminal interactively, after I import the sys module.
    But my script doesn't act the same. Here is my entire script:

    #!/usr/bin/python
    print("Content-type:text/plain;charset=utf-8\n\n")
    import sys
    sys.stdout.buffer.write('晉\n'.encode("utf-8"))

    All I get is the despised "Internal Server Error" with Console
    reporting:

    malformed header from script. Bad header=\xe6\x99\x89

    Strangely, if I run the script in Terminal it acts as expected.

    This is OSX 10.6 2,, Python 3.1.1.
    And it is frustrating because my entire website is hung up on this one
    line I have been working on for 5 days.

    -- Gnarlie
    http://Gnarlodious.com
    Gnarlodious, Nov 30, 2009
    #3
  4. Gnarlodious

    Aahz Guest

    In article <>,
    Gnarlodious <> wrote:
    >
    >Thanks for the help, but it doesn't work. All I get is an error like:
    >
    >UnicodeEncodeError: 'ascii' codec can't encode character '\\u0107' in
    >position 0: ordinal not in range(128)


    No time to give you more info, but you probably need to change the
    encoding of sys.stdout.
    --
    Aahz () <*> http://www.pythoncraft.com/

    The best way to get information on Usenet is not to ask a question, but
    to post the wrong information.
    Aahz, Nov 30, 2009
    #4
  5. Gnarlodious

    Lie Ryan Guest

    On 12/1/2009 4:05 AM, Gnarlodious wrote:
    > Thanks for the help, but it doesn't work. All I get is an error like:
    >
    > UnicodeEncodeError: 'ascii' codec can't encode character '\\u0107' in
    > position 0: ordinal not in range(128)


    The error says it all; you're trying to encode the chinese character
    using 'ascii' codec.

    > malformed header from script. Bad header=\xe6\x99\x89


    Hmmm... strange. The \xe6\x99\x89 happens to coincide with UTF-8
    representation of 晉. Why is your content becoming a header?

    > #!/usr/bin/python

    do you know what python version, exactly, that gets called by this
    hashbang? You mentioned that you're using python 3, but I'm not sure
    that this hashbang will invoke python3 (unless Mac OSX has made a
    progress above other linux distros and made python 3 the default python).

    > Strangely, if I run the script in Terminal it acts as expected.


    I think I see it now. You're invoking python3 in the terminal; but your
    server invokes python 2. Python 2 uses byte-based string literal, while
    python 3 uses unicode-based string literal. When you try to '
    晉\n'.encode("utf-8"), python 2 tried to decode the string using 'ascii'
    decoder, causing the exception.
    Lie Ryan, Nov 30, 2009
    #5
  6. Gnarlodious

    Ned Deily Guest

    In article
    <>,
    Gnarlodious <> wrote:

    > It does work in Terminal interactively, after I import the sys module.
    > But my script doesn't act the same. Here is my entire script:
    >
    > #!/usr/bin/python
    > print("Content-type:text/plain;charset=utf-8\n\n")
    > import sys
    > sys.stdout.buffer.write('ùÁÄn'.encode("utf-8"))
    >
    > All I get is the despised "Internal Server Error" with Console
    > reporting:
    >
    > malformed header from script. Bad header=Äxe6Äx99Äx89
    >
    > Strangely, if I run the script in Terminal it acts as expected.
    >
    > This is OSX 10.6 2,, Python 3.1.1.


    Are you sure you are actually using Python 3? /usr/bin/python is the
    path to the Apple-supplied python 2.6.1. If you installed Python 3.1.1
    using the python.org OS X installer, the path should be
    /usr/local/bin/python3

    --
    Ned Deily,
    Ned Deily, Nov 30, 2009
    #6
  7. Gnarlodious

    Guest

    On 05:05 pm, wrote:
    >Thanks for the help, but it doesn't work. All I get is an error like:
    >
    >UnicodeEncodeError: 'ascii' codec can't encode character '\\u0107' in
    >position 0: ordinal not in range(128)
    >
    >It does work in Terminal interactively, after I import the sys module.
    >But my script doesn't act the same. Here is my entire script:
    >
    >#!/usr/bin/python
    >print("Content-type:text/plain;charset=utf-8\n\n")
    >import sys
    >sys.stdout.buffer.write('f49\n'.encode("utf-8"))
    >
    >All I get is the despised "Internal Server Error" with Console
    >reporting:
    >
    >malformed header from script. Bad header=\xe6\x99\x89


    As the error suggests, you're writing f49 to the headers section of the
    response. This is because you're not ending the headers section with a
    blank line. Lines in HTTP end with \r\n, not with just \n.

    Have you considered using something with fewer sharp corners than CGI?
    You might find it more productive.

    Jean-Paul
    , Nov 30, 2009
    #7
  8. Gnarlodious

    Gnarlodious Guest

    > you probably need to change the encoding of sys.stdout
    >>> sys.stdout.encoding

    'UTF-8'

    >> #!/usr/bin/python


    > do you know what python version, exactly, that gets called by this

    hashbang?
    Verified in HTTP:
    >>> print(sys.version)

    3.1.1
    Is is possible modules are getting loaded from my old Python?

    I symlinked to the new Python, and no I do not want to roll it back
    because it is work (meaning I would have to type "sudo").
    ls /usr/bin/python
    lrwxr-xr-x 1 root wheel 63 Nov 20 21:24 /usr/bin/python -> /Library/
    Frameworks/Python.framework/Versions/3.1/bin/python3.1
    Ugh, I have not been able to program in 11 days.

    Now I remember doing it that way because I could not figure out how to
    get Apache to find the new Python.

    ls /usr/local/bin/python3.1
    lrwxr-xr-x 1 root wheel 71 Nov 20 08:19 /usr/local/bin/python3.1 -
    > ../../../Library/Frameworks/Python.framework/Versions/3.1/bin/

    python3.1

    So they are both pointing to the same Python.


    And yes, I would prefer easier http scripting, but don't know one.

    -- Gnarlie
    Gnarlodious, Dec 1, 2009
    #8
  9. Gnarlodious

    Gnarlodious Guest

    On Nov 30, 5:53 am, "Martin v. Löwis" wrote:

    > #!/usr/bin/python
    > print("Content-type:text/plain;charset=utf-8\n\n")
    > sys.stdout.buffer.write('晉\n'.encode("utf-8"))


    Does this work for anyone? Because all I get is a blank page. Nothing.
    If I can establish what SHOULD work, maybe I can diagnose this
    problem.

    -- Gnarlie
    Gnarlodious, Dec 1, 2009
    #9
  10. Gnarlodious

    Lie Ryan Guest

    On 12/2/2009 12:27 AM, Gnarlodious wrote:
    > On Nov 30, 5:53 am, "Martin v. Löwis" wrote:
    >
    >> #!/usr/bin/python
    >> print("Content-type:text/plain;charset=utf-8\n\n")
    >> sys.stdout.buffer.write('晉\n'.encode("utf-8"))

    >
    > Does this work for anyone? Because all I get is a blank page. Nothing.
    > If I can establish what SHOULD work, maybe I can diagnose this
    > problem.
    >


    with a minor fix (import sys) that runs without errors in Python 3.1
    (Vista), but the result is a bit disturbing...

    --------------------------
    晉
    Content-type:text/plain;charset=utf-8
    <BLANKLINE>
    <BLANKLINE>
    --------------------------

    (is this a bug? or just undefined behavior?)



    the following works correctly in python 3.1:

    ---------------------------
    #!/usr/bin/python
    import sys
    print = lambda s: sys.stdout.buffer.write(s.encode('utf-8'))
    print("Content-type:text/plain;charset=utf-8\n\n")
    print('晉\n')
    ----------------------------

    (and that code will definitely fail with python2 because of the print
    assignment, an insurance if your server happens to be misconfigured to
    run python2)
    Lie Ryan, Dec 1, 2009
    #10
  11. Gnarlodious

    Gnarlodious Guest

    On Dec 1, 8:36 am, Lie Ryan wrote:

    > #!/usr/bin/python
    > import sys
    > print = lambda s: sys.stdout.buffer.write(s.encode('utf-8'))
    > print("Content-type:text/plain;charset=utf-8\n\n")
    > print('晉\n')


    HA! IT WORKS! Thank you thank you thank you. I don't understand the
    lambda functionality but will figure it out. BTW this is OSX 10.6 and
    Python 3.1.1.

    Again, thank you for the help.

    -- Gnarlie
    Gnarlodious, Dec 1, 2009
    #11
  12. Gnarlodious

    Ned Deily Guest

    In article
    <>,
    Gnarlodious <> wrote:

    > I symlinked to the new Python, and no I do not want to roll it back
    > because it is work (meaning I would have to type "sudo").
    > ls /usr/bin/python
    > lrwxr-xr-x 1 root wheel 63 Nov 20 21:24 /usr/bin/python -> /Library/
    > Frameworks/Python.framework/Versions/3.1/bin/python3.1
    > Ugh, I have not been able to program in 11 days.


    You should *not* do this. The files in /usr/bin are installed and
    controlled by Apple and, in particular, /usr/bin/python is the Apple
    supplied python. By changing /usr/bin/python, you are risking incorrect
    operation of other system programs that may depend on it plus it is
    quite likely that an OS X software update will overwrite this location
    breaking your applications. Use /usr/local/bin/python3.1 instead.

    --
    Ned Deily,
    Ned Deily, Dec 1, 2009
    #12
  13. Gnarlodious

    Terry Reedy Guest

    Gnarlodious wrote:
    > On Dec 1, 8:36 am, Lie Ryan wrote:
    >
    >> #!/usr/bin/python
    >> import sys
    >> print = lambda s: sys.stdout.buffer.write(s.encode('utf-8'))


    This is almost exactly the same as

    def print(s): return sys.stdout.buffer.write(s.encode('utf-8'))

    except that the latter gives better error tracebacks.

    >> print("Content-type:text/plain;charset=utf-8\n\n")
    >> print('晉\n')

    >
    > HA! IT WORKS! Thank you thank you thank you. I don't understand the
    > lambda functionality but will figure it out.


    Nothing to do with lambda, really. See above.

    tjr
    Terry Reedy, Dec 1, 2009
    #13
  14. On Tue, 1 Dec 2009 05:27:22 -0800 (PST), Gnarlodious
    <> declaimed the following in
    gmane.comp.python.general:

    > On Nov 30, 5:53 am, "Martin v. Löwis" wrote:
    >
    > > #!/usr/bin/python
    > > print("Content-type:text/plain;charset=utf-8\n\n")
    > > sys.stdout.buffer.write('?\n'.encode("utf-8"))

    >
    > Does this work for anyone? Because all I get is a blank page. Nothing.
    > If I can establish what SHOULD work, maybe I can diagnose this
    > problem.
    >

    Have you tried

    sys.stdout.write("Content-type:text/plain;charset=utf-8\r\n\r\n")
    etc.

    Most internet protocols use the <cr><lf> sequence for line
    terminator; might be safer to specify the full sequence (or run on a
    Windows box where the I/O system may translate \n into \r\n for you <G>)
    --
    Wulfraed Dennis Lee Bieber KD6MOG
    HTTP://wlfraed.home.netcom.com/
    Dennis Lee Bieber, Dec 3, 2009
    #14
  15. Gnarlodious

    Gnarlodious Guest

    On Dec 2, 11:58 pm, Dennis Lee Bieber wrote:

    >         Have you tried
    >
    >         sys.stdout.write("Content-type:text/plain;charset=utf-8\r\n\r\n")


    Yes I tried that when it was suggested, to no avail. All I get is
    "Internal server error". All I can imagine is that there is no
    "sys.stdout.write" in my Python. No idea why.

    -- Gnarlie K5ZN
    Gnarlodious, Dec 5, 2009
    #15
  16. Gnarlodious

    Gnarlodious Guest

    On Dec 1, 3:06 pm, Terry Reedy wrote:
    > def print(s): return sys.stdout.buffer.write(s.encode('utf-8'))


    Here is a better solution that lets me send any string to the
    function:

    def print(html): return sys.stdout.buffer.write(("Content-type:text/
    plain;charset=utf-8\n\n"+html).encode('utf-8'))

    Why this changed in Python 3 I do not know, nor why it was nowhere to
    be found on the internet.

    Can anyone explain it?

    Anyway, I hope others with this problem can find this solution.

    -- Gnarlie
    Gnarlodious, Dec 5, 2009
    #16
  17. Gnarlodious

    Lie Ryan Guest

    On 12/5/2009 2:57 PM, Gnarlodious wrote:
    > On Dec 1, 3:06 pm, Terry Reedy wrote:
    >> def print(s): return sys.stdout.buffer.write(s.encode('utf-8'))

    >
    > Here is a better solution that lets me send any string to the
    > function:
    >
    > def print(html): return sys.stdout.buffer.write(("Content-type:text/
    > plain;charset=utf-8\n\n"+html).encode('utf-8'))


    No, that's wrong. You're serving HTML with Content-type:text/plain, it
    should've been text/html or application/xhtml+xml (though technically
    correct some older browsers have problems with the latter).

    > Why this changed in Python 3 I do not know, nor why it was nowhere to
    > be found on the internet.
    >
    > Can anyone explain it?


    Python 3's str() is what was Python 2's unicode().
    Python 2's str() turned into Python 3's bytes().

    Python 3's print() now takes a unicode string, which is the regular string.

    Because of the switch to unicode str, a simple print('晉') should've
    worked flawlessly if your terminal can accept the character, but the
    problem is your terminal does not.

    The correct fix is to fix your terminal's encoding.

    In Windows, due to the prompt's poor support for Unicode, the only real
    solution is to switch to a better terminal.

    Another workaround is to use a real file:

    import sys
    f = open('afile.html', 'w', encoding='utf-8')
    print("晉", file=f)
    sys.stdout = f
    print("晉")

    or slightly better is to rewrap the buffer with io.TextIOWrapper:
    import sys, io
    sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")
    print("晉")
    Lie Ryan, Dec 5, 2009
    #17
  18. * Lie Ryan:
    > On 12/5/2009 2:57 PM, Gnarlodious wrote:
    >> On Dec 1, 3:06 pm, Terry Reedy wrote:
    >>> def print(s): return sys.stdout.buffer.write(s.encode('utf-8'))

    >>
    >> Here is a better solution that lets me send any string to the
    >> function:
    >>
    >> def print(html): return sys.stdout.buffer.write(("Content-type:text/
    >> plain;charset=utf-8\n\n"+html).encode('utf-8'))

    >
    > No, that's wrong. You're serving HTML with Content-type:text/plain, it
    > should've been text/html or application/xhtml+xml (though technically
    > correct some older browsers have problems with the latter).
    >
    >> Why this changed in Python 3 I do not know, nor why it was nowhere to
    >> be found on the internet.
    >>
    >> Can anyone explain it?

    >
    > Python 3's str() is what was Python 2's unicode().
    > Python 2's str() turned into Python 3's bytes().
    >
    > Python 3's print() now takes a unicode string, which is the regular string.
    >
    > Because of the switch to unicode str, a simple print('晉') should've
    > worked flawlessly if your terminal can accept the character, but the
    > problem is your terminal does not.
    >
    > The correct fix is to fix your terminal's encoding.
    >
    > In Windows, due to the prompt's poor support for Unicode, the only real
    > solution is to switch to a better terminal.


    A bit off-topic perhaps, but that last is a misconception. Windows' [cmd.exe]
    does have poor support for UTF-8, in short it Does Not Work in Windows XP, and
    probably does not work in Vista or Windows7 either. However, Windows console
    windows have full support for the Basic Multilingual Plane of Unicode: they're
    pure Unicode beasts.

    Thus, the problem is an interaction between two systems that Do Not Work: the
    [cmd.exe] program's practically non-existing support for UTF-8 (codepage 65001),
    and the very unfortunate confusion of stream i/o and interactive i/o in *nix,
    which has ended up as a "feature" (it's more like a design bug) in a lot of
    programming languages stemming from *nix origins, and that includes Python.

    Windows' "terminal", its console window support, is INNOCENT... :)

    In Windows, as opposed to *nix, interactive character i/o is separated at the
    API level. There is integration with stream i/o, but the interactive i/o can be
    accessed separately. This is the "console function" API.

    So for interactive console i/o one solution could be some Python module for
    interactive console i/o, on Windows internally using the Windows console
    function API, which is fully Unicode (based on UCS-2, i.e. the BMP).

    Cheers,

    - Alf


    > Another workaround is to use a real file:
    >
    > import sys
    > f = open('afile.html', 'w', encoding='utf-8')
    > print("晉", file=f)
    > sys.stdout = f
    > print("晉")
    >
    > or slightly better is to rewrap the buffer with io.TextIOWrapper:
    > import sys, io
    > sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")
    > print("晉")
    Alf P. Steinbach, Dec 5, 2009
    #18
  19. Gnarlodious

    Gnarlodious Guest

    On Dec 5, 3:54 am, Lie Ryan wrote:

    > Because of the switch to unicode str, a simple print('晉') should've
    > worked flawlessly if your terminal can accept the character, but the
    > problem is your terminal does not.


    There is nothing wrong with Terminal, Mac OSX supports Unicode from
    one end to the other.
    The problem is that your code works normally in Terminal but not in a
    browser.

    #!/usr/bin/python
    import sys, io
    print("Content-type:text/plain;charset=utf-8\n\n")
    sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")
    print("晉")

    The browser shows "Server error", Apache 2 reports error:

    [error] [client 127.0.0.1] malformed header from script. Bad header=
    \xe6\x99\x89: test.py

    So far every way to print Unicode to a browser looks very un-Pythonic.
    I am just wondering if I have a bug or am missing the right way
    entirely.

    -- Gnarlie
    Gnarlodious, Dec 6, 2009
    #19
  20. Gnarlodious

    Lie Ryan Guest

    On 12/6/2009 12:56 PM, Gnarlodious wrote:
    > On Dec 5, 3:54 am, Lie Ryan wrote:
    >
    >> Because of the switch to unicode str, a simple print('晉') should've
    >> worked flawlessly if your terminal can accept the character, but the
    >> problem is your terminal does not.

    >
    > There is nothing wrong with Terminal, Mac OSX supports Unicode from
    > one end to the other.
    > The problem is that your code works normally in Terminal but not in a
    > browser.
    >
    > #!/usr/bin/python
    > import sys, io
    > print("Content-type:text/plain;charset=utf-8\n\n")
    > sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")
    > print("晉")
    >
    > The browser shows "Server error", Apache 2 reports error:
    >
    > [error] [client 127.0.0.1] malformed header from script. Bad header=
    > \xe6\x99\x89: test.py


    I've already posted before for some reason it is not possible to mix
    writing using print() and sys.stdout.buffer. On my machine, the output
    got mixed up:

    --------------------------
    晉
    Content-type:text/plain;charset=utf-8
    <BLANKLINE>
    <BLANKLINE>
    --------------------------

    notice that the chinese character is on top of the header. I guess this
    is due to the buffering from print.

    > So far every way to print Unicode to a browser looks very un-Pythonic.
    > I am just wondering if I have a bug or am missing the right way
    > entirely.


    My *guess* is Apache does not request a utf-8 stdout. When run on the
    Terminal, the Terminal requested utf-8 stdout from python and the script
    runs correctly. I'm not too familiar with Apache's internal nor how
    python 3 figured its stdout's encoding, you might want to find Apache's
    mailing list if they have any similar case.

    PS: You might also want to look at this:
    http://stackoverflow.com/questions/984014/python-3-is-using-sys-stdout-buffer-write-good-style

    it says to try setting your PYTHONIOENCODING environment variable to "utf8"
    Lie Ryan, Dec 6, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. mzo
    Replies:
    0
    Views:
    477
  2. Tamer Ibrahim
    Replies:
    3
    Views:
    409
    Alexey Smirnov
    Jan 29, 2007
  3. keto
    Replies:
    0
    Views:
    903
  4. David Cournapeau

    print a vs print '%s' % a vs print '%f' a

    David Cournapeau, Dec 30, 2008, in forum: Python
    Replies:
    0
    Views:
    336
    David Cournapeau
    Dec 30, 2008
  5. Alexandre Jaquet

    print chinese char

    Alexandre Jaquet, Sep 9, 2005, in forum: Perl Misc
    Replies:
    1
    Views:
    97
    Alexandre Jaquet
    Sep 9, 2005
Loading...

Share This Page