Can't print Chinese to HTTP

G

Gnarlodious

Hello.
The "upgrade to Python 3.1 has been disaster so far. I can't figure out how to print Chinese to a browser. If my script is:

#!/usr/bin/python
print("Content-type:text/html\n\n")
print('晉')

the Chinese string simply does not print. It works in interactive Terminal no problem, and also works in Python 2.6 (which my server is still running) in 4 different browsers. What am I doing wrong? BTW searched Google for 2 days no solution, if this doesn't get solved soon I will have to roll back to 2.6.

Thanks for any clue.

-- Gnarlie
http://Gnarlodious.com
 
M

Martin v. Löwis

Gnarlodious said:
Hello. The "upgrade to Python 3.1 has been disaster so far. I can't
figure out how to print Chinese to a browser. If my script is:

#!/usr/bin/python
print("Content-type:text/html\n\n")
print('晉')

the Chinese string simply does not print. It works in interactive
Terminal no problem, and also works in Python 2.6 (which my server is
still running) in 4 different browsers. What am I doing wrong? BTW
searched Google for 2 days no solution, if this doesn't get solved
soon I will have to roll back to 2.6.

Thanks for any clue.

In the CGI case, Python cannot figure out what encoding to use for
output, so it raises an exception. This exception should show up in
the error log of your web server, please check.

One way of working around this problem is to encode the output
explicitly:

#!/usr/bin/python
print("Content-type:text/plain;charset=utf-8\n\n")
sys.stdout.buffer.write('晉\n'.encode("utf-8"))

FWIW, the Content-type in your example is wrong in two ways:
what you produce is not HTML, and the charset parameter is
missing.

Regards,
Martin
 
G

Gnarlodious

Thanks for the help, but it doesn't work. All I get is an error like:

UnicodeEncodeError: 'ascii' codec can't encode character '\\u0107' in
position 0: ordinal not in range(128)

It does work in Terminal interactively, after I import the sys module.
But my script doesn't act the same. Here is my entire script:

#!/usr/bin/python
print("Content-type:text/plain;charset=utf-8\n\n")
import sys
sys.stdout.buffer.write('晉\n'.encode("utf-8"))

All I get is the despised "Internal Server Error" with Console
reporting:

malformed header from script. Bad header=\xe6\x99\x89

Strangely, if I run the script in Terminal it acts as expected.

This is OSX 10.6 2,, Python 3.1.1.
And it is frustrating because my entire website is hung up on this one
line I have been working on for 5 days.

-- Gnarlie
http://Gnarlodious.com
 
A

Aahz

Thanks for the help, but it doesn't work. All I get is an error like:

UnicodeEncodeError: 'ascii' codec can't encode character '\\u0107' in
position 0: ordinal not in range(128)

No time to give you more info, but you probably need to change the
encoding of sys.stdout.
 
L

Lie Ryan

Thanks for the help, but it doesn't work. All I get is an error like:

UnicodeEncodeError: 'ascii' codec can't encode character '\\u0107' in
position 0: ordinal not in range(128)

The error says it all; you're trying to encode the chinese character
using 'ascii' codec.
> malformed header from script. Bad header=\xe6\x99\x89

Hmmm... strange. The \xe6\x99\x89 happens to coincide with UTF-8
representation of 晉. Why is your content becoming a header?
#!/usr/bin/python
do you know what python version, exactly, that gets called by this
hashbang? You mentioned that you're using python 3, but I'm not sure
that this hashbang will invoke python3 (unless Mac OSX has made a
progress above other linux distros and made python 3 the default python).
Strangely, if I run the script in Terminal it acts as expected.

I think I see it now. You're invoking python3 in the terminal; but your
server invokes python 2. Python 2 uses byte-based string literal, while
python 3 uses unicode-based string literal. When you try to '
晉\n'.encode("utf-8"), python 2 tried to decode the string using 'ascii'
decoder, causing the exception.
 
N

Ned Deily

Gnarlodious said:
It does work in Terminal interactively, after I import the sys module.
But my script doesn't act the same. Here is my entire script:

#!/usr/bin/python
print("Content-type:text/plain;charset=utf-8\n\n")
import sys
sys.stdout.buffer.write('ùÁÄn'.encode("utf-8"))

All I get is the despised "Internal Server Error" with Console
reporting:

malformed header from script. Bad header=Äxe6Äx99Äx89

Strangely, if I run the script in Terminal it acts as expected.

This is OSX 10.6 2,, Python 3.1.1.

Are you sure you are actually using Python 3? /usr/bin/python is the
path to the Apple-supplied python 2.6.1. If you installed Python 3.1.1
using the python.org OS X installer, the path should be
/usr/local/bin/python3
 
E

exarkun

Thanks for the help, but it doesn't work. All I get is an error like:

UnicodeEncodeError: 'ascii' codec can't encode character '\\u0107' in
position 0: ordinal not in range(128)

It does work in Terminal interactively, after I import the sys module.
But my script doesn't act the same. Here is my entire script:

#!/usr/bin/python
print("Content-type:text/plain;charset=utf-8\n\n")
import sys
sys.stdout.buffer.write('f49\n'.encode("utf-8"))

All I get is the despised "Internal Server Error" with Console
reporting:

malformed header from script. Bad header=\xe6\x99\x89

As the error suggests, you're writing f49 to the headers section of the
response. This is because you're not ending the headers section with a
blank line. Lines in HTTP end with \r\n, not with just \n.

Have you considered using something with fewer sharp corners than CGI?
You might find it more productive.

Jean-Paul
 
G

Gnarlodious

you probably need to change the encoding of sys.stdout
'UTF-8'


do you know what python version, exactly, that gets called by this
hashbang?
Verified in HTTP:3.1.1
Is is possible modules are getting loaded from my old Python?

I symlinked to the new Python, and no I do not want to roll it back
because it is work (meaning I would have to type "sudo").
ls /usr/bin/python
lrwxr-xr-x 1 root wheel 63 Nov 20 21:24 /usr/bin/python -> /Library/
Frameworks/Python.framework/Versions/3.1/bin/python3.1
Ugh, I have not been able to program in 11 days.

Now I remember doing it that way because I could not figure out how to
get Apache to find the new Python.

ls /usr/local/bin/python3.1
lrwxr-xr-x 1 root wheel 71 Nov 20 08:19 /usr/local/bin/python3.1 -
../../../Library/Frameworks/Python.framework/Versions/3.1/bin/
python3.1

So they are both pointing to the same Python.


And yes, I would prefer easier http scripting, but don't know one.

-- Gnarlie
 
G

Gnarlodious

#!/usr/bin/python
print("Content-type:text/plain;charset=utf-8\n\n")
sys.stdout.buffer.write('晉\n'.encode("utf-8"))

Does this work for anyone? Because all I get is a blank page. Nothing.
If I can establish what SHOULD work, maybe I can diagnose this
problem.

-- Gnarlie
 
L

Lie Ryan

Does this work for anyone? Because all I get is a blank page. Nothing.
If I can establish what SHOULD work, maybe I can diagnose this
problem.

with a minor fix (import sys) that runs without errors in Python 3.1
(Vista), but the result is a bit disturbing...

--------------------------
晉
Content-type:text/plain;charset=utf-8
<BLANKLINE>
<BLANKLINE>
--------------------------

(is this a bug? or just undefined behavior?)



the following works correctly in python 3.1:

---------------------------
#!/usr/bin/python
import sys
print = lambda s: sys.stdout.buffer.write(s.encode('utf-8'))
print("Content-type:text/plain;charset=utf-8\n\n")
print('晉\n')
 
G

Gnarlodious

#!/usr/bin/python
import sys
print = lambda s: sys.stdout.buffer.write(s.encode('utf-8'))
print("Content-type:text/plain;charset=utf-8\n\n")
print('晉\n')

HA! IT WORKS! Thank you thank you thank you. I don't understand the
lambda functionality but will figure it out. BTW this is OSX 10.6 and
Python 3.1.1.

Again, thank you for the help.

-- Gnarlie
 
N

Ned Deily

Gnarlodious said:
I symlinked to the new Python, and no I do not want to roll it back
because it is work (meaning I would have to type "sudo").
ls /usr/bin/python
lrwxr-xr-x 1 root wheel 63 Nov 20 21:24 /usr/bin/python -> /Library/
Frameworks/Python.framework/Versions/3.1/bin/python3.1
Ugh, I have not been able to program in 11 days.

You should *not* do this. The files in /usr/bin are installed and
controlled by Apple and, in particular, /usr/bin/python is the Apple
supplied python. By changing /usr/bin/python, you are risking incorrect
operation of other system programs that may depend on it plus it is
quite likely that an OS X software update will overwrite this location
breaking your applications. Use /usr/local/bin/python3.1 instead.
 
T

Terry Reedy

This is almost exactly the same as

def print(s): return sys.stdout.buffer.write(s.encode('utf-8'))

except that the latter gives better error tracebacks.
HA! IT WORKS! Thank you thank you thank you. I don't understand the
lambda functionality but will figure it out.

Nothing to do with lambda, really. See above.

tjr
 
D

Dennis Lee Bieber

Does this work for anyone? Because all I get is a blank page. Nothing.
If I can establish what SHOULD work, maybe I can diagnose this
problem.
Have you tried

sys.stdout.write("Content-type:text/plain;charset=utf-8\r\n\r\n")
etc.

Most internet protocols use the <cr><lf> sequence for line
terminator; might be safer to specify the full sequence (or run on a
Windows box where the I/O system may translate \n into \r\n for you <G>)
 
G

Gnarlodious

        Have you tried

        sys.stdout.write("Content-type:text/plain;charset=utf-8\r\n\r\n")

Yes I tried that when it was suggested, to no avail. All I get is
"Internal server error". All I can imagine is that there is no
"sys.stdout.write" in my Python. No idea why.

-- Gnarlie K5ZN
 
G

Gnarlodious

def print(s): return sys.stdout.buffer.write(s.encode('utf-8'))

Here is a better solution that lets me send any string to the
function:

def print(html): return sys.stdout.buffer.write(("Content-type:text/
plain;charset=utf-8\n\n"+html).encode('utf-8'))

Why this changed in Python 3 I do not know, nor why it was nowhere to
be found on the internet.

Can anyone explain it?

Anyway, I hope others with this problem can find this solution.

-- Gnarlie
 
L

Lie Ryan

Here is a better solution that lets me send any string to the
function:

def print(html): return sys.stdout.buffer.write(("Content-type:text/
plain;charset=utf-8\n\n"+html).encode('utf-8'))

No, that's wrong. You're serving HTML with Content-type:text/plain, it
should've been text/html or application/xhtml+xml (though technically
correct some older browsers have problems with the latter).
Why this changed in Python 3 I do not know, nor why it was nowhere to
be found on the internet.

Can anyone explain it?

Python 3's str() is what was Python 2's unicode().
Python 2's str() turned into Python 3's bytes().

Python 3's print() now takes a unicode string, which is the regular string.

Because of the switch to unicode str, a simple print('晉') should've
worked flawlessly if your terminal can accept the character, but the
problem is your terminal does not.

The correct fix is to fix your terminal's encoding.

In Windows, due to the prompt's poor support for Unicode, the only real
solution is to switch to a better terminal.

Another workaround is to use a real file:

import sys
f = open('afile.html', 'w', encoding='utf-8')
print("晉", file=f)
sys.stdout = f
print("晉")

or slightly better is to rewrap the buffer with io.TextIOWrapper:
import sys, io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")
print("晉")
 
A

Alf P. Steinbach

* Lie Ryan:
No, that's wrong. You're serving HTML with Content-type:text/plain, it
should've been text/html or application/xhtml+xml (though technically
correct some older browsers have problems with the latter).


Python 3's str() is what was Python 2's unicode().
Python 2's str() turned into Python 3's bytes().

Python 3's print() now takes a unicode string, which is the regular string.

Because of the switch to unicode str, a simple print('晉') should've
worked flawlessly if your terminal can accept the character, but the
problem is your terminal does not.

The correct fix is to fix your terminal's encoding.

In Windows, due to the prompt's poor support for Unicode, the only real
solution is to switch to a better terminal.

A bit off-topic perhaps, but that last is a misconception. Windows' [cmd.exe]
does have poor support for UTF-8, in short it Does Not Work in Windows XP, and
probably does not work in Vista or Windows7 either. However, Windows console
windows have full support for the Basic Multilingual Plane of Unicode: they're
pure Unicode beasts.

Thus, the problem is an interaction between two systems that Do Not Work: the
[cmd.exe] program's practically non-existing support for UTF-8 (codepage 65001),
and the very unfortunate confusion of stream i/o and interactive i/o in *nix,
which has ended up as a "feature" (it's more like a design bug) in a lot of
programming languages stemming from *nix origins, and that includes Python.

Windows' "terminal", its console window support, is INNOCENT... :)

In Windows, as opposed to *nix, interactive character i/o is separated at the
API level. There is integration with stream i/o, but the interactive i/o can be
accessed separately. This is the "console function" API.

So for interactive console i/o one solution could be some Python module for
interactive console i/o, on Windows internally using the Windows console
function API, which is fully Unicode (based on UCS-2, i.e. the BMP).

Cheers,

- Alf
 
G

Gnarlodious

Because of the switch to unicode str, a simple print('晉') should've
worked flawlessly if your terminal can accept the character, but the
problem is your terminal does not.

There is nothing wrong with Terminal, Mac OSX supports Unicode from
one end to the other.
The problem is that your code works normally in Terminal but not in a
browser.

#!/usr/bin/python
import sys, io
print("Content-type:text/plain;charset=utf-8\n\n")
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")
print("晉")

The browser shows "Server error", Apache 2 reports error:

[error] [client 127.0.0.1] malformed header from script. Bad header=
\xe6\x99\x89: test.py

So far every way to print Unicode to a browser looks very un-Pythonic.
I am just wondering if I have a bug or am missing the right way
entirely.

-- Gnarlie
 
L

Lie Ryan

Because of the switch to unicode str, a simple print('晉') should've
worked flawlessly if your terminal can accept the character, but the
problem is your terminal does not.

There is nothing wrong with Terminal, Mac OSX supports Unicode from
one end to the other.
The problem is that your code works normally in Terminal but not in a
browser.

#!/usr/bin/python
import sys, io
print("Content-type:text/plain;charset=utf-8\n\n")
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")
print("晉")

The browser shows "Server error", Apache 2 reports error:

[error] [client 127.0.0.1] malformed header from script. Bad header=
\xe6\x99\x89: test.py

I've already posted before for some reason it is not possible to mix
writing using print() and sys.stdout.buffer. On my machine, the output
got mixed up:

--------------------------
晉
Content-type:text/plain;charset=utf-8
<BLANKLINE>
<BLANKLINE>
--------------------------

notice that the chinese character is on top of the header. I guess this
is due to the buffering from print.
So far every way to print Unicode to a browser looks very un-Pythonic.
I am just wondering if I have a bug or am missing the right way
entirely.

My *guess* is Apache does not request a utf-8 stdout. When run on the
Terminal, the Terminal requested utf-8 stdout from python and the script
runs correctly. I'm not too familiar with Apache's internal nor how
python 3 figured its stdout's encoding, you might want to find Apache's
mailing list if they have any similar case.

PS: You might also want to look at this:
http://stackoverflow.com/questions/984014/python-3-is-using-sys-stdout-buffer-write-good-style

it says to try setting your PYTHONIOENCODING environment variable to "utf8"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,019
Latest member
RoxannaSta

Latest Threads

Top