How to display Chinese in a list retrieved from database via python

zxo102 · Dec 25, 2008

Hi,
I retrieve some info in Chinese from postgresql and assign it to a
variable 'info' defined in javascript of a html page:
var info = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
But I want it to be
var info = ['¤¤¤å','¤¤¤å','¤¤¤å']
since in html pages (via javascript), the items in chinese out of the
former :['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4'] can
not be displayed correctly when it is inserted into a html page. If
the list is var info = ['¤¤¤å','¤¤¤å','¤¤¤å'] , then everything works
fine.

Anybody knows how to solve this problem?

Thanks in advance.

zxo102

zxo102 · Dec 25, 2008

Hi,
I retrieve some info in Chinese from postgresql and assign it to a
variable 'info' defined in javascript of a html page:
var info = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
But I want it to be
var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ']
since in html pages (via javascript), the items in chinese out of the
former :['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4'] can
not be displayed correctly when it is inserted into a html page. If
the list is var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ'] , then everything works
fine.

Click to expand...

Anybody knows how to solve this problem?

Click to expand...

Upgrading to Python 2.6 would probably be beneficial due to its better
handling of Unicode.
Also, posting some of the actual code you're using (to generate
JavaScript, I guess?) would definitely help.

Merry Christmas,
Chris

Hi Chris:

I have to use python2.4 since many other python packages have not been
updated to 2.6.
Here is my demo:
I create a table in postgresql:

create table my_table (
id serial,
name char(20),
phone char(20)
);

and insert two records into the table
(4, 'ÖÐÎÄ', '1334995555555')
(5, 'ÖÐÎÄ', '3434343434343')

I would like to generate a html page dynamically, here is the demo
script

############################################################
def do_search(a):
# µ÷ÓÃODBCÄ£¿é
import odbc
# Í¨¹ýODBCÖÐµÄ"my_odbc"ºÍÏàÓ¦µÄÊý¾Ý¿â½¨Á¢Á¬½Ó
cc = odbc.odbc('dsn=wisco')
cursor1 = cc.cursor()
# ½«Êý¾Ý´æÈëÊý¾Ý¿âÖÐµÄ"my_table"
#cursor1.execute("select * from my_table where name = '%s' "%a)
cursor1.execute("select * from my_table where name like '%%%s%%'
"%a)
rr = cursor1.fetchall()
# ÏÔÊ¾ÓÃ»§²éµ½µÄÊý¾Ý
row01 = rr[0]
row02 = rr[1]
print row01, row02
html_str = ''
#print "Content-Type: text/html\n\n"
html_str += "<html><head> <title> test </title></head><body> \n"
html_str += "<script language=javascript>\n"
html_str += " var row01 = %s\n"
html_str += " var row02 = %s\n"
html_str += "</script>\n"
html_str += " </body></html>\n"
html_str = html_str%(row01,row02)
f = open('c:\\xbop_sinopec\\apache\\htdocs\\test01.html','w')
f.write(html_str)
f.close

do_search('ÖÐÎÄ')
#########################################################

The html code is as follows

<html><head> <title> test </title></head><body>
<script language=javascript>
var row01 = (1, '\xd6\xd0\xce\xc4', '1334995555555')
var row02 = (2, '\xd6\xd0\xce\xc4', '3434343434343')
</script>
</body></html>

But the 'ÖÐÎÄ' is '\xd6\xd0\xce\xc4'. When row01 and row02 is called
from somewhere,
'\xd6\xd0\xce\xc4' can not be displayed correctly as 'ÖÐÎÄ' in html
environment.

Thanks for your help and Merry Christmas to you too.

ouyang

Jeroen Ruigrok van der Werven · Dec 25, 2008

-On [20081225 08:30] said:
Anybody knows how to solve this problem?

You are assigning/pushing out Python byte sequences, not Unicode. Look at
u'' string variables, x.encode() and x.decode() to help you.
It's widely documented on the Internet, a quick Python Unicode (with encode
or decode) should get you there.

Gabriel Genellina · Dec 25, 2008

En Thu said:
Hi,
I retrieve some info in Chinese from postgresql and assign it to a
variable 'info' defined in javascript of a html page:
var info = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
But I want it to be
var info = ['ä¸æ–‡','ä¸æ–‡','ä¸æ–‡']
since in html pages (via javascript), the items in chinese out of the
former :['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4'] can
not be displayed correctly when it is inserted into a html page. If
the list is var info = ['ä¸æ–‡','ä¸æ–‡','ä¸æ–‡'] , then everything works
fine.

Click to expand...

Click to expand...

The html code is as follows

<html><head> <title> test </title></head><body>
<script language=javascript>
var row01 = (1, '\xd6\xd0\xce\xc4', '1334995555555')
var row02 = (2, '\xd6\xd0\xce\xc4', '3434343434343')
</script>
</body></html>

But the 'ä¸æ–‡' is '\xd6\xd0\xce\xc4'. When row01 and row02 is called
from somewhere,
'\xd6\xd0\xce\xc4' can not be displayed correctly as 'ä¸æ–‡' in html
environment.

You forgot to specify the page encoding, gb2312 presumably. If adding the
encoding does not help, I'd say the problem must reside on how you later
use row01 and row02 (your html page does not those variables for
anything). 'ä¸æ–‡' is the same as '\xd6\xd0\xce\xc4', and both javascript
and Python share the same representation for strings (mostly) so this
should not be an issue.

My PC is unable to display those characters, but I get "true" from this:

<html><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset='gb2312'"><title> test </title></head>
<body><script
language=javascript>alert('ä¸æ–‡'=='\xd6\xd0\xce\xc4')</script></body></html>

zxo102 · Dec 25, 2008

En Thu, 25 Dec 2008 07:27:03 -0200, zxo102 <[email protected]> escribi¨®:

Hi,
I retrieve some info in Chinese from postgresql and assign it to a
variable 'info' defined in javascript of a html page:
var info = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
But I want it to be
var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ']
since in html pages (via javascript), the items in chinese out of the
former :['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4'] can
not be displayed correctly when it is inserted into a html page. If
the list is var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ'] , then everything works
fine.

Click to expand...

Click to expand...

The html code is as follows

Click to expand...

<html><head> <title> test </title></head><body>
<script language=javascript>
var row01 = (1, '\xd6\xd0\xce\xc4', '1334995555555')
var row02 = (2, '\xd6\xd0\xce\xc4', '3434343434343')
</script>
</body></html>

Click to expand...

But the 'ÖÐÎÄ' is '\xd6\xd0\xce\xc4'. When row01 and row02 is called
from somewhere,
'\xd6\xd0\xce\xc4' can not be displayed correctly as 'ÖÐÎÄ' in html
environment.

Click to expand...

You forgot to specify the page encoding, gb2312 presumably. If adding the
encoding does not help, I'd say the problem must reside on how you later
use row01 and row02 (your html page does not those variables for
anything). 'ÖÐÎÄ' is the same as '\xd6\xd0\xce\xc4', and both javascript
and Python share the same representation for strings (mostly) so this
should not be an issue.

My PC is unable to display those characters, but I get "true" from this:

<html><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset='gb2312'"><title> test </title></head>
<body><script
language=javascript>alert('ÖÐÎÄ'=='\xd6\xd0\xce\xc4')</script></body></html>

I did that: <META HTTP-EQUIV="Content-Type" CONTENT="text/html;

charset='gb2312'">, but it does not work. Alert('\xd6\xd0\xce\xc4') displays some "junks". I am thinking there may be some way to convert '\xd6\xd0\xce\xc4' to 'ÖÐÎÄ' in the list with python before I generate the html page. As a result, when I open the html file with Vi, I can see 'ÖÐÎÄ' directly instead of '\xd6\xd0\xce\xc4'. That will solve my problem.

Any ideas?

Ouyang

Mark Tolonen · Dec 26, 2008

zxo102 said:
En Thu, 25 Dec 2008 07:27:03 -0200, zxo102 <[email protected]> escribi¨®:

Hi,
I retrieve some info in Chinese from postgresql and assign it to
a
variable 'info' defined in javascript of a html page:
var info = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
But I want it to be
var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ']
since in html pages (via javascript), the items in chinese out of
the
former :['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4']
can
not be displayed correctly when it is inserted into a html page. If
the list is var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ'] , then everything
works
fine.

Click to expand...

The html code is as follows

Click to expand...

<html><head> <title> test </title></head><body>
<script language=javascript>
var row01 = (1, '\xd6\xd0\xce\xc4', '1334995555555')
var row02 = (2, '\xd6\xd0\xce\xc4', '3434343434343')
</script>
</body></html>

Click to expand...

But the 'ÖÐÎÄ' is '\xd6\xd0\xce\xc4'. When row01 and row02 is called
from somewhere,
'\xd6\xd0\xce\xc4' can not be displayed correctly as 'ÖÐÎÄ' in html
environment.

Click to expand...

You forgot to specify the page encoding, gb2312 presumably. If adding the
encoding does not help, I'd say the problem must reside on how you later
use row01 and row02 (your html page does not those variables for
anything). 'ÖÐÎÄ' is the same as '\xd6\xd0\xce\xc4', and both javascript
and Python share the same representation for strings (mostly) so this
should not be an issue.

My PC is unable to display those characters, but I get "true" from this:

<html><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset='gb2312'"><title> test </title></head>
<body><script
language=javascript>alert('ÖÐÎÄ'=='\xd6\xd0\xce\xc4')</script></body></html>

Click to expand...

I did that: <META HTTP-EQUIV="Content-Type" CONTENT="text/html;

charset='gb2312'">, but it does not work. Alert('\xd6\xd0\xce\xc4')
displays some "junks". I am thinking there may be some way to convert
'\xd6\xd0\xce\xc4' to 'ÖÐÎÄ' in the list with python before I generate
the html page. As a result, when I open the html file with Vi, I can see
'ÖÐÎÄ' directly instead of '\xd6\xd0\xce\xc4'. That will solve my
problem.

Click to expand...

Any ideas?

Use charset=gb2312 instead of charset='gb2312'(remove single quotes).

I was able to display ÖÐÎÄ successfully with this code:

f=open('test.html','wt')
f.write('''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title></head>
<body>\xd6\xd0\xce\xc4</body></html>''')
f.close()

-Mark

zxo102 · Dec 27, 2008

En Thu, 25 Dec 2008 07:27:03 -0200, zxo102 <[email protected]> escribi¨®:
Hi,
I retrieve some info in Chinese from postgresql and assign it to
a
variable 'info' defined in javascript of a html page:
var info = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
But I want it to be
var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ']
since in html pages (via javascript), the items in chinese out of
the
former :['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4']
can
not be displayed correctly when it is inserted into a html page. If
the list is var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ'] , then everything
works
fine.
The html code is as follows
<html><head> <title> test </title></head><body>
<script language=javascript>
var row01 = (1, '\xd6\xd0\xce\xc4', '1334995555555')
var row02 = (2, '\xd6\xd0\xce\xc4', '3434343434343')
</script>
</body></html>
But the 'ÖÐÎÄ' is '\xd6\xd0\xce\xc4'. When row01 and row02 is called
from somewhere,
'\xd6\xd0\xce\xc4' can not be displayed correctly as 'ÖÐÎÄ' in html
environment.
You forgot to specify the page encoding, gb2312 presumably. If adding the
encoding does not help, I'd say the problem must reside on how you later
use row01 and row02 (your html page does not those variables for
anything). 'ÖÐÎÄ' is the same as '\xd6\xd0\xce\xc4', and both javascript
and Python share the same representation for strings (mostly) so this
should not be an issue.
My PC is unable to display those characters, but I get "true" from this:
<html><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset='gb2312'"><title> test </title></head>
<body><script
language=javascript>alert('ÖÐÎÄ'=='\xd6\xd0\xce\xc4')</script></body></html>

Click to expand...

Click to expand...

I did that: <META HTTP-EQUIV="Content-Type" CONTENT="text/html;

Click to expand...

Any ideas?

Click to expand...

Use charset=gb2312 instead of charset='gb2312'(remove single quotes).

I was able to display ÖÐÎÄ successfully with this code:

f=open('test.html','wt')
f.write('''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title></head>
<body>\xd6\xd0\xce\xc4</body></html>''')
f.close()

-Mark- Òþ²Ø±»ÒýÓÃÎÄ×Ö -

- ÏÔÊ¾ÒýÓÃµÄÎÄ×Ö -

Mark,
I have exactly copied your code into the htdocs of my Apache
server,

<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title></head>
<body>\xd6\xd0\xce\xc4</body></html>

but it still shows me \xd6\xd0\xce\xc4. Any ideas?

ouyang

Gabriel Genellina · Dec 27, 2008

En Sat said:
Mark,
I have exactly copied your code into the htdocs of my Apache
server,

<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title></head>
<body>\xd6\xd0\xce\xc4</body></html>

but it still shows me \xd6\xd0\xce\xc4. Any ideas?

That's not the same thing as Mark T. said.
The original was Python code to *write* a test file that you could open in
a browser. Things like "\xd6\xd0" are escape sequences interpreted by
Python, not meant to literally appear in a file. Like \n -- it means
"start a new line", one wants a new line in the output, *not* a backslash
and a letter n. "\xd6\xd0" generate *two* bytes, not eight. If the file is
interpreted as containing latin-1 characters, you see them as Ã–Ã. But due
to the "charset=gb2312" line, those two bytes together make the ideograph
ä¸.

So, write the Python code (from f=open... up to f.close()) in a file and
execute it. Then open the generated test.html file. You should see the two
ideographs.

zxo102 · Dec 29, 2008

En Sat, 27 Dec 2008 03:03:24 -0200,zxo102<[email protected]> escribiÃ³:

That's not the same thing as Mark T. said.
The original was Python code to *write* a test file that you could open in
a browser. Things like "\xd6\xd0" are escape sequences interpreted by
Python, not meant to literally appear in a file. Like \n -- it means
"start a new line", one wants a new line in the output, *not* a backslash
and a letter n. "\xd6\xd0" generate *two* bytes, not eight. If the file is Â
interpreted as containing latin-1 characters, you see them as Ã–Ã. But due Â
to the "charset=gb2312" line, those two bytes together make the ideograph Â
ä¸.

So, write the Python code (from f=open... up to f.close()) in a file and
execute it. Then open the generated test.html file. You should see the two
ideographs.

Thanks for your explanation. The example works now. It is close to my
real case.

I have a list in a dictionary and want to insert it into the html
file. I test it with following scripts of CASE 1, CASE 2 and CASE 3. I
can see "ä¸æ–‡" in CASE 1 but that is not what I want. CASE 2 does not
show me correct things.
So, in CASE 3, I hacked the script of CASE 2 with a function:
conv_list2str() to 'convert' the list into a string. CASE 3 can show
me "ä¸æ–‡". I don't know what is wrong with CASE 2 and what is right with
CASE 3.

Without knowing why, I have just hard coded my python application
following CASE 3 for displaying Chinese characters from a list in a
dictionary in my web application.

Any ideas?

Happy a New Year: 2009

ouyang

CASE 1:
########################################################
f=open('test.html','wt')
f.write('''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = ['\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4', '\xd6\xd0\xce
\xc4']
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>''')
f.close()

CASE 2:
#######################################################
mydict = {}
mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
f_str = '''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %(JUNK)s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

f_str = f_str%mydict
f=open('test02.html','wt')
f.write(f_str)
f.close()

CASE 3:
###################################################
mydict = {}
mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']

f_str = '''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %(JUNK)s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

import string

def conv_list2str(value):
list_len = len(value)
list_str = "["
for ii in range(list_len):
list_str += '"'+string.strip(str(value[ii])) + '"'
if ii != list_len-1:
list_str += ","
list_str += "]"
return list_str

mydict['JUNK'] = conv_list2str(mydict['JUNK'])

f_str = f_str%mydict
f=open('test03.html','wt')
f.write(f_str)
f.close()

Mark Tolonen · Dec 29, 2008

I have a list in a dictionary and want to insert it into the html
file. I test it with following scripts of CASE 1, CASE 2 and CASE 3. I
can see "ä¸æ–‡" in CASE 1 but that is not what I want. CASE 2 does not
show me correct things.
So, in CASE 3, I hacked the script of CASE 2 with a function:
conv_list2str() to 'convert' the list into a string. CASE 3 can show
me "ä¸æ–‡". I don't know what is wrong with CASE 2 and what is right with
CASE 3.

Without knowing why, I have just hard coded my python application
following CASE 3 for displaying Chinese characters from a list in a
dictionary in my web application.

Any ideas?

See below each case...æ–°å¹´å¿«ä¹ï¼

Happy a New Year: 2009

ouyang

CASE 1:
########################################################
f=open('test.html','wt')
f.write('''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = ['\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4', '\xd6\xd0\xce
\xc4']
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>''')
f.close()

In CASE 1, the *4 bytes* D6 D0 CE C4 are written to the file, which is the
correct gb2312 encoding for ä¸æ–‡.

CASE 2:
#######################################################
mydict = {}
mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
f_str = '''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %(JUNK)s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

f_str = f_str%mydict
f=open('test02.html','wt')
f.write(f_str)
f.close()

In CASE 2, the *16 characters* "\xd6\xd0\xce\xc4" are written to the file,
which is NOT the correct gb2312 encoding for ä¸æ–‡, and will be interpreted
however javascript pleases. This is because the str() representation of
mydict['JUNK'] in Python 2.x is the characters "['\xd6\xd0\xce\xc4',
'\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4']".

CASE 3:
###################################################
mydict = {}
mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']

f_str = '''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %(JUNK)s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

import string

def conv_list2str(value):
list_len = len(value)
list_str = "["
for ii in range(list_len):
list_str += '"'+string.strip(str(value[ii])) + '"'
if ii != list_len-1:
list_str += ","
list_str += "]"
return list_str

mydict['JUNK'] = conv_list2str(mydict['JUNK'])

f_str = f_str%mydict
f=open('test03.html','wt')
f.write(f_str)
f.close()

CASE 3 works because you build your own, correct, gb2312 representation of
mydict['JUNK'] (value[ii] above is the correct 4-byte sequence for ä¸æ–‡).

That said, learn to use Unicode strings by trying the following program, but
set the first line to the encoding *your editor* saves files in. You can
use the actual Chinese characters instead of escape codes this way. The
encoding used for the source code and the encoding used for the html file
don't have to match, but the charset declared in the file and the encoding
used to write the file *do* have to match.

# coding: utf8

import codecs

mydict = {}
mydict['JUNK'] = [u'ä¸æ–‡',u'ä¸æ–‡',u'ä¸æ–‡']

def conv_list2str(value):
return u'["' + u'","'.join(s for s in value) + u'"]'

f_str = u'''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

s = conv_list2str(mydict['JUNK'])
f=codecs.open('test04.html','wt',encoding='gb2312')
f.write(f_str % s)
f.close()

-Mark

P.S. Python 3.0 makes this easier for what you want to do, because the
representation of a dictionary changes. You'll be able to skip the
conv_list2str() function and all strings are Unicode by default.

zxo102 · Dec 29, 2008

I have a list in a dictionary and want to insert it into the html
file. I test it with following scripts of CASE 1, CASE 2 and CASE 3. I
can see "ÖÐÎÄ" in CASE 1 but that is not what I want. CASE 2 does not
show me correct things.
So, in CASE 3, I hacked the script of CASE 2 with a function:
conv_list2str() to 'convert' the list into a string. CASE 3 can show
me "ÖÐÎÄ". I don't know what is wrong with CASE 2 and what is right with
CASE 3.

Click to expand...

Without knowing why, I have just hard coded my python application
following CASE 3 for displaying Chinese characters from a list in a
dictionary in my web application.

Click to expand...

Any ideas?

Click to expand...

See below each case...ÐÂÄê¿ìÀÖ£¡

Happy a New Year: 2009

ouyang

Click to expand...

CASE 1:
########################################################
f=open('test.html','wt')
f.write('''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = ['\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4', '\xd6\xd0\xce
\xc4']
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>''')
f.close()

Click to expand...

In CASE 1, the *4 bytes* D6 D0 CE C4 are written to the file, which is the
correct gb2312 encoding for ÖÐÎÄ.

CASE 2:
#######################################################
mydict = {}
mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
f_str = '''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %(JUNK)s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

Click to expand...

f_str = f_str%mydict
f=open('test02.html','wt')
f.write(f_str)
f.close()

Click to expand...

In CASE 2, the *16 characters* "\xd6\xd0\xce\xc4" are written to the file,
which is NOT the correct gb2312 encoding for ÖÐÎÄ, and will be interpreted
however javascript pleases. This is because the str() representation of
mydict['JUNK'] in Python 2.x is the characters "['\xd6\xd0\xce\xc4',
'\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4']".

CASE 3:
###################################################
mydict = {}
mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']

Click to expand...

f_str = '''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %(JUNK)s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

Click to expand...

import string

Click to expand...

def conv_list2str(value):
list_len = len(value)
list_str = "["
for ii in range(list_len):
list_str += '"'+string.strip(str(value[ii])) + '"'
if ii != list_len-1:
list_str += ","
list_str += "]"
return list_str

Click to expand...

mydict['JUNK'] = conv_list2str(mydict['JUNK'])

Click to expand...

f_str = f_str%mydict
f=open('test03.html','wt')
f.write(f_str)
f.close()

Click to expand...

CASE 3 works because you build your own, correct, gb2312 representation of
mydict['JUNK'] (value[ii] above is the correct 4-byte sequence for ÖÐÎÄ).

That said, learn to use Unicode strings by trying the following program, but
set the first line to the encoding *your editor* saves files in. You can
use the actual Chinese characters instead of escape codes this way. The
encoding used for the source code and the encoding used for the html file
don't have to match, but the charset declared in the file and the encoding
used to write the file *do* have to match.

# coding: utf8

import codecs

mydict = {}
mydict['JUNK'] = [u'ÖÐÎÄ',u'ÖÐÎÄ',u'ÖÐÎÄ']

def conv_list2str(value):
return u'["' + u'","'.join(s for s in value) + u'"]'

f_str = u'''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

s = conv_list2str(mydict['JUNK'])
f=codecs.open('test04.html','wt',encoding='gb2312')
f.write(f_str % s)
f.close()

-Mark

P.S. Python 3.0 makes this easier for what you want to do, because the
representation of a dictionary changes. You'll be able to skip the
conv_list2str() function and all strings are Unicode by default.

Thanks for your comments, Mark. I understand it now. The list(escape
codes): ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4'] is
from a postgresql database with "select" statement.I will postgresql
database configurations and see if it is possible to return ['ÖÐÎÄ','ÖÐ
ÎÄ','ÖÐÎÄ'] directly with "select" statement.

ouyang

Mark Tolonen · Dec 29, 2008

zxo102 said:
[snip]

That said, learn to use Unicode strings by trying the following program,
but
set the first line to the encoding *your editor* saves files in. You can
use the actual Chinese characters instead of escape codes this way. The
encoding used for the source code and the encoding used for the html file
don't have to match, but the charset declared in the file and the
encoding
used to write the file *do* have to match.

# coding: utf8

import codecs

mydict = {}
mydict['JUNK'] = [u'ÖÐÎÄ',u'ÖÐÎÄ',u'ÖÐÎÄ']

def conv_list2str(value):
return u'["' + u'","'.join(s for s in value) + u'"]'

f_str = u'''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

s = conv_list2str(mydict['JUNK'])
f=codecs.open('test04.html','wt',encoding='gb2312')
f.write(f_str % s)
f.close()

-Mark

P.S. Python 3.0 makes this easier for what you want to do, because the
representation of a dictionary changes. You'll be able to skip the
conv_list2str() function and all strings are Unicode by default.

Click to expand...

Thanks for your comments, Mark. I understand it now. The list(escape
codes): ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4'] is
from a postgresql database with "select" statement.I will postgresql
database configurations and see if it is possible to return ['ÖÐÎÄ','ÖÐ
ÎÄ','ÖÐÎÄ'] directly with "select" statement.

ouyang

The trick with working with Unicode is convert anything read into the
program (from a file, database, etc.) to Unicode characters, manipulate it,
then convert it back to a specific encoding when writing it back. So if
postgresql is returning gb2312 data, use:

data.decode('gb2312') to get the Unicode equivalent:
ÖÐÎÄ

Google for some Python Unicode tutorials.

-Mark

how to pass a dictionary (including chinese characters) through Queueas is?	0	Oct 25, 2008
how to show Chinese Characters in the value set of a dictionary	4	Jan 1, 2006
Cannot display Chinese data from database	1	Nov 25, 2005
how to display/input/write Chinese Text in java	6	Feb 20, 2008
In python CGI, how to pass "hello" back to a javascript function asan argument at client side?	2	Oct 12, 2009
How to display HTML that is the data within and XML file.	12	Dec 19, 2009
[ANN] libgmail 0.0.1 -- Gmail access via Python	10	Jul 1, 2004
how to copy contents of literal to clipboard via button on a page?	0	Apr 12, 2008

How to display Chinese in a list retrieved from database via python

zxo102

zxo102

Jeroen Ruigrok van der Werven

Gabriel Genellina

zxo102

Mark Tolonen

zxo102

Gabriel Genellina

zxo102

Mark Tolonen

zxo102

Mark Tolonen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads