How to display Chinese in a list retrieved from database via python

Z

zxo102

Hi,
I retrieve some info in Chinese from postgresql and assign it to a
variable 'info' defined in javascript of a html page:
var info = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
But I want it to be
var info = ['¤¤¤å','¤¤¤å','¤¤¤å']
since in html pages (via javascript), the items in chinese out of the
former :['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4'] can
not be displayed correctly when it is inserted into a html page. If
the list is var info = ['¤¤¤å','¤¤¤å','¤¤¤å'] , then everything works
fine.

Anybody knows how to solve this problem?

Thanks in advance.


zxo102
 
Z

zxo102

Hi,
I retrieve some info in Chinese from postgresql and assign it to a
variable 'info' defined in javascript of a html page:
var info = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
But I want it to be
var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ']
since in html pages (via javascript), the items in chinese out of the
former :['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4'] can
not be displayed correctly when it is inserted into a html page. If
the list is var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ'] , then everything works
fine.
Anybody knows how to solve this problem?

Upgrading to Python 2.6 would probably be beneficial due to its better
handling of Unicode.
Also, posting some of the actual code you're using (to generate
JavaScript, I guess?) would definitely help.

Merry Christmas,
Chris

Hi Chris:

I have to use python2.4 since many other python packages have not been
updated to 2.6.
Here is my demo:
I create a table in postgresql:

create table my_table (
id serial,
name char(20),
phone char(20)
);

and insert two records into the table
(4, 'ÖÐÎÄ', '1334995555555')
(5, 'ÖÐÎÄ', '3434343434343')


I would like to generate a html page dynamically, here is the demo
script

############################################################
def do_search(a):
# µ÷ÓÃODBCÄ£¿é
import odbc
# ͨ¹ýODBCÖеÄ"my_odbc"ºÍÏàÓ¦µÄÊý¾Ý¿â½¨Á¢Á¬½Ó
cc = odbc.odbc('dsn=wisco')
cursor1 = cc.cursor()
# ½«Êý¾Ý´æÈëÊý¾Ý¿âÖеÄ"my_table"
#cursor1.execute("select * from my_table where name = '%s' "%a)
cursor1.execute("select * from my_table where name like '%%%s%%'
"%a)
rr = cursor1.fetchall()
# ÏÔʾÓû§²éµ½µÄÊý¾Ý
row01 = rr[0]
row02 = rr[1]
print row01, row02
html_str = ''
#print "Content-Type: text/html\n\n"
html_str += "<html><head> <title> test </title></head><body> \n"
html_str += "<script language=javascript>\n"
html_str += " var row01 = %s\n"
html_str += " var row02 = %s\n"
html_str += "</script>\n"
html_str += " </body></html>\n"
html_str = html_str%(row01,row02)
f = open('c:\\xbop_sinopec\\apache\\htdocs\\test01.html','w')
f.write(html_str)
f.close

do_search('ÖÐÎÄ')
#########################################################

The html code is as follows

<html><head> <title> test </title></head><body>
<script language=javascript>
var row01 = (1, '\xd6\xd0\xce\xc4', '1334995555555')
var row02 = (2, '\xd6\xd0\xce\xc4', '3434343434343')
</script>
</body></html>

But the 'ÖÐÎÄ' is '\xd6\xd0\xce\xc4'. When row01 and row02 is called
from somewhere,
'\xd6\xd0\xce\xc4' can not be displayed correctly as 'ÖÐÎÄ' in html
environment.

Thanks for your help and Merry Christmas to you too.

ouyang
 
J

Jeroen Ruigrok van der Werven

-On [20081225 08:30] said:
Anybody knows how to solve this problem?

You are assigning/pushing out Python byte sequences, not Unicode. Look at
u'' string variables, x.encode() and x.decode() to help you.
It's widely documented on the Internet, a quick Python Unicode (with encode
or decode) should get you there.
 
G

Gabriel Genellina

Hi,
I retrieve some info in Chinese from postgresql and assign it to a
variable 'info' defined in javascript of a html page:
var info = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
But I want it to be
var info = ['中文','中文','中文']
since in html pages (via javascript), the items in chinese out of the
former :['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4'] can
not be displayed correctly when it is inserted into a html page. If
the list is var info = ['中文','中文','中文'] , then everything works
fine.

The html code is as follows

<html><head> <title> test </title></head><body>
<script language=javascript>
var row01 = (1, '\xd6\xd0\xce\xc4', '1334995555555')
var row02 = (2, '\xd6\xd0\xce\xc4', '3434343434343')
</script>
</body></html>

But the '中文' is '\xd6\xd0\xce\xc4'. When row01 and row02 is called
from somewhere,
'\xd6\xd0\xce\xc4' can not be displayed correctly as '中文' in html
environment.

You forgot to specify the page encoding, gb2312 presumably. If adding the
encoding does not help, I'd say the problem must reside on how you later
use row01 and row02 (your html page does not those variables for
anything). '中文' is the same as '\xd6\xd0\xce\xc4', and both javascript
and Python share the same representation for strings (mostly) so this
should not be an issue.

My PC is unable to display those characters, but I get "true" from this:

<html><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset='gb2312'"><title> test </title></head>
<body><script
language=javascript>alert('中文'=='\xd6\xd0\xce\xc4')</script></body></html>
 
Z

zxo102

En Thu, 25 Dec 2008 07:27:03 -0200, zxo102 <[email protected]> escribi¨®:


Hi,
I retrieve some info in Chinese from postgresql and assign it to a
variable 'info' defined in javascript of a html page:
var info = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
But I want it to be
var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ']
since in html pages (via javascript), the items in chinese out of the
former :['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4'] can
not be displayed correctly when it is inserted into a html page. If
the list is var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ'] , then everything works
fine.
The html code is as follows
<html><head> <title> test </title></head><body>
<script language=javascript>
var row01 = (1, '\xd6\xd0\xce\xc4', '1334995555555')
var row02 = (2, '\xd6\xd0\xce\xc4', '3434343434343')
</script>
</body></html>
But the 'ÖÐÎÄ' is '\xd6\xd0\xce\xc4'. When row01 and row02 is called
from somewhere,
'\xd6\xd0\xce\xc4' can not be displayed correctly as 'ÖÐÎÄ' in html
environment.

You forgot to specify the page encoding, gb2312 presumably. If adding the
encoding does not help, I'd say the problem must reside on how you later
use row01 and row02 (your html page does not those variables for
anything). 'ÖÐÎÄ' is the same as '\xd6\xd0\xce\xc4', and both javascript
and Python share the same representation for strings (mostly) so this
should not be an issue.

My PC is unable to display those characters, but I get "true" from this:

<html><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset='gb2312'"><title> test </title></head>
<body><script
language=javascript>alert('ÖÐÎÄ'=='\xd6\xd0\xce\xc4')</script></body></html>

I did that: <META HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset='gb2312'">, but it does not work. Alert('\xd6\xd0\xce\xc4') displays some "junks". I am thinking there may be some way to convert '\xd6\xd0\xce\xc4' to 'ÖÐÎÄ' in the list with python before I generate the html page. As a result, when I open the html file with Vi, I can see 'ÖÐÎÄ' directly instead of '\xd6\xd0\xce\xc4'. That will solve my problem.

Any ideas?

Ouyang
 
M

Mark Tolonen

zxo102 said:
En Thu, 25 Dec 2008 07:27:03 -0200, zxo102 <[email protected]> escribi¨®:


Hi,
I retrieve some info in Chinese from postgresql and assign it to
a
variable 'info' defined in javascript of a html page:
var info = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
But I want it to be
var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ']
since in html pages (via javascript), the items in chinese out of
the
former :['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4']
can
not be displayed correctly when it is inserted into a html page. If
the list is var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ'] , then everything
works
fine.
The html code is as follows
<html><head> <title> test </title></head><body>
<script language=javascript>
var row01 = (1, '\xd6\xd0\xce\xc4', '1334995555555')
var row02 = (2, '\xd6\xd0\xce\xc4', '3434343434343')
</script>
</body></html>
But the 'ÖÐÎÄ' is '\xd6\xd0\xce\xc4'. When row01 and row02 is called
from somewhere,
'\xd6\xd0\xce\xc4' can not be displayed correctly as 'ÖÐÎÄ' in html
environment.

You forgot to specify the page encoding, gb2312 presumably. If adding the
encoding does not help, I'd say the problem must reside on how you later
use row01 and row02 (your html page does not those variables for
anything). 'ÖÐÎÄ' is the same as '\xd6\xd0\xce\xc4', and both javascript
and Python share the same representation for strings (mostly) so this
should not be an issue.

My PC is unable to display those characters, but I get "true" from this:

<html><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset='gb2312'"><title> test </title></head>
<body><script
language=javascript>alert('ÖÐÎÄ'=='\xd6\xd0\xce\xc4')</script></body></html>

I did that: <META HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset='gb2312'">, but it does not work. Alert('\xd6\xd0\xce\xc4')
displays some "junks". I am thinking there may be some way to convert
'\xd6\xd0\xce\xc4' to 'ÖÐÎÄ' in the list with python before I generate
the html page. As a result, when I open the html file with Vi, I can see
'ÖÐÎÄ' directly instead of '\xd6\xd0\xce\xc4'. That will solve my
problem.

Any ideas?

Use charset=gb2312 instead of charset='gb2312'(remove single quotes).

I was able to display ÖÐÎÄ successfully with this code:

f=open('test.html','wt')
f.write('''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title></head>
<body>\xd6\xd0\xce\xc4</body></html>''')
f.close()

-Mark
 
Z

zxo102

En Thu, 25 Dec 2008 07:27:03 -0200, zxo102 <[email protected]> escribi¨®:
Hi,
I retrieve some info in Chinese from postgresql and assign it to
a
variable 'info' defined in javascript of a html page:
var info = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
But I want it to be
var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ']
since in html pages (via javascript), the items in chinese out of
the
former :['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4']
can
not be displayed correctly when it is inserted into a html page. If
the list is var info = ['ÖÐÎÄ','ÖÐÎÄ','ÖÐÎÄ'] , then everything
works
fine.
The html code is as follows
<html><head> <title> test </title></head><body>
<script language=javascript>
var row01 = (1, '\xd6\xd0\xce\xc4', '1334995555555')
var row02 = (2, '\xd6\xd0\xce\xc4', '3434343434343')
</script>
</body></html>
But the 'ÖÐÎÄ' is '\xd6\xd0\xce\xc4'. When row01 and row02 is called
from somewhere,
'\xd6\xd0\xce\xc4' can not be displayed correctly as 'ÖÐÎÄ' in html
environment.
You forgot to specify the page encoding, gb2312 presumably. If adding the
encoding does not help, I'd say the problem must reside on how you later
use row01 and row02 (your html page does not those variables for
anything). 'ÖÐÎÄ' is the same as '\xd6\xd0\xce\xc4', and both javascript
and Python share the same representation for strings (mostly) so this
should not be an issue.
My PC is unable to display those characters, but I get "true" from this:
<html><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset='gb2312'"><title> test </title></head>
<body><script
language=javascript>alert('ÖÐÎÄ'=='\xd6\xd0\xce\xc4')</script></body></html>
I did that: <META HTTP-EQUIV="Content-Type" CONTENT="text/html;
Any ideas?

Use charset=gb2312 instead of charset='gb2312'(remove single quotes).

I was able to display ÖÐÎÄ successfully with this code:

f=open('test.html','wt')
f.write('''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title></head>
<body>\xd6\xd0\xce\xc4</body></html>''')
f.close()

-Mark- Òþ²Ø±»ÒýÓÃÎÄ×Ö -

- ÏÔʾÒýÓõÄÎÄ×Ö -

Mark,
I have exactly copied your code into the htdocs of my Apache
server,

<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title></head>
<body>\xd6\xd0\xce\xc4</body></html>

but it still shows me \xd6\xd0\xce\xc4. Any ideas?

ouyang
 
G

Gabriel Genellina

Mark,
I have exactly copied your code into the htdocs of my Apache
server,

<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title></head>
<body>\xd6\xd0\xce\xc4</body></html>

but it still shows me \xd6\xd0\xce\xc4. Any ideas?

That's not the same thing as Mark T. said.
The original was Python code to *write* a test file that you could open in
a browser. Things like "\xd6\xd0" are escape sequences interpreted by
Python, not meant to literally appear in a file. Like \n -- it means
"start a new line", one wants a new line in the output, *not* a backslash
and a letter n. "\xd6\xd0" generate *two* bytes, not eight. If the file is
interpreted as containing latin-1 characters, you see them as ÖÃ. But due
to the "charset=gb2312" line, those two bytes together make the ideograph
中.

So, write the Python code (from f=open... up to f.close()) in a file and
execute it. Then open the generated test.html file. You should see the two
ideographs.
 
Z

zxo102

En Sat, 27 Dec 2008 03:03:24 -0200,zxo102<[email protected]> escribió:







That's not the same thing as Mark T. said.
The original was Python code to *write* a test file that you could open in
a browser. Things like "\xd6\xd0" are escape sequences interpreted by
Python, not meant to literally appear in a file. Like \n -- it means
"start a new line", one wants a new line in the output, *not* a backslash
and a letter n. "\xd6\xd0" generate *two* bytes, not eight. If the file is  
interpreted as containing latin-1 characters, you see them as ÖÃ. But due  
to the "charset=gb2312" line, those two bytes together make the ideograph  
中.

So, write the Python code (from f=open... up to f.close()) in a file and
execute it. Then open the generated test.html file. You should see the two
ideographs.

Thanks for your explanation. The example works now. It is close to my
real case.

I have a list in a dictionary and want to insert it into the html
file. I test it with following scripts of CASE 1, CASE 2 and CASE 3. I
can see "中文" in CASE 1 but that is not what I want. CASE 2 does not
show me correct things.
So, in CASE 3, I hacked the script of CASE 2 with a function:
conv_list2str() to 'convert' the list into a string. CASE 3 can show
me "中文". I don't know what is wrong with CASE 2 and what is right with
CASE 3.

Without knowing why, I have just hard coded my python application
following CASE 3 for displaying Chinese characters from a list in a
dictionary in my web application.

Any ideas?

Happy a New Year: 2009

ouyang



CASE 1:
########################################################
f=open('test.html','wt')
f.write('''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = ['\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4', '\xd6\xd0\xce
\xc4']
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>''')
f.close()

CASE 2:
#######################################################
mydict = {}
mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
f_str = '''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %(JUNK)s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

f_str = f_str%mydict
f=open('test02.html','wt')
f.write(f_str)
f.close()

CASE 3:
###################################################
mydict = {}
mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']

f_str = '''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %(JUNK)s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

import string

def conv_list2str(value):
list_len = len(value)
list_str = "["
for ii in range(list_len):
list_str += '"'+string.strip(str(value[ii])) + '"'
if ii != list_len-1:
list_str += ","
list_str += "]"
return list_str

mydict['JUNK'] = conv_list2str(mydict['JUNK'])

f_str = f_str%mydict
f=open('test03.html','wt')
f.write(f_str)
f.close()
 
M

Mark Tolonen

I have a list in a dictionary and want to insert it into the html
file. I test it with following scripts of CASE 1, CASE 2 and CASE 3. I
can see "中文" in CASE 1 but that is not what I want. CASE 2 does not
show me correct things.
So, in CASE 3, I hacked the script of CASE 2 with a function:
conv_list2str() to 'convert' the list into a string. CASE 3 can show
me "中文". I don't know what is wrong with CASE 2 and what is right with
CASE 3.

Without knowing why, I have just hard coded my python application
following CASE 3 for displaying Chinese characters from a list in a
dictionary in my web application.

Any ideas?

See below each case...æ–°å¹´å¿«ä¹ï¼
Happy a New Year: 2009

ouyang



CASE 1:
########################################################
f=open('test.html','wt')
f.write('''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = ['\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4', '\xd6\xd0\xce
\xc4']
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>''')
f.close()

In CASE 1, the *4 bytes* D6 D0 CE C4 are written to the file, which is the
correct gb2312 encoding for 中文.
CASE 2:
#######################################################
mydict = {}
mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
f_str = '''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %(JUNK)s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

f_str = f_str%mydict
f=open('test02.html','wt')
f.write(f_str)
f.close()

In CASE 2, the *16 characters* "\xd6\xd0\xce\xc4" are written to the file,
which is NOT the correct gb2312 encoding for 中文, and will be interpreted
however javascript pleases. This is because the str() representation of
mydict['JUNK'] in Python 2.x is the characters "['\xd6\xd0\xce\xc4',
'\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4']".
CASE 3:
###################################################
mydict = {}
mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']

f_str = '''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %(JUNK)s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

import string

def conv_list2str(value):
list_len = len(value)
list_str = "["
for ii in range(list_len):
list_str += '"'+string.strip(str(value[ii])) + '"'
if ii != list_len-1:
list_str += ","
list_str += "]"
return list_str

mydict['JUNK'] = conv_list2str(mydict['JUNK'])

f_str = f_str%mydict
f=open('test03.html','wt')
f.write(f_str)
f.close()

CASE 3 works because you build your own, correct, gb2312 representation of
mydict['JUNK'] (value[ii] above is the correct 4-byte sequence for 中文).

That said, learn to use Unicode strings by trying the following program, but
set the first line to the encoding *your editor* saves files in. You can
use the actual Chinese characters instead of escape codes this way. The
encoding used for the source code and the encoding used for the html file
don't have to match, but the charset declared in the file and the encoding
used to write the file *do* have to match.

# coding: utf8

import codecs

mydict = {}
mydict['JUNK'] = [u'中文',u'中文',u'中文']

def conv_list2str(value):
return u'["' + u'","'.join(s for s in value) + u'"]'

f_str = u'''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

s = conv_list2str(mydict['JUNK'])
f=codecs.open('test04.html','wt',encoding='gb2312')
f.write(f_str % s)
f.close()


-Mark

P.S. Python 3.0 makes this easier for what you want to do, because the
representation of a dictionary changes. You'll be able to skip the
conv_list2str() function and all strings are Unicode by default.
 
Z

zxo102

I have a list in a dictionary and want to insert it into the html
file. I test it with following scripts of CASE 1, CASE 2 and CASE 3. I
can see "ÖÐÎÄ" in CASE 1 but that is not what I want. CASE 2 does not
show me correct things.
So, in CASE 3, I hacked the script of CASE 2 with a function:
conv_list2str() to 'convert' the list into a string. CASE 3 can show
me "ÖÐÎÄ". I don't know what is wrong with CASE 2 and what is right with
CASE 3.
Without knowing why, I have just hard coded my python application
following CASE 3 for displaying Chinese characters from a list in a
dictionary in my web application.
Any ideas?

See below each case...ÐÂÄê¿ìÀÖ£¡


Happy a New Year: 2009

CASE 1:
########################################################
f=open('test.html','wt')
f.write('''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = ['\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4', '\xd6\xd0\xce
\xc4']
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>''')
f.close()

In CASE 1, the *4 bytes* D6 D0 CE C4 are written to the file, which is the
correct gb2312 encoding for ÖÐÎÄ.


CASE 2:
#######################################################
mydict = {}
mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
f_str = '''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %(JUNK)s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''
f_str = f_str%mydict
f=open('test02.html','wt')
f.write(f_str)
f.close()

In CASE 2, the *16 characters* "\xd6\xd0\xce\xc4" are written to the file,
which is NOT the correct gb2312 encoding for ÖÐÎÄ, and will be interpreted
however javascript pleases. This is because the str() representation of
mydict['JUNK'] in Python 2.x is the characters "['\xd6\xd0\xce\xc4',
'\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4']".


CASE 3:
###################################################
mydict = {}
mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
\xc4']
f_str = '''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %(JUNK)s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''
import string
def conv_list2str(value):
list_len = len(value)
list_str = "["
for ii in range(list_len):
list_str += '"'+string.strip(str(value[ii])) + '"'
if ii != list_len-1:
list_str += ","
list_str += "]"
return list_str
mydict['JUNK'] = conv_list2str(mydict['JUNK'])
f_str = f_str%mydict
f=open('test03.html','wt')
f.write(f_str)
f.close()

CASE 3 works because you build your own, correct, gb2312 representation of
mydict['JUNK'] (value[ii] above is the correct 4-byte sequence for ÖÐÎÄ).

That said, learn to use Unicode strings by trying the following program, but
set the first line to the encoding *your editor* saves files in. You can
use the actual Chinese characters instead of escape codes this way. The
encoding used for the source code and the encoding used for the html file
don't have to match, but the charset declared in the file and the encoding
used to write the file *do* have to match.

# coding: utf8

import codecs

mydict = {}
mydict['JUNK'] = [u'ÖÐÎÄ',u'ÖÐÎÄ',u'ÖÐÎÄ']

def conv_list2str(value):
return u'["' + u'","'.join(s for s in value) + u'"]'

f_str = u'''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

s = conv_list2str(mydict['JUNK'])
f=codecs.open('test04.html','wt',encoding='gb2312')
f.write(f_str % s)
f.close()

-Mark

P.S. Python 3.0 makes this easier for what you want to do, because the
representation of a dictionary changes. You'll be able to skip the
conv_list2str() function and all strings are Unicode by default.

Thanks for your comments, Mark. I understand it now. The list(escape
codes): ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4'] is
from a postgresql database with "select" statement.I will postgresql
database configurations and see if it is possible to return ['ÖÐÎÄ','ÖÐ
ÎÄ','ÖÐÎÄ'] directly with "select" statement.

ouyang
 
M

Mark Tolonen

zxo102 said:
[snip]
That said, learn to use Unicode strings by trying the following program,
but
set the first line to the encoding *your editor* saves files in. You can
use the actual Chinese characters instead of escape codes this way. The
encoding used for the source code and the encoding used for the html file
don't have to match, but the charset declared in the file and the
encoding
used to write the file *do* have to match.

# coding: utf8

import codecs

mydict = {}
mydict['JUNK'] = [u'ÖÐÎÄ',u'ÖÐÎÄ',u'ÖÐÎÄ']

def conv_list2str(value):
return u'["' + u'","'.join(s for s in value) + u'"]'

f_str = u'''<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
<title>test</title>
<script language=javascript>
var test = %s
alert(test[0])
alert(test[1])
alert(test[2])
</script>
</head>
<body></body></html>'''

s = conv_list2str(mydict['JUNK'])
f=codecs.open('test04.html','wt',encoding='gb2312')
f.write(f_str % s)
f.close()

-Mark

P.S. Python 3.0 makes this easier for what you want to do, because the
representation of a dictionary changes. You'll be able to skip the
conv_list2str() function and all strings are Unicode by default.

Thanks for your comments, Mark. I understand it now. The list(escape
codes): ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4'] is
from a postgresql database with "select" statement.I will postgresql
database configurations and see if it is possible to return ['ÖÐÎÄ','ÖÐ
ÎÄ','ÖÐÎÄ'] directly with "select" statement.

ouyang

The trick with working with Unicode is convert anything read into the
program (from a file, database, etc.) to Unicode characters, manipulate it,
then convert it back to a specific encoding when writing it back. So if
postgresql is returning gb2312 data, use:

data.decode('gb2312') to get the Unicode equivalent:
ÖÐÎÄ

Google for some Python Unicode tutorials.

-Mark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top