Read utf-8 file


M

moonhkt

File have China Made
¤¤°ê »s

http://www.fileformat.info/info/unicode/char/4e2d/index.htm
UTF-16 (hex) 0x4E2D (4e2d)
UTF-8 (hex) 0xE4 0xB8 0xAD (e4b8ad)


Read by od -cx utf_a.text
0000000 ¤¤ ** ** °ê ** ** »s ** ** \n
e4b8 ade5 9c8b e8a3 bd0a
0000012

Read by python, why python display as beow ?

¤¤°ê»s

u'\u4e2d\u570b\u88fd\n' <--- Value ¤¤°ê»s
<-- UTF-8 value
u'\u4e2d' ¤¤ CJK UNIFIED IDEOGRAPH-4E2D
u'\u570b' °ê CJK UNIFIED IDEOGRAPH-570B
u'\u88fd' »s CJK UNIFIED IDEOGRAPH-88FD

import unicodedata
import codecs # UNICODE
.....

file = codecs.open(options.filename, 'r','utf-8' )
try:
for line in file:
#print repr(line)
#print "========="
print line.encode("utf")
for keys in line.split(","):

print repr(keys) ," <--- Value" , keys.encode("utf") ,"<--
UTF-8 value"
for key in keys:
try:
name = unicodedata.name(unicode(key))
print "%-9s %-8s %-30s" % ( (repr(key)),
key.encode("utf") , name )


How to display
e4b8ad for ¤¤ in python ?
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top