How to convert between Japanese coding systems?

Dietrich Bollmann · Feb 19, 2009

Hi,

Are there any functions in python to convert between different Japanese
coding systems?

I would like to convert between (at least) ISO-2022-JP, UTF-8, EUC-JP
and SJIS. I also need some function to encode / decode base64 encoded
strings.

I get the strings (which actually are emails) from a server on the
internet with:

import urllib
server = urllib.urlopen(serverURL, parameters)
email = server.read()

The coding systems are given in the response string:

Example:

email = '''[...]
Subject:
=?UTF-8?Q?romaji=E3=81=B2=E3=82=89=E3=81=8C=E3=81=AA=E3=82=AB=E3=82=BF?=
=?UTF-8?Q?=E3=82=AB=E3=83=8A=E6=BC=A2=E5=AD=97?=
[...]
Content-Type: text/plain; charset=EUC-JP
[...]
Content-Transfer-Encoding: base64
[...]

cm9tYWpppNKk6aSspMqlq6W/paulyrTBu/oNCg0K

'''

My idea is to first parse the 'email' string and to extract the email
body as well as the values of the 'Subject: ', the 'Content-Type: ' and
the 'Content-Transfer-Encoding: ' attributes and to after use them to
convert them to some other coding system:

Something in the lines of:

(subject, contentType, contentTransferEncoding, content) =
parseEmail(email)

to = 'utf-8'
subjectUtf8 = decodeSubject(subject, to)

from = contentType
to = 'utf-8'
contentUtf8 = convertCodingSystem(decodeBase64(content), from, to)

The only problem is that I could not find any standard functionality to
convert between different Japanese coding systems.

Thanks,

Dietrich Bollmann

Peter Otten · Feb 19, 2009

Dietrich said:
I get the strings (which actually are emails) from a server on the
internet with:

import urllib
server = urllib.urlopen(serverURL, parameters)
email = server.read()

The coding systems are given in the response string:

Example:

email = '''[...]
Subject:
=?UTF-8?Q?romaji=E3=81=B2=E3=82=89=E3=81=8C=E3=81=AA=E3=82=AB=E3=82=BF?=
=?UTF-8?Q?=E3=82=AB=E3=83=8A=E6=BC=A2=E5=AD=97?=
[...]
Content-Type: text/plain; charset=EUC-JP
[...]
Content-Transfer-Encoding: base64
[...]

cm9tYWpppNKk6aSspMqlq6W/paulyrTBu/oNCg0K

'''

Is that an email? Maybe you can get it in a format that is supported by the
email package in the standard library.

The only problem is that I could not find any standard functionality to
convert between different Japanese coding systems.

Then you didn't look hard enough:
'\x89\xef\x8e\xd0\x8aT\x97v'

See also http://www.amk.ca/python/howto/unicode

Peter

Justin Ezequiel · Feb 19, 2009

Are there any functions in python to convert between different Japanese
coding systems?

I would like to convert between (at least) ISO-2022-JP, UTF-8, EUC-JP
and SJIS. I also need some function to encode / decode base64 encoded
strings.

Example:

email = '''[...]
Subject:
=?UTF-8?Q?romaji=E3=81=B2=E3=82=89=E3=81=8C=E3=81=AA=E3=82=AB=E3=82=BF?=
=?UTF-8?Q?=E3=82=AB=E3=83=8A=E6=BC=A2=E5=AD=97?=
[...]
Content-Type: text/plain; charset=EUC-JP
[...]
Content-Transfer-Encoding: base64
[...]

cm9tYWpppNKk6aSspMqlq6W/paulyrTBu/oNCg0K

'''

from = contentType
to = 'utf-8'
contentUtf8 = convertCodingSystem(decodeBase64(content), from, to)

The only problem is that I could not find any standard functionality to
convert between different Japanese coding systems.

Thanks,

Dietrich Bollmann

import base64

ENCODINGS = ['ISO-2022-JP', 'UTF-8', 'EUC-JP', 'SJIS']

def decodeBase64(content):
return base64.decodestring(content)

def convertCodingSystem(s, _from, _to):
unicode = s.decode(_from)
return unicode.encode(_to)

if __name__ == '__main__':
content = 'cm9tYWpppNKk6aSspMqlq6W/paulyrTBu/oNCg0K'
_from = 'EUC-JP'
for _to in ENCODINGS:
x = convertCodingSystem(decodeBase64(content), _from, _to)
print _to, repr(x)

Justin Ezequiel · Feb 19, 2009

import email
from email.Header import decode_header
from unicodedata import name as un

MS = '''\
Subject: =?UTF-8?Q?
romaji=E3=81=B2=E3=82=89=E3=81=8C=E3=81=AA=E3=82=AB=E3=82=BF?=
Date: Thu, 19 Feb 2009 09:34:56 -0000
MIME-Version: 1.0
Content-Type: text/plain; charset=EUC-JP
Content-Transfer-Encoding: base64

cm9tYWpppNKk6aSspMqlq6W/paulyrTBu/oNCg0K
'''

def get_header(msg, name):
(value, charset), = decode_header(msg.get(name))
if not charset: return value
return value.decode(charset)

if __name__ == '__main__':
msg = email.message_from_string(MS)
s = get_header(msg, 'Subject')
print repr(s)
for c in s:
try: print un(c)
except ValueError: print repr(c)
print

e = msg.get_content_charset()
b = msg.get_payload(decode=True).decode(e)
print repr(b)
for c in b:
try: print un(c)
except ValueError: print repr(c)
print

How do I convert MBOX to PST for Outlook 365?	5	Feb 17, 2025
How to convert MBOX to HTML for email backup?	1	Mar 7, 2026
How to convert MBOX to PST in easy steps?	4	Dec 28, 2024
Can I convert MBOX to DOC without losing email formatting?	0	Apr 13, 2026
How to convert EML files to PDF format using a converter tool?	3	Jan 23, 2025
Why should I convert PST file to CSV format?	1	Apr 2, 2026
How to convert Excel to vCard without losing contact details	3	Jan 30, 2025
How to Convert Excel to VCF Format Quickly and Easily?	2	Jan 31, 2025

How to convert between Japanese coding systems?

Dietrich Bollmann

Peter Otten

Justin Ezequiel

Justin Ezequiel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads