Shift-JIS to UTF-8 conversion

P

PyTJ

Hello everybody,

I need to convert a Japanese Shift-JIS CSV file to Unicode UTF-8.

My machine is a Windows 98 english computer with Python 2.3.4

Any hints?.
 
J

Jeff Epler

I think you do something like this (untested):

import codecs

def transcode(infile, outfile, incoding="shift-jis",
outcoding="utf-8"):
f = codecs.open(infile, "rb", incoding)
g = codecs.open(outfile, "wb", outcoding)

g.write(f.read())
# If the file is so large that it can't be read at once, do a loop which
# reads and writes smaller chunks
# while 1:
# block = f.read(4096000)
# if not block: break
# g.write(block)

f.close()
g.close()

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFCjRzZJd01MZaTXX0RAg8YAJ4rQ8Fcpwi1AB2a/ZVdALGysct8jACfYdXm
in2aJ3xmdB0ncRZBWXmfMQs=
=bHjV
-----END PGP SIGNATURE-----
 
R

rbsharp

Hello,
I think the answer is basically correct but shift-jis is not a standard
part of
Python 2.3. You will either need to use Python 2.4 where the cjkcodes
are integrated or install them under Python 2.3. The link is
http://cjkpython.i18n.org/

You then also need:
import cjkcodecs.aliases

Richard
 
G

George Yoshida

PyTJ said:
I need to convert a Japanese Shift-JIS CSV file to Unicode UTF-8.

My machine is a Windows 98 english computer with Python 2.3.4

Any hints?.

First, you need to install codecs to support japanese encodings.
Python 2.3.* does not support SJIS by default.

I'll give you two options.

- Japanese Codecs
http://www.python.jp/Zope/download/JapaneseCodecs

http://ftp.python.jp/pub/JapaneseCodecs/JapaneseCodecs-1.4.10.win32-py2.3.exe

- CJKCodecs
http://cjkpython.i18n.org/
http://download.berlios.de/cjkpython/cjkcodecs-1.1.win32-py2.3.exe

If you only need Japanese support, Japanese Codecs might be handy.
On the other hand, CJKCodecs can handle much broader encodings.
Aside from that, starting from 2.4, Python ships with CJKCodecs,
so I'd recomment CJKCodecs without reservations.

-- george
 
J

Jeff Epler

Hello, I think the answer is basically correct but shift-jis is not a
standard part of Python 2.3.

Ah, I was fooled --- I tested on Python 2.3, but my packager must have
included the codecs you went on to mention.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFCkcJCJd01MZaTXX0RAhNUAKCLbSsAAzxXe9UIjMXd5AN/wKcfbQCeI9j0
lpU5Zu0BgAdD2hTFvKB8kJs=
=Tof0
-----END PGP SIGNATURE-----
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top