Big5--->GB converter

G

Gordon Beaton

Anyone can give me an example? I know Java is able to do that.

There seem to be several "GB" encodings like GBK, GB18030, x-EUC-CN
and ISO2022_CN_GB. You can see for yourself here:
http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html

Create an InputStreamReader and OutputStreamWriter, specifying
appropriate encodings for each, then simply copy your data from one to
the other:

// data source
InputStream is = ...
InputStreamReader isr = new InputStreamReader(is,"Big5");
BufferedReader br = new BufferedReader(isr);

// destination
OutputStream os = ...
OutputStreamWriter osw = new OutputStreamWriter(os,"GBK");
BufferedWriter bw = new BufferedWriter(osw);

String line;

while ((line = br.readLine()) != null) {
bw.write(line);
bw.newLine();
}

br.close();
bw.close();

/gordon
 
T

terry

Gordon Beaton said:
There seem to be several "GB" encodings like GBK, GB18030, x-EUC-CN
and ISO2022_CN_GB. You can see for yourself here:
http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html

Create an InputStreamReader and OutputStreamWriter, specifying
appropriate encodings for each, then simply copy your data from one to
the other:

// data source
InputStream is = ...
InputStreamReader isr = new InputStreamReader(is,"Big5");
BufferedReader br = new BufferedReader(isr);

// destination
OutputStream os = ...
OutputStreamWriter osw = new OutputStreamWriter(os,"GBK");
BufferedWriter bw = new BufferedWriter(osw);

String line;

while ((line = br.readLine()) != null) {
bw.write(line);
bw.newLine();
}

br.close();
bw.close();

/gordon

However, I find the character returned is incorrect
For example, 歡(6B61) returns (欢)9919.
But I have found that 欢 is 6B22.
 
M

Michael Lee

Converting Big5 text to GB text is not as simple as it seems.

Some facts first:

1. Big5 is the de facto Traditional Chinese encoding scheme.

2. Big5_HKSCS is Big5 plus the Hong Kong Supplimentary Character Set, so it
is a superset of Big5. But note that HKSCS is only used in Hong Kong.

3. GBK is the de facto Simplified Chinese encoding scheme.

4. Both Big5_HKSCS and GBK are subsets of Unicode.

5. A subset of GBK is a subset of Big5_HKSCS.

There is no problem converting Big5 or GBK to Unicode. But due to the facts
listed above, it is obvious that not every character in Big5 has a
corresponding mapping in GBK.

It is still possible to perform such a conversion because almost every Big5
character has a corresponding GBK character *linguistically*, but Java's API
doesn't provide any means to perform this kind of conversion.


Michael Lee



----- Original Message -----
From: "terry" <[email protected]>
Newsgroups: comp.lang.java.programmer
Sent: Sunday, November 02, 2003 11:28 PM
Subject: Re: Big5--->GB converter


Gordon Beaton said:
There seem to be several "GB" encodings like GBK, GB18030, x-EUC-CN
and ISO2022_CN_GB. You can see for yourself here:
http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html

Create an InputStreamReader and OutputStreamWriter, specifying
appropriate encodings for each, then simply copy your data from one to
the other:

// data source
InputStream is = ...
InputStreamReader isr = new InputStreamReader(is,"Big5");
BufferedReader br = new BufferedReader(isr);

// destination
OutputStream os = ...
OutputStreamWriter osw = new OutputStreamWriter(os,"GBK");
BufferedWriter bw = new BufferedWriter(osw);

String line;

while ((line = br.readLine()) != null) {
bw.write(line);
bw.newLine();
}

br.close();
bw.close();

/gordon

However, I find the character returned is incorrect
For example, 歡(6B61) returns (欢)9919.
But I have found that 欢 is 6B22.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top