Converting Unicode to ASCII

J

Joona I Palaste

Bill George said:
Any Java code examples for converting Unicode to ASCII?

As Unicode contains thousands of characters that aren't representable in
ASCII, I don't think converting from Unicode to ASCII is possible at
all.

--
/-- Joona Palaste ([email protected]) ---------------------------\
| Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
| http://www.helsinki.fi/~palaste W++ B OP+ |
\----------------------------------------- Finland rules! ------------/
"The question of copying music from the Internet is like a two-barreled sword."
- Finnish rap artist Ezkimo
 
M

Michael Borgwardt

Bill said:
Any Java code examples for converting Unicode to ASCII?

String unicode = "Unicode: \u30e6\u30eb\u30b3\u30fc\u30c9";
byte[] bytes = String.getBytes("US_ASCII");
 
M

Michael Borgwardt

Michael said:
Bill said:
Any Java code examples for converting Unicode to ASCII?


String unicode = "Unicode: \u30e6\u30eb\u30b3\u30fc\u30c9";
byte[] bytes = String.getBytes("US_ASCII");

Oops, that should be:

byte[] bytes = unicode.getBytes("US-ASCII");
 
S

Steve Horsley

Any Java code examples for converting Unicode to ASCII?

Many thanks.

That's a very vague question. What is your input?

If it's a java String, then yes, it is in unocode, and
byte[] bytes = myString.getBytes("ASCII");
will return the ASCII. All the unicode values that aren't
represented in the ASCII characterset will be converted to '?'.

If it's a file on disk, then it isn't unicode, it's unicode that'e been
encoded into bytes somehow and you need to know how before you can recover
and convert the unicode.

Steve.
 
R

RC

Bill said:
Any Java code examples for converting Unicode to ASCII?

There are only 255 = 2 to power 8 (from 00 to ff) values in ASCII.
Unicode are from 0000 to ffff (2 to power 16 = ??).
Everything in Unicode from 0000 to 00ff are exactly the same as
ASCII. You can not convert unicode to ASCII which value greater than
00ff, right?!

But you can convert other code (like Chinese, Japanese, Korean) into
unicode.
 
J

John O'Conner

RC said:
There are only 255 = 2 to power 8 (from 00 to ff) values in ASCII.
Unicode are from 0000 to ffff (2 to power 16 = ??).
Everything in Unicode from 0000 to 00ff are exactly the same as
ASCII. You can not convert unicode to ASCII which value greater than
00ff, right?!


Almost right...everything from 0x0000 through 0x007F are exactly the
same as ASCII. Everything from 0x0000 through 0x00FF is the same as ISO
Latin 1.

The safest way to create ASCII text is to use the built-in converters, ie
byte[] ascii = someString.getBytes("ASCII");
 
J

Joona I Palaste

Almost right...everything from 0x0000 through 0x007F are exactly the
same as ASCII. Everything from 0x0000 through 0x00FF is the same as ISO
Latin 1.
The safest way to create ASCII text is to use the built-in converters, ie
byte[] ascii = someString.getBytes("ASCII");

Isn't ISO Latin 1 a superset of ASCII? In that case you could simply say
that everything from 0x0000 through 0x00FF is the same as ISO Latin 1.

--
/-- Joona Palaste ([email protected]) ---------------------------\
| Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
| http://www.helsinki.fi/~palaste W++ B OP+ |
\----------------------------------------- Finland rules! ------------/
"The truth is out there, man! Way out there!"
- Professor Ashfield
 
J

John O'Conner

Joona said:
Isn't ISO Latin 1 a superset of ASCII? In that case you could simply say
that everything from 0x0000 through 0x00FF is the same as ISO Latin 1.


Yes you are correct. However, the original post asked about ASCII. Only
telling them that 0x0000 through 0x00FF is the same as ISO Latin 1 would
not have helped them.

ASCII (0x000 through 0x007F) is a subset of Latin 1 (0x0000 through
0x00FF). Both are contained within Unicode using the same codepoints.
 
R

Roedy Green

String unicode = "Unicode: \u30e6\u30eb\u30b3\u30fc\u30c9";
byte[] bytes = String.getBytes("US_ASCII");

does anything but Java source code understand \uxxxx sequences?
 
J

Jon Skeet

Roedy Green said:
String unicode = "Unicode: \u30e6\u30eb\u30b3\u30fc\u30c9";
byte[] bytes = String.getBytes("US_ASCII");

does anything but Java source code understand \uxxxx sequences?

Property files do - and it's also used in C#.
 
K

KC Wong

String unicode = "Unicode: \u30e6\u30eb\u30b3\u30fc\u30c9";
byte[] bytes = String.getBytes("US_ASCII");

does anything but Java source code understand \uxxxx sequences?

Property files do - and it's also used in C#.

Be careful with the escape sequences though - you could break your code if
the unicode escapes is converted to new lines, quotes and etc.


KC.
 
M

Michael Borgwardt

Roedy Green said:
String unicode = "Unicode: \u30e6\u30eb\u30b3\u30fc\u30c9";
byte[] bytes = String.getBytes("US_ASCII");

does anything but Java source code understand \uxxxx sequences?

C# uses the same syntax for escape sequences. I don't know
if there's anything else.
 
C

Chris Smith

KC said:
String unicode = "Unicode: \u30e6\u30eb\u30b3\u30fc\u30c9";
byte[] bytes = String.getBytes("US_ASCII");

does anything but Java source code understand \uxxxx sequences?

Property files do - and it's also used in C#.

Be careful with the escape sequences though - you could break your code if
the unicode escapes is converted to new lines, quotes and etc.

Right. Unicode escapes are used to avoid typing characters that aren't
on your keyboard, NOT to escape characters with meaning to the language.
They are a whole different ball of wax from the traditional C-style
escape sequences for strings.

Of course, I could also argue that anyone who tries using the Unicode
value of a quote or newline in preference to \n or \" deserves whatever
they get. :)

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top