creating a java UTF-8 string


S

static

Hi,

I have a byte array of data in utf-8 format and I would like to
convert it to a string in UTF-8 format. I've tried doing

b_array_utf8 = byte array in UTF-8 format

// This converts it to Unicode so now str is a unicode str
String str = new String(b_array_utf8,"UTF-8");

How can I convert str to a UTF-8 String? If I do the
getBytes(str,"UTF-8") then it will encode it back but I really need a
String in UTF-8 format?

Thanks in advance :)

a
 
Ad

Advertisements

I

Ian Pilcher

static said:
Hi,

I have a byte array of data in utf-8 format and I would like to
convert it to a string in UTF-8 format. I've tried doing

b_array_utf8 = byte array in UTF-8 format

// This converts it to Unicode so now str is a unicode str
String str = new String(b_array_utf8,"UTF-8");

How can I convert str to a UTF-8 String? If I do the
getBytes(str,"UTF-8") then it will encode it back but I really need a
String in UTF-8 format?

What do you mean by "a String in UTF-8 format"? Java Strings are
composed of 16-bit chars, so they are UTF-16 (although Unicode
surrogates aren't handled properly until 1.5). UTF-8 is an appropriate
encoding for an array of bytes, which you already have.
 
M

Michael Borgwardt

static said:
How can I convert str to a UTF-8 String?

There is no such thing as an "UTF-8 String". A String is composed of characters,
whereas UTF-8 is a method of converting between chracters and bytes.
 
A

Anton Spaans

static said:
Hi,

I have a byte array of data in utf-8 format and I would like to
convert it to a string in UTF-8 format. I've tried doing

b_array_utf8 = byte array in UTF-8 format

// This converts it to Unicode so now str is a unicode str
String str = new String(b_array_utf8,"UTF-8");

How can I convert str to a UTF-8 String? If I do the
getBytes(str,"UTF-8") then it will encode it back but I really need a
String in UTF-8 format?

Thanks in advance :)

a

Like the other posters: There is not such thing as a UTF-8 string. A String
is a string of characters, each one of which can be returned by the
charAt(int pos) method. Maybe you want to iterate over the characters of the
String, and see each character in its UTF-8 encoding?

So

for (int i = 0; i < str.length(); i++)
{
// kar is UTF-16
char kar = str.charAt(i);
}

won't work because kar will be UTF-16.

You try to do something like this?

byte[] utf8 = str.getBytes("UTF-8");
for (int i = 0; i < str.length(); i++)
{
// kar is UTF-8
int kar = getUTF8CharAt(utf8, i) ;
}

Is this what you're looking for, the implemenation of the getUFT8CharAt(...)
method?

-- Anton.
 
S

static

Michael Borgwardt said:
There is no such thing as an "UTF-8 String". A String is composed of characters,
whereas UTF-8 is a method of converting between chracters and bytes.

Sorry guys...I guess I didn't make myself more clear. I have a byte
array in UTF-8 format and would like to transfer all of that data to a
string without any character loss. So the String would contain all of
the UTF-8 data. Any ideas? thanks.
 
A

Anton Spaans

static said:
Michael Borgwardt <[email protected]> wrote in message

Sorry guys...I guess I didn't make myself more clear. I have a byte
array in UTF-8 format and would like to transfer all of that data to a
string without any character loss. So the String would contain all of
the UTF-8 data. Any ideas? thanks.

So, you want to store a byte-array (in this case it contains characters in
UTF-8 format) into a String in such a way that the byte-array does not get
changed/encoded? You want to circumvent the UTF-8 to UTF-16 encoding? I
don't think that is possible.

A String contains an array of 'char', not an array of 'byte'. And a char is
a UTF-16 character.... A 'byte' is not a 'char', so conversion is necessary.
Any String-constructor taking a byte-array will do some kind of conversion
on the input byte-array (to properly convert it into a char-array).

Question: Why would you want to store a byte-array in a String if you are
just interested in the contents of the byte-array?
 
Ad

Advertisements

A

A. Bolmarcich

Sorry guys...I guess I didn't make myself more clear. I have a byte
array in UTF-8 format and would like to transfer all of that data to a
string without any character loss. So the String would contain all of
the UTF-8 data. Any ideas? thanks.

Is the value of

new String(b, "ISO8859_1")

where b is the byte array what you mean by a "UTF-8 String"?
 
Ad

Advertisements

M

Michael Borgwardt

static said:
Sorry guys...I guess I didn't make myself more clear. I have a byte
array in UTF-8 format and would like to transfer all of that data to a
string without any character loss. So the String would contain all of
the UTF-8 data. Any ideas?

No problem whatsoever then. Just use UTF-8 encoding for your Readers
and Writers (or whatever other ways you use to convert between bytes
and characters) and nothing can go wrong.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top