check if string is utf-8

A

alan_sec

Hi.
Is there any method or class in java for checking if string is utf-8
encoded?
I have to check if string is utf-8 encoded, and if not I have to
replace non-utf-8 characters with "?".
Any suggestion would be nice.
Thanks.
Alan
 
G

Guest

alan_sec said:
Is there any method or class in java for checking if string is utf-8
encoded?
I have to check if string is utf-8 encoded, and if not I have to
replace non-utf-8 characters with "?".

Do you have a byte[] you need to check it is valid UTF-8 ?

Or do you have a String (which is always UTF-16 internally !) and
want to check if you have converted UTF-8 bytes as if they were
ISO-8859-1 bytes ?

Or ?

Arne
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

alan_sec said:
I have a byte []I need to check if it is valid UTF-8.

There are some useful stuff in java.nio !

I don't know your context, but here are a code
snippet to illustrate one of the possibilities:

public static boolean test(byte[] b, String csnam) {
CharsetDecoder cd =
Charset.availableCharsets().get(csnam).newDecoder();
try {
cd.decode(ByteBuffer.wrap(b));
} catch (CharacterCodingException e) {
return false;
}
return true;
}

It is probably not the best for your application, because
you could do the conversion and specify replacement etc..

Arne
 
C

Chris Uppal

alan_sec said:
Is there any method or class in java for checking if string is utf-8
encoded?

Strings (instances of java.lang.String) are /never/ UTF8 encoded. The only
kind of data that can possibly be UTF8 is binary data (such as a byte[] array).

(OK, you /can/ use instances of java.lang.String to hold binary data, but its a
/really bad/ idea.)

I have to check if string is utf-8 encoded, and if not I have to
replace non-utf-8 characters with "?".

That doesn't really make any sense, perhaps you could say more about what you
are trying to do, where your data comes from (how you are reading it), and how
you represent it in your program.

-- chris
 
T

Tor Iver Wilhelmsen

alan_sec said:
Is there any method or class in java for checking if string is utf-8
encoded?
I have to check if string is utf-8 encoded, and if not I have to
replace non-utf-8 characters with "?".

The String constructor will do that for you - or use the class that
String() refers to in the javadocs.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top