JDBC: Western European / UTF-8

J

Jeff

Is there a way to determine if what I pull out of a database is either
Western European or UTF-8?

TIA,
Jeff
 
J

Jon Skeet

Jeff said:
Is there a way to determine if what I pull out of a database is either
Western European or UTF-8?

If you're pulling out a string, it should just be a string - which
means it's just in Unicode. Whether or not it was stored in the
database as Western European or UTF-8 should be irrelevant. Can I ask
what you're concerned about?
 
T

Thomas Weidenfeller

Jeff said:
Is there a way to determine if what I pull out of a database is either
Western European or UTF-8?

"Western European" is not a precise definition of a character set. I
assume you mean some 8 bit character set.

IMHO it is difficult, but not impossible. The first 128 characters in
UTF-8 are identical encoded as is [sp?] ASCII. So if you only have
characters in the range 0 - 127 it is impossible. However, if you have
characters with bit 8 set to 1, you could try to check if the encoding
pattern of such bytes and the following bytes matches the UTF-8 encoding.

UTF-8 encoding follows the following pattern:

0xxxxxxx
110xxxxx 10xxxxxx
1110xxxx 10xxxxxx 10xxxxxx
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

etc.

So if every byte with an 8th bit set to 1 follows the above pattern, you
have a high probability that you have UTF-8. It depends on your
application if this is good enough for you.

/Thomas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top