Charset auto detector

A

a.l

Hi folks,

Do you know if there is a way to automaticly detect the charset from a
bytes array ? In fact, I would like to decode a byte array, with the
good charset interpretor, given that I do not know which charset was
used to encode it.

The CharsetDecoder class seems to have a "isAutoDetecting" boolean
method : this means that there should exists a 'generic' charset
decoder implementation which could auto detect the charset. Am I right
?


Any suggestion would be appreciated,

Thanks folks !


Antoine Larcher
 
A

Alan Moore

Hi folks,

Do you know if there is a way to automaticly detect the charset from a
bytes array ? In fact, I would like to decode a byte array, with the
good charset interpretor, given that I do not know which charset was
used to encode it.

The CharsetDecoder class seems to have a "isAutoDetecting" boolean
method : this means that there should exists a 'generic' charset
decoder implementation which could auto detect the charset. Am I right
?

Unfortunately, that auto-detect feature is very limited. If you know
you're reading Chinese text, but don't know which of the several
Chinese encodings it was written in, you can use an auto-detecting
"wrapper" Charset that figures it out for you. I think there's one
for Japanese text as well, but there's no built-in universal
auto-detecting Charset.

I use this tool:

http://glaforge.free.fr/wiki/index.php?wiki=GuessEncoding

It only works with a limited set of Unicode and Western encodings, but
it's perfect for my needs. If you need something with broader
applicability, look for the CharDet package from Mozilla.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top