Is there any way to discover what charset encoding a file is using?

J

James

Hi all,

Is there any way, to discover what charset eoncoding a file is
actually by reading the content of it.

For example, I may have a file which contains some Japanese Character,
how could I determine if those character are actually Japanese ones.

Thank You.

James
 
M

Michael Borgwardt

James said:
Is there any way, to discover what charset eoncoding a file is
actually by reading the content of it.

Not with anything remotely approaching certainty.
For example, I may have a file which contains some Japanese Character,
how could I determine if those character are actually Japanese ones.

Your best bet would be to take some common japanese words, encode them
in each of the three(!) charsets commonly used in Japan plus UTF-8
and UTF-16 and look for matches.

If you just have a file that might be any language in any encoding,
you're pretty much f*cked. In the worst case, it might be a *mix*
of languages encoded in ISO-2022 (which, if I understood it correctly,
is stateful and uses special command sequences to switch between modes
in which different languages can be encoded).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top