Determining possible encodings of a given text

N

Nordlöw

How do I efficiently determine which possible encoding(s) a given text
is in? Can I use the iconv.h api somehow?

Thanks in advance,
Nordlöw
 
J

Jens Thoms Toerring

In comp.lang.c Nordloew said:
How do I efficiently determine which possible encoding(s) a given text
is in? Can I use the iconv.h api somehow?

Sorry, but that's not a question related to the C programming
language but about some specific task and libraries (that may
be written in C, but that doesn't make it on-topic). The basic
question would remain the same if you would use C, C++, Perl
or any other programming language.

So just a few hints: figuring out which encoding is used for a
file is probably a very difficult task since it would require
that the program understands something about the content of
the file. It's probably possible to make some well-educated
guess if the file is long enough, but a method that gets it
always right is, as far as I can see, impossible. And libiconv
isn't going to be of any help since it's for converting from an
already known encoding to another, it doesn't try to guess the
source encoding (except in the most trival way, using the
locale dependent character encoding when no source encoding
has been specified).

If you're interested in a more in-depth discussion it probably
would make sense to post to comp.programming instead.

Regards, Jens
 
R

Richard Tobin

How do I efficiently determine which possible encoding(s) a given text
is in? Can I use the iconv.h api somehow?

What do you need to know?

If it doesn't contain any bytes above 127, it's probably ascii. If it
contains lots of zeros in the even or odd positions it's probably
UTF-16. If it contains bytes above 127 *and* they're consistent with
UTF-8, then it's almost certainly UTF-8. If it contains a small
proportion of bytes above 127, it's quite likely some ISO-Latin-N
encoding. I don't know much about far-eastern encoding.

You might look at http://jchardet.sourceforge.net/

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top