Anno Siegel (
[email protected]) wrote:
: >
: > > Does exist any module/script that can 100% detect text language..
: > > for example English, German, French, ... (European languages, at least
: > > English...)
: >
: > 100%? No. What language is this string: "hotel"?
: Well, one-word-samples are hard, and 100% is unattainable.
: Entirely off topic, I have recently heard of an approach to text
: classification (with an eye to language recognition) that I found
: interesting.
: Use a Ziv-Lempel-like method to compress your sample. Then concatenate
: it with texts of similar lengths taken from known languages and compress
: again. If the compression rate is similar or better than that of the
: original text, the appended text is similar to the original one. If
: the compression deteriorates, the texts are dissimilar.
: The source (some idle chat on IRC, sorry) said that this works for
: rather small samples of fewer than a hundred words. I have always been
: meaning to play with it, but haven't got around.
: Anno
Sounds reasonable, basically it would be testing for similarity of letter
sequences.