Identify the language of a String literal

J

javadev

Hello all

I was wondering if the following is possible:

Having read some text from a file in the form of a String, is it
possible to identify the language in which the text is? Maybe some sort
of a getLocale() method for a String....

Rgds
SS
 
G

Gordon Beaton

Having read some text from a file in the form of a String, is it
possible to identify the language in which the text is? Maybe some
sort of a getLocale() method for a String....

The String does not encode that information anywhere.

The problem isn't trivial and there are no foolproof solutions, but
common mechanisms for identifying the language of a text are to look
for known "stop words" (lists are available online), or to calculate
letter frequencies, and in both cases compare the results with what is
known about various languages.

/gordon
 
A

Adam Maass

javadev said:
Hello all

I was wondering if the following is possible:

Having read some text from a file in the form of a String, is it
possible to identify the language in which the text is? Maybe some sort
of a getLocale() method for a String....

The short answer: can't be done with absolute reliability. If you have a
large(ish) sample, you might be able to make some pretty good guesses based
on character frequencies or elsehow.


-- Adam Maass
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top