Identify the language of a String literal

Discussion in 'Java' started by javadev, Apr 13, 2006.

  1. javadev

    javadev Guest

    Hello all

    I was wondering if the following is possible:

    Having read some text from a file in the form of a String, is it
    possible to identify the language in which the text is? Maybe some sort
    of a getLocale() method for a String....

    Rgds
    SS
    javadev, Apr 13, 2006
    #1
    1. Advertising

  2. On 13 Apr 2006 03:31:42 -0700, javadev wrote:
    > Having read some text from a file in the form of a String, is it
    > possible to identify the language in which the text is? Maybe some
    > sort of a getLocale() method for a String....


    The String does not encode that information anywhere.

    The problem isn't trivial and there are no foolproof solutions, but
    common mechanisms for identifying the language of a text are to look
    for known "stop words" (lists are available online), or to calculate
    letter frequencies, and in both cases compare the results with what is
    known about various languages.

    /gordon

    --
    [ do not email me copies of your followups ]
    g o r d o n + n e w s @ b a l d e r 1 3 . s e
    Gordon Beaton, Apr 13, 2006
    #2
    1. Advertising

  3. javadev

    Adam Maass Guest

    "javadev" <> wrote:
    > Hello all
    >
    > I was wondering if the following is possible:
    >
    > Having read some text from a file in the form of a String, is it
    > possible to identify the language in which the text is? Maybe some sort
    > of a getLocale() method for a String....
    >


    The short answer: can't be done with absolute reliability. If you have a
    large(ish) sample, you might be able to make some pretty good guesses based
    on character frequencies or elsehow.


    -- Adam Maass
    Adam Maass, Apr 14, 2006
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    12
    Views:
    417
    Kenny McCormack
    Jan 9, 2006
  2. Replies:
    2
    Views:
    465
    Richard Tobin
    Apr 11, 2008
  3. Anonieko Ramos

    What's wrong with rpc-literal? Why use doc-literal?

    Anonieko Ramos, Sep 27, 2004, in forum: ASP .Net Web Services
    Replies:
    0
    Views:
    364
    Anonieko Ramos
    Sep 27, 2004
  4. sqlcamel

    How to identify double bytes language?

    sqlcamel, Nov 13, 2009, in forum: Perl Misc
    Replies:
    8
    Views:
    147
    Peter J. Holzer
    Nov 14, 2009
  5. Replies:
    6
    Views:
    99
    Joost Diepenmaat
    Apr 13, 2008
Loading...

Share This Page