identify the language of a web page

Discussion in 'XML' started by usgog@yahoo.com, Apr 11, 2008.

  1. Guest

    Suppose I need to classify 10000 web pages based on their languages.
    What should I look for to determine the language of each web page? Any
    advice is welcome.
    , Apr 11, 2008
    #1
    1. Advertising

  2. On Thu, 10 Apr 2008, wrote:

    > Suppose I need to classify 10000 web pages based on their languages.
    > What should I look for to determine the language of each web page?


    The "lang" attribute in HTML; the "xml:lang" attribute in XHTML.
    Andreas Prilop, Apr 11, 2008
    #2
    1. Advertising

  3. In article <>,
    <> wrote:

    >Suppose I need to classify 10000 web pages based on their languages.
    >What should I look for to determine the language of each web page? Any
    >advice is welcome.


    Assuming you want to do this by inspection of the text (rather than
    looking for xml:lang and the like), Google for language
    identification. The first page lists several tools and a research
    bibliography on the subject.

    -- Richard
    --
    :wq
    Richard Tobin, Apr 11, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. KK
    Replies:
    2
    Views:
    382
    Hermit Dave
    Jan 25, 2004
  2. javadev
    Replies:
    2
    Views:
    392
    Adam Maass
    Apr 14, 2006
  3. sqlcamel

    How to identify double bytes language?

    sqlcamel, Nov 13, 2009, in forum: Perl Misc
    Replies:
    8
    Views:
    147
    Peter J. Holzer
    Nov 14, 2009
  4. Andrew K
    Replies:
    1
    Views:
    123
    kaeli
    Feb 23, 2005
  5. Replies:
    6
    Views:
    98
    Joost Diepenmaat
    Apr 13, 2008
Loading...

Share This Page