Open Source English Language Parser

Discussion in 'Java' started by Luc The Perverse, Oct 29, 2005.

  1. I was wondering if there existed

    A quality english language parser, which would be capible, with simple to
    moderate modification, of converting all words to their base form (plurals
    become singular - verbs are unconjugated, adjective endings drop off etc.)

    A simple reverse dictionary will not work, because before a verb can be
    unconjugated it needs to be identified as a verb. Otherwise the noun
    "being" could become be!

    --
    LTP

    :)
     
    Luc The Perverse, Oct 29, 2005
    #1
    1. Advertising

  2. Luc The Perverse

    Roedy Green Guest

    On Sat, 29 Oct 2005 15:09:53 -0600, "Luc The Perverse"
    <> wrote, quoted or indirectly
    quoted someone who said :

    >A quality english language parser, which would be capible, with simple to
    >moderate modification, of converting all words to their base form (plurals
    >become singular - verbs are unconjugated, adjective endings drop off etc.)


    There was one featured on Star Trek NG episode 47.

    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Oct 30, 2005
    #2
    1. Advertising

  3. Luc The Perverse wrote:

    > A quality english language parser, which would be capible, with simple to
    > moderate modification, of converting all words to their base form (plurals
    > become singular - verbs are unconjugated, adjective endings drop off etc.)
    >


    You ask for a language parser (going for the syntax of sentences), but
    what you describe is aiming at the moprhology of words. Now it depends
    on what you want. Do you really want the linguistic base form, as you
    find it in human readable dictionaries? Then a lemmatizer can give you
    the base form of a word, as well as telling you which form of the word
    you have at hand. There should definetly be something in Java, as it is
    a very common problem, implementations in different programming
    languages are available for decades. Of course the lemmatizer might not
    give you a unique information about a word, as for instance "run" could
    be a verb as well as noun. A so called tagger could help you then,
    reducing this ambiguity, based on the context of the word. It is also a
    standard problem and a lot of implementations are around. A tagger might
    still not give you a unique answer, but for most practical applications,
    this might not be a problem.

    If you just want to do some search, dictionary lookup etc., a stemmer
    might be, what you are looking for. It reduces words not to the
    linguistic base form, but to a stem, which is not necessarly actually an
    english word. This would be the simplest solution, used for instance in
    search engines. This has also been implemented by genererations of
    computer linguists, so you might search for "stemmer Java" in Google and
    find something you could use.

    Greetings,
    Ralf
     
    Ralf Callenberg, Oct 30, 2005
    #3
  4. "Roedy Green" <> wrote in
    message news:...
    > On Sat, 29 Oct 2005 15:09:53 -0600, "Luc The Perverse"
    > <> wrote, quoted or indirectly
    > quoted someone who said :
    >
    >>A quality english language parser, which would be capible, with simple to
    >>moderate modification, of converting all words to their base form (plurals
    >>become singular - verbs are unconjugated, adjective endings drop off etc.)

    >
    > There was one featured on Star Trek NG episode 47.



    Next time the crew of the Enterprise is forced back to our time period, I
    will see if I can go back with them to procure it. Until then I am forced
    to use this inferior technology :(

    --
    LTP

    "Just like humans, a Java thread cannot paint in its sleep." - Roedy Green
     
    Luc The Perverse, Oct 31, 2005
    #4
  5. Luc The Perverse

    cjr9968

    Joined:
    Jun 28, 2011
    Messages:
    1
    Don't listen to Luc. He's just being an ass :yell:. However, this is a very academic issue. I haven't heard of any open source code to do language parsing but my best guess for writing one would be using a lexical analyzer in combination with a dictionary of word roots, endings and prefixes to accomplish your task.
    You'd need a very solid understanding of your target language, pretty good programming/engineering skills, and some experience with Latin wouldn't hurt.

    It's really not as sci-fi as most people think it is.

    ...you could just go to grad school though...
     
    Last edited: Jun 28, 2011
    cjr9968, Jun 28, 2011
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tian Min Huang
    Replies:
    0
    Views:
    386
    Tian Min Huang
    Jul 23, 2003
  2. =?Utf-8?B?UmFlZCBTYXdhbGhh?=

    English/English DLL

    =?Utf-8?B?UmFlZCBTYXdhbGhh?=, Oct 15, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    1,707
    =?Utf-8?B?UmFlZCBTYXdhbGhh?=
    Oct 16, 2005
  3. IchBin
    Replies:
    1
    Views:
    808
  4. dgoldsmith_89
    Replies:
    10
    Views:
    9,139
    dgoldsmith_89
    Jan 8, 2008
  5. pat eyler
    Replies:
    1
    Views:
    496
    Masayoshi Takahashi
    Mar 5, 2005
Loading...

Share This Page