Open Source English Language Parser

Luc The Perverse · Oct 29, 2005

I was wondering if there existed

A quality english language parser, which would be capible, with simple to
moderate modification, of converting all words to their base form (plurals
become singular - verbs are unconjugated, adjective endings drop off etc.)

A simple reverse dictionary will not work, because before a verb can be
unconjugated it needs to be identified as a verb. Otherwise the noun
"being" could become be!

Roedy Green · Oct 30, 2005

A quality english language parser, which would be capible, with simple to
moderate modification, of converting all words to their base form (plurals
become singular - verbs are unconjugated, adjective endings drop off etc.)

There was one featured on Star Trek NG episode 47.

Ralf Callenberg · Oct 30, 2005

Luc said:
A quality english language parser, which would be capible, with simple to
moderate modification, of converting all words to their base form (plurals
become singular - verbs are unconjugated, adjective endings drop off etc.)

You ask for a language parser (going for the syntax of sentences), but
what you describe is aiming at the moprhology of words. Now it depends
on what you want. Do you really want the linguistic base form, as you
find it in human readable dictionaries? Then a lemmatizer can give you
the base form of a word, as well as telling you which form of the word
you have at hand. There should definetly be something in Java, as it is
a very common problem, implementations in different programming
languages are available for decades. Of course the lemmatizer might not
give you a unique information about a word, as for instance "run" could
be a verb as well as noun. A so called tagger could help you then,
reducing this ambiguity, based on the context of the word. It is also a
standard problem and a lot of implementations are around. A tagger might
still not give you a unique answer, but for most practical applications,
this might not be a problem.

If you just want to do some search, dictionary lookup etc., a stemmer
might be, what you are looking for. It reduces words not to the
linguistic base form, but to a stem, which is not necessarly actually an
english word. This would be the simplest solution, used for instance in
search engines. This has also been implemented by genererations of
computer linguists, so you might search for "stemmer Java" in Google and
find something you could use.

Greetings,
Ralf

Luc The Perverse · Oct 31, 2005

Roedy Green said:
There was one featured on Star Trek NG episode 47.

Next time the crew of the Enterprise is forced back to our time period, I
will see if I can go back with them to procure it. Until then I am forced
to use this inferior technology

cjr9968 · Jun 28, 2011

Luc The Perverse said:
"Roedy Green" <[email protected]> wrote in
message news:[email protected]...
> On Sat, 29 Oct 2005 15:09:53 -0600, "Luc The Perverse"
> <[email protected]> wrote, quoted or indirectly
> quoted someone who said :
>
>>A quality english language parser, which would be capible, with simple to
>>moderate modification, of converting all words to their base form (plurals
>>become singular - verbs are unconjugated, adjective endings drop off etc.)
>
> There was one featured on Star Trek NG episode 47.

Next time the crew of the Enterprise is forced back to our time period, I
will see if I can go back with them to procure it. Until then I am forced
to use this inferior technology

--
LTP

"Just like humans, a Java thread cannot paint in its sleep." - Roedy Green

Don't listen to Luc. He's just being an ass :yell:. However, this is a very academic issue. I haven't heard of any open source code to do language parsing but my best guess for writing one would be using a lexical analyzer in combination with a dictionary of word roots, endings and prefixes to accomplish your task.
You'd need a very solid understanding of your target language, pretty good programming/engineering skills, and some experience with Latin wouldn't hurt.

It's really not as sci-fi as most people think it is.

...you could just go to grad school though...

The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
NewsMaestro Usenet Supertool	0	Aug 29, 2007
Is XML Doc wrong or is Schema wrong? (or both)	7	Dec 18, 2004
NewsMaestro Usenet Supertool 3.8.1 is released	0	Sep 20, 2007
[SUMMARY] Crossword Solver (#132)	0	Aug 2, 2007
In the Matter of Herb Schildt: a Detailed Analysis of "C: TheComplete Nonsense"	109	Apr 3, 2010
NewsMaestro Usenet Supertool	7	Aug 23, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008

Open Source English Language Parser

Luc The Perverse

Roedy Green

Ralf Callenberg

Luc The Perverse

cjr9968

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads