-------- Original-Nachricht --------
Datum: Sun, 25 May 2008 00:27:48 +0900
Von: "Robert Dober" <
[email protected]>
An: (e-mail address removed)
Betreff: Re: #plural? or #singular?
You will always have problems with collective nouns (brood, flock,
pride, etc), especially if you train yourself on languages that aren't
spoken.
I think many people balk at your question because you didn't specify
the terms of the problem. What language? What vernacular? What
venue?
cheerio (plural),
Todd
Dear Todd,
well, I didn't start the thread ... so I don't have to specify the problem.
The OP wanted to decide whether a given noun is singular or plural.
As I see it, in English, nouns can be grouped into four groups:
1) Those that form a plural by adding an 's' : eg., house -> houses
2) Those that don't belong to the first group and have different forms
for singular and plural : eg., man -> men, mouse->mice
3) Those that don't belong to the first two groups, because singular and
plural forms both exist and coincide (eg. moose->moose)
4) Those that don't belong to the previous groups, as they don't have two
forms, because they describe some collective (eg. police (at least in British English)) or something uncountable (eg. pride).
The first two groups and the last can be dealt with by a program
that generates a plural from a singular (ie., the linguistics gem).
Especially due to the group 3 nouns, a program that 'pluralizes'
a given noun doesn't answer the OP's question, because it cannot decide
(from the missing information of the circumstances) whether a given noun is singular or plural.
Dave and Robert gave several examples for this.
My point is that there exists a type of software - parts-of-speech taggers - that can resolve these questions from circumstance information - not always correctly, as it's a computer program relying on probabilities, but remarkably well.
I didn't understand your point about languages that aren't spoken ...
if you had a Latin text, say, (there's a large collection available
on project Gutenberg), and you manually tagged a part of it, to let
a Bayesian classification program learn probabilities, it would be able to identify the parts-of-speech of another Latin text, e.g., identify plural nouns in it in Latin (that's certainly much easier than in English, as there's hardly anything in the group 3 for Latin - I'd bet you'd find a nice little list of words printed in fat in every grammar (oh, please remember - hand is 'manus' and 'hands' is also 'manus').
What language? What vernacular? What
venue?
I assume that the OP is talking about some standard written form
of a language, like standard English, French, German, etc ..
Now, you get ready-made taggers on the net for some
of these languages, so your computer can say, this Italian word is a plural
noun, even if you don't know any Italian.
If you wanted to identify plural nouns from singular ones in Turkish, you could still use eg. treetagger for that, but you have to get a Turkish text tagged manually first to teach the program the probabilities that a given
word form is a plural or a singular - it pays to have a native-language Turk to do that.
For those language that there are ready-made solutions offered, somebody
has already taken a large amount of typical texts (novels, newspaper
articles, poems etc.), tagged them manually and provided parameter
files for download, so no training from the user's part is necessary anymore.
Best regards,
Axel