#plural? or #singular?

M

Mark Dodwell

Does anybody know an easy way to test if a word is singular or plural --
something a bit smarter than just checking if there is an s on the end!

Thanks,

~ Mark
 
M

Michael W. Ryder

Mark said:
Does anybody know an easy way to test if a word is singular or plural --
something a bit smarter than just checking if there is an s on the end!

Thanks,

~ Mark

You might be able to use some form of dictionary lookup and that will
help with words like mice, but it still will not help with words like
moose where the singular and plural are the same.
 
A

Axel Etzold

-------- Original-Nachricht --------
Datum: Sat, 24 May 2008 07:45:01 +0900
Von: "Michael W. Ryder" <[email protected]>
An: (e-mail address removed)
Betreff: Re: #plural? or #singular?
You might be able to use some form of dictionary lookup and that will
help with words like mice, but it still will not help with words like
moose where the singular and plural are the same.

Dear Mark,

for the simpler task, where there are different forms for singular and
plural (eg., mouse-mice, house-houses), you could use this:

http://api.rubyonrails.org/classes/Inflector.html

For the more difficult cases, where singular and plural forms coincide
(and for the easier cases as well), a part-of-speech tagger can be helpful.

I don't know of any written in Ruby, but I can recommend the tree-tagger,
which you can script from Ruby to suit your needs.
It is available for several languages, so you can find irregular plurals
of words in different languages ....

It is here :

http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/


Best regards,

Axel
 
T

Trans

Does anybody know an easy way to test if a word is singular or plural --
something a bit smarter than just checking if there is an s on the end!

English gem may help. If you devise #plural? and #singular? I'd be
happy to add them to the API.

T.
 
D

Dave Bass

There are lots of difficulties here.

Is "sheep" singular or plural?
Is "fish" singular or plural?
Is "the government" singular or plural?
Is "England" singular or plural? (England is a country / England are
bound to lose the match)
Is "English" singular or plural? (English is a language / The English
are eccentric)

So even an exhaustive list of words is not going to give you the right
answer all the time. You need to take the word in context, i.e. you need
to parse the sentence grammatically. Here There Be Dragons.
 
R

Robert Dober

Even worse sometimes it is undefined I guess, or caption may play a role.

I can see data.

Maybe some native speakers will tell me that this is not a correct
sentence, I do not know, but than there is

I can see Data.

Languages (plural) are just a big mess (singular) ;)

Robert
 
A

Axel Etzold

-------- Original-Nachricht --------
Datum: Sun, 25 May 2008 00:27:48 +0900
Von: "Robert Dober" <[email protected]>
An: (e-mail address removed)
Betreff: Re: #plural? or #singular?
Even worse sometimes it is undefined I guess, or caption may play a role.

I can see data.

Maybe some native speakers will tell me that this is not a correct
sentence, I do not know, but than there is

I can see Data.

Languages (plural) are just a big mess (singular) ;)

Robert

Dear Robert and Dave,

well, this is what tree-tagger (see tags output below, for the tagset
see my previous post) says:

I can see data. (noun plural)
I can see Data. (proper noun singular)
England is a country. (proper noun singular)
England are bound to lose the match. (proper noun singular) (nobody is perfect).
English is a language. (proper noun singular)
The English are eccentric. (noun plural)
Languages (noun plural) are just a big mess (noun singular).

Parts-of-speech tagging uses a Bayesian decision model, requiring
training on a set of human-tagged text.
There are large amounts of texts available for many languages, such
as newspaper articles.
The authors of tree-taggers claim about 96 % correct tagging somewhere
in the docs ( can't find it right now).
It's also fast - you can tag an entire novel in just a few seconds -
and it's available for several major languages, not just English.


Best regards,

Axel

-----------------------------------

I PP I
can MD can
see VV see
data NNS datum
SENT .
I PP I
can MD can
see VV see
Data NP Data
SENT .
England NP England
is VBZ be
a DT a
country NN country
SENT .
England NP England
are VBP be
bound VVN bind
to TO to
lose VV lose
the DT the
match NN match
SENT .
English NP English
is VBZ be
a DT a
language NN language
SENT .
The DT the
English NNS English
are VBP be
eccentric JJ eccentric
SENT .
Languages NNS language
are VBP be
just RB just
a DT a
big JJ big
mess NN mess
SENT .
 
R

Ray Baxter

well, this is what tree-tagger (see tags output below, for the tagset
see my previous post) says:
England are bound to lose the match. (proper noun singular) (nobody
is perfect).

The collective noun in American English is singular, while in British
English the collective noun is plural. In American English, we would
say "England is bound to lose the match," so your results are correct,
if the language under consideration is American English. (Although I'm
not sure what to make of the plural verb.)
Parts-of-speech tagging uses a Bayesian decision model, requiring
training on a set of human-tagged text.

Did you train tree-tagger on a data set of American English?

Ray
 
R

Robert Dober

I can see data. (noun plural)
I can see Data. (proper noun singular)
England is a country. (proper noun singular)
England are bound to lose the match. (proper noun singular) (nobody is perfect).
English is a language. (proper noun singular)
The English are eccentric. (noun plural)
Languages (noun plural) are just a big mess (noun singular).
Impressive, I have to admit :)
Parts-of-speech tagging uses a Bayesian decision model, requiring
training on a set of human-tagged text.
There are large amounts of texts available for many languages, such
as newspaper articles.
The authors of tree-taggers claim about 96 % correct tagging somewhere
in the docs ( can't find it right now).
It's also fast - you can tag an entire novel in just a few seconds -
and it's available for several major languages, not just English.
Even more so !!! Thanx for sharing
R.
 
T

Todd Benson

-------- Original-Nachricht --------


Dear Robert and Dave,

well, this is what tree-tagger (see tags output below, for the tagset
see my previous post) says:

I can see data. (noun plural)
I can see Data. (proper noun singular)
England is a country. (proper noun singular)
England are bound to lose the match. (proper noun singular) (nobody is perfect).
English is a language. (proper noun singular)
The English are eccentric. (noun plural)
Languages (noun plural) are just a big mess (noun singular).

You will always have problems with collective nouns (brood, flock,
pride, etc), especially if you train yourself on languages that aren't
spoken.
Parts-of-speech tagging uses a Bayesian decision model, requiring
training on a set of human-tagged text.
There are large amounts of texts available for many languages, such
as newspaper articles.
The authors of tree-taggers claim about 96 % correct tagging somewhere
in the docs ( can't find it right now).
It's also fast - you can tag an entire novel in just a few seconds -
and it's available for several major languages, not just English.

I think many people balk at your question because you didn't specify
the terms of the problem. What language? What vernacular? What
venue?

cheerio (plural),
Todd
 
A

Axel Etzold

The collective noun in American English is singular, while in British
English the collective noun is plural. In American English, we would
say "England is bound to lose the match," so your results are correct,
if the language under consideration is American English. (Although I'm
not sure what to make of the plural verb.)


Did you train tree-tagger on a data set of American English?


Dear Ray,

I didn't know about that difference in collective noun to singular or
plural mapping in American and British English.
I gather from the docs that the training of treetagger was done by the authors on the Wall Street Journal and some other American English sources.
I am myself not a native English speaker. So, being easily impressible
as a continental European from Germany, at some point in time, I was
sent to an English school in south-west England (it's called a grammar school, even though they teach many subjects and mostly to English people), where I was taught that

a) Speaking "proper English" is of paramount importance (see the musical
"My fair Lady").
b) Proper English is spoken only in England.
c) Americans don't use English at all - don't believe them if they claim they do. (See the musical "My fair Lady", song: "Why can't the English?", "... well in America, they haven't used it [English] for years").

further

a), b) and c) are true because

d) The terms "proper English" and "Queen's English" can be used interchangeably.
e) Americans have continuously failed to come up with a Queen - Jackie
Kennedy or a future female President are no acceptable substitutes.
f) Admitting anything else would harm or destroy the very profitable language industry in England.

It seems I am still somewhat under the influence of that .... :)

Best regards,

Axel
 
J

Jeremy McAnally

A really cheap way to do this with ActiveSupport would be to do
something like this:

class String
def singular?
self.singularize == self
end

def plural?
self.pluralize == self
end
end

In the console, it looks like this:
=> true

I don't know if that's the best solution, but it works. :)

--Jeremy


Does anybody know an easy way to test if a word is singular or plural --
something a bit smarter than just checking if there is an s on the end!

Thanks,

~ Mark



--
http://jeremymcanally.com/
http://entp.com

Read my books:
Ruby in Practice (http://manning.com/mcanally/)
My free Ruby e-book (http://humblelittlerubybook.com/)

Or, my blogs:
http://mrneighborly.com
http://rubyinpractice.com
 
A

Axel Etzold

-------- Original-Nachricht --------
Datum: Sun, 25 May 2008 03:14:18 +0900
Von: "Todd Benson" <[email protected]>
An: (e-mail address removed)
Betreff: Re: #plural? or #singular?
-------- Original-Nachricht --------
Datum: Sun, 25 May 2008 00:27:48 +0900
Von: "Robert Dober" <[email protected]>
An: (e-mail address removed)
Betreff: Re: #plural? or #singular?

You will always have problems with collective nouns (brood, flock,
pride, etc), especially if you train yourself on languages that aren't
spoken.




I think many people balk at your question because you didn't specify
the terms of the problem. What language? What vernacular? What
venue?

cheerio (plural),
Todd

Dear Todd,

well, I didn't start the thread ... so I don't have to specify the problem.
The OP wanted to decide whether a given noun is singular or plural.

As I see it, in English, nouns can be grouped into four groups:

1) Those that form a plural by adding an 's' : eg., house -> houses
2) Those that don't belong to the first group and have different forms
for singular and plural : eg., man -> men, mouse->mice
3) Those that don't belong to the first two groups, because singular and
plural forms both exist and coincide (eg. moose->moose)
4) Those that don't belong to the previous groups, as they don't have two
forms, because they describe some collective (eg. police (at least in British English)) or something uncountable (eg. pride).

The first two groups and the last can be dealt with by a program
that generates a plural from a singular (ie., the linguistics gem).
Especially due to the group 3 nouns, a program that 'pluralizes'
a given noun doesn't answer the OP's question, because it cannot decide
(from the missing information of the circumstances) whether a given noun is singular or plural.
Dave and Robert gave several examples for this.
My point is that there exists a type of software - parts-of-speech taggers - that can resolve these questions from circumstance information - not always correctly, as it's a computer program relying on probabilities, but remarkably well.

I didn't understand your point about languages that aren't spoken ...
if you had a Latin text, say, (there's a large collection available
on project Gutenberg), and you manually tagged a part of it, to let
a Bayesian classification program learn probabilities, it would be able to identify the parts-of-speech of another Latin text, e.g., identify plural nouns in it in Latin (that's certainly much easier than in English, as there's hardly anything in the group 3 for Latin - I'd bet you'd find a nice little list of words printed in fat in every grammar (oh, please remember - hand is 'manus' and 'hands' is also 'manus').

What language? What vernacular? What
venue?

I assume that the OP is talking about some standard written form
of a language, like standard English, French, German, etc ..

Now, you get ready-made taggers on the net for some
of these languages, so your computer can say, this Italian word is a plural
noun, even if you don't know any Italian.
If you wanted to identify plural nouns from singular ones in Turkish, you could still use eg. treetagger for that, but you have to get a Turkish text tagged manually first to teach the program the probabilities that a given
word form is a plural or a singular - it pays to have a native-language Turk to do that. :)
For those language that there are ready-made solutions offered, somebody
has already taken a large amount of typical texts (novels, newspaper
articles, poems etc.), tagged them manually and provided parameter
files for download, so no training from the user's part is necessary anymore.

Best regards,

Axel
 
S

Sean O'Halpin

The collective noun in American English is singular, while in British
English the collective noun is plural.

While this is completely off-topic, I feel impelled to correct your
assumption here: there is no such prescription in so-called 'British
English'[1]. I quite regularly hear people using both the singular and
the plural referring to the same collective noun even in the same
breath, e.g. "The government's in disarray. They're going to have a
tough time recovering from this defeat".

Regards,
Sean

[1] Here, in Britain we tend to think that what we speak is yer actual
original English, so it doesn't require qualification :)
 
M

Mark Wilden

The collective noun in American English is singular, while in British
English the collective noun is plural.

While this is completely off-topic, I feel impelled to correct your
assumption here: there is no such prescription in so-called 'British
English'[1]. I quite regularly hear people using both the singular and
the plural referring to the same collective noun even in the same
breath, e.g. "The government's in disarray. They're going to have a
tough time recovering from this defeat".

My favourite (sic) is the British use of the possessive for no reason:
"We're going to Tesco's". :)

///ark
 
D

Dave Bass

Mark said:
My favourite (sic) is the British use of the possessive for no reason:
"We're going to Tesco's". :)

I think you'll find it's short for "We're going to Tesco's [store]".

Similar in principle to "We met up at Fred's [house]" or "Homer is often
to be found in Moe's [bar]".

A fairly recent usage is the possessive pronoun for a similar purpose:
"After dinner we went back to hers" meaning "...back to her place".
 
R

Robert Dober

Mark said:
My favourite (sic) is the British use of the possessive for no reason:
"We're going to Tesco's". :)

I think you'll find it's short for "We're going to Tesco's [store]".
And if I might add, I guess this too will be used frequently:
We're going to Tesco's because *they* have some new -- dunoo what they
are selling tho ;)
Which, if applied often in the training text might become some strange
plural form, but I am really imagining here ;)
But in reality we are not talking plural or singular form anymore we
are right into the miraculous wonders of languages where social
context mixes with syntax, semantics and meaning.

Cheers
Robert
http://ruby-smalltalk.blogspot.com/
 
T

Todd Benson

Dear Todd,

well, I didn't start the thread ... so I don't have to specify the problem.
The OP wanted to decide whether a given noun is singular or plural.

I was talking to the OP, but I guess I didn't say that outright.
As I see it, in English, nouns can be grouped into four groups:

1) Those that form a plural by adding an 's' : eg., house -> houses
2) Those that don't belong to the first group and have different forms
for singular and plural : eg., man -> men, mouse->mice
3) Those that don't belong to the first two groups, because singular and
plural forms both exist and coincide (eg. moose->moose)
4) Those that don't belong to the previous groups, as they don't have two
forms, because they describe some collective (eg. police (at least in British English)) or something uncountable (eg. pride).

The first two groups and the last can be dealt with by a program
that generates a plural from a singular (ie., the linguistics gem).
Especially due to the group 3 nouns, a program that 'pluralizes'
a given noun doesn't answer the OP's question, because it cannot decide
(from the missing information of the circumstances) whether a given noun is singular or plural.
Dave and Robert gave several examples for this.
My point is that there exists a type of software - parts-of-speech taggers - that can resolve these questions from circumstance information - not always correctly, as it's a computer program relying on probabilities, but remarkably well.

I didn't understand your point about languages that aren't spoken ...
if you had a Latin text, say, (there's a large collection available
on project Gutenberg), and you manually tagged a part of it, to let
a Bayesian classification program learn probabilities, it would be able to identify the parts-of-speech of another Latin text, e.g., identify plural nouns in it in Latin (that's certainly much easier than in English, as there's hardly anything in the group 3 for Latin - I'd bet you'd find a nice little list of words printed in fat in every grammar (oh, please remember - hand is 'manus' and 'hands' is also 'manus').



I assume that the OP is talking about some standard written form
of a language, like standard English, French, German, etc ..

Hmm. Most people unconsciously change their use of communication by
location or the company they are in.
Now, you get ready-made taggers on the net for some
of these languages, so your computer can say, this Italian word is a plural
noun, even if you don't know any Italian.
If you wanted to identify plural nouns from singular ones in Turkish, you could still use eg. treetagger for that, but you have to get a Turkish text tagged manually first to teach the program the probabilities that a given
word form is a plural or a singular - it pays to have a native-language Turk to do that. :)
For those language that there are ready-made solutions offered, somebody
has already taken a large amount of typical texts (novels, newspaper
articles, poems etc.), tagged them manually and provided parameter
files for download, so no training from the user's part is necessary anymore.

Best regards,

Axel

I pretty much agree with you, but I still think the side cases pop up
more frequently than we think.

With the non-spoken languages point, I meant things like symbology,
programming languages, formal logic, and the like. "Plural" may take
on a different meaning.
 
M

Mark Wilden

Mark said:
My favourite (sic) is the British use of the possessive for no
reason:
"We're going to Tesco's". :)

I think you'll find it's short for "We're going to Tesco's [store]".
And if I might add, I guess this too will be used frequently:
We're going to Tesco's because *they* have some new -- dunoo what they
are selling tho ;)

Well, OK, but it seems to me the same as saying "We're going to
England's." I mean, when you think about it, any possessive -could-
have an implied noun.

I think the usage simply arises from the fact that many stores do have
possessive names, so it feels "natural."

///ark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top