wordnet semantic similarity: how to refer to elements of a pair in alist? can we sort dictionary acc

T

Token Type

In order to solve the following question, http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html:
★ Use one of the predefined similarity measures to score the similarity of each of the following pairs of words. Rank the pairs in order of decreasing similarity. How close is your ranking to the order given here, an order that was established experimentally by (Miller & Charles, 1998): car-automobile, gem-jewel, journey-voyage, boy-lad, coast-shore, asylum-madhouse, magician-wizard, midday-noon, furnace-stove, food-fruit, bird-****, bird-crane, tool-implement, brother-monk, lad-brother, crane-implement, journey-car, monk-oracle, cemetery-woodland, food-rooster, coast-hill, forest-graveyard, shore-woodland, monk-slave, coast-forest, lad-wizard, chord-smile, glass-magician, rooster-voyage, noon-string.

(1) First, I put the word pairs in a list eg.
pairs = [(car, automobile), (gem, jewel), (journey, voyage) ]. According to http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html, I need to put them in the following format so as to calculate teh semantic similarity : wn..synset('right_whale.n.01').path_similarity(wn.synset('minke_whale.n.01')).

In this case, I need to use loop to iterate each element in the above pairs.. How can I refer to each element in the above pairs, i.e. pairs = [(car,automobile), (gem, jewel), (journey, voyage) ]. What's the index for 'car'and for 'automobile'? Thanks for your tips.

(2) Since I can't solve the above index issue. I try to use dictionary as follows:word1 = wn.synset(str(key) + '.n.01')
word2 = wn.synset(str(pairs[key])+'.n.01')
similarity = word1.path_similarity(word2)
print key+'-'+pairs[key],similarity


car-automobile 1.0
journey-voyage 0.25
gem-jewel 0.125

Now it seems that I can calculate the semantic similarity for each groups in the above dictionary. However, I want to sort according to the similarityvalue in the result before print the result out. Can sort dictionary elements according to their values? This is one of the requirement in this exercise. How can we make each group of words (e.g. car-automobile, jounrney-voyage, gem-jewel)
sorted according to their similarity value?
Thanks for your tips.
 
M

Mark Lawrence

In order to solve the following question, http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html:
★ Use one of the predefined similarity measures to score the similarity of each of the following pairs of words. Rank the pairs in order of decreasing similarity. How close is your ranking to the order given here, an order that was established experimentally by (Miller & Charles, 1998): car-automobile, gem-jewel, journey-voyage, boy-lad, coast-shore, asylum-madhouse, magician-wizard, midday-noon, furnace-stove, food-fruit, bird-****, bird-crane, tool-implement, brother-monk, lad-brother, crane-implement, journey-car, monk-oracle, cemetery-woodland, food-rooster, coast-hill, forest-graveyard, shore-woodland, monk-slave, coast-forest, lad-wizard, chord-smile, glass-magician, rooster-voyage, noon-string.

(1) First, I put the word pairs in a list eg.
pairs = [(car, automobile), (gem, jewel), (journey, voyage) ]. According to http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html, I need to put them in the following format so as to calculate teh semantic similarity : wn.synset('right_whale.n.01').path_similarity(wn.synset('minke_whale.n.01')).

In this case, I need to use loop to iterate each element in the above pairs. How can I refer to each element in the above pairs, i.e. pairs = [(car, automobile), (gem, jewel), (journey, voyage) ]. What's the index for 'car' and for 'automobile'? Thanks for your tips.

(2) Since I can't solve the above index issue. I try to use dictionary as follows: word1 = wn.synset(str(key) + '.n.01')
word2 = wn.synset(str(pairs[key])+'.n.01')
similarity = word1.path_similarity(word2)
print key+'-'+pairs[key],similarity


car-automobile 1.0
journey-voyage 0.25
gem-jewel 0.125

Now it seems that I can calculate the semantic similarity for each groups in the above dictionary. However, I want to sort according to the similarity value in the result before print the result out. Can sort dictionary elements according to their values? This is one of the requirement in this exercise. How can we make each group of words (e.g. car-automobile, jounrney-voyage, gem-jewel)
sorted according to their similarity value?
Thanks for your tips.

In your for loop save the data in a list rather than print it out and
sort according to this
http://wiki.python.org/moin/HowTo/Sorting#Operator_Module_Functions
 
T

Terry Reedy

In this case, I need to use loop to iterate each element in the above
pairs. How can I refer to each element in the above pairs, i.e. pairs
= [(car, automobile), (gem, jewel), (journey, voyage) ]. What's the
index for 'car' and for 'automobile'? Thanks for your tips.
pairs = [('car', 'automobile'), ('gem', 'jewel')]
pairs[0][0] 'car'
pairs[1][1] 'jewel'
for a,b in pairs: a,b

('car', 'automobile')
('gem', 'jewel')
 
Y

yujian

I want to save all the URLs in current opened windows, and then close
all the windows.
 
T

Token Type

yes, thanks all your tips. I did try sorted with itemgetter. However, the sorted results are same as follows whether I set reverse=True or reverse= False. Isn't it strange? Thanks.
list_simi=[]
from operator import itemgetter
word1 = wn.synset(str(key) + '.n.01')
word2 = wn.synset(str(pairs[key])+'.n.01')
similarity = word1.path_similarity(word2)
list_simi.append((key+'-'+pairs[key],similarity))
sorted(list_simi, key=itemgetter(1), reverse=True)


[('car-automobile', 1.0)]
[('journey-voyage', 0.25)]
[('gem-jewel', 0.125)]list_simi=[]
from operator import itemgetter
word1 = wn.synset(str(key) + '.n.01')
word2 = wn.synset(str(pairs[key])+'.n.01')
similarity = word1.path_similarity(word2)
list_simi.append((key+'-'+pairs[key],similarity))
sorted(list_simi, key=itemgetter(1), reverse=False)


[('car-automobile', 1.0)]
[('journey-voyage', 0.25)]
[('gem-jewel', 0.125)]
 
T

Token Type

yes, thanks all your tips. I did try sorted with itemgetter. However, the sorted results are same as follows whether I set reverse=True or reverse= False. Isn't it strange? Thanks.
list_simi=[]
from operator import itemgetter
word1 = wn.synset(str(key) + '.n.01')
word2 = wn.synset(str(pairs[key])+'.n.01')
similarity = word1.path_similarity(word2)
list_simi.append((key+'-'+pairs[key],similarity))
sorted(list_simi, key=itemgetter(1), reverse=True)


[('car-automobile', 1.0)]
[('journey-voyage', 0.25)]
[('gem-jewel', 0.125)]list_simi=[]
from operator import itemgetter
word1 = wn.synset(str(key) + '.n.01')
word2 = wn.synset(str(pairs[key])+'.n.01')
similarity = word1.path_similarity(word2)
list_simi.append((key+'-'+pairs[key],similarity))
sorted(list_simi, key=itemgetter(1), reverse=False)


[('car-automobile', 1.0)]
[('journey-voyage', 0.25)]
[('gem-jewel', 0.125)]
 
T

Token Type

Dear all, the problem has been solved as follows. Thanks anyway:
import nltk
from nltk.corpus import wordnet as wn
pairs = {'car':'automobile', 'gem':'jewel', 'journey':'voyage'}
list_simi=[]
for key in pairs:
word1 = wn.synset(str(key) + '.n.01')
word2 = wn.synset(str(pairs[key])+'.n.01')
similarity = word1.path_similarity(word2)
list_simi.append((key+'-'+pairs[key],similarity))

from operator import itemgetter
sorted(list_simi, key=itemgetter(1), reverse=False) [('gem-jewel', 0.125), ('journey-voyage', 0.25), ('car-automobile', 1.0)]
sorted(list_simi, key=itemgetter(1), reverse=True) [('car-automobile', 1.0), ('journey-voyage', 0.25), ('gem-jewel', 0.125)]
sorted(list_simi, key=itemgetter(1))
[('gem-jewel', 0.125), ('journey-voyage', 0.25), ('car-automobile', 1.0)]
 
T

Token Type

Dear all, the problem has been solved as follows. Thanks anyway:
import nltk
from nltk.corpus import wordnet as wn
pairs = {'car':'automobile', 'gem':'jewel', 'journey':'voyage'}
list_simi=[]
for key in pairs:
word1 = wn.synset(str(key) + '.n.01')
word2 = wn.synset(str(pairs[key])+'.n.01')
similarity = word1.path_similarity(word2)
list_simi.append((key+'-'+pairs[key],similarity))

from operator import itemgetter
sorted(list_simi, key=itemgetter(1), reverse=False) [('gem-jewel', 0.125), ('journey-voyage', 0.25), ('car-automobile', 1.0)]
sorted(list_simi, key=itemgetter(1), reverse=True) [('car-automobile', 1.0), ('journey-voyage', 0.25), ('gem-jewel', 0.125)]
sorted(list_simi, key=itemgetter(1))
[('gem-jewel', 0.125), ('journey-voyage', 0.25), ('car-automobile', 1.0)]
 
I

Ian Kelly

yes, thanks all your tips. I did try sorted with itemgetter. However, the sorted results are same as follows whether I set reverse=True or reverse= False. Isn't it strange? Thanks.

First of all, "sorted" does not sort the list in place as you seem to
be expecting.
It returns a new sorted list. Since your code does not store the
return value of the sorted call anywhere, the sorted list is discarded
and only the original list is kept. If you want to sort a list in
place, use the list.sort method instead.

Second, you're not sorting the overall list. On each iteration your
code: 1) assigns a new empty list to list_simi; 2) processes one of
the pairs; 3) adds the pair to the empty list; and 4) sorts the list.
On the next iteration you then start all over again with a new empty
list, and so when you get to the sorting step you're only sorting one
item each time. You need to accumulate the list instead of wiping it
out on each iteration, and only sort it after the loop has completed.
 
A

alex23

yes, thanks all your tips. I did try sorted with itemgetter.
However, the sorted results are same as follows whether I
set reverse=True or reverse= False. Isn't it strange? Thanks.

That's because you're sorting each entry individually, not the entire
result. For every key-value pair, you create a new empty list, append
one tuple, and then sort it. The consistent order you're seeing is the
outcome of stepping through the dictionary keys.

This is untested, but it should be closer to what you're after, I
think. First it creates `list_simi` as a generator, then it sorts it.

import nltk
from nltk.corpus import wordnet as wn
from operator import itemgetter

pairs = {'car':'automobile', 'gem':'jewel', 'journey':'voyage'}

def find_similarity(word1, word2):
as_synset = lambda word: wn.synset( str(word) + '.n.01' )
return as_synset(word1).path_similarity( as_synset(word2) )

similarity_value = itemgetter(1)

list_simi = (
('%s-%s' % (word1, word2), find_similarity(word1, word2) )
for word1, word2 in pairs.iteritems()
)
list_simi = sorted(list_simi, key=similarity_value, reverse=True)
 
T

Token Type

Thanks indeed for all your suggestions. When I try my above codes, what puzzles me is that when the data in the dictionary increase, some data become missing in the sorted result. Quite odd. In the pairs, we have {'journey':'voyage'} but in the sorted result no ('journey-voyage',0.25), which did appear in my first post which was a small scale experiment. I am quite puzzled....
pairs = {'car':'automobile', 'gem':'jewel', 'journey':'voyage','boy':'lad','coast':'shore', 'asylum':'madhouse', 'magician':'wizard', 'midday':'noon', 'furnace':'stove', 'food':'fruit', 'bird':'****', 'bird':'crane', 'tool':'implement', 'brother':'monk', 'lad':'brother', 'crane':'implement', 'journey':'car', 'monk':'oracle', 'cemetery':'woodland', 'food':'rooster', 'coast':'hill', 'forest':'graveyard', 'shore':'woodland', 'monk':'slave', 'coast':'forest','lad':'wizard', 'chord':'smile', 'glass':'magician', 'rooster':'voyage', 'noon':'string'}
list_simi=[]
for key in pairs:
word1 = wn.synset(str(key) + '.n.01')
word2 = wn.synset(str(pairs[key])+'.n.01')
similarity = word1.path_similarity(word2)
list_simi.append((key+'-'+pairs[key],similarity))

[('midday-noon', 1.0), ('car-automobile', 1.0), ('tool-implement', 0.5), ('boy-lad', 0.3333333333333333), ('lad-wizard', 0.2), ('monk-slave', 0.2), ('shore-woodland', 0.2), ('magician-wizard', 0.16666666666666666), ('brother-monk', 0.125), ('asylum-madhouse', 0.125), ('gem-jewel', 0.125), ('cemetery-woodland', 0.1111111111111111), ('bird-crane', 0.1111111111111111), ('glass-magician', 0.1111111111111111), ('crane-implement', 0.1), ('chord-smile',0.09090909090909091), ('coast-forest', 0.09090909090909091), ('furnace-stove', 0.07692307692307693), ('forest-graveyard', 0.07142857142857142), ('food-rooster', 0.0625), ('noon-string', 0.058823529411764705), ('journey-car',0.05), ('rooster-voyage', 0.041666666666666664)]
 
A

alex23

When I try my above codes, what puzzles me is that when
the data in the dictionary increase, some data become
missing in the sorted result. Quite odd. In the pairs,
we have {'journey':'voyage'} but in the sorted result no (
'journey-voyage',0.25)

Keys are unique in dictionaries. You have two uses of 'journey'; the
second will overwrite the first.

Do you _need_ these items to be a dictionary? Are you doing any look
up? If not, just make it a list of tuples:

pairs = [ ('car', 'automobile'), ('gem', 'jewel') ...]

Then make your main loop:

for word1, word2 in pairs:

If you do need a dictionary for other reasons, you might want to try a
dictionary of lists:

pairs = {
'car': ['automobile', 'vehicle'],
'gem': ['jewel'],
}

for word1, synonyms in pairs:
for word2 in synonyms:
...
 
T

Token Type

Thanks indeed for your tips. Now I understand the difference between tuples and dictionaries deeper.
 

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top