Problem loading a file of words

T

teoryn

I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code:

*--beginning of file--*
#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
word = str.lower(word)
word_list = []
for char in word:
word_list.append(char)
word_list.sort()
sorted_word = ''
for char in word_list:
sorted_word += char
return sorted_word

print 'Building dictionary...',

dictionary = { }

# Notice that you need to have a file named 'dictionary.txt'
# in the same directory as this file. The format is to have
# one word per line, such as the following (of course without
# the # marks):

#test
#hello
#quit
#night
#pear
#pare

f = file('dictionary.txt')

# This loop builds the dictionary, where the key is
# the string after calling sort_string(), and the value
# is the list of all 'regular' words (from the dictionary,
# not sorted) that passing to sort_string() returns the key

while True:
line = f.readline()
if len(line) == 0:
break
line = str.lower(line[:-1]) # convert to lowercase just in case
and
# remove the return at the end of
the line
sline = sort_string(line)
if sline in dictionary: # this key already exist, add to
existing list
dictionary[sline].append(line)
print 'Added %s to key %s' % (line,sline) #for testing
else: # create new key and list
dictionary[sline] = [line]
print 'Created key %s for %s' % (sline,line) #for
testing
f.close()

print 'Ready!'

# This loop lets the user input a scrambled word, look for it in
# dictionary, and print all matching unscrambled words.
# If the user types 'quit' then the program ends.
while True:
lookup = raw_input('Enter a scrambled word : ')

results = dictionary[sort_string(lookup)]

for x in results:
print x,

print

if lookup == 'quit':
break
*--end of file--*


If you create dictionary.txt as suggested in the comments, it should
work fine (assumeing you pass a word that creates a valid key, I'll
have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the large
dictionary)


*--beginning of example--*
Enter a scrambled word : ccehimnostyz
Traceback (most recent call last):
File "unscram.py", line 62, in ?
results = dictionary[sort_string(lookup)]
KeyError: 'ccehimnostyz'
*--end of example--*


If you'd like a copy of the dictionary I'm using email me at teoryn at
gmail dot com or leave your email here and I'll send it to you (It's
702.2 KB compressed)

Thanks,
Kevin
 
D

Devan L

teoryn said:
I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code:

*--beginning of file--*
#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
word = str.lower(word)
word_list = []
for char in word:
word_list.append(char)
word_list.sort()
sorted_word = ''
for char in word_list:
sorted_word += char
return sorted_word

print 'Building dictionary...',

dictionary = { }

# Notice that you need to have a file named 'dictionary.txt'
# in the same directory as this file. The format is to have
# one word per line, such as the following (of course without
# the # marks):

#test
#hello
#quit
#night
#pear
#pare

f = file('dictionary.txt')

# This loop builds the dictionary, where the key is
# the string after calling sort_string(), and the value
# is the list of all 'regular' words (from the dictionary,
# not sorted) that passing to sort_string() returns the key

while True:
line = f.readline()
if len(line) == 0:
break
line = str.lower(line[:-1]) # convert to lowercase just in case
and
# remove the return at the end of
the line
sline = sort_string(line)
if sline in dictionary: # this key already exist, add to
existing list
dictionary[sline].append(line)
print 'Added %s to key %s' % (line,sline) #for testing
else: # create new key and list
dictionary[sline] = [line]
print 'Created key %s for %s' % (sline,line) #for
testing
f.close()

print 'Ready!'

# This loop lets the user input a scrambled word, look for it in
# dictionary, and print all matching unscrambled words.
# If the user types 'quit' then the program ends.
while True:
lookup = raw_input('Enter a scrambled word : ')

results = dictionary[sort_string(lookup)]

for x in results:
print x,

print

if lookup == 'quit':
break
*--end of file--*


If you create dictionary.txt as suggested in the comments, it should
work fine (assumeing you pass a word that creates a valid key, I'll
have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the large
dictionary)


*--beginning of example--*
Enter a scrambled word : ccehimnostyz
Traceback (most recent call last):
File "unscram.py", line 62, in ?
results = dictionary[sort_string(lookup)]
KeyError: 'ccehimnostyz'
*--end of example--*


If you'd like a copy of the dictionary I'm using email me at teoryn at
gmail dot com or leave your email here and I'll send it to you (It's
702.2 KB compressed)

Thanks,
Kevin

Heh, it reminds me of the code I used to write.

def sort_string(word):
return ''.join(sorted(list(word.lower())))
f = open('dictionary.txt','r')
lines = [line.rstrip('\n') for line in f.readlines()]
f.close()
dictionary = dict((sort_string(line),line) for line in lines)
lookup = ''
while lookup != 'quit':
lookup = raw_input('Enter a scrambled word:')
if dictionary.has_key(lookup):
word = dictionary[lookup]
else:
word = 'Not found.'
print word

You need python 2.4 to use this example.
 
R

Robert Kern

teoryn said:
I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code:

*--beginning of file--*
#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
word = str.lower(word)
word_list = []
for char in word:
word_list.append(char)
word_list.sort()
sorted_word = ''
for char in word_list:
sorted_word += char
return sorted_word

An idiomatic Python 2.4 version of this function would be:

def sort_string(word):
word = word.lower()
sorted_list = sorted(word)
sorted_word = ''.join(sorted_list)
return sorted_word
print 'Building dictionary...',

dictionary = { }

# Notice that you need to have a file named 'dictionary.txt'
# in the same directory as this file. The format is to have
# one word per line, such as the following (of course without
# the # marks):

#test
#hello
#quit
#night
#pear
#pare

f = file('dictionary.txt')

# This loop builds the dictionary, where the key is
# the string after calling sort_string(), and the value
# is the list of all 'regular' words (from the dictionary,
# not sorted) that passing to sort_string() returns the key

while True:
line = f.readline()
if len(line) == 0:
break
line = str.lower(line[:-1]) # convert to lowercase just in case
and
# remove the return at the end of
the line
sline = sort_string(line)
if sline in dictionary: # this key already exist, add to
existing list
dictionary[sline].append(line)
print 'Added %s to key %s' % (line,sline) #for testing
else: # create new key and list
dictionary[sline] = [line]
print 'Created key %s for %s' % (sline,line) #for
testing
f.close()

# this really should all be within a function, but let's just carry on
dictionary = {}
f = open('dictionary.txt')
try:
# enclose this in a try: finally: block in case something goes wrong
for line in f:
line = line.strip().lower()
sline = sort_string(line)
val = dictionary.setdefault(sline, [])
val.append(line)
print "Added %s to key %s" % (line, sline)
finally:
f.close()
print 'Ready!'

# This loop lets the user input a scrambled word, look for it in
# dictionary, and print all matching unscrambled words.
# If the user types 'quit' then the program ends.
while True:
lookup = raw_input('Enter a scrambled word : ')

results = dictionary[sort_string(lookup)]

for x in results:
print x,

print

if lookup == 'quit':
break
*--end of file--*


If you create dictionary.txt as suggested in the comments, it should
work fine (assumeing you pass a word that creates a valid key, I'll
have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the large
dictionary)

Well, my version works (using /usr/share/dict/words from Debian as
dictionary.txt). Yours does, too. Are you sure that you are using the
right dictionary.txt?

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
R

Robert Kern

Devan said:
Heh, it reminds me of the code I used to write.

def sort_string(word):
return ''.join(sorted(list(word.lower())))
f = open('dictionary.txt','r')
lines = [line.rstrip('\n') for line in f.readlines()]
f.close()
dictionary = dict((sort_string(line),line) for line in lines)

That's definitely not the kind of dictionary that he wants.

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
T

Terrance N. Phillip

Kevin,
I'm pretty new to Python too. I'm not sure why you're seeing this
problem... is it possible that this is an "out-by-one" error? Is
zymotechnics the *last* word in dictionary.txt? Try this slightly
simplified version of your program and see if you have the same problem....

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
return "".join(sorted(list(word.lower())))

dictionary = {}
f = open('/usr/bin/words') # or whatever file you like
for line in f:
sline = sort_string(line[:-1])
if sline in dictionary:
dictionary[sline].append(line)
else:
dictionary[sline] = [line]
f.close()

lookup = raw_input('Enter a scrambled word : ')
while lookup:
try:
results = dictionary[sort_string(lookup)]
for x in results:
print x,
print
except:
print "?????"
lookup = raw_input('Enter a scrambled word : ')


Good luck,

Nick.
 
D

Devan L

Robert said:
That's definitely not the kind of dictionary that he wants.

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Oh, I missed the part where he put values in a list.
 
P

Peter Otten

teoryn said:
I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code:
line = str.lower(line[:-1]) # convert to lowercase just in case
have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the large
dictionary)


*--beginning of example--*
Enter a scrambled word : ccehimnostyz
Traceback (most recent call last):
File "unscram.py", line 62, in ?
results = dictionary[sort_string(lookup)]
KeyError: 'ccehimnostyz'
*--end of example--*

If 'zymotechnics' is the last line and that line is missing a trailing
newline

line[:-1]

mutilates 'zymotechnics' to 'zymotechnic'. In that case the dictionary would
contain the key 'ccehimnotyz'. Another potential problem could be
leading/trailing whitespace. Both problems can be fixed by using
line.strip() instead of line[:-1] as in Robert Kern's code.

Peter
 
S

Steven D'Aprano

I've been spending today learning python and as an exercise I've ported
a program I wrote in java that unscrambles a word. Before describing
the problem, here's the code:

*--beginning of file--*
#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
word = str.lower(word)

It is generally considered better form to write that line as:

word = word.lower()

word_list = []
for char in word:
word_list.append(char)

If you want a list of characters, the best way of doing that is just:

word_list = list(word)

word_list.sort()
sorted_word = ''
for char in word_list:
sorted_word += char
return sorted_word

And the above four lines are best written as:

return ''.join(word_list)

print 'Building dictionary...',

dictionary = { }

# Notice that you need to have a file named 'dictionary.txt'
# in the same directory as this file. The format is to have
# one word per line, such as the following (of course without
# the # marks):

#test
#hello
#quit
#night
#pear
#pare

f = file('dictionary.txt')

# This loop builds the dictionary, where the key is
# the string after calling sort_string(), and the value
# is the list of all 'regular' words (from the dictionary,
# not sorted) that passing to sort_string() returns the key

while True:
line = f.readline()
if len(line) == 0:
break
line = str.lower(line[:-1]) # convert to lowercase just in case
and
# remove the return at the end of
the line
sline = sort_string(line)
if sline in dictionary: # this key already exist, add to
existing list
dictionary[sline].append(line)
print 'Added %s to key %s' % (line,sline) #for testing
else: # create new key and list
dictionary[sline] = [line]
print 'Created key %s for %s' % (sline,line) #for
testing
f.close()

Your while-loop seems to have been mangled a little thanks to word-wrap.
In particular, I can't work out what that "and" is doing in the middle of
it.

Unless you are expecting really HUGE dictionary files (hundreds of
millions of lines) perhaps a better way of writing the above while-loop
would be:

print 'Building dictionary...',
dictionary = { }
f = file('dictionary.txt', 'r')
for line in f.readlines()
line = line.strip() # remove whitespace at both ends
if line: # line is not the empty string
line = line.lower()
sline = sort_string(line)
if sline in dictionary:
dictionary[sline].append(line)
print 'Added %s to key %s' % (line,sline)
else:
dictionary[sline] = [line]
print 'Created key %s for %s' % (sline,line)
f.close()

print 'Ready!'

# This loop lets the user input a scrambled word, look for it in
# dictionary, and print all matching unscrambled words.
# If the user types 'quit' then the program ends.
while True:
lookup = raw_input('Enter a scrambled word : ')

results = dictionary[sort_string(lookup)]

This will fail if the scrambled word you enter is not in the dictionary.
for x in results:
print x,

print

if lookup == 'quit':
break

You probably want the test for quit to happen before printing the
"unscrambled" words.
*--end of file--*


If you create dictionary.txt as suggested in the comments, it should
work fine (assumeing you pass a word that creates a valid key, I'll
have to add exceptions later). The problem is when using a large
dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
always gives an error, specifically:
(Note: ccehimnostyz is for zymotechnics, which is in the
large dictionary)


*--beginning of example--*
Enter a scrambled word : ccehimnostyz Traceback (most recent call last):
File "unscram.py", line 62, in ?
results = dictionary[sort_string(lookup)]
KeyError: 'ccehimnostyz'
*--end of example--*

If this error is always happening for the LAST line in the text file, I'm
guessing there is no newline after the word. So when you read the text
file and build the dictionary, you inadvertently remove the "s" from the
word before storing it in the dictionary.
 
T

teoryn

Thanks to everyone for all the help!

Here's the (at least for now) final script, although note I'm using
2.3.5, not 2.4, so I can't use some of the tips that were given.

#!/usr/bin/python
# Filename: unscram.py

def sort_string(word):
'''Returns word in lowercase sorted alphabetically'''
word_list = list(word.lower())
word_list.sort()
return ''.join(word_list)

print 'Building dictionary...',

dictionary = { }

f = file('/usr/share/dict/words', 'r')

for line in f.readlines():
line = line.strip() # remove whitespace at both ends
if line: # line is not the empty string
line = line.lower()
sline = sort_string(line)
if sline in dictionary:
dictionary[sline].append(line)
#print 'Added %s to key %s' % (line,sline)
else:
dictionary[sline] = [line]
#print 'Created key %s for %s' % (sline,line)
f.close()

print 'Ready!'

lookup = raw_input('Enter a scrambled word : ')
while lookup:
try:
results = dictionary[sort_string(lookup)]
for x in results:
print x,
print
except:
print "?????"
lookup = raw_input('Enter a scrambled word : ')



As for the end of the file idea, that word wasn't at the end of the
file, and there was a blank line, so that's out of the question. The
word list I was using was 272,520 words long, and I got it a while back
when doing this same thing in java, but as you can see now I'm just
using /usr/share/dict/words which I found after not finding it in the
place listed in Nick's comment.

I'm still lost as to why my old code would only work for the small
file, and another interesting note is that with the larger file, it
would only write "zzz for zzz" (or whatever each word was) instead of
"Created key zzz for zzz". However, it works now, so I'm happy.

Thanks for all the help,
Kevin
 
P

Peter Otten

teoryn said:
I'm still lost as to why my old code would only work for the small
file, and another interesting note is that with the larger file, it
would only write "zzz for zzz" (or whatever each word was) instead of
"Created key zzz for zzz". However, it works now, so I'm happy.

Happy as long as you don't know what happened? How can that be?
Another guess then -- there may be inconsistent newlines, some "\n" and some
"\r\n":
garbled = "garbled\r\n"[:-1]
print "created key %s for %s" % ("".join(sorted(garbled)), garbled)
abdeglr for garbled

Peter
 
T

teoryn

I was just happy that it worked, but was still curious as to why it
didn't before. Thanks for the idea, I'll look into it and see if this
is the case.

Thanks,
Kevin
 
T

teoryn

I changed to using line = line.strip() instead of line = line [:-1] in
the original and it it worked.

Thanks!
 
P

Peter Hansen

teoryn said:
I changed to using line = line.strip() instead of line = line [:-1] in
the original and it it worked.

Just to be clear, these don't do nearly the same thing in general,
though in your specific case they might appear similar.

The line[:-1] idiom says 'return a string which is a copy of the
original but with the last character, if any, removed, regardless of
what character it is'.

The line.strip() idiom says 'return a string with all whitespace
characters removed from the end *and* start of the string'.

In certain cases, you might reasonably prefer .rstrip() (which removes
only from the right-hand side, or end), or even something like
..rstrip('\n') which would remove only newlines from the end.

-Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top