Rookie question about data types (hashtables)

S

Steve D. Perkins

Hello all -

I'm a Java and C++ developer, looking to pick up some Python and
wxPython skills for prototyping and quick projects. I've been developing
for about 15 years, and am very familiar with the wxWindows framework (I
wrote one of the early Java bindings for it), but I've only been exposed
to Python for a few days.

I like to learn by doing, so I decided to just dive in with a small
project I have a need for... a Spanish-English dictionary that lives in
the system tray and pops up on demand with a hotkey. I've borrowed the
dictionary data from the Pythoñol project on SourceForge, and learned the
Python methods for file I/O and parsing strings. I now have 80,000 or so
cleanly-parsed Spanish-English word pairs, and need to find the
appropriate data structure to store them in.

What I would LIKE to have is a hashtable-like structure that lets me
retrieve multiple values based on a partial key. I want to have a text
field for the user to type in a Spanish word, and a box below containing
a list of English words. With each character that the user types, I
would like to narrow down the words listed in the box below. That is,
when you type the letter "a" the box will contain all English words
starting with the letter "a". When you then type "n", the box will
narrow further and contain all words starting with "an", etc... until
you've narrowed it down to appropriate word. I figure that this will be
easier than having users enter entire words in one shot and then try to
implement some intelligence for near-matches.

Therefore, it would be great if there was some hashtable-like data
structure that lets me say "Return all values where the key starts with
'an'". Actually, it would be even better if I could provide multiple
options and get a union in one pass... such as "Return all values where
the key starts with 'an' or 'añ'".

I haven't yet turned up EXACTLY what I'm looking for, but I thought
I would ask in case everyone would say, "Duh! There's a native data type
for that, silly!". Even if that's not the case, does anyone know of any
examples out there I can look at where such a data structure has been
implemented manually? Thanks!

Steve
 
R

Rainer Deyke

Steve said:
What I would LIKE to have is a hashtable-like structure that lets
me retrieve multiple values based on a partial key.

Sounds like a sorted list would work best. A binary search (see the bisect
module) lets find all matches in O(log n) time.
 
S

Sidharth Kuruvila

the bsddb module has a binary tree database but that might be overkill.
Other than that I doubt there's anything in standard python that would help.

If you don't plan to add any words to your dictionary you could order the
list and use a binary search to find the entries.
you could use python list for that :)
 
A

Anton Vredegoor

Sounds like a sorted list would work best. A binary search (see the bisect
module) lets find all matches in O(log n) time.

Probably yes. I'm replying just in case someone would like to have
their hash values sorted in alphabetical order too. The code below
might be of interest to hobbyist programmers who don't care about
practical issues :) (also not tested beyond what you see)

Anton

from string import ascii_lowercase as alc

def antonian_sorted_unrank(k,base,ndig):
res = []
while k > -1 :
r,k = divmod(k,int('1'*ndig,base))
k,ndig = k-1,ndig-1
res.append(r)
return res

def antonian_sorted_rank(seq, base,ndig):
res = 0
for x in seq:
res = x*int('1'*ndig,base)+res+1
ndig -= 1
return res-1

def word_rank(word,ndig):
d = [alc.index(c) for c in word]
return antonian_sorted_rank(d,len(alc),ndig)

def word_unrank(i,ndig):
L = antonian_sorted_unrank(i,len(alc),ndig)
return "".join([alc[j] for j in L])

def test():
L = ["a","ab","aa","b","bab","ba"]
maxlen = max(map(len,L))
print L
def wu(i): return word_unrank(i,maxlen)
def wr(w): return word_rank(w,maxlen)
H = map(wr,L)
print H
H.sort()
print H
R = map(wu,H)
print R
L.sort()
assert L == R

if __name__=='__main__':
test()
 
A

Axel Mittendorf

Steve D. Perkins said:
Hello all - Hi

when you type the letter "a" the box will contain all English words
starting with the letter "a". When you then type "n", the box will
narrow further and contain all words starting with "an", etc... until

Maybe you could use a normal python dictionary (you call it hashtable).
Look at this script (You would need one dictionary for each direction.):

#!/usr/bin/python

# generate dictionaries (AKA hashtables).
# direction: English --> German
EN2DE = {}
# direction: German --> English
DE2EN = {}

# inserts a word and its translation into the dict d
def insertWord( d, key, val):
# does the key exist
if d.has_key( key):
# yes
d[ key].append( val)
else:
# no
d[ key] = [val]

# returns a list words that begin with s
def getBeginWith( d, s):
l = s.__len__()
res = []
keys = d.keys()
for k in keys:
if k.__len__()>=l and k[:l] == s:
# use "k[:l].lower() == s.lower()" to ignore case
res.append( k)
return res

# insert some entries
insertWord( EN2DE, "someword", "meaning 1")
insertWord( EN2DE, "someword", "meaning 2")
insertWord( EN2DE, "my_codingstyle", "evil")
insertWord( EN2DE, "son", "Sohn")
insertWord( EN2DE, "sunday", "Sonntag")
insertWord( EN2DE, "wednesday", "Mittwoch")
insertWord( EN2DE, "Weapon of mass destruction", "George Walker Bush")
insertWord( EN2DE, "sold", "verkauft")

# show that it works
print "all keys that begin with 'so': ", getBeginWith( EN2DE, "so")
print "translations for 'someword': ", EN2DE['someword']

Instead of doing this you could maybe subclass UserList from the standard
library.

HTH, Axel
 
E

Eric Brunel

Steve D. Perkins wrote:
[snip]
What I would LIKE to have is a hashtable-like structure that lets me
retrieve multiple values based on a partial key.

AFAIK, if you can do queries on a "hashtable-like structure" with a partial key,
it cannot be a real hashtable...
> I want to have a text
field for the user to type in a Spanish word, and a box below containing
a list of English words. With each character that the user types, I
would like to narrow down the words listed in the box below. That is,
when you type the letter "a" the box will contain all English words
starting with the letter "a". When you then type "n", the box will
narrow further and contain all words starting with "an", etc... until
you've narrowed it down to appropriate word.

I'd use a Python dictionary indexed not on words, but on letters in the word.
So, for a dictionary translating english words into french, you'd have for example:

dictionary['m']['o']['n']['d']['a']['y']['translation'] = 'lundi'

(NB: having 'translation' as the last key is to solve the problem of words
included in another; see 'sun' and 'sunday'. BTW, any key that is not a single
character can be used...)

Here is an example implementation:

--------------------------------------------------------
def addWord(myDict, originalWord, translatedWord):
d = myDict
for c in originalWord:
if not d.has_key(c): d[c] = {}
d = d[c]
d['translation'] = translatedWord

def _allTranslations(rootWord, d):
res = []
for c in d.keys():
if c == 'translation':
res.append((rootWord, d[c]))
else:
res += _allTranslations(rootWord + c, d[c])
return res

def getTranslations(myDict, word):
d = myDict
for c in word:
if not d.has_key(c): return []
d = d[c]
return _allTranslations(word, d)
--------------------------------------------------------

And then you'd do:

frenchDict = {}
## Week days
addWord(frenchDict, 'monday', 'lundi')
addWord(frenchDict, 'tuesday', 'mardi')
# ...
addWord(frenchDict, 'sunday', 'dimanche')
## Months
addWord(frenchDict, 'january', 'janvier')
addWord(frenchDict, 'february', 'fevrier')
# ...
addWord(frenchDict, 'december', 'decembre')

print getTranslations(frenchDict, 'wednesday')
print getTranslations(frenchDict, 'ju')
print getTranslations(frenchDict, 's')

I don't know how it scales up for big dictionaries, not to mention huge ones...

HTH
 
S

Steve D. Perkins

What I would LIKE to have is a hashtable-like structure that lets
Sounds like a sorted list would work best. A binary search (see the
bisect module) lets find all matches in O(log n) time.


Hmm... a double-linked-list with recursion, like back in Freshman
Programming 101? I'm embarrassed to say that this didn't even cross my
mind originally. I've been coding in Java for too long, when I encounter
programs my instincts are to find an open-source plug-in that's already
been written to solve the problem.
 
R

Rainer Deyke

Steve said:
Hmm... a double-linked-list with recursion, like back in Freshman
Programming 101?

I was thinking about a standard Python list, which is actually implemented
as an array, not a linked list. Binary searches don't work on linked lists.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top