Looking things up by string prefix

T

Tom Anderson

Hi all,

I want to store some things, each filed under a string key. I then want to
look those things up with other strings, finding the thing whose key is
the longest available prefix of the lookup string.

So, if i store:

/products -> foo
/products/fruit -> bar

Then lookups go like:

/products/furniture/chairs -> foo
/products/fruit/bananas -> bar

Am i right in thinking that there's nothing in the standard library that
does this? I'm writing something that uses the NavigableMap methods on
TreeMap to do it, but it's a bit grim, and not guaranteed to be as
efficient as it could be.

Am i further right in thinking that there's nothing in Apache's Commons
Collections or Google's Guava Collections that does this?

Is there any reasonable open-source library that does?

If i write this myself, is there a data structure better than a Patricia
tree for it?

Cheers,
tom
 
J

John B. Matthews

Tom Anderson said:
I want to store some things, each filed under a string key. I then want to
look those things up with other strings, finding the thing whose key is
the longest available prefix of the lookup string.

So, if i store:

/products -> foo
/products/fruit -> bar

Then lookups go like:

/products/furniture/chairs -> foo
/products/fruit/bananas -> bar

Am i right in thinking that there's nothing in the standard library that
does this? I'm writing something that uses the NavigableMap methods on
TreeMap to do it, but it's a bit grim, and not guaranteed to be as
efficient as it could be.

Am i further right in thinking that there's nothing in Apache's Commons
Collections or Google's Guava Collections that does this?

Is there any reasonable open-source library that does?

Have you seen this?
 
E

Eric Sosman

Hi all,

I want to store some things, each filed under a string key. I then want
to look those things up with other strings, finding the thing whose key
is the longest available prefix of the lookup string.

So, if i store:

/products -> foo
/products/fruit -> bar

Then lookups go like:

/products/furniture/chairs -> foo
/products/fruit/bananas -> bar

If you add "/products/fuzz -> baz" to the collection, what
should a lookup on "/products/f" return?
If i write this myself, is there a data structure better than a Patricia
tree for it?

The scale of the problem depends on whether a key's "words" must
match in their entirety or only partially. If it's a "word-by-word"
discipline (so /products/f yields foo), I think an ordinary multi-way
tree would be fine; no need for anything as fancy as Patricia. If
/products/f matches both /products/fruit and /products/fuzz, something
trickier may be needed (at the least, you've got to decide what to do
about the ambiguity).
 
R

Roedy Green

I want to store some things, each filed under a string key. I then want to
look those things up with other strings, finding the thing whose key is
the longest available prefix of the lookup string.

here are three approaches:

1. use SQL

2. use a TreeMap. Find entry before and after the lookup and analyse.

3. use a HashMap. Keep chopping the key till you get a match on
lookup.
--
Roedy Green Canadian Mind Products
http://mindprod.com

There is no harm in being sometimes wrong especially if one is promptly found out.
~ John Maynard Keynes (born: 1883-06-05 died: 1946-04-21 at age: 62)
 
A

Andreas Leitgeb

Tom Anderson said:
I want to store some things, each filed under a string key. I then want to
look those things up with other strings, finding the thing whose key is
the longest available prefix of the lookup string.
So, if i store:
/products -> foo
/products/fruit -> bar
Then lookups go like:
/products/furniture/chairs -> foo
/products/fruit/bananas -> bar

My approach would be, to partition the keys by length, first,
and for each length maintain a separate HashMap.

The structure to hold the key-thing pairs:
ArrayList<HashMap<String,Thing>>

To add a pair: put it into the hashmap indexed by key.length()
in the ArrayList (extending the ArrayList and creating a new
HashMap as necessary)

To lookup a string:
reverse-iterate the ArrayList from .size()-1 downto 1: for each
index i that has a non-empty HashMap stored in the ArrayList,
obtain lookupString.substring(0,i) and look it up in that
non-empty HashMap stored at i.

If the key lengths of your keys are very sparsely distributed,
replace the ArrayList<?> with TreeMap<Integer,HashMap<String,Thing>>

PS: I think, there is no such Collection ready made in Java standard
library. I have no idea whether it exists in apache's or other class
libraries. I don't know Patricia Trees, either, so cannot answer
those direct questions.
 
T

Tom Anderson

I think this Radix Tree does what you want (it's the same thing as a
Patricia tree):

http://code.google.com/p/radixtree/

It could certainly be modified to do what i want, yes. Thanks for finding
that.

It's a shame his RadixTree doesn't implement the standard Map interface; i
don't particularly need it to, but it would be a more well-rounded class
if it did.

tom
 
T

Tom Anderson

If you add "/products/fuzz -> baz" to the collection, what
should a lookup on "/products/f" return?

foo. Neither of the other keys is a prefix of the search term.
The scale of the problem depends on whether a key's "words" must
match in their entirety or only partially. If it's a "word-by-word"
discipline (so /products/f yields foo), I think an ordinary multi-way
tree would be fine; no need for anything as fancy as Patricia. If
/products/f matches both /products/fruit and /products/fuzz, something
trickier may be needed (at the least, you've got to decide what to do
about the ambiguity).

I evidently haven't explained this clearly. A lookup with /products/f
would never match either /products/fruit or /products/fuzz as entries,
because neither is a prefix of the lookup term.

What i'm trying to do is a bit (okay, a lot) like servlet mapping. If i
set up a mapping for /user/editProfile/*, then i want to catch requests
for /user/editProfile/jim, /user/editProfile/bob, etc.

tom
 
T

Tom Anderson

My approach would be, to partition the keys by length, first,
and for each length maintain a separate HashMap.

The structure to hold the key-thing pairs:
ArrayList<HashMap<String,Thing>>

To add a pair: put it into the hashmap indexed by key.length()
in the ArrayList (extending the ArrayList and creating a new
HashMap as necessary)

To lookup a string:
reverse-iterate the ArrayList from .size()-1 downto 1: for each
index i that has a non-empty HashMap stored in the ArrayList,
obtain lookupString.substring(0,i) and look it up in that
non-empty HashMap stored at i.

Interesting, thanks. That would certainly work. At first glance, it seems
a bit clunky, but you only need to to do lookups for the lengths that have
keys, and each one is a hashtable lookup, so it should be quite quick.

For large sets of keys, or rather for sets of keys with many different
lengths, i suspect it wouldn't be very fast, but my key set is likely to
be quite small.

tom
 
E

Eric Sosman

foo. Neither of the other keys is a prefix of the search term.


I evidently haven't explained this clearly. A lookup with /products/f
would never match either /products/fruit or /products/fuzz as entries,
because neither is a prefix of the lookup term.

What i'm trying to do is a bit (okay, a lot) like servlet mapping. If i
set up a mapping for /user/editProfile/*, then i want to catch requests
for /user/editProfile/jim, /user/editProfile/bob, etc.

Okay. I don't know what's already out there and ready-written,
but if I were going to write such a thing from scratch I'd probably
come up with something like (just typed in; not compiled or tested)

class Node {

/** Mapping if we match this far and no further */
private final Target target;

/** Links to deeper Nodes. */
private final Map<String,Node> links;

Node(Target target) {
this.target = target;
this.links = new HashMap<String,Node>();
}

void setLink(String key, Node dest) {
links.put(key, dest);
}

Node getLink(String key) {
return links.get(key);
}

Target match(Iterator<String> keys) {
if (! keys.hasNext())
return null;
Node next = getLink(keys.next());
Target t = (next == null) ? target : next.match(keys);
return (t == null) ? target : t;
}
}

The internal Map might be something lighter-weight if the "degrees"
of the Nodes are expected to be small, but that's the idea. It's a
lot like traversing a directory tree, with the added fillip of a
default value for incomplete matches.
 
A

Andreas Leitgeb

Tom Anderson said:
Interesting, thanks. That would certainly work. At first glance, it seems
a bit clunky, but you only need to to do lookups for the lengths that have
keys, and each one is a hashtable lookup, so it should be quite quick.

If your set of keys is going to be small enough, you can get by with a
more lightweight variant:
maintain an int[longestKey.length()] to count each length's key-count
and keep all the key-value pairs together in one HashMap.

Only obtain lookupString.substring(0,i) for each i whose array-element
is non-zero. For those, however, you'll search the complete HashMap,
which may be slightly less performant than the original.
 
R

Robert Klemme

If your set of keys is going to be small enough, you can get by with a
more lightweight variant:

Hmm, if the data set is small enough you can get by with a wide range of
solutions - even linear search in a list or array may be fast enough.
This problem however is naturally implemented with some Trie variant.
That's the data structure that fits the problem best IMHO.

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,772
Messages
2,569,593
Members
45,111
Latest member
VetaMcRae
Top