Looking things up by string prefix

Tom Anderson · Jul 2, 2010

Hi all,

I want to store some things, each filed under a string key. I then want to
look those things up with other strings, finding the thing whose key is
the longest available prefix of the lookup string.

So, if i store:

/products -> foo
/products/fruit -> bar

Then lookups go like:

/products/furniture/chairs -> foo
/products/fruit/bananas -> bar

Am i right in thinking that there's nothing in the standard library that
does this? I'm writing something that uses the NavigableMap methods on
TreeMap to do it, but it's a bit grim, and not guaranteed to be as
efficient as it could be.

Am i further right in thinking that there's nothing in Apache's Commons
Collections or Google's Guava Collections that does this?

Is there any reasonable open-source library that does?

If i write this myself, is there a data structure better than a Patricia
tree for it?

Cheers,
tom

John B. Matthews · Jul 2, 2010

Tom Anderson said:
I want to store some things, each filed under a string key. I then want to
look those things up with other strings, finding the thing whose key is
the longest available prefix of the lookup string.

So, if i store:

/products -> foo
/products/fruit -> bar

Then lookups go like:

/products/furniture/chairs -> foo
/products/fruit/bananas -> bar

Am i right in thinking that there's nothing in the standard library that
does this? I'm writing something that uses the NavigableMap methods on
TreeMap to do it, but it's a bit grim, and not guaranteed to be as
efficient as it could be.

Am i further right in thinking that there's nothing in Apache's Commons
Collections or Google's Guava Collections that does this?

Is there any reasonable open-source library that does?

Have you seen this?

markspace · Jul 2, 2010

Tom said:
Is there any reasonable open-source library that does?

I think this Radix Tree does what you want (it's the same thing as a
Patricia tree):

http://code.google.com/p/radixtree/

Eric Sosman · Jul 2, 2010

Hi all,

I want to store some things, each filed under a string key. I then want
to look those things up with other strings, finding the thing whose key
is the longest available prefix of the lookup string.

So, if i store:

/products -> foo
/products/fruit -> bar

Then lookups go like:

/products/furniture/chairs -> foo
/products/fruit/bananas -> bar

If you add "/products/fuzz -> baz" to the collection, what
should a lookup on "/products/f" return?

If i write this myself, is there a data structure better than a Patricia
tree for it?

The scale of the problem depends on whether a key's "words" must
match in their entirety or only partially. If it's a "word-by-word"
discipline (so /products/f yields foo), I think an ordinary multi-way
tree would be fine; no need for anything as fancy as Patricia. If
/products/f matches both /products/fruit and /products/fuzz, something
trickier may be needed (at the least, you've got to decide what to do
about the ambiguity).

Roedy Green · Jul 4, 2010

I want to store some things, each filed under a string key. I then want to
look those things up with other strings, finding the thing whose key is
the longest available prefix of the lookup string.

here are three approaches:

1. use SQL

2. use a TreeMap. Find entry before and after the lookup and analyse.

3. use a HashMap. Keep chopping the key till you get a match on
lookup.
--
Roedy Green Canadian Mind Products
http://mindprod.com

There is no harm in being sometimes wrong especially if one is promptly found out.
~ John Maynard Keynes (born: 1883-06-05 died: 1946-04-21 at age: 62)

Andreas Leitgeb · Jul 4, 2010

Tom Anderson said:
I want to store some things, each filed under a string key. I then want to
look those things up with other strings, finding the thing whose key is
the longest available prefix of the lookup string.
So, if i store:
/products -> foo
/products/fruit -> bar
Then lookups go like:
/products/furniture/chairs -> foo
/products/fruit/bananas -> bar

My approach would be, to partition the keys by length, first,
and for each length maintain a separate HashMap.

The structure to hold the key-thing pairs:
ArrayList<HashMap<String,Thing>>

To add a pair: put it into the hashmap indexed by key.length()
in the ArrayList (extending the ArrayList and creating a new
HashMap as necessary)

To lookup a string:
reverse-iterate the ArrayList from .size()-1 downto 1: for each
index i that has a non-empty HashMap stored in the ArrayList,
obtain lookupString.substring(0,i) and look it up in that
non-empty HashMap stored at i.

If the key lengths of your keys are very sparsely distributed,
replace the ArrayList<?> with TreeMap<Integer,HashMap<String,Thing>>

PS: I think, there is no such Collection ready made in Java standard
library. I have no idea whether it exists in apache's or other class
libraries. I don't know Patricia Trees, either, so cannot answer
those direct questions.

Tom Anderson · Jul 4, 2010

Have you seen this?

<http://code.google.com/p/patricia-trie/>

At the point at which i posted, i had not. I stumbled across it a bit
later and kicked myself. I think it should be pretty easy to add the
method i want to that.

tom

Tom Anderson · Jul 4, 2010

I think this Radix Tree does what you want (it's the same thing as a
Patricia tree):

http://code.google.com/p/radixtree/

It could certainly be modified to do what i want, yes. Thanks for finding
that.

It's a shame his RadixTree doesn't implement the standard Map interface; i
don't particularly need it to, but it would be a more well-rounded class
if it did.

tom

Tom Anderson · Jul 4, 2010

If you add "/products/fuzz -> baz" to the collection, what
should a lookup on "/products/f" return?

foo. Neither of the other keys is a prefix of the search term.

The scale of the problem depends on whether a key's "words" must
match in their entirety or only partially. If it's a "word-by-word"
discipline (so /products/f yields foo), I think an ordinary multi-way
tree would be fine; no need for anything as fancy as Patricia. If
/products/f matches both /products/fruit and /products/fuzz, something
trickier may be needed (at the least, you've got to decide what to do
about the ambiguity).

I evidently haven't explained this clearly. A lookup with /products/f
would never match either /products/fruit or /products/fuzz as entries,
because neither is a prefix of the lookup term.

What i'm trying to do is a bit (okay, a lot) like servlet mapping. If i
set up a mapping for /user/editProfile/*, then i want to catch requests
for /user/editProfile/jim, /user/editProfile/bob, etc.

tom

Tom Anderson · Jul 4, 2010

My approach would be, to partition the keys by length, first,
and for each length maintain a separate HashMap.

The structure to hold the key-thing pairs:
ArrayList<HashMap<String,Thing>>

To add a pair: put it into the hashmap indexed by key.length()
in the ArrayList (extending the ArrayList and creating a new
HashMap as necessary)

To lookup a string:
reverse-iterate the ArrayList from .size()-1 downto 1: for each
index i that has a non-empty HashMap stored in the ArrayList,
obtain lookupString.substring(0,i) and look it up in that
non-empty HashMap stored at i.

Interesting, thanks. That would certainly work. At first glance, it seems
a bit clunky, but you only need to to do lookups for the lengths that have
keys, and each one is a hashtable lookup, so it should be quite quick.

For large sets of keys, or rather for sets of keys with many different
lengths, i suspect it wouldn't be very fast, but my key set is likely to
be quite small.

tom

Eric Sosman · Jul 4, 2010

foo. Neither of the other keys is a prefix of the search term.

I evidently haven't explained this clearly. A lookup with /products/f
would never match either /products/fruit or /products/fuzz as entries,
because neither is a prefix of the lookup term.

What i'm trying to do is a bit (okay, a lot) like servlet mapping. If i
set up a mapping for /user/editProfile/*, then i want to catch requests
for /user/editProfile/jim, /user/editProfile/bob, etc.

Okay. I don't know what's already out there and ready-written,
but if I were going to write such a thing from scratch I'd probably
come up with something like (just typed in; not compiled or tested)

class Node {

/** Mapping if we match this far and no further */
private final Target target;

/** Links to deeper Nodes. */
private final Map<String,Node> links;

Node(Target target) {
this.target = target;
this.links = new HashMap<String,Node>();
}

void setLink(String key, Node dest) {
links.put(key, dest);
}

Node getLink(String key) {
return links.get(key);
}

Target match(Iterator<String> keys) {
if (! keys.hasNext())
return null;
Node next = getLink(keys.next());
Target t = (next == null) ? target : next.match(keys);
return (t == null) ? target : t;
}
}

The internal Map might be something lighter-weight if the "degrees"
of the Nodes are expected to be small, but that's the idea. It's a
lot like traversing a directory tree, with the added fillip of a
default value for incomplete matches.

Andreas Leitgeb · Jul 4, 2010

Tom Anderson said:
Interesting, thanks. That would certainly work. At first glance, it seems
a bit clunky, but you only need to to do lookups for the lengths that have
keys, and each one is a hashtable lookup, so it should be quite quick.

If your set of keys is going to be small enough, you can get by with a
more lightweight variant:
maintain an int[longestKey.length()] to count each length's key-count
and keep all the key-value pairs together in one HashMap.

Only obtain lookupString.substring(0,i) for each i whose array-element
is non-zero. For those, however, you'll search the complete HashMap,
which may be slightly less performant than the original.

Robert Klemme · Jul 4, 2010

If your set of keys is going to be small enough, you can get by with a
more lightweight variant:

Hmm, if the data set is small enough you can get by with a wide range of
solutions - even linear search in a list or array may be fast enough.
This problem however is naturally implemented with some Trie variant.
That's the data structure that fits the problem best IMHO.

Kind regards

robert

PATCH: Speed up direct string concatenation by 20+%!	34	Sep 29, 2006
Sencha Touch--Support 2 browsers in just 228K!	64	Jul 16, 2010
7.0 wishlist?	321	Oct 29, 2008
Meta-Meta-Programming	29	Feb 7, 2006
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
Ruby Weekly News 6th - 12th June 2005	0	Jun 14, 2005
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	May 1, 2007
comp.lang.vhdl FAQ part 1 of 4: general	0	Jul 8, 2003

Looking things up by string prefix

Tom Anderson

John B. Matthews

markspace

Eric Sosman

Roedy Green

Andreas Leitgeb

Tom Anderson

Tom Anderson

Tom Anderson

Tom Anderson

Eric Sosman

Andreas Leitgeb

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads