Faster datastructure for lookups wanted

Discussion in 'Ruby' started by m94asr@gmail.com, Sep 7, 2006.

  1. Guest

    Hi all,

    maybe somebody can recommend me the right datastructure or
    any other advice would be a big help.

    My code spends most of its execution time doing lookups from
    a hashtable with about 1M keys. The keys are strings and the values
    are arrays of integers. Most of the time only of length 1.

    I do not care how long the construction of the datastructure takes,
    but the lookup should be as fast as possible.

    xs.each{|x|
    if found = hash[x]
    #do sth.
    end
    }


    Thanks a lot!
    -Armin
     
    , Sep 7, 2006
    #1
    1. Advertising

  2. On Fri, Sep 08, 2006 at 06:55:12AM +0900, wrote:
    > maybe somebody can recommend me the right datastructure or
    > any other advice would be a big help.
    >
    > My code spends most of its execution time doing lookups from
    > a hashtable with about 1M keys. The keys are strings and the values
    > are arrays of integers. Most of the time only of length 1.
    >
    > I do not care how long the construction of the datastructure takes,
    > but the lookup should be as fast as possible.


    It hardly gets faster than a Hash in Ruby.
    You can also try a trie (Patricia tree if you have long keys and care about
    space), or Judy arrays (http://rjudy.sourceforge.net), but I wouldn't expect
    major performance gains (Judy::JudySL, being more specialized than Hash,
    might have a chance)...

    --
    Mauricio Fernandez - http://eigenclass.org - singular Ruby
     
    Mauricio Fernandez, Sep 7, 2006
    #2
    1. Advertising

  3. Mauricio Fernandez wrote:
    > On Fri, Sep 08, 2006 at 06:55:12AM +0900, wrote:
    >> maybe somebody can recommend me the right datastructure or
    >> any other advice would be a big help.
    >>
    >> My code spends most of its execution time doing lookups from
    >> a hashtable with about 1M keys. The keys are strings and the values
    >> are arrays of integers. Most of the time only of length 1.
    >>
    >> I do not care how long the construction of the datastructure takes,
    >> but the lookup should be as fast as possible.

    >
    > It hardly gets faster than a Hash in Ruby.
    > You can also try a trie (Patricia tree if you have long keys and care
    > about
    > space),


    A Trie optimised by cutting off unambiguous traversal would
    be a definite possibility.

    > or Judy arrays (http://rjudy.sourceforge.net), but I wouldn't
    > expect
    > major performance gains (Judy::JudySL, being more specialized than Hash,
    > might have a chance)...



    --
    Posted via http://www.ruby-forum.com/.
     
    Eero Saynatkari, Sep 8, 2006
    #3
  4. On Fri, Sep 08, 2006 at 08:28:33AM +0900, Eero Saynatkari wrote:
    } Mauricio Fernandez wrote:
    } > On Fri, Sep 08, 2006 at 06:55:12AM +0900, wrote:
    } >> maybe somebody can recommend me the right datastructure or
    } >> any other advice would be a big help.
    } >>
    } >> My code spends most of its execution time doing lookups from
    } >> a hashtable with about 1M keys. The keys are strings and the values
    } >> are arrays of integers. Most of the time only of length 1.
    } >>
    } >> I do not care how long the construction of the datastructure takes,
    } >> but the lookup should be as fast as possible.
    } >
    } > It hardly gets faster than a Hash in Ruby.
    } > You can also try a trie (Patricia tree if you have long keys and care
    } > about
    } > space),
    }
    } A Trie optimised by cutting off unambiguous traversal would
    } be a definite possibility.

    There is a trie gem that implements a Patricia Trie.

    http://gemjack.com/gems/trie-0.0.1/classes/Trie.html

    Of course, a Patricia Trie assumes no a priori knowledge of your string
    inputs. If you know something about your keys, you may be able to do better
    with a hash of hashes (to however many layers is appropriate), splitting as
    appropriate for your key space. For example, if you know that your keys are
    IPv4 addresses that come in dotted quad notation (e.g. 127.0.0.1), you
    could do better with (note: untested):

    class SplittableHash
    def initialize(split)
    @split = split
    @root = {}
    end

    def [](key)
    key.split(@split).inject(@root) { |h,k| h[k] if h }
    end

    def []=(key, val)
    path = key.split(@split)
    key = path.pop
    path.inject(@root) { |h,k| h[k] ||= {} }[key] = val
    end

    end

    --Greg
     
    Gregory Seidman, Sep 8, 2006
    #4
  5. wrote:
    > Hi all,
    >
    > maybe somebody can recommend me the right datastructure or
    > any other advice would be a big help.
    >
    > My code spends most of its execution time doing lookups from
    > a hashtable with about 1M keys. The keys are strings and the values
    > are arrays of integers. Most of the time only of length 1.
    >
    > I do not care how long the construction of the datastructure takes,
    > but the lookup should be as fast as possible.
    >
    > xs.each{|x|
    > if found = hash[x]
    > #do sth.
    > end
    > }


    As others said already, a Hash is pretty much the fastest for the
    general case. How do your string keys look like? Maybe it is worth
    trying symbols instead of strings?

    If you unveil a bit more about your application we might be able to come
    up with more suggestions.

    Kind regards

    robert
     
    Robert Klemme, Sep 8, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jeff Cope

    Lookups

    Jeff Cope, Aug 26, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    485
    Natty Gur
    Aug 26, 2003
  2. Harvey
    Replies:
    0
    Views:
    720
    Harvey
    Jul 16, 2004
  3. Harvey
    Replies:
    1
    Views:
    856
    Daniel
    Jul 16, 2004
  4. =?Utf-8?B?cm9kY2hhcg==?=

    table lookups

    =?Utf-8?B?cm9kY2hhcg==?=, May 20, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    450
    =?Utf-8?B?cm9kY2hhcg==?=
    Jun 3, 2005
  5. Rob Baxter
    Replies:
    11
    Views:
    665
    bugbear
    Aug 24, 2004
Loading...

Share This Page