btree

Discussion in 'Perl Misc' started by George Mpouras, May 11, 2012.

  1. I ve done some tests and found out
    that hash is faster than 'something' ~~ @array
    From theory the btree alogorythm is faster than the hash (even more if the
    hash has conflicts)
    Is there any way to store my values as btree instead of hash in Perl ?
    George Mpouras, May 11, 2012
    #1
    1. Advertising

  2. On Fri, 11 May 2012 13:45:45 +0300, George Mpouras wrote:

    > I ve done some tests and found out that hash is faster than
    > 'something' ~~ @array From theory the btree alogorythm is faster than
    > the hash (even more if the hash has conflicts)
    > Is there any way to store my values as btree instead of hash in Perl ?


    Native? No. But there probably is some module that helps on CPAN.

    M4
    Martijn Lievaart, May 14, 2012
    #2
    1. Advertising

  3. George Mpouras

    Tim Watts Guest

    XeCycle wrote:

    > "George Mpouras" <> writes:
    >
    >> I ve done some tests and found out
    >> that hash is faster than 'something' ~~ @array
    >> From theory the btree alogorythm is faster than the hash (even more if
    >> the hash has conflicts)
    >> Is there any way to store my values as btree instead of hash in Perl ?

    >
    > Use Berkeley DB, dude.
    >
    > Never think about such efficiency problems before your data grew
    > very large.
    >


    Yep - exactly how much data does the OP have? Typically, you can have tens
    to hundreds of thousands of keys in a perl hash without much of a problem,
    unless:

    1) Speed is becoming a problem;

    2) You are implementing on constrained hardware (eg embedded).

    Also, there are various tree modules available on CPAN, but few are likely
    to compare to the internal perl hashing unless they are written in C and
    have been specifically designed for speed/efficiency.
    --
    Tim Watts
    Tim Watts, May 14, 2012
    #3
  4. On 2012-05-14 07:33, Tim Watts <> wrote:
    > XeCycle wrote:
    >> "George Mpouras" <> writes:
    >>> Is there any way to store my values as btree instead of hash in Perl ?

    >>
    >> Use Berkeley DB, dude.
    >>
    >> Never think about such efficiency problems before your data grew
    >> very large.
    >>

    >
    > Yep - exactly how much data does the OP have? Typically, you can have
    > tens to hundreds of thousands of keys in a perl hash without much of a
    > problem, unless:
    >
    > 1) Speed is becoming a problem;


    As long as the hash fits into memory, each operation (insert, lookup,
    delete) is almost certainly faster on a hash than on a btree.

    A btree becomes faster when we are dealing with *persistent* data and a
    relatively small number of lookups and/or changes:

    With a hash, you need to load all the data into memory at program start,
    and if you make any changens you have to write it back to disk before
    the program terminates.

    With a btree, you only have to read/write the blocks you need.

    hp


    --
    _ | Peter J. Holzer | Deprecating human carelessness and
    |_|_) | Sysadmin WSR | ignorance has no successful track record.
    | | | |
    __/ | http://www.hjp.at/ | -- Bill Code on
    Peter J. Holzer, May 14, 2012
    #4
  5. On Mon, 14 May 2012 09:12:46 +0100, Ben Morrow wrote:

    > DB_File is a core module. If you use the $DB_BTREE mode, and undef
    > instead of a filename, it will give you a Perl-level hash implemented by
    > an in-memory btree.


    Nice! Thanks. Added to the toolbox.

    M4
    Martijn Lievaart, May 14, 2012
    #5
  6. "George Mpouras" <> writes:

    [...]

    > From theory the btree alogorythm is faster than the hash (even more if the
    > hash has conflicts)


    As I already showed in another posting: This is completely
    wrong. First, if there are now 'conflicts' in the hash table, hash
    lookup is O(1) -- you calculate the hash value, select the proper
    table slot and do at most a single key comparison. That's better than
    anything which can be achieved with any tree structure. Second, if
    there are conflicts, the maximum amount of key comparisons necessary
    for a search (successful or unsuccessful) depends on the average
    length of the list which needs to be searched. If these lists are
    short compared to the height of some search tree used for the same set
    of elements, hashed lookups will still be faster. And 'short-ness' of
    the lists can be ensured by using a suitably large table.
    Rainer Weikusat, May 14, 2012
    #6
  7. XeCycle <> writes:
    > "George Mpouras" <> writes:
    >
    >> I ve done some tests and found out
    >> that hash is faster than 'something' ~~ @array
    >> From theory the btree alogorythm is faster than the hash (even more if the
    >> hash has conflicts)
    >> Is there any way to store my values as btree instead of hash in Perl ?

    >
    > Use Berkeley DB, dude.
    >
    > Never think about such efficiency problems before your data grew
    > very large.


    The absolutely last thing you want in practice is working code which
    needs a fundamental change until "yesterday" because the core data
    structure used by it turns out to be unsuitable for a real-world
    problem it is supposed to deal with.
    Rainer Weikusat, May 14, 2012
    #7
  8. George Mpouras

    Willem Guest

    XeCycle wrote:
    ) Rainer Weikusat <> writes:
    )> The absolutely last thing you want in practice is working code which
    )> needs a fundamental change until "yesterday" because the core data
    )> structure used by it turns out to be unsuitable for a real-world
    )> problem it is supposed to deal with.
    )
    ) Another way is to use a compatible interface for the new data
    ) backend. Berkeley DB can be compatible with Perl arrays and
    ) hashes.
    )
    ) Of course you may not be able to use the new features in this
    ) way, but when things get really large, the original algorithm may
    ) not suffice. Therefore a refactor is needed, in case you're
    ) scaling it too large. A moderate scale up may be able to go
    ) without those features provided by a more capable data backend.

    And then it turns out that the API to the backend is what causes
    it to scale badly and you're screwed anyway.


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
    Willem, May 14, 2012
    #8
  9. George Mpouras

    Ted Zlatanov Guest

    On Mon, 14 May 2012 14:46:05 +0800 XeCycle <> wrote:

    X> Use Berkeley DB, dude.

    cfengine (a configuration management tool I use) switched from Berkeley
    DB to Tokyo Cabinet recently, and I had a chance to look at TC. It's a
    good C kv database with good performance and has a Perl API. It
    compares well to Berkeley DB, especially in capacity and bug density,
    so maybe it's useful to others too. Take a look at their site,
    http://fallabs.com/tokyocabinet/

    They also offer Kyoto Cabinet, a C++ successor to TC, but my experience
    with C++ portability and readability has been less pleasant...

    Ted
    Ted Zlatanov, May 16, 2012
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Nobody
    Replies:
    5
    Views:
    770
    Claudio Puviani
    Feb 9, 2004
  2. Nobody
    Replies:
    2
    Views:
    789
    John Harrison
    Feb 10, 2004
  3. Eloff

    Hashtable or Btree?

    Eloff, Dec 23, 2004, in forum: C++
    Replies:
    4
    Views:
    1,136
    Martin Stettner
    Dec 26, 2004
  4. Spam Me Please

    BTree beginner

    Spam Me Please, Mar 19, 2005, in forum: C++
    Replies:
    4
    Views:
    3,365
    Karl Heinz Buchegger
    Mar 21, 2005
  5. os2

    btree

    os2, Oct 25, 2003, in forum: C Programming
    Replies:
    4
    Views:
    500
    Ravi Uday
    Oct 27, 2003
Loading...

Share This Page