btree

G

George Mpouras

I ve done some tests and found out
that hash is faster than 'something' ~~ @array
From theory the btree alogorythm is faster than the hash (even more if the
hash has conflicts)
Is there any way to store my values as btree instead of hash in Perl ?
 
M

Martijn Lievaart

I ve done some tests and found out that hash is faster than
'something' ~~ @array From theory the btree alogorythm is faster than
the hash (even more if the hash has conflicts)
Is there any way to store my values as btree instead of hash in Perl ?

Native? No. But there probably is some module that helps on CPAN.

M4
 
T

Tim Watts

XeCycle said:
Use Berkeley DB, dude.

Never think about such efficiency problems before your data grew
very large.

Yep - exactly how much data does the OP have? Typically, you can have tens
to hundreds of thousands of keys in a perl hash without much of a problem,
unless:

1) Speed is becoming a problem;

2) You are implementing on constrained hardware (eg embedded).

Also, there are various tree modules available on CPAN, but few are likely
to compare to the internal perl hashing unless they are written in C and
have been specifically designed for speed/efficiency.
 
P

Peter J. Holzer

Yep - exactly how much data does the OP have? Typically, you can have
tens to hundreds of thousands of keys in a perl hash without much of a
problem, unless:

1) Speed is becoming a problem;

As long as the hash fits into memory, each operation (insert, lookup,
delete) is almost certainly faster on a hash than on a btree.

A btree becomes faster when we are dealing with *persistent* data and a
relatively small number of lookups and/or changes:

With a hash, you need to load all the data into memory at program start,
and if you make any changens you have to write it back to disk before
the program terminates.

With a btree, you only have to read/write the blocks you need.

hp
 
M

Martijn Lievaart

DB_File is a core module. If you use the $DB_BTREE mode, and undef
instead of a filename, it will give you a Perl-level hash implemented by
an in-memory btree.

Nice! Thanks. Added to the toolbox.

M4
 
R

Rainer Weikusat

[...]
From theory the btree alogorythm is faster than the hash (even more if the
hash has conflicts)

As I already showed in another posting: This is completely
wrong. First, if there are now 'conflicts' in the hash table, hash
lookup is O(1) -- you calculate the hash value, select the proper
table slot and do at most a single key comparison. That's better than
anything which can be achieved with any tree structure. Second, if
there are conflicts, the maximum amount of key comparisons necessary
for a search (successful or unsuccessful) depends on the average
length of the list which needs to be searched. If these lists are
short compared to the height of some search tree used for the same set
of elements, hashed lookups will still be faster. And 'short-ness' of
the lists can be ensured by using a suitably large table.
 
R

Rainer Weikusat

XeCycle said:
Use Berkeley DB, dude.

Never think about such efficiency problems before your data grew
very large.

The absolutely last thing you want in practice is working code which
needs a fundamental change until "yesterday" because the core data
structure used by it turns out to be unsuitable for a real-world
problem it is supposed to deal with.
 
W

Willem

XeCycle wrote:
)> The absolutely last thing you want in practice is working code which
)> needs a fundamental change until "yesterday" because the core data
)> structure used by it turns out to be unsuitable for a real-world
)> problem it is supposed to deal with.
)
) Another way is to use a compatible interface for the new data
) backend. Berkeley DB can be compatible with Perl arrays and
) hashes.
)
) Of course you may not be able to use the new features in this
) way, but when things get really large, the original algorithm may
) not suffice. Therefore a refactor is needed, in case you're
) scaling it too large. A moderate scale up may be able to go
) without those features provided by a more capable data backend.

And then it turns out that the API to the backend is what causes
it to scale badly and you're screwed anyway.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
T

Ted Zlatanov

X> Use Berkeley DB, dude.

cfengine (a configuration management tool I use) switched from Berkeley
DB to Tokyo Cabinet recently, and I had a chance to look at TC. It's a
good C kv database with good performance and has a Perl API. It
compares well to Berkeley DB, especially in capacity and bug density,
so maybe it's useful to others too. Take a look at their site,
http://fallabs.com/tokyocabinet/

They also offer Kyoto Cabinet, a C++ successor to TC, but my experience
with C++ portability and readability has been less pleasant...

Ted
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top