BTree examples ??

K

Kenjis Kaan

Hello. I am wondering if anyone has an example of using BTree for
storing data on a disk file.?? I found some BTree modules on the
web, but none give example of how you would then store it and retrieve
it from disk drive. I have a need to store hundred of thousands of
key, data values but told not to use a RDBMS. It would be nice if I
can get some sample of that actually shows the write to disk and
subsequent retrieval from it, which then allows you to do search for
records by supplying a key. TIA
 
R

Russ Jones

(e-mail address removed) (Kenjis Kaan) wrote in
Hello. I am wondering if anyone has an example of using BTree for
storing data on a disk file.?? I found some BTree modules on the
web, but none give example of how you would then store it and retrieve
it from disk drive. I have a need to store hundred of thousands of
key, data values but told not to use a RDBMS. It would be nice if I
can get some sample of that actually shows the write to disk and
subsequent retrieval from it, which then allows you to do search for
records by supplying a key. TIA

Using a hash to hold your data and then using the Storable module to save
and retrieve it comes immediately to mind, but that "hundreds of
thousands" of records sounds pretty big.

The venerable "Algorithms + Data Structures = Programs," Niklaus Wirth,
1975, contains (as I recall) some BTree code. It's in Pascal, but it might
be a starting place.
 
M

Malte Ubl

Kenjis said:
Hello. I am wondering if anyone has an example of using BTree for
storing data on a disk file.?? I found some BTree modules on the
web, but none give example of how you would then store it and retrieve
it from disk drive. I have a need to store hundred of thousands of
key, data values but told not to use a RDBMS. It would be nice if I
can get some sample of that actually shows the write to disk and
subsequent retrieval from it, which then allows you to do search for
records by supplying a key. TIA


Wouldn't DBD::SQLite be a solution? This module includes a complete
public domain RDBMS.

http://search.cpan.org/author/MSERGEANT/DBD-SQLite-0.25/lib/DBD/SQLite.pm

You could use it and not tell anybody :)

malte
 
P

Paul Marquess

Kenjis Kaan said:
Hello. I am wondering if anyone has an example of using BTree for
storing data on a disk file.?? I found some BTree modules on the
web, but none give example of how you would then store it and retrieve
it from disk drive. I have a need to store hundred of thousands of
key, data values but told not to use a RDBMS. It would be nice if I
can get some sample of that actually shows the write to disk and
subsequent retrieval from it, which then allows you to do search for
records by supplying a key. TIA

The DB_File module, that comes with Perl, has a btree opion. You can easily
store hundres of thousands of key/data pairs in it.

The example below is derived from the docuemntation that comes with DB_File

use warnings ;
use strict ;
use DB_File ;

my %h ;
tie %h, "DB_File", "tree", O_RDWR|O_CREAT, 0666, $DB_BTREE
or die "Cannot open file 'tree': $!\n" ;

# Add a key/value pair to the file
$h{'Wall'} = 'Larry' ;
$h{'Smith'} = 'John' ;
$h{'mouse'} = 'mickey' ;
$h{'duck'} = 'donald' ;

# Delete
delete $h{"duck"} ;

# Cycle through the keys printing them in order.
# Note it is not necessary to sort the keys as
# the btree will have kept them in order automatically.
foreach (keys %h)
{ print "$_\n" }

untie %h ;

Paul
 
V

Vlad Tepes

Paul Marquess said:
The DB_File module, that comes with Perl, has a btree opion. You can
easily store hundres of thousands of key/data pairs in it.


The example below is derived from the docuemntation that comes with
DB_File

use warnings ;
use strict ;
use DB_File ;

my %h ;
tie %h, "DB_File", "tree", O_RDWR|O_CREAT, 0666, $DB_BTREE
or die "Cannot open file 'tree': $!\n" ;

# Add a key/value pair to the file
$h{'Wall'} = 'Larry' ;
$h{'Smith'} = 'John' ;
$h{'mouse'} = 'mickey' ;
$h{'duck'} = 'donald' ;

# Delete
delete $h{"duck"} ;

# Cycle through the keys printing them in order.
# Note it is not necessary to sort the keys as
# the btree will have kept them in order automatically.
foreach (keys %h)
{ print "$_\n" }

untie %h ;

Paul

( I'm not sure about this, but I think I'm right: )

A small note: If you have a very big hash, you'd might like to save
memory by doing

while ( (my $k, undef) = each %h ) {
print "$k\n";
}

instead of using the foreach loop in the above example.
 
K

Kenjis Kaan

Jörg Westheide said:
Hi!


Have a look at the DB File module


That's no problem (I used it with some millions)
The only backdraw I discovered, is the really bad performance on
Win(2K), it's fine on Linux though

J rg

But Win2k is what I am doing all this on. Using Activestate Perl.
 
K

Kenjis Kaan

There has go to be a way to use BTree.pm (its floating around the
internet) without resorting to the fancy DB_File with all the bells
and whistles. All I want is just ot create a simple file containing
BTree objects so I can then search and replace records at will. I
think DB_File is an overkill unless I can put everything in the
development directory which becomes part of the install package. I
notice the only thing BTree.pm is missing is the delete operation.
Which is really weird, I remember studying this stuffs back in school,
and the books shows you some code with insert etc, but the delete is
left as an exercise. Its beyond me that some guy would write the
BTree.pm but left out the delete operations. I know its a bit harder
to implement, but if you are going to do it, might as well make it
complete so people can use it.
 
M

Mark Jason Dominus

There has go to be a way to use BTree.pm (its floating around the
internet) without resorting to the fancy DB_File with all the bells
and whistles.

Are you talking about the BTree.pm at

http://perl.plover.com/BTree/

? Or have you found some other package? If you want help with a piece
of software, you need to tell people what and where it is or else they
won't know and they can't help you with it.
Its beyond me that some guy would write the BTree.pm but left out the
delete operations. I know its a bit harder to implement, but if you
are going to do it, might as well make it complete so people can use
it.

If you *are* talking about the BTree.pm at
http://perl.plover.com/BTree/, then I can tell you the answer to your
question. The BTree.pm module was written as example code to
accompany a magazine article about how B-Trees work. It was not
intended to be used for real projects, because any such use would be
silly even if the module did contain 'delete'. The author left out
'delete' to keep the code short enough for publication and because the
article didn't deal with 'delete'.

However, if you really think that a complete BTree.pm is what you
need, I expect that for a fee, the module author would be willing to
add those functions to the module, and also extend it to store its
values on the disk instead of in memory. You could spend some of the
money you are saving by not using an RDBMS.
 
P

Paul Marquess

Hi!
I am wondering if anyone has an example of using BTree for
storing data on a disk file.?? I found some BTree modules on the
web, but none give example of how you would then store it and retrieve
it from disk drive.

Have a look at the DB_File module
I have a need to store hundred of thousands of
key, data values

That's no problem (I used it with some millions)
The only backdraw I discovered, is the really bad performance on
Win(2K), it's fine on Linux though

Jörg
 
P

Paul Marquess

Vlad Tepes said:
( I'm not sure about this, but I think I'm right: )

A small note: If you have a very big hash, you'd might like to save
memory by doing

while ( (my $k, undef) = each %h ) {
print "$k\n";
}

instead of using the foreach loop in the above example.

Good catch. You are quite correct. If you have millions of records in the
database, this is the only way to iterate through them without running out
of memory.

Using "keys" of "values" will result in *all* the keys or values being read
into memory in one go. The "each" function will only read a single key/value
pair at a time.

Paul
 
P

Paul Marquess

Jörg Westheide said:
Hi!


Have a look at the DB_File module


That's no problem (I used it with some millions)
The only backdraw I discovered, is the really bad performance on
Win(2K), it's fine on Linux though

Can you elaborate on the performance issues on Win2k?

I don't do any development on Windows myself, and haven't had any reports of
bad performance (as far as I can remember). If there is a performance issue
with Windows, I'd like to fix it (if it is fixable), or document it if it is
not.

cheers
Paul
 
C

ctcgag

Hello. I am wondering if anyone has an example of using BTree for
storing data on a disk file.?? I found some BTree modules on the
web, but none give example of how you would then store it and retrieve
it from disk drive. I have a need to store hundred of thousands of
key, data values but told not to use a RDBMS.

Were you told *why* not to use a RDBMS? If there is a valid reason
to exclude RDBMS, that same reason might exclude other suggestions.

Xho
 
P

Paul Marquess

Jörg Westheide said:
Hi Paul!


Well, we have a program that reads data, stores them in a DB_File BTree
and then reads and prints them (sorted).
With some test data we had times of 40 minutes for a whole run under
Linux. Under Win2K we killed it after 24 hours while it was still
reading and storing the data. Profiling showed that inserting the data
is the problem (and not in our code)


Maybe others don't use it with that much data. We had more than a
million entries in the BTree and the db file grew to more than 1GB. It
seemed to me that the perfomance became more worse the more data was
stored. With only some 10,000 entries there was not a big difference


I would like to see it fixed.
If you need additional information, let me know

Hmm, that certainly is a big difference!

To take this any further, I need to reproduce the behaviour myself, so I
need a few more details on the data you are storing - for example, are all
keys fixed length? Are you using the default sorting algorithm that comes
with Btree? If not, can I see the user-defined sorting sub please? Is the
data in a completely random order or partially sorted?

Ideally, I'd like to have a look at the script you are using, plus a sample
of real data.

Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top