MLDBM tie is very slow

Rob Z · Oct 26, 2005

Hi all,

I am working with MLDBM to access a static "database file". (Written
once, never altered, only read.) The file is ~75MB and is a 4-level
HOH. i.e. hash of hashes of hashes of hashes. It is running on Linux
on an 2x CPU XServe with Perl 5.8.

The trouble is that the tie() command is taking ~10 seconds when first
connecting to the database file. I would like to shorten this as much
as possible, I dont need the file read into memory at the beginning, I
can read in each entry as it is needed later. I would actually like to
leave as much data out of memory as I can, until it is really needed.
As far as I can find, the whole file isnt being read into memory
(memory use is ~50MB for the process after the tie()), but a good
portion is. My concern is that this file will grow by about 8x over
the next few months, to 500+MB.

Anyway, I am looking for alternatives or options for speeding up that
initial tie() and making the smallest memory commitment up front as
possible. Any ideas?

Thanks,
Rob

xhoster · Oct 26, 2005

Rob Z said:
Hi all,

I am working with MLDBM to access a static "database file". (Written
once, never altered, only read.) The file is ~75MB and is a 4-level
HOH. i.e. hash of hashes of hashes of hashes. It is running on Linux
on an 2x CPU XServe with Perl 5.8.

The trouble is that the tie() command is taking ~10 seconds when first
connecting to the database file.

Just saying you use MLDBM is not sufficient. Please provide two pieces of
runnable code, one that creates a structure similar to what you are working
with and writes it out, and one that times the opening of that structure.

I would like to shorten this as much
as possible, I dont need the file read into memory at the beginning, I
can read in each entry as it is needed later.

I could be wrong, but I don't think that this is the nature of MLDBM.

I would actually like to
leave as much data out of memory as I can, until it is really needed.
As far as I can find, the whole file isnt being read into memory
(memory use is ~50MB for the process after the tie()),

This doesn't mean much. It could just mean that the on-disk format of
MLDBM data is 50% less space-efficient than the in-memory format.

but a good
portion is. My concern is that this file will grow by about 8x over
the next few months, to 500+MB.

I thought the file never changed?

Anyway, I am looking for alternatives or options for speeding up that
initial tie()

How about not doing a tie at all? Store the data in a file using Storable
directly, retrieve it into a hashref directly with Storable.

and making the smallest memory commitment up front as
possible. Any ideas?

Why? If you are ultimately going to end up having it all in memory anyway
(which I assume you are because you say "up front"), why not just load it
into memory and get it over with?

Xho

Brian Wakem · Oct 26, 2005

Rob said:
Hi all,

I am working with MLDBM to access a static "database file". (Written
once, never altered, only read.) The file is ~75MB and is a 4-level
HOH. i.e. hash of hashes of hashes of hashes. It is running on Linux
on an 2x CPU XServe with Perl 5.8.

The trouble is that the tie() command is taking ~10 seconds when first
connecting to the database file. I would like to shorten this as much
as possible, I dont need the file read into memory at the beginning, I
can read in each entry as it is needed later. I would actually like to
leave as much data out of memory as I can, until it is really needed.
As far as I can find, the whole file isnt being read into memory
(memory use is ~50MB for the process after the tie()), but a good
portion is. My concern is that this file will grow by about 8x over
the next few months, to 500+MB.

You said it will never be altered.

Anyway, I am looking for alternatives or options for speeding up that
initial tie() and making the smallest memory commitment up front as
possible. Any ideas?

When dealing will large amounts of data you should be thinking RDBMS.

Rob Z · Oct 26, 2005

I apologize, I should have been more specific, since this is what
everyone is latching on to:

The file will never be altered once it is written. Over the coming
months, new files of the exact same name and hierarchical structure
will be written over the original. The size of those files will become
increasingly large up to 500+MB.

As far as why not read the whole thing into memory at the front, there
are a few reasons, but the easiest to explain is: If a user wants to
make a query for a single data element, having to wait (eventually up
to a minute maybe) for a response just because we are reading the
entire DB into memory is a bit frustrating.

Good point about memory vs. disk size efficiency though, Xho. I will
also look into using Storable directly.

As far as RDBMS, I am trying to avoid it, since it will require
installation and configuration on many computers I have no control over
(customer machines, etc.).

A. Sinan Unur · Oct 26, 2005

As far as RDBMS, I am trying to avoid it, since it will require
installation and configuration on many computers I have no control over
(customer machines, etc.).

SQLite?

xhoster · Oct 27, 2005

Rob Z said:
As far as why not read the whole thing into memory at the front, there
are a few reasons, but the easiest to explain is: If a user wants to
make a query for a single data element, having to wait (eventually up
to a minute maybe) for a response just because we are reading the
entire DB into memory is a bit frustrating.

You could use the program interactively and keep it running between
queries.

Anyway, I was pleasantly surprised to discover that I confused MLDBM with
some other DBM-like thing, and that MLDBM does not keep everything in
memory. In my tests, I've seen neither slowness nor large memory usage upon
tying a large pre-existing file. So without seeing the specifics of your
code/model system, there isn't much more I can say.

As far as RDBMS, I am trying to avoid it, since it will require
installation and configuration on many computers I have no control over
(customer machines, etc.).

Installing and configuring some of the DBM modules is no walk in the park,
either.

Xho

Stephan Titard · Oct 27, 2005

Rob said:
I apologize, I should have been more specific, since this is what
everyone is latching on to:

The file will never be altered once it is written. Over the coming
months, new files of the exact same name and hierarchical structure
will be written over the original. The size of those files will become
increasingly large up to 500+MB.

As far as why not read the whole thing into memory at the front, there
are a few reasons, but the easiest to explain is: If a user wants to
make a query for a single data element, having to wait (eventually up
to a minute maybe) for a response just because we are reading the
entire DB into memory is a bit frustrating.

Good point about memory vs. disk size efficiency though, Xho. I will
also look into using Storable directly.

maybe you can read the originial file, and transform it in something
that can load quickly...
(maybe even various files, adn some kind of index file)
DBM:

eep is pure-perl and performs well
SQLite could be of interest also.

Paul Marquess · Oct 27, 2005

Brian Wakem said:
You said it will never be altered.

When dealing will large amounts of data you should be thinking RDBMS.

If the application needs the flexibility/infrascructure that an RDBMS
gives, then yes, go down the route, but the amount of data being processed
on its own is not a good enough reason to jump ship. I know that DB_File can
easily handle this amount of data, and I'd imagine that GDBM_File can as
well. None of the DBM implementations read the complete database into memory
(unless you have explicitly set it up to do it) - they all use a small
cache.

Regarding the performance problem at hand, a 10 second startup time imples
there is something fundamentally wrong, either with the way the code has
been written or with the environment it is running under. To be able to help
we need to see some code.

cheers
Paul

Bill Davidsen · Oct 28, 2005

Rob said:
I apologize, I should have been more specific, since this is what
everyone is latching on to:

The file will never be altered once it is written. Over the coming
months, new files of the exact same name and hierarchical structure
will be written over the original. The size of those files will become
increasingly large up to 500+MB.

It would seem that "changing the file" and "replacing the file with one
which has changed" is a distinction without a difference. It still
precludes any solution involving leaving a program connected, building a
fast and fancy index, etc.

Problems setting value in MLDBM created database	3	May 10, 2007
2 issues with "tie"	2	Aug 27, 2007
DB_File tie error	7	Jan 11, 2007
Dumping Perl tie hash file	6	Mar 9, 2007
'Needless flexibilities' and structured records [very long]	10	Mar 15, 2013
tie() with DB_File not tie()ing ?	23	Apr 24, 2006
Very Sluggish Code	4	Jun 11, 2012
Tie and Lexical Filehandles or IO::File?	2	Aug 28, 2006

MLDBM tie is very slow

Rob Z

xhoster

Brian Wakem

Rob Z

A. Sinan Unur

xhoster

Stephan Titard

Paul Marquess

Bill Davidsen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads