MLDBM tie is very slow

Discussion in 'Perl Misc' started by Rob Z, Oct 26, 2005.

  1. Rob Z

    Rob Z Guest

    Hi all,

    I am working with MLDBM to access a static "database file". (Written
    once, never altered, only read.) The file is ~75MB and is a 4-level
    HOH. i.e. hash of hashes of hashes of hashes. It is running on Linux
    on an 2x CPU XServe with Perl 5.8.

    The trouble is that the tie() command is taking ~10 seconds when first
    connecting to the database file. I would like to shorten this as much
    as possible, I dont need the file read into memory at the beginning, I
    can read in each entry as it is needed later. I would actually like to
    leave as much data out of memory as I can, until it is really needed.
    As far as I can find, the whole file isnt being read into memory
    (memory use is ~50MB for the process after the tie()), but a good
    portion is. My concern is that this file will grow by about 8x over
    the next few months, to 500+MB.

    Anyway, I am looking for alternatives or options for speeding up that
    initial tie() and making the smallest memory commitment up front as
    possible. Any ideas?


    Thanks,
    Rob
    Rob Z, Oct 26, 2005
    #1
    1. Advertising

  2. Rob Z

    Guest

    "Rob Z" <> wrote:
    > Hi all,
    >
    > I am working with MLDBM to access a static "database file". (Written
    > once, never altered, only read.) The file is ~75MB and is a 4-level
    > HOH. i.e. hash of hashes of hashes of hashes. It is running on Linux
    > on an 2x CPU XServe with Perl 5.8.
    >
    > The trouble is that the tie() command is taking ~10 seconds when first
    > connecting to the database file.


    Just saying you use MLDBM is not sufficient. Please provide two pieces of
    runnable code, one that creates a structure similar to what you are working
    with and writes it out, and one that times the opening of that structure.


    > I would like to shorten this as much
    > as possible, I dont need the file read into memory at the beginning, I
    > can read in each entry as it is needed later.


    I could be wrong, but I don't think that this is the nature of MLDBM.

    > I would actually like to
    > leave as much data out of memory as I can, until it is really needed.
    > As far as I can find, the whole file isnt being read into memory
    > (memory use is ~50MB for the process after the tie()),


    This doesn't mean much. It could just mean that the on-disk format of
    MLDBM data is 50% less space-efficient than the in-memory format.

    > but a good
    > portion is. My concern is that this file will grow by about 8x over
    > the next few months, to 500+MB.


    I thought the file never changed?

    > Anyway, I am looking for alternatives or options for speeding up that
    > initial tie()


    How about not doing a tie at all? Store the data in a file using Storable
    directly, retrieve it into a hashref directly with Storable.

    > and making the smallest memory commitment up front as
    > possible. Any ideas?


    Why? If you are ultimately going to end up having it all in memory anyway
    (which I assume you are because you say "up front"), why not just load it
    into memory and get it over with?

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Oct 26, 2005
    #2
    1. Advertising

  3. Rob Z

    Brian Wakem Guest

    Rob Z wrote:

    > Hi all,
    >
    > I am working with MLDBM to access a static "database file". (Written
    > once, never altered, only read.) The file is ~75MB and is a 4-level
    > HOH. i.e. hash of hashes of hashes of hashes. It is running on Linux
    > on an 2x CPU XServe with Perl 5.8.
    >
    > The trouble is that the tie() command is taking ~10 seconds when first
    > connecting to the database file. I would like to shorten this as much
    > as possible, I dont need the file read into memory at the beginning, I
    > can read in each entry as it is needed later. I would actually like to
    > leave as much data out of memory as I can, until it is really needed.
    > As far as I can find, the whole file isnt being read into memory
    > (memory use is ~50MB for the process after the tie()), but a good
    > portion is. My concern is that this file will grow by about 8x over
    > the next few months, to 500+MB.



    You said it will never be altered.


    > Anyway, I am looking for alternatives or options for speeding up that
    > initial tie() and making the smallest memory commitment up front as
    > possible. Any ideas?



    When dealing will large amounts of data you should be thinking RDBMS.


    --
    Brian Wakem
    Email: http://homepage.ntlworld.com/b.wakem/myemail.png
    Brian Wakem, Oct 26, 2005
    #3
  4. Rob Z

    Rob Z Guest

    I apologize, I should have been more specific, since this is what
    everyone is latching on to:

    The file will never be altered once it is written. Over the coming
    months, new files of the exact same name and hierarchical structure
    will be written over the original. The size of those files will become
    increasingly large up to 500+MB.



    As far as why not read the whole thing into memory at the front, there
    are a few reasons, but the easiest to explain is: If a user wants to
    make a query for a single data element, having to wait (eventually up
    to a minute maybe) for a response just because we are reading the
    entire DB into memory is a bit frustrating.

    Good point about memory vs. disk size efficiency though, Xho. I will
    also look into using Storable directly.

    As far as RDBMS, I am trying to avoid it, since it will require
    installation and configuration on many computers I have no control over
    (customer machines, etc.).
    Rob Z, Oct 26, 2005
    #4
  5. "Rob Z" <> wrote in news:1130362985.744801.26250
    @z14g2000cwz.googlegroups.com:

    > As far as RDBMS, I am trying to avoid it, since it will require
    > installation and configuration on many computers I have no control over
    > (customer machines, etc.).


    SQLite?
    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, Oct 26, 2005
    #5
  6. Rob Z

    Guest

    "Rob Z" <> wrote:
    >
    > As far as why not read the whole thing into memory at the front, there
    > are a few reasons, but the easiest to explain is: If a user wants to
    > make a query for a single data element, having to wait (eventually up
    > to a minute maybe) for a response just because we are reading the
    > entire DB into memory is a bit frustrating.


    You could use the program interactively and keep it running between
    queries.

    Anyway, I was pleasantly surprised to discover that I confused MLDBM with
    some other DBM-like thing, and that MLDBM does not keep everything in
    memory. In my tests, I've seen neither slowness nor large memory usage upon
    tying a large pre-existing file. So without seeing the specifics of your
    code/model system, there isn't much more I can say.


    > As far as RDBMS, I am trying to avoid it, since it will require
    > installation and configuration on many computers I have no control over
    > (customer machines, etc.).


    Installing and configuring some of the DBM modules is no walk in the park,
    either.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Oct 27, 2005
    #6
  7. Rob Z wrote:
    > I apologize, I should have been more specific, since this is what
    > everyone is latching on to:
    >
    > The file will never be altered once it is written. Over the coming
    > months, new files of the exact same name and hierarchical structure
    > will be written over the original. The size of those files will become
    > increasingly large up to 500+MB.
    >
    >
    >
    > As far as why not read the whole thing into memory at the front, there
    > are a few reasons, but the easiest to explain is: If a user wants to
    > make a query for a single data element, having to wait (eventually up
    > to a minute maybe) for a response just because we are reading the
    > entire DB into memory is a bit frustrating.
    >
    > Good point about memory vs. disk size efficiency though, Xho. I will
    > also look into using Storable directly.

    maybe you can read the originial file, and transform it in something
    that can load quickly...
    (maybe even various files, adn some kind of index file)
    DBM::Deep is pure-perl and performs well
    SQLite could be of interest also.
    >
    > As far as RDBMS, I am trying to avoid it, since it will require
    > installation and configuration on many computers I have no control over
    > (customer machines, etc.).
    >
    Stephan Titard, Oct 27, 2005
    #7
  8. "Brian Wakem" <> wrote in message
    news:...
    > Rob Z wrote:
    >
    > > Hi all,
    > >
    > > I am working with MLDBM to access a static "database file". (Written
    > > once, never altered, only read.) The file is ~75MB and is a 4-level
    > > HOH. i.e. hash of hashes of hashes of hashes. It is running on Linux
    > > on an 2x CPU XServe with Perl 5.8.
    > >
    > > The trouble is that the tie() command is taking ~10 seconds when first
    > > connecting to the database file. I would like to shorten this as much
    > > as possible, I dont need the file read into memory at the beginning, I
    > > can read in each entry as it is needed later. I would actually like to
    > > leave as much data out of memory as I can, until it is really needed.
    > > As far as I can find, the whole file isnt being read into memory
    > > (memory use is ~50MB for the process after the tie()), but a good
    > > portion is. My concern is that this file will grow by about 8x over
    > > the next few months, to 500+MB.

    >
    >
    > You said it will never be altered.
    >
    >
    > > Anyway, I am looking for alternatives or options for speeding up that
    > > initial tie() and making the smallest memory commitment up front as
    > > possible. Any ideas?

    >
    >
    > When dealing will large amounts of data you should be thinking RDBMS.


    If the application needs the flexibility/infrascructure that an RDBMS
    gives, then yes, go down the route, but the amount of data being processed
    on its own is not a good enough reason to jump ship. I know that DB_File can
    easily handle this amount of data, and I'd imagine that GDBM_File can as
    well. None of the DBM implementations read the complete database into memory
    (unless you have explicitly set it up to do it) - they all use a small
    cache.

    Regarding the performance problem at hand, a 10 second startup time imples
    there is something fundamentally wrong, either with the way the code has
    been written or with the environment it is running under. To be able to help
    we need to see some code.

    cheers
    Paul
    Paul Marquess, Oct 27, 2005
    #8
  9. Rob Z wrote:
    > I apologize, I should have been more specific, since this is what
    > everyone is latching on to:
    >
    > The file will never be altered once it is written. Over the coming
    > months, new files of the exact same name and hierarchical structure
    > will be written over the original. The size of those files will become
    > increasingly large up to 500+MB.


    It would seem that "changing the file" and "replacing the file with one
    which has changed" is a distinction without a difference. It still
    precludes any solution involving leaving a program connected, building a
    fast and fancy index, etc.

    --
    bill davidsen
    SBC/Prodigy Yorktown Heights NY data center
    http://newsgroups.news.prodigy.com
    Bill Davidsen, Oct 28, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Earl Teigrob
    Replies:
    7
    Views:
    414
    Scott M.
    Feb 16, 2004
  2. Alf P. Steinbach /Usenet

    Slow -- VERY slow brain

    Alf P. Steinbach /Usenet, Jun 16, 2011, in forum: C++
    Replies:
    17
    Views:
    492
    Noah Roberts
    Jun 29, 2011
  3. Nick Green
    Replies:
    4
    Views:
    178
    Nick Green
    Nov 18, 2009
  4. Replies:
    0
    Views:
    105
  5. botfood

    tie() with DB_File not tie()ing ?

    botfood, Apr 24, 2006, in forum: Perl Misc
    Replies:
    23
    Views:
    437
    botfood
    Apr 26, 2006
Loading...

Share This Page