[OT?] SDBM file HUGE on disk

G

Guest

Hi all,

I tied a hash containing ca. 19000 key/value-pairs via SDBM to an
external file. Most of the values are strings less than 20 characters
long, only occasionally there are strings longer than 40...50 characters,
and rolled out as a flat file the 19000-odd lines occupy ca. 650kB.
Reading these data into a hash and tying that to an external file
creates a db file on disk wich is around 16MB large.

Is there any chance to create a smaller file?

Even though this question may be a little off topic I'd appreciate
any hint, thank you,

Oliver.
 
G

Gunnar Hjalmarsson

I tied a hash containing ca. 19000 key/value-pairs via SDBM to an
external file. Most of the values are strings less than 20
characters long, only occasionally there are strings longer than
40...50 characters, and rolled out as a flat file the 19000-odd
lines occupy ca. 650kB. Reading these data into a hash and tying
that to an external file creates a db file on disk wich is around
16MB large.

Is there any chance to create a smaller file?

Try another type of DBM file, e.g. DB_File.

The AnyDBM_File POD includes a comparison table:
http://search.cpan.org/perldoc?AnyDBM_File
 
G

Guest

: The AnyDBM_File POD includes a comparison table:
: http://search.cpan.org/perldoc?AnyDBM_File

Hi Gunnar,

I consulted that table (which I had seen earlier), it says:

odbm ndbm sdbm gdbm bsd-db
---- ---- ---- ---- ------
Database Size ? ? small big? ok[1]

I must confess that I cannot really accept 16MB for 19000 strings
filling 500kB as flat file, as "small". The keys are all six ASCII
characters "long".

I am also afraid that NDBM requires additional software which is
not readily available on my remote outdated Win98 target machine
which, worst of all, has no net access (at least it has ASPerl
in a fairly recent version (5.8.0) installed. So I'll have to live
with SDBM's voracious appetite for disk space.

Oliver.
 
G

Guest

: (e-mail address removed)-berlin.de wrote:

: >
: > : The AnyDBM_File POD includes a comparison table:
: > : http://search.cpan.org/perldoc?AnyDBM_File
: >
: > Hi Gunnar,
: >
: > I consulted that table (which I had seen earlier), it says:
: >
: > odbm ndbm sdbm gdbm bsd-db
: > ---- ---- ---- ---- ------
: > Database Size ? ? small big? ok[1]
: >
: > I must confess that I cannot really accept 16MB for 19000 strings
: > filling 500kB as flat file, as "small". The keys are all six ASCII
: > characters "long".


: I agree, in that in my experience SDMB has ballooned a flat file by
: factors of perhaps 3 to 7 or so, not the factor of 32 you are reporting.
: But SDBM is pretty rudimentary, and who knows what a near worst-case set
: of keys could produce.

Well, it looks as if the answer is hidden here. The keys I use are
strictly consecutive in order, running from "0001.1" to "4973.4" with
the digit after the dot being in a range of [1..4]. Perhaps some scrambling
on my side will produce seemingly "random" (but still unique) keys which
will make a smaller SDBM file? Sounds worth trying.

: You didn't do anything like copy the SDBM file
: in from another OS or an older version of Perl (or another DBM-type
: [shudder]), did you?

No no. I use AS Perl 5.8.0 and a

tie (%mongol, 'SDBM_File', 'mongdb', O_RDWR|O_CREAT, 0666)
or die "Couldn't tie SDBM file 'mongdb': $!; aborting.\n";

statement, and then initially fill the hash %mongol at the first run
with data from a keyed list with approx. 19000 entries.


: There is no guarantee of portability with any
: DBM-type file, even between machines with the same OS and Perl version
: (the admin who set up the system could have configured different
: parameters for two different systems). If the contents are to be

Data porting is not an issue, the plain, flat ASCII list is my transport
format of choice in this particular case.


: One other possibility: Are you using a DBM-type tied hash in
: a CGI application without proper locking? Proper locking is essential.

No, just a plain single-user GUI.


: I don't think you'll find any of the Unix-specific DBM's ported to
: Windows. Gunnar was suggesting DB_File, which is standard (that means
: "comes with") ActiveState Perl in all recent versions. In fact, I think
: maybe it is the default for dbmopen/dbmclose these days, instead of
: SDBM. Try that as he suggested. See:

: perldoc DB_File

Won't work on a AS Perl 5.8.0 WinXP installation!


Thank you a lot for the comments and discussion,

Oliver.
 
J

Joe Smith

: perldoc DB_File

Won't work on a AS Perl 5.8.0 WinXP installation!

Why not? What part does not work on WinXP?

Start -> All Programs -> ActiveState ActivePerl 5.8 -> Perl Package Manager
ppm> install DB_File
====================
Install 'DB_File' version 1.810 in ActivePerl 5.8.0.806.
====================
Downloaded 288942 bytes.
Extracting 10/10: blib/arch/auto/DB_File/DB_File.lib
Installing C:\Perl\site\lib\auto\DB_File\DB_File.bs
Installing C:\Perl\site\lib\auto\DB_File\DB_File.dll
Installing C:\Perl\site\lib\auto\DB_File\DB_File.exp
Installing C:\Perl\site\lib\auto\DB_File\DB_File.lib
Installing C:\Perl\html\site\lib\DB_File.html
Files found in blib\arch: installing files in blib\lib into architecture
dependent library tree
Installing C:\Perl\site\lib\DB_File.pm
Installing C:\Perl\site\lib\auto\DB_File\autosplit.ix
Successfully installed DB_File version 1.810 in ActivePerl 5.8.0.806.
ppm>

-Joe
 
G

Guest

: (e-mail address removed)-berlin.de wrote:

: > : perldoc DB_File
: >
: > Won't work on a AS Perl 5.8.0 WinXP installation!

: Why not? What part does not work on WinXP?

: Start -> All Programs -> ActiveState ActivePerl 5.8 -> Perl Package Manager
: ppm> install DB_File
: ====================
: Install 'DB_File' version 1.810 in ActivePerl 5.8.0.806.
: ====================
: Downloaded 288942 bytes.


Hi Joe,

Thank you for this demonstration which leaves me silent and flushing.
I simply did not find DB_File in my installation (a minute ago), and
since I never ever really develop things on Windows XP (only for my
team mates who just use XP; for myself, I stick with Linux), I didn't
even think of using DB_File because I thought it is just an interface
into a Berkeley db library. Well, I'll tell later which version
(Tie::File vs. DB_File) brings the better performance.

Thank you,

Oliver.
 
G

Guest

(e-mail address removed)-berlin.de wrote:


: Hi Joe,

: Thank you for this demonstration which leaves me silent and flushing.


Yet, I forgot to mention, the target machine has no connection to the
net, and data to be "carried" there (literally) must fit on a floppy.

In the meantime I made Tie::File work with my data set, it's a bit slow,
but it works.

More details later,

Oliver.
 
G

Guest

(e-mail address removed)-berlin.de wrote:

: In the meantime I made Tie::File work with my data set, it's a bit slow,
: but it works.

: More details later,

As promised, more details. I use the Tie::File module to open 10 files,
each with approx. 19000 lines, and usually the response speed is quite
acceptable. Only if my first seek targets record no. 17000 or so, then
the time gets slow, approx. one second per file, so in total 10 seconds.
However all following seek operations take place in "real time". Attaching
the utf8 discipline slows down the operation by about 30 percent, so a
seek without utf8 takes 8 seconds, with utf8 discipline takes 12 seconds.
However, the memory footprint is quite small, and the disk file is just
the 600kB (+/-) so for my purpose (line-based editing of pseudo database)
the Tie::File module is exactly what I need, and no remote installation
(which would take place without my supervision) of any additional software
is necessary as AS Perl 5.8.0 is up and running on the target machine.

Thanks again for all comments and ideas,

Oliver.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top