Question about simple databases in Perl

J

January Weiner

Hello,

I have the following problem:

I have a very simple relation keyword -- number. I need to access it
very quickly for huge data sets.

For now, I was using NDBM, tying a hash with an indexed database. However,
this approach has drawbacks: large sizes of the index files,
platform-dependent index files, hard to store and retrieve additional
information etc. However, for someone like me it was a very easy and
straightforward approach.

I have now tested the sqlite which gives me the power of SQL. However, it
turned out to be roughly 100-500 times slower than NDBM, and the speed here
is of utter importance. First, a code snippet showing how I populated the
database:

my $dbh = DBI->connect('dbi:SQLite:dbname=blah.foo.sql','','');
$sth = $dbh->prepare( 'PRAGMA synchronous = OFF' ) ;
$sth->execute( ) ;
$sth = $dbh->prepare( 'create table t ( name TEXT, cont TEXT )' ) ;
$sth->execute( ) ;

$sth = $dbh->prepare( "insert into t ( name, cont ) values ( ?, ? )" ) ;

while( keys %records ) {
$sth->execute( $_, $records{$_} ) ;
}


And here is a code snippet showing how get the records:

my $dbh = DBI->connect('dbi:SQLite:dbname=blah.foo.sql','','');
my $sth = $dbh->prepare( "select * from t where name=?" ) ;

my $ntests = 10000 ; # number of tests
my $nrec = 700000 ; # number of records ;
# @keys hold all the keys, don't worry where I got it from

while( $ntests > 0 ) {

$i = int(rand($nrec)) ;
$key = $keys[ $i ] ;

# $t0 = [ gettimeofday ] ;
$sth->execute( $key ) ;
$all = $sth->fetchall_arrayref( ) ;
# $dt = tv_interval( $t0, [ gettimeofday ] ) ;

$ntests-- ;

}

Now, my questions are:

1) I don't know much about SQL or sqlite, maybe NDBM is expected to be so
much faster?
2) if not -- what do I do wrong? Should I ask in an SQL newsgroup for help
on optimizing my queries / database structure?

3) what other means could I use to speed up the access to these simple
records?

Best regards,

January


--
 
P

Paul Lalli

while( keys %records ) {
$sth->execute( $_, $records{$_} ) ;
}

There's no way that works. It simply repeatedly tests to see if there
are any elements in the hash, and if so, executes that SQL. It does
not assign any elements to $_, and seeing as nothing in the block
changes %records, it's an infinite loop.

Paul Lalli
 
X

xhoster

January Weiner said:
Hello,

I have the following problem:

I have a very simple relation keyword -- number. I need to access it
very quickly for huge data sets.

For now, I was using NDBM, tying a hash with an indexed database.
However, this approach has drawbacks: large sizes of the index files,
platform-dependent index files, hard to store and retrieve additional
information etc. However, for someone like me it was a very easy and
straightforward approach.

I have now tested the sqlite which gives me the power of SQL. However,
it turned out to be roughly 100-500 times slower than NDBM, and the speed
here is of utter importance. First, a code snippet showing how I
populated the database:

my $dbh = DBI->connect('dbi:SQLite:dbname=blah.foo.sql','','');
$sth = $dbh->prepare( 'PRAGMA synchronous = OFF' ) ;
$sth->execute( ) ;
$sth = $dbh->prepare( 'create table t ( name TEXT, cont TEXT )' ) ; .....
my $sth = $dbh->prepare( "select * from t where name=?" ) ;


Now, my questions are:

1) I don't know much about SQL or sqlite, maybe NDBM is expected to be so
much faster?
2) if not -- what do I do wrong? Should I ask in an SQL newsgroup for
help on optimizing my queries / database structure?

Other than the infinite loop mentions by someone else, you didn't build an
index on the "name" column or table "t". If you don't build an index,
then the program will have to scan the whole dataset to find what you
are looking for.

$dbh->do('create index asdfasdf on t(name)');

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
J

January Weiner

Paul Lalli said:
There's no way that works. It simply repeatedly tests to see if there
are any elements in the hash, and if so, executes that SQL. It does
not assign any elements to $_, and seeing as nothing in the block
changes %records, it's an infinite loop.

Whoups, my error while simplifying the code snippet. It should have been

for( keys %... ) {


}

j.

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top