design strategy on handling large DB

E

Ela

except in using mysql or related programs, how can perl be used to deal with
large databases? Furthermore, how can I load that large db into cache in
advance and later use another perl to access/search that db?
 
T

Ted Zlatanov

E> except in using mysql or related programs, how can perl be used to deal with
E> large databases?

In order (you may already know some of these):

1) learn SQL and at least some RDBMS design theory (select+join, update,
delete, indexing, foreign keys, normalization, data types, etc.)

2) read the DBI documentation

3) try Rose::DB::Object or Class::DBIx to see if they will work for you
instead of direct DBI queries.

I hope by "large" you mean at most a hundred gigabytes of data (which is
considered a mid-size database by most people). Large databases are
usually in the hundreds of gigabytes - terabytes range and very
different beasts.

E> Furthermore, how can I load that large db into cache in advance and
E> later use another perl to access/search that db?

I don't understand exactly what you want; there are 3 possibilities I
can think of and I'd rather not guess. Can you please explain what you
want to accomplish in more detail?

Ted
 
E

Ela

I hope by "large" you mean at most a hundred gigabytes of data (which is
considered a mid-size database by most people). Large databases are
usually in the hundreds of gigabytes - terabytes range and very
different beasts.

It's only 3GB... but I find when I run the current java program with the
mysql, it uses up all the 2GB memory...
E> Furthermore, how can I load that large db into cache in advance and
E> later use another perl to access/search that db?

I don't understand exactly what you want; there are 3 possibilities I
can think of and I'd rather not guess. Can you please explain what you
want to accomplish in more detail?

Ted

In MySQL, the db is already structured and therefore runtime prog can access
the structued DB. But in perl, I don't know whether I can put the DB flat
file into some data structures in advance, so users can query the DB
whenever they want instead of dynamically create the DB every time.
 
R

RedGrittyBrick

Ela said:
It's only 3GB... but I find when I run the current java program with the
mysql, it uses up all the 2GB memory...

3 is usually greater than 2. Unless the DBMS is very space-inefficient
at storing data and the Java program is very space-efficient at storing
data.


cache usually means memory doesn't it?


There must be something I haven't understood that explains how 3 GB data
can be expected to fit into less than 2GB of memory.
 
X

xhoster

Ela said:
except in using mysql or related programs, how can perl be used to deal
with large databases?

Many ways. Flat files which you write your own access to, flat
files used with some DBD module to access, special formats like DBM::Deep,
etc.
Furthermore, how can I load that large db into
cache in advance

To my mind, a DB that can be loaded into cache is by definition not large.
and later use another perl to access/search that db?

I don't know of any way of reliably doing this that is less annoying
than just running mysql or related programs. Afterall, that is what they
are designed for.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
T

Ted Zlatanov

E> It's only 3GB... but I find when I run the current java program with the
E> mysql, it uses up all the 2GB memory...

You are loading all your data into memory. That's almost certainly the
wrong approach. Do you need all the data? What does the data look
like? What operations are you doing across the data? SQL has almost
every operation covered, and it's rare indeed that you will find it
inadequate for managing structured data.

E> Furthermore, how can I load that large db into cache in advance and
E> later use another perl to access/search that db?
E> In MySQL, the db is already structured and therefore runtime prog can access
E> the structued DB. But in perl, I don't know whether I can put the DB flat
E> file into some data structures in advance, so users can query the DB
E> whenever they want instead of dynamically create the DB every time.

I still don't understand why Perl can't query the database directly, but
you have to answer the questions above before I or anyone else can
suggest the next step without guessing.

Ted
 
E

Ela

Ted Zlatanov said:
You are loading all your data into memory. That's almost certainly the
wrong approach. Do you need all the data? What does the data look
like? What operations are you doing across the data? SQL has almost
every operation covered, and it's rare indeed that you will find it
inadequate for managing structured data.


Do you think the bottleneck according to the following statistics is from
mysqld?

08:56:56 up 11:45, 5 users, load average: 1.00, 1.00, 1.00
84 processes: 82 sleeping, 2 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 47.5% 0.0% 2.6% 0.0% 0.0% 0.0% 49.8%
cpu00 95.1% 0.0% 4.8% 0.0% 0.0% 0.0% 0.0%
cpu01 0.0% 0.0% 0.4% 0.0% 0.0% 0.0% 99.6%
Mem: 2061424k av, 2043432k used, 17992k free, 0k shrd, 17308k
buff
515388k actv, 12k in_d, 37452k in_c
Swap: 2040212k av, 0k used, 2040212k free 1278708k
cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
3825 ela 25 0 20732 20M 3024 R 49.7 1.0 550:05 0 mysqld
3815 ela 24 0 60772 59M 10428 S 0.2 2.9 1:11 0 java
4373 ela 15 0 1044 1044 808 R 0.2 0.0 0:00 1 top
 
X

xhoster

Ela said:
It's only 3GB... but I find when I run the current java program with the
mysql, it uses up all the 2GB memory...

What uses up all 2GB? MySQL or Java? If MySQL, then you have it
poorly configured. If Java, then you must be loading all the data into
memory at one time, either accidentally or on purpose. Bad concepts aren't
going to become good ones just by re-writing in a different language.


Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
P

Peter J. Holzer

What uses up all 2GB? MySQL or Java? If MySQL, then you have it
poorly configured. If Java, then you must be loading all the data into
memory at one time, either accidentally or on purpose.

I don't know about Java, but in Perl this is easy to do accidentally.

if you do something like

my $sth = $dbh->prepare("select * from largetable");
$sth->execute();

while (my $r = $sth->fetchrow_hashref) {
process_one_row($r);
}

You read one row at a time into memory. Right?

Wrong. DBD::mysql by default uses mysql_store_result, so the
$sth->execute will (try to) load the entire table into memory. You need
to tell DBD::mysql to use mysql_use_result instead:

$dbh->{mysql_use_result} = 1;

But then you cannot issue other queries until you have finished reading
the results ...
Bad concepts aren't going to become good ones just by re-writing in a
different language.

Right.

hp
 
S

szr

Peter said:
I don't know about Java, but in Perl this is easy to do accidentally.

if you do something like

my $sth = $dbh->prepare("select * from largetable");
$sth->execute();

while (my $r = $sth->fetchrow_hashref) {
process_one_row($r);
}

You read one row at a time into memory. Right?

Wrong. DBD::mysql by default uses mysql_store_result, so the
$sth->execute will (try to) load the entire table into memory. You
need to tell DBD::mysql to use mysql_use_result instead:

$dbh->{mysql_use_result} = 1;

But then you cannot issue other queries until you have finished
reading the results ...

Couldn't one get around that by just using multiple DBD objects (with
mysql_use_result set) ?
 
P

Peter J. Holzer

Couldn't one get around that by just using multiple DBD objects (with
mysql_use_result set) ?

Yes, that would work. But then you have two sessions and have to be
careful with transactions, locking, etc. All solvable, of course, you
just have to be aware of it.

hp
 
N

nolo contendere

if you do something like

    my $sth = $dbh->prepare("select * from largetable");
    $sth->execute();

    while (my $r = $sth->fetchrow_hashref) {
        process_one_row($r);
    }

You read one row at a time into memory. Right?

Wrong. DBD::mysql by default uses mysql_store_result, so the
$sth->execute will (try to) load the entire table into memory. You need
to tell DBD::mysql to use mysql_use_result instead:

    $dbh->{mysql_use_result} = 1;

But then you cannot issue other queries until you have finished reading
the results ...


This is weird. So what would be the difference between any of the
fetch/select<row> methods and the fetch/select<all> methods for mysql
if the default 'mysql_store_result' were used?
 
N

nolo contendere

This is weird. So what would be the difference between any of the
fetch/select<row> methods and the fetch/select<all> methods for mysql
if the default 'mysql_store_result' were used?

Sorry, my question should be: Why would you choose to use any of the
'row' methods over the 'all' methods if the default were being used?
 
T

Ted Zlatanov

E> Do you think the bottleneck according to the following statistics is from
E> mysqld?

E> 08:56:56 up 11:45, 5 users, load average: 1.00, 1.00, 1.00
E> 84 processes: 82 sleeping, 2 running, 0 zombie, 0 stopped
E> CPU states: cpu user nice system irq softirq iowait idle
E> total 47.5% 0.0% 2.6% 0.0% 0.0% 0.0% 49.8%
E> cpu00 95.1% 0.0% 4.8% 0.0% 0.0% 0.0% 0.0%
E> cpu01 0.0% 0.0% 0.4% 0.0% 0.0% 0.0% 99.6%
E> Mem: 2061424k av, 2043432k used, 17992k free, 0k shrd, 17308k
E> buff
E> 515388k actv, 12k in_d, 37452k in_c
E> Swap: 2040212k av, 0k used, 2040212k free 1278708k
E> cached

E> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
E> 3825 ela 25 0 20732 20M 3024 R 49.7 1.0 550:05 0 mysqld
E> 3815 ela 24 0 60772 59M 10428 S 0.2 2.9 1:11 0 java
E> 4373 ela 15 0 1044 1044 808 R 0.2 0.0 0:00 1 top

I don't mean to be rude, but you haven't answered any of the questions
above, and as I said, no one can help you without understanding what
you're trying to do.

You could post your source code, but that's less useful than some
answers.

For what it's worth, the Java interpreter is taking up just a few MB of
memory in the `top' output you've pasted, and MySQL is not using much
either. Hit `M' in `top' to see the processes sorted by memory usage,
which should be helpful.

Ted
 
C

ccc31807

except in using mysql or related programs, how can perl be used to deal with
large databases? Furthermore, how can I load that large db into cache in
advance and later use another perl to access/search that db?

What kind of user interface do you want? If you have your database,
you already have all you need to access and search the data, i.e.,
SQL. You can certainly do it from the command line, and depending of
the kind of database you may have graphical utilities. Enterprise
Manager for SQL Server, or Toad for Mysql, or even Access (via ODBC)
for Mysql.

I would suggest using Access as a front end for your database is a
good, efficient, easy way for users that are comfortable for using
Access. They can easily do inserts, updates, deletes, and select
queries using query objects and the graphical query builder.

If this doesn't work, you can always build an HTML like network
interface that uses a browser for user interaction. This is commonly
done on the web and isn't difficult to do. You can program dynamic
queries on the fly using user supplied parameters.

The important thing to remember is that the RDBMS already does
everything you want -- no need to reinvent the wheel. All you have to
do is connect it to your chosen interface. Perl can easily translate
between the two.

CC
 
N

nolo contendere

Because DBI is providing an abstraction layer -- it is a "quirk" of
the DBD::mysql driver that the row methods have the same footprint as
the "all" methods.

Ok, didn't know if there was some non-obvious benefit for the mysql
driver to implement it that way...
 
X

xhoster

nolo contendere said:
Sorry, my question should be: Why would you choose to use any of the
'row' methods over the 'all' methods if the default were being used?

For one, because you are using DBI and trying to abstract away the
peculiarities of the particular DBDs, to the extent possible. The behavior
of DBD::mysql is certainly peculiar; you should only bow down to it as
a last resort.

For another, store_result is much more memory efficient, as it seems to
store all the data in a highly compact low level structure. Then fetchrow
parcels it out into memory-inefficient Perl structures one row at a time.
While fetchall stores the whole result set in Perl structures, taking
several times as much memory.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
X

xhoster

Peter J. Holzer said:
I don't know about Java, but in Perl this is easy to do accidentally.

if you do something like

my $sth = $dbh->prepare("select * from largetable");
$sth->execute();

while (my $r = $sth->fetchrow_hashref) {
process_one_row($r);
}

You read one row at a time into memory. Right?

Wrong. DBD::mysql by default uses mysql_store_result, so the
$sth->execute will (try to) load the entire table into memory. You need
to tell DBD::mysql to use mysql_use_result instead:

$dbh->{mysql_use_result} = 1;

But then you cannot issue other queries until you have finished reading
the results ...

I wasn't aware of that last limitation, or at least had forgotten it. I
thought store_result was the default because mysql locks table during the
read and they wanted to make those locks last as briefly as possible, by
reading the all of the data up front. But I guess it is also there to work
around this other limitation you point out.


Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
N

nolo contendere

For one, because you are using DBI and trying to abstract away the
peculiarities of the particular DBDs, to the extent possible.  The behavior
of DBD::mysql is certainly peculiar; you should only bow down to it as
a last resort.

For another, store_result is much more memory efficient, as it seems to
store all the data in a highly compact low level structure.  Then fetchrow
parcels it out into memory-inefficient Perl structures one row at a time.
While fetchall stores the whole result set in Perl structures, taking
several times as much memory.

Ok, that makes sense--just seems like a 'gotcha' to keep in mind when
using DBI with mysql, as Peter pointed out earlier in this thread.
Thx.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top