design strategy on handling large DB

Ela · Apr 11, 2008

except in using mysql or related programs, how can perl be used to deal with
large databases? Furthermore, how can I load that large db into cache in
advance and later use another perl to access/search that db?

Ted Zlatanov · Apr 11, 2008

E> except in using mysql or related programs, how can perl be used to deal with
E> large databases?

In order (you may already know some of these):

1) learn SQL and at least some RDBMS design theory (select+join, update,
delete, indexing, foreign keys, normalization, data types, etc.)

2) read the DBI documentation

3) try Rose:

B::Object or Class:

BIx to see if they will work for you
instead of direct DBI queries.

I hope by "large" you mean at most a hundred gigabytes of data (which is
considered a mid-size database by most people). Large databases are
usually in the hundreds of gigabytes - terabytes range and very
different beasts.

E> Furthermore, how can I load that large db into cache in advance and
E> later use another perl to access/search that db?

I don't understand exactly what you want; there are 3 possibilities I
can think of and I'd rather not guess. Can you please explain what you
want to accomplish in more detail?

Ted

Ela · Apr 11, 2008

I hope by "large" you mean at most a hundred gigabytes of data (which is

considered a mid-size database by most people). Large databases are
usually in the hundreds of gigabytes - terabytes range and very
different beasts.

It's only 3GB... but I find when I run the current java program with the
mysql, it uses up all the 2GB memory...

E> Furthermore, how can I load that large db into cache in advance and
E> later use another perl to access/search that db?

I don't understand exactly what you want; there are 3 possibilities I
can think of and I'd rather not guess. Can you please explain what you
want to accomplish in more detail?

Ted

In MySQL, the db is already structured and therefore runtime prog can access
the structued DB. But in perl, I don't know whether I can put the DB flat
file into some data structures in advance, so users can query the DB
whenever they want instead of dynamically create the DB every time.

RedGrittyBrick · Apr 11, 2008

Ela said:
It's only 3GB... but I find when I run the current java program with the
mysql, it uses up all the 2GB memory...

3 is usually greater than 2. Unless the DBMS is very space-inefficient
at storing data and the Java program is very space-efficient at storing
data.

cache usually means memory doesn't it?

There must be something I haven't understood that explains how 3 GB data
can be expected to fit into less than 2GB of memory.

xhoster · Apr 11, 2008

Ela said:
except in using mysql or related programs, how can perl be used to deal
with large databases?

Many ways. Flat files which you write your own access to, flat
files used with some DBD module to access, special formats like DBM:

eep,
etc.

Furthermore, how can I load that large db into
cache in advance

To my mind, a DB that can be loaded into cache is by definition not large.

and later use another perl to access/search that db?

I don't know of any way of reliably doing this that is less annoying
than just running mysql or related programs. Afterall, that is what they
are designed for.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Ted Zlatanov · Apr 11, 2008

E> It's only 3GB... but I find when I run the current java program with the
E> mysql, it uses up all the 2GB memory...

You are loading all your data into memory. That's almost certainly the
wrong approach. Do you need all the data? What does the data look
like? What operations are you doing across the data? SQL has almost
every operation covered, and it's rare indeed that you will find it
inadequate for managing structured data.

E> Furthermore, how can I load that large db into cache in advance and
E> later use another perl to access/search that db?
E> In MySQL, the db is already structured and therefore runtime prog can access
E> the structued DB. But in perl, I don't know whether I can put the DB flat
E> file into some data structures in advance, so users can query the DB
E> whenever they want instead of dynamically create the DB every time.

I still don't understand why Perl can't query the database directly, but
you have to answer the questions above before I or anyone else can
suggest the next step without guessing.

Ted

Ela · Apr 12, 2008

Ted Zlatanov said:
You are loading all your data into memory. That's almost certainly the
wrong approach. Do you need all the data? What does the data look
like? What operations are you doing across the data? SQL has almost
every operation covered, and it's rare indeed that you will find it
inadequate for managing structured data.

Do you think the bottleneck according to the following statistics is from
mysqld?

08:56:56 up 11:45, 5 users, load average: 1.00, 1.00, 1.00
84 processes: 82 sleeping, 2 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 47.5% 0.0% 2.6% 0.0% 0.0% 0.0% 49.8%
cpu00 95.1% 0.0% 4.8% 0.0% 0.0% 0.0% 0.0%
cpu01 0.0% 0.0% 0.4% 0.0% 0.0% 0.0% 99.6%
Mem: 2061424k av, 2043432k used, 17992k free, 0k shrd, 17308k
buff
515388k actv, 12k in_d, 37452k in_c
Swap: 2040212k av, 0k used, 2040212k free 1278708k
cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
3825 ela 25 0 20732 20M 3024 R 49.7 1.0 550:05 0 mysqld
3815 ela 24 0 60772 59M 10428 S 0.2 2.9 1:11 0 java
4373 ela 15 0 1044 1044 808 R 0.2 0.0 0:00 1 top

Peter J. Holzer · Apr 12, 2008

Do you think the bottleneck according to the following statistics is from
mysqld?

[top output snipped]

The programmer.

hp

xhoster · Apr 13, 2008

Ela said:
It's only 3GB... but I find when I run the current java program with the
mysql, it uses up all the 2GB memory...

What uses up all 2GB? MySQL or Java? If MySQL, then you have it
poorly configured. If Java, then you must be loading all the data into
memory at one time, either accidentally or on purpose. Bad concepts aren't
going to become good ones just by re-writing in a different language.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Peter J. Holzer · Apr 13, 2008

What uses up all 2GB? MySQL or Java? If MySQL, then you have it
poorly configured. If Java, then you must be loading all the data into
memory at one time, either accidentally or on purpose.

I don't know about Java, but in Perl this is easy to do accidentally.

if you do something like

my $sth = $dbh->prepare("select * from largetable");
$sth->execute();

while (my $r = $sth->fetchrow_hashref) {
process_one_row($r);
}

You read one row at a time into memory. Right?

Wrong. DBD::mysql by default uses mysql_store_result, so the
$sth->execute will (try to) load the entire table into memory. You need
to tell DBD::mysql to use mysql_use_result instead:

$dbh->{mysql_use_result} = 1;

But then you cannot issue other queries until you have finished reading
the results ...

Bad concepts aren't going to become good ones just by re-writing in a
different language.

Right.

hp

szr · Apr 13, 2008

Peter said:
I don't know about Java, but in Perl this is easy to do accidentally.

if you do something like

my $sth = $dbh->prepare("select * from largetable");
$sth->execute();

while (my $r = $sth->fetchrow_hashref) {
process_one_row($r);
}

You read one row at a time into memory. Right?

Wrong. DBD::mysql by default uses mysql_store_result, so the
$sth->execute will (try to) load the entire table into memory. You
need to tell DBD::mysql to use mysql_use_result instead:

$dbh->{mysql_use_result} = 1;

But then you cannot issue other queries until you have finished
reading the results ...

Couldn't one get around that by just using multiple DBD objects (with
mysql_use_result set) ?

Peter J. Holzer · Apr 13, 2008

Couldn't one get around that by just using multiple DBD objects (with
mysql_use_result set) ?

Yes, that would work. But then you have two sessions and have to be
careful with transactions, locking, etc. All solvable, of course, you
just have to be aware of it.

hp

nolo contendere · Apr 14, 2008

if you do something like

my $sth = $dbh->prepare("select * from largetable");
$sth->execute();

while (my $r = $sth->fetchrow_hashref) {
process_one_row($r);
}

You read one row at a time into memory. Right?

Wrong. DBD::mysql by default uses mysql_store_result, so the
$sth->execute will (try to) load the entire table into memory. You need
to tell DBD::mysql to use mysql_use_result instead:

$dbh->{mysql_use_result} = 1;

But then you cannot issue other queries until you have finished reading
the results ...

This is weird. So what would be the difference between any of the
fetch/select<row> methods and the fetch/select<all> methods for mysql
if the default 'mysql_store_result' were used?

nolo contendere · Apr 14, 2008

This is weird. So what would be the difference between any of the
fetch/select<row> methods and the fetch/select<all> methods for mysql
if the default 'mysql_store_result' were used?

Sorry, my question should be: Why would you choose to use any of the
'row' methods over the 'all' methods if the default were being used?

Ted Zlatanov · Apr 14, 2008

E> Do you think the bottleneck according to the following statistics is from
E> mysqld?

E> 08:56:56 up 11:45, 5 users, load average: 1.00, 1.00, 1.00
E> 84 processes: 82 sleeping, 2 running, 0 zombie, 0 stopped
E> CPU states: cpu user nice system irq softirq iowait idle
E> total 47.5% 0.0% 2.6% 0.0% 0.0% 0.0% 49.8%
E> cpu00 95.1% 0.0% 4.8% 0.0% 0.0% 0.0% 0.0%
E> cpu01 0.0% 0.0% 0.4% 0.0% 0.0% 0.0% 99.6%
E> Mem: 2061424k av, 2043432k used, 17992k free, 0k shrd, 17308k
E> buff
E> 515388k actv, 12k in_d, 37452k in_c
E> Swap: 2040212k av, 0k used, 2040212k free 1278708k
E> cached

E> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
E> 3825 ela 25 0 20732 20M 3024 R 49.7 1.0 550:05 0 mysqld
E> 3815 ela 24 0 60772 59M 10428 S 0.2 2.9 1:11 0 java
E> 4373 ela 15 0 1044 1044 808 R 0.2 0.0 0:00 1 top

I don't mean to be rude, but you haven't answered any of the questions
above, and as I said, no one can help you without understanding what
you're trying to do.

You could post your source code, but that's less useful than some
answers.

For what it's worth, the Java interpreter is taking up just a few MB of
memory in the `top' output you've pasted, and MySQL is not using much
either. Hit `M' in `top' to see the processes sorted by memory usage,
which should be helpful.

Ted

ccc31807 · Apr 14, 2008

except in using mysql or related programs, how can perl be used to deal with
large databases? Furthermore, how can I load that large db into cache in
advance and later use another perl to access/search that db?

What kind of user interface do you want? If you have your database,
you already have all you need to access and search the data, i.e.,
SQL. You can certainly do it from the command line, and depending of
the kind of database you may have graphical utilities. Enterprise
Manager for SQL Server, or Toad for Mysql, or even Access (via ODBC)
for Mysql.

I would suggest using Access as a front end for your database is a
good, efficient, easy way for users that are comfortable for using
Access. They can easily do inserts, updates, deletes, and select
queries using query objects and the graphical query builder.

If this doesn't work, you can always build an HTML like network
interface that uses a browser for user interaction. This is commonly
done on the web and isn't difficult to do. You can program dynamic
queries on the fly using user supplied parameters.

The important thing to remember is that the RDBMS already does
everything you want -- no need to reinvent the wheel. All you have to
do is connect it to your chosen interface. Perl can easily translate
between the two.

CC

nolo contendere · Apr 14, 2008

Because DBI is providing an abstraction layer -- it is a "quirk" of
the DBD::mysql driver that the row methods have the same footprint as
the "all" methods.

Ok, didn't know if there was some non-obvious benefit for the mysql
driver to implement it that way...

xhoster · Apr 14, 2008

nolo contendere said:
Sorry, my question should be: Why would you choose to use any of the
'row' methods over the 'all' methods if the default were being used?

For one, because you are using DBI and trying to abstract away the
peculiarities of the particular DBDs, to the extent possible. The behavior
of DBD::mysql is certainly peculiar; you should only bow down to it as
a last resort.

For another, store_result is much more memory efficient, as it seems to
store all the data in a highly compact low level structure. Then fetchrow
parcels it out into memory-inefficient Perl structures one row at a time.
While fetchall stores the whole result set in Perl structures, taking
several times as much memory.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

xhoster · Apr 14, 2008

Peter J. Holzer said:
I don't know about Java, but in Perl this is easy to do accidentally.

if you do something like

my $sth = $dbh->prepare("select * from largetable");
$sth->execute();

while (my $r = $sth->fetchrow_hashref) {
process_one_row($r);
}

You read one row at a time into memory. Right?

Wrong. DBD::mysql by default uses mysql_store_result, so the
$sth->execute will (try to) load the entire table into memory. You need
to tell DBD::mysql to use mysql_use_result instead:

$dbh->{mysql_use_result} = 1;

But then you cannot issue other queries until you have finished reading
the results ...

I wasn't aware of that last limitation, or at least had forgotten it. I
thought store_result was the default because mysql locks table during the
read and they wanted to make those locks last as briefly as possible, by
reading the all of the data up front. But I guess it is also there to work
around this other limitation you point out.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

nolo contendere · Apr 14, 2008

For one, because you are using DBI and trying to abstract away the
peculiarities of the particular DBDs, to the extent possible. The behavior
of DBD::mysql is certainly peculiar; you should only bow down to it as
a last resort.

For another, store_result is much more memory efficient, as it seems to
store all the data in a highly compact low level structure. Then fetchrow
parcels it out into memory-inefficient Perl structures one row at a time.
While fetchall stores the whole result set in Perl structures, taking
several times as much memory.

Ok, that makes sense--just seems like a 'gotcha' to keep in mind when
using DBI with mysql, as Peter pointed out earlier in this thread.
Thx.

I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023
Large array	0	Jan 6, 2014
Complex testbench design strategy	15	Nov 16, 2008
Question About Design Strategy	3	Sep 27, 2007
Exception Handling	33	Mar 11, 2012
jQuery Overloading Strategy -- What Not To Do	25	Sep 10, 2011
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022

design strategy on handling large DB

Ela

Ted Zlatanov

Ela

RedGrittyBrick

xhoster

Ted Zlatanov

Ela

Peter J. Holzer

xhoster

Peter J. Holzer

szr

Peter J. Holzer

nolo contendere

nolo contendere

Ted Zlatanov

ccc31807

nolo contendere

xhoster

xhoster

nolo contendere

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads