How to make the each looping concurrent thread to improve WHILE-loopperformance?

W

www

Hi,

I have a while-loop which loops 360 times. Each looping takes 100ms, so
in total it takes 36 seconds, which is very long.


while(true) //looping 360 times
{
....//code for preparation of the method calling in the end

doIt(); //this method takes time. It inserts data into database
}


Right now, the flow is:

first looping -> second looping -> ..... -> 360th looping

I am wondering if I can make the loopings more or less concurrent so no
need for next looping to wait for the previous looping ends:

first looping ->
second looping ->
....
360th looping ->

Could you please give me some help? Thank you.
 
P

Patricia Shanahan

www said:
Hi,

I have a while-loop which loops 360 times. Each looping takes 100ms, so
in total it takes 36 seconds, which is very long.


while(true) //looping 360 times
{
....//code for preparation of the method calling in the end

doIt(); //this method takes time. It inserts data into database
}


Right now, the flow is:

first looping -> second looping -> ..... -> 360th looping

I am wondering if I can make the loopings more or less concurrent so no
need for next looping to wait for the previous looping ends:

first looping ->
second looping ->
...
360th looping ->

Could you please give me some help? Thank you.

Is most of doIt's time spent waiting for the database insert? If so,
there may be potential, depending on the capabilities of the database.

You will need to use multiple threads to run the doIt calls. At the
other extreme from using a single thread to do all the calls, you could
start a new thread for each call. However, that will probably involve
more thread start overhead than is needed.

I think you will get better control over resources if you use the new
java.util.Concurrent features. See the API documentation introduction to
java.util.ThreadPoolExecutor.

Patricia
 
B

buggy

doIt(); //this method takes time. It inserts data into database

Have doIt() store the information to be inserted into the database in an
list

After the loop has completed, create an SQL prepared statement then loop
through the saved list filling in the values into the prepared statement.

This will let the database engine compile the insert statement once,
rahte than 360 times.
 
D

Daniel Pitts

Hi,

I have a while-loop which loops 360 times. Each looping takes 100ms, so
in total it takes 36 seconds, which is very long.

while(true) //looping 360 times
{
....//code for preparation of the method calling in the end

doIt(); //this method takes time. It inserts data into database

}

Right now, the flow is:

first looping -> second looping -> ..... -> 360th looping

I am wondering if I can make the loopings more or less concurrent so no
need for next looping to wait for the previous looping ends:

first looping ->
second looping ->
...
360th looping ->

Could you please give me some help? Thank you.


First, look into using Batches instead of concurrency.
If you find that you absolutely can't use batches, then look into
java.util.concurrent.Executors
<http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/
Executors.html>
It will help you create a set of threads workers. This helps two ways.
One way is that you don't create 360 threads (that could cause serious
resource problems). The other help is that you don't have to worry
about queuing it up yourself, its built into the executors.

You may find that there are ways to speed up your database access.
 
L

Lew

buggy said:
doIt(); //this method takes time. It inserts data into database

Have doIt() store the information to be inserted into the database in an
list

After the loop has completed, create an SQL prepared statement then loop
through the saved list filling in the values into the prepared statement.

This will let the database engine compile the insert statement once,
rahte than 360 times.

Keep an eye on transaction integrity with this approach if you are not
auto-committing, because that could places all the inserts into one
transaction. If you want them to individually commit then you would need to
attend to that. OTOH, this is a powerful idiom when you do want all-or-nothing
for a transaction.

- Lew
 
C

Chris Uppal

Daniel said:
First, look into using Batches instead of concurrency.

The way you have capitalised "Batches" makes it sound as if there's a specific
software package with that name which would help with this sort of problem. I
haven't heard of one myself (and Google shows nothing obviously helpful); am I
missing something interesting ?

-- chris
 
D

Daniel Pitts

The way you have capitalised "Batches" makes it sound as if there's a specific
software package with that name which would help with this sort of problem. I
haven't heard of one myself (and Google shows nothing obviously helpful); am I
missing something interesting ?

-- chris

Ah, sorry. I was struck by the RCM (Random Capitalisation Monster).
When you are inserting or updating many rows in a database, you can
often "batch" the process to improve throughput. Most database
interfaces support batching.

Basically, the concept goes like this:
1. Start batch
2. Insert a bunch of rows
3. commit batch
4. --- All of the inserts get sent to the DB in one go.

This has the downside that you can't rely on side-effects of the
inserts until after commit. Specifically, you can't get the auto-
generated primary key for each insert.
 
L

Lew

Daniel said:
Basically, the concept goes like this:
1. Start batch
2. Insert a bunch of rows
3. commit batch
4. --- All of the inserts get sent to the DB in one go.

Pros:

- good use of connection and potentially of PreparedStatement to augment
performance.
- the only way to maintain consistency across related modifications.
- one part of the transaction fails, the whole thing rolls back, if you're
vigilant.

Cons:

- one part of the transaction fails, the whole thing rolls back, unless you're
vigilant.
- ties up a thread until it's all over.
- ties up db resources (e.g., the connection) until it's all over.
This has the downside that you can't rely on side-effects of the
inserts until after commit. Specifically, you can't get the auto-
generated primary key for each insert.

The use of auto-generated items as keys is controversial, and at best fraught
with peril. Thiw downside would not exist if one used real keys, i.e., columns
that correspond to attributes of the model. Auto-generated values require
special handling for data loads and unloads. Auto-generated values need to be
kept hidden from the model domain.

There are apologists for the route of using only auto-generated values as
keys. They feel the cited difficulties to be worth the effort.

There are those in the latter group who go beyond any justifiable use of
auto-generated key values to assign single-column keys to multi-column-key
(relationship) tables, those whose composite keys comprise only a
concatenation of foreign-key references.

I used to use auto-generated keys all over the place. (Not in composite-key
tables, however.) Now I'm in the natural-key (a.k.a., "real-key") camp.

- Lew
 
C

christopher

there are 2 more alternatives -- one is to save all the data as you
loop and write it to the database once*without* using
PreparedStatement, which still writes the data in order but only opens
the database once. the other is use connection pooling, which can
maintain an open connection. The point here is opening a database
connection can be *very* slow. it should be easy to check and see if
this is what is slowing you down.
 
C

Chris Uppal

Daniel Pitts wrote:

[me:]
The way you have capitalised "Batches" makes it sound as if there's a
specific software package with that name which would help with this
sort of problem.
[...]
Ah, sorry. I was struck by the RCM (Random Capitalisation Monster).

No problem.

But now I'm wondering if there's useful mileage in abstracting the batching
pattern out into some sort of framework -- something like

interface BatchProcessor
{
void submitTask(Runnable action);
void implementAbortBy(Runnable action);
void implementCommitBy(Runnable action);
void abort();
void commit();
....
}

(with extensions for threading and the like). Probably overkill, or at least
over-engineering something simple, but... it might make more sense if the
BatchProcessor were specific to use in DB contexts, since there is a fair
amount of common extra semantics to be managed in such cases.

Hey ho.

-- chris
 
P

Patricia Shanahan

there are 2 more alternatives -- one is to save all the data as you
loop and write it to the database once*without* using
PreparedStatement, which still writes the data in order but only opens
the database once. the other is use connection pooling, which can
maintain an open connection. The point here is opening a database
connection can be *very* slow. it should be easy to check and see if
this is what is slowing you down.

I now have a question that is very similar to this one.

I have some data I need to examine in many different ways. The main
files, which represent one logical table, total a bit over 10GB, about
88 million lines of 123 bytes each.

I'm considering converting this to a MySQL database, and accessing it
through Java.

What is the best way of inserting the 88 million rows in the main
table? Do it in batches of some reasonable size?

Patricia
 
A

Alex Hunsley

Patricia said:
I now have a question that is very similar to this one.

I have some data I need to examine in many different ways. The main
files, which represent one logical table, total a bit over 10GB, about
88 million lines of 123 bytes each.

I'm considering converting this to a MySQL database, and accessing it
through Java.

What is the best way of inserting the 88 million rows in the main
table? Do it in batches of some reasonable size?

Yup, I've done something similar before.
For loading a large database, it's worth spending some time benchmarking
what an efficient 'load chunk' size is (for the method, or methods, you
are using for your load).
 
C

Chris Uppal

Patricia said:
I have some data I need to examine in many different ways. The main
files, which represent one logical table, total a bit over 10GB, about
88 million lines of 123 bytes each.

I'm considering converting this to a MySQL database, and accessing it
through Java.

What is the best way of inserting the 88 million rows in the main
table? Do it in batches of some reasonable size?

If you haven't already, then I suggest you look into "bulk load" or "bulk
insert". Some links (for MySQL)
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
http://dev.mysql.com/doc/refman/5.1/en/insert-speed.html

(There's a comment from a "Nathan Huebner" near the bottom of the first page
which describes how he loaded data with fixed size columns but without
separators using LOAD DATA INFILE.)

Also consider "standard" tricks like turning off all indexing, triggers,
referential integrity constraints, etc, while doing the insert.

Again, if you haven't already, then its worth considering whether you require
transactional integrity on the DB you're building. Presumably MySQL works
faster for non-transactional table-types.
http://dev.mysql.com/doc/refman/5.1/en/storage-engine-compare-transactions.html

-- chris
 
P

Patricia Shanahan

Chris said:
Patricia Shanahan wrote:




If you haven't already, then I suggest you look into "bulk load" or "bulk
insert". Some links (for MySQL)
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
http://dev.mysql.com/doc/refman/5.1/en/insert-speed.html

(There's a comment from a "Nathan Huebner" near the bottom of the first page
which describes how he loaded data with fixed size columns but without
separators using LOAD DATA INFILE.)

Yup, I tracked down LOAD DATA INFILE after posting, and that seems to be
the way to go. I've converted my text file to tab delimited columns,
newline at end of row, and loaded up an extract that way.
Also consider "standard" tricks like turning off all indexing, triggers,
referential integrity constraints, etc, while doing the insert.

Again, if you haven't already, then its worth considering whether you require
transactional integrity on the DB you're building. Presumably MySQL works
faster for non-transactional table-types.
http://dev.mysql.com/doc/refman/5.1/en/storage-engine-compare-transactions.html

Thanks for the tips. I'm mining a fixed body of data. Once I get it
loaded I don't plan to change the table contents, so I don't see any
need at all for transactional integrity.

Patricia
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top