How to make the each looping concurrent thread to improve WHILE-loopperformance?

Discussion in 'Java' started by www, Feb 1, 2007.

  1. www

    www Guest

    Hi,

    I have a while-loop which loops 360 times. Each looping takes 100ms, so
    in total it takes 36 seconds, which is very long.


    while(true) //looping 360 times
    {
    ....//code for preparation of the method calling in the end

    doIt(); //this method takes time. It inserts data into database
    }


    Right now, the flow is:

    first looping -> second looping -> ..... -> 360th looping

    I am wondering if I can make the loopings more or less concurrent so no
    need for next looping to wait for the previous looping ends:

    first looping ->
    second looping ->
    ....
    360th looping ->

    Could you please give me some help? Thank you.
     
    www, Feb 1, 2007
    #1
    1. Advertising

  2. www wrote:
    > Hi,
    >
    > I have a while-loop which loops 360 times. Each looping takes 100ms, so
    > in total it takes 36 seconds, which is very long.
    >
    >
    > while(true) //looping 360 times
    > {
    > ....//code for preparation of the method calling in the end
    >
    > doIt(); //this method takes time. It inserts data into database
    > }
    >
    >
    > Right now, the flow is:
    >
    > first looping -> second looping -> ..... -> 360th looping
    >
    > I am wondering if I can make the loopings more or less concurrent so no
    > need for next looping to wait for the previous looping ends:
    >
    > first looping ->
    > second looping ->
    > ...
    > 360th looping ->
    >
    > Could you please give me some help? Thank you.


    Is most of doIt's time spent waiting for the database insert? If so,
    there may be potential, depending on the capabilities of the database.

    You will need to use multiple threads to run the doIt calls. At the
    other extreme from using a single thread to do all the calls, you could
    start a new thread for each call. However, that will probably involve
    more thread start overhead than is needed.

    I think you will get better control over resources if you use the new
    java.util.Concurrent features. See the API documentation introduction to
    java.util.ThreadPoolExecutor.

    Patricia
     
    Patricia Shanahan, Feb 1, 2007
    #2
    1. Advertising

  3. www

    buggy Guest

    doIt(); //this method takes time. It inserts data into database

    Have doIt() store the information to be inserted into the database in an
    list

    After the loop has completed, create an SQL prepared statement then loop
    through the saved list filling in the values into the prepared statement.

    This will let the database engine compile the insert statement once,
    rahte than 360 times.
     
    buggy, Feb 1, 2007
    #3
  4. www

    Daniel Pitts Guest

    Re: How to make the each looping concurrent thread to improve WHILE-loop performance?

    On Feb 1, 5:40 am, www <> wrote:
    > Hi,
    >
    > I have a while-loop which loops 360 times. Each looping takes 100ms, so
    > in total it takes 36 seconds, which is very long.
    >
    > while(true) //looping 360 times
    > {
    > ....//code for preparation of the method calling in the end
    >
    > doIt(); //this method takes time. It inserts data into database
    >
    > }
    >
    > Right now, the flow is:
    >
    > first looping -> second looping -> ..... -> 360th looping
    >
    > I am wondering if I can make the loopings more or less concurrent so no
    > need for next looping to wait for the previous looping ends:
    >
    > first looping ->
    > second looping ->
    > ...
    > 360th looping ->
    >
    > Could you please give me some help? Thank you.



    First, look into using Batches instead of concurrency.
    If you find that you absolutely can't use batches, then look into
    java.util.concurrent.Executors
    <http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/
    Executors.html>
    It will help you create a set of threads workers. This helps two ways.
    One way is that you don't create 360 threads (that could cause serious
    resource problems). The other help is that you don't have to worry
    about queuing it up yourself, its built into the executors.

    You may find that there are ways to speed up your database access.
     
    Daniel Pitts, Feb 1, 2007
    #4
  5. www

    Lew Guest

    buggy wrote:
    > doIt(); //this method takes time. It inserts data into database
    >
    > Have doIt() store the information to be inserted into the database in an
    > list
    >
    > After the loop has completed, create an SQL prepared statement then loop
    > through the saved list filling in the values into the prepared statement.
    >
    > This will let the database engine compile the insert statement once,
    > rahte than 360 times.


    Keep an eye on transaction integrity with this approach if you are not
    auto-committing, because that could places all the inserts into one
    transaction. If you want them to individually commit then you would need to
    attend to that. OTOH, this is a powerful idiom when you do want all-or-nothing
    for a transaction.

    - Lew
     
    Lew, Feb 1, 2007
    #5
  6. www

    Chris Uppal Guest

    Re: How to make the each looping concurrent thread to improve WHILE-loop performance?

    Daniel Pitts wrote:

    > First, look into using Batches instead of concurrency.


    The way you have capitalised "Batches" makes it sound as if there's a specific
    software package with that name which would help with this sort of problem. I
    haven't heard of one myself (and Google shows nothing obviously helpful); am I
    missing something interesting ?

    -- chris
     
    Chris Uppal, Feb 2, 2007
    #6
  7. www

    Daniel Pitts Guest

    Re: How to make the each looping concurrent thread to improve WHILE-loop performance?

    On Feb 2, 4:08 am, "Chris Uppal" <-
    THIS.org> wrote:
    > Daniel Pitts wrote:
    > > First, look into using Batches instead of concurrency.

    >
    > The way you have capitalised "Batches" makes it sound as if there's a specific
    > software package with that name which would help with this sort of problem. I
    > haven't heard of one myself (and Google shows nothing obviously helpful); am I
    > missing something interesting ?
    >
    > -- chris


    Ah, sorry. I was struck by the RCM (Random Capitalisation Monster).
    When you are inserting or updating many rows in a database, you can
    often "batch" the process to improve throughput. Most database
    interfaces support batching.

    Basically, the concept goes like this:
    1. Start batch
    2. Insert a bunch of rows
    3. commit batch
    4. --- All of the inserts get sent to the DB in one go.

    This has the downside that you can't rely on side-effects of the
    inserts until after commit. Specifically, you can't get the auto-
    generated primary key for each insert.
     
    Daniel Pitts, Feb 2, 2007
    #7
  8. www

    Lew Guest

    Daniel Pitts wrote:
    > Basically, the concept goes like this:
    > 1. Start batch
    > 2. Insert a bunch of rows
    > 3. commit batch
    > 4. --- All of the inserts get sent to the DB in one go.


    Pros:

    - good use of connection and potentially of PreparedStatement to augment
    performance.
    - the only way to maintain consistency across related modifications.
    - one part of the transaction fails, the whole thing rolls back, if you're
    vigilant.

    Cons:

    - one part of the transaction fails, the whole thing rolls back, unless you're
    vigilant.
    - ties up a thread until it's all over.
    - ties up db resources (e.g., the connection) until it's all over.

    > This has the downside that you can't rely on side-effects of the
    > inserts until after commit. Specifically, you can't get the auto-
    > generated primary key for each insert.


    The use of auto-generated items as keys is controversial, and at best fraught
    with peril. Thiw downside would not exist if one used real keys, i.e., columns
    that correspond to attributes of the model. Auto-generated values require
    special handling for data loads and unloads. Auto-generated values need to be
    kept hidden from the model domain.

    There are apologists for the route of using only auto-generated values as
    keys. They feel the cited difficulties to be worth the effort.

    There are those in the latter group who go beyond any justifiable use of
    auto-generated key values to assign single-column keys to multi-column-key
    (relationship) tables, those whose composite keys comprise only a
    concatenation of foreign-key references.

    I used to use auto-generated keys all over the place. (Not in composite-key
    tables, however.) Now I'm in the natural-key (a.k.a., "real-key") camp.

    - Lew
     
    Lew, Feb 2, 2007
    #8
  9. www

    Guest

    Re: How to make the each looping concurrent thread to improve WHILE-loop performance?

    there are 2 more alternatives -- one is to save all the data as you
    loop and write it to the database once*without* using
    PreparedStatement, which still writes the data in order but only opens
    the database once. the other is use connection pooling, which can
    maintain an open connection. The point here is opening a database
    connection can be *very* slow. it should be easy to check and see if
    this is what is slowing you down.


    On Feb 1, 5:40 am, www <> wrote:
    > Hi,
    >
    > I have a while-loop which loops 360 times. Each looping takes 100ms, so
    > in total it takes 36 seconds, which is very long.
    >
    > while(true) //looping 360 times
    > {
    > ....//code for preparation of the method calling in the end
    >
    > doIt(); //this method takes time. It inserts data into database
    >
    > }
    >
    > Right now, the flow is:
    >
    > first looping -> second looping -> ..... -> 360th looping
    >
    > I am wondering if I can make the loopings more or less concurrent so no
    > need for next looping to wait for the previous looping ends:
    >
    > first looping ->
    > second looping ->
    > ...
    > 360th looping ->
    >
    > Could you please give me some help? Thank you.
     
    , Feb 5, 2007
    #9
  10. www

    Chris Uppal Guest

    Re: How to make the each looping concurrent thread to improve WHILE-loop performance?

    Daniel Pitts wrote:

    [me:]
    > > The way you have capitalised "Batches" makes it sound as if there's a
    > > specific software package with that name which would help with this
    > > sort of problem.

    [...]
    > Ah, sorry. I was struck by the RCM (Random Capitalisation Monster).


    No problem.

    But now I'm wondering if there's useful mileage in abstracting the batching
    pattern out into some sort of framework -- something like

    interface BatchProcessor
    {
    void submitTask(Runnable action);
    void implementAbortBy(Runnable action);
    void implementCommitBy(Runnable action);
    void abort();
    void commit();
    ....
    }

    (with extensions for threading and the like). Probably overkill, or at least
    over-engineering something simple, but... it might make more sense if the
    BatchProcessor were specific to use in DB contexts, since there is a fair
    amount of common extra semantics to be managed in such cases.

    Hey ho.

    -- chris
     
    Chris Uppal, Feb 5, 2007
    #10
  11. wrote:
    > there are 2 more alternatives -- one is to save all the data as you
    > loop and write it to the database once*without* using
    > PreparedStatement, which still writes the data in order but only opens
    > the database once. the other is use connection pooling, which can
    > maintain an open connection. The point here is opening a database
    > connection can be *very* slow. it should be easy to check and see if
    > this is what is slowing you down.


    I now have a question that is very similar to this one.

    I have some data I need to examine in many different ways. The main
    files, which represent one logical table, total a bit over 10GB, about
    88 million lines of 123 bytes each.

    I'm considering converting this to a MySQL database, and accessing it
    through Java.

    What is the best way of inserting the 88 million rows in the main
    table? Do it in batches of some reasonable size?

    Patricia
     
    Patricia Shanahan, Feb 5, 2007
    #11
  12. www

    Alex Hunsley Guest

    Patricia Shanahan wrote:
    > wrote:
    >> there are 2 more alternatives -- one is to save all the data as you
    >> loop and write it to the database once*without* using
    >> PreparedStatement, which still writes the data in order but only opens
    >> the database once. the other is use connection pooling, which can
    >> maintain an open connection. The point here is opening a database
    >> connection can be *very* slow. it should be easy to check and see if
    >> this is what is slowing you down.

    >
    > I now have a question that is very similar to this one.
    >
    > I have some data I need to examine in many different ways. The main
    > files, which represent one logical table, total a bit over 10GB, about
    > 88 million lines of 123 bytes each.
    >
    > I'm considering converting this to a MySQL database, and accessing it
    > through Java.
    >
    > What is the best way of inserting the 88 million rows in the main
    > table? Do it in batches of some reasonable size?


    Yup, I've done something similar before.
    For loading a large database, it's worth spending some time benchmarking
    what an efficient 'load chunk' size is (for the method, or methods, you
    are using for your load).

    >
    > Patricia
     
    Alex Hunsley, Feb 5, 2007
    #12
  13. www

    Chris Uppal Guest

    Re: How to make the each looping concurrent thread to improve WHILE-loop performance?

    Patricia Shanahan wrote:

    > I have some data I need to examine in many different ways. The main
    > files, which represent one logical table, total a bit over 10GB, about
    > 88 million lines of 123 bytes each.
    >
    > I'm considering converting this to a MySQL database, and accessing it
    > through Java.
    >
    > What is the best way of inserting the 88 million rows in the main
    > table? Do it in batches of some reasonable size?


    If you haven't already, then I suggest you look into "bulk load" or "bulk
    insert". Some links (for MySQL)
    http://dev.mysql.com/doc/refman/5.1/en/load-data.html
    http://dev.mysql.com/doc/refman/5.1/en/insert-speed.html

    (There's a comment from a "Nathan Huebner" near the bottom of the first page
    which describes how he loaded data with fixed size columns but without
    separators using LOAD DATA INFILE.)

    Also consider "standard" tricks like turning off all indexing, triggers,
    referential integrity constraints, etc, while doing the insert.

    Again, if you haven't already, then its worth considering whether you require
    transactional integrity on the DB you're building. Presumably MySQL works
    faster for non-transactional table-types.
    http://dev.mysql.com/doc/refman/5.1/en/storage-engine-compare-transactions.html

    -- chris
     
    Chris Uppal, Feb 7, 2007
    #13
  14. Chris Uppal wrote:
    > Patricia Shanahan wrote:
    >
    >
    >>I have some data I need to examine in many different ways. The main
    >>files, which represent one logical table, total a bit over 10GB, about
    >>88 million lines of 123 bytes each.
    >>
    >>I'm considering converting this to a MySQL database, and accessing it
    >>through Java.
    >>
    >>What is the best way of inserting the 88 million rows in the main
    >>table? Do it in batches of some reasonable size?

    >
    >
    > If you haven't already, then I suggest you look into "bulk load" or "bulk
    > insert". Some links (for MySQL)
    > http://dev.mysql.com/doc/refman/5.1/en/load-data.html
    > http://dev.mysql.com/doc/refman/5.1/en/insert-speed.html
    >
    > (There's a comment from a "Nathan Huebner" near the bottom of the first page
    > which describes how he loaded data with fixed size columns but without
    > separators using LOAD DATA INFILE.)


    Yup, I tracked down LOAD DATA INFILE after posting, and that seems to be
    the way to go. I've converted my text file to tab delimited columns,
    newline at end of row, and loaded up an extract that way.

    >
    > Also consider "standard" tricks like turning off all indexing, triggers,
    > referential integrity constraints, etc, while doing the insert.
    >
    > Again, if you haven't already, then its worth considering whether you require
    > transactional integrity on the DB you're building. Presumably MySQL works
    > faster for non-transactional table-types.
    > http://dev.mysql.com/doc/refman/5.1/en/storage-engine-compare-transactions.html


    Thanks for the tips. I'm mining a fixed body of data. Once I get it
    loaded I don't plan to change the table contents, so I don't see any
    need at all for transactional integrity.

    Patricia
     
    Patricia Shanahan, Feb 7, 2007
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dave Ficken
    Replies:
    0
    Views:
    643
    Dave Ficken
    Nov 8, 2003
  2. Nishi Bhonsle
    Replies:
    1
    Views:
    914
    Thomas Weidenfeller
    Jul 20, 2004
  3. Pep
    Replies:
    6
    Views:
    826
  4. ssylee
    Replies:
    2
    Views:
    734
    James Kanze
    Jan 28, 2008
  5. Replies:
    5
    Views:
    279
Loading...

Share This Page