Best way to query a enormous database table

Discussion in 'Java' started by Lonifasiko, Oct 19, 2006.

  1. Lonifasiko

    Lonifasiko Guest

    Hi,

    I'm working with MySQL as back-end and I must perform a heavy select
    against a 250 columns table that could store near 1 million records
    aproximately.

    What's the best way to do this? I'm newbie to Java language and I've
    started with:

    ResultSet rs = stmt.executeQuery("SELECT ....");

    However, the JDK is throwing me OutOfmemotyException while querying the
    table, that today has got "only" 18000 records.

    I would really appreciate any ideas.

    Thanks very much in advance.
     
    Lonifasiko, Oct 19, 2006
    #1
    1. Advertising

  2. Lonifasiko

    AW på ZRX Guest

    > ResultSet rs = stmt.executeQuery("SELECT ....");
    >
    > However, the JDK is throwing me OutOfmemotyException while querying the
    > table, that today has got "only" 18000 records.
    >
    > I would really appreciate any ideas.


    Are you trying to read all records into the java program?
    It is better to let the database do the selection based on a where clause,
    so that you only get the records needed.
    The size of the table should not be a problem.

    Also note that a common newbie error is to use Strings instead of
    StringBuffers if you are doing a lot of String processing on the data.

    Maybe a bit more code could help....

    Allan
     
    AW på ZRX, Oct 19, 2006
    #2
    1. Advertising

  3. On 19.10.2006 15:11 Lonifasiko wrote:
    > Hi,
    >
    > I'm working with MySQL as back-end and I must perform a heavy select
    > against a 250 columns table that could store near 1 million records
    > aproximately.
    >
    > What's the best way to do this? I'm newbie to Java language and I've
    > started with:
    >
    > ResultSet rs = stmt.executeQuery("SELECT ....");
    >
    > However, the JDK is throwing me OutOfmemotyException while querying the
    > table, that today has got "only" 18000 records.
    >
    > I would really appreciate any ideas.


    If it's throwing th OOME during executeQuery() you should check the docs
    for the MySQL driver on how to disable the caching. I assume that the
    driver tries to load all rows into memory.

    If this happens during your processing of the rows, then the answer is
    simple: Don't keep all rows in memory :)

    Thomas

    --
    It's not a RootKit - it's a Sony
     
    Thomas Kellerer, Oct 19, 2006
    #3
  4. If you are selecting the rows to display them on some sort of a UI, try
    to use pagination and limit the set you pick up from the database.
    Refer to MySql manual for 'limit' clause. This will help you restrict
    the number of rows returned. You can change the limit values for every
    page and select a managable number of records rather than the entire
    table.

    -cheers,
    Manish
     
    Manish Pandit, Oct 19, 2006
    #4
  5. Lonifasiko

    javajoker Guest

    I'm agreeing with the other posters.. turn off caching, make more
    selective queries (no full table scans) and pagination with query
    limits if applicable. Maybe you should also think of normalizing that
    table as well. Maybe it would be easier to troubleshoot if you gave an
    example of how you planned to use the data after you retrieved it. In
    most cases you shouldn't need every record stored in memory.

    --Nick
    http://www.lla50.com


    Lonifasiko wrote:
    > Hi,
    >
    > I'm working with MySQL as back-end and I must perform a heavy select
    > against a 250 columns table that could store near 1 million records
    > aproximately.
    >
    > What's the best way to do this? I'm newbie to Java language and I've
    > started with:
    >
    > ResultSet rs = stmt.executeQuery("SELECT ....");
    >
    > However, the JDK is throwing me OutOfmemotyException while querying the
    > table, that today has got "only" 18000 records.
    >
    > I would really appreciate any ideas.
    >
    > Thanks very much in advance.
     
    javajoker, Oct 19, 2006
    #5
  6. Lonifasiko

    javajoker Guest

    I'm agreeing with the other posters.. turn off caching, make more
    selective queries (no full table scans) and pagination with query
    limits if applicable. Maybe you should also think of normalizing that
    table as well. Maybe it would be easier to troubleshoot if you gave an
    example of how you planned to use the data after you retrieved it. In
    most cases you shouldn't need every record stored in memory.

    --Nick
    http://www.lla50.com


    Lonifasiko wrote:
    > Hi,
    >
    > I'm working with MySQL as back-end and I must perform a heavy select
    > against a 250 columns table that could store near 1 million records
    > aproximately.
    >
    > What's the best way to do this? I'm newbie to Java language and I've
    > started with:
    >
    > ResultSet rs = stmt.executeQuery("SELECT ....");
    >
    > However, the JDK is throwing me OutOfmemotyException while querying the
    > table, that today has got "only" 18000 records.
    >
    > I would really appreciate any ideas.
    >
    > Thanks very much in advance.
     
    javajoker, Oct 19, 2006
    #6
  7. Lonifasiko

    Lonifasiko Guest

    Hi and thanks for both the replies.

    In fact, I'm going to filter data from database using a where clause.
    It was just an example ;-)

    I know it's not a good idea to load into memory all rows. I'll see for
    an alternative way to do it.

    I'll also take a look at the MySQL driver properties for the caching
    issue you mention.

    Thanks very much.
     
    Lonifasiko, Oct 19, 2006
    #7
  8. Lonifasiko

    Simon Brooke Guest

    in message <>,
    Lonifasiko ('') wrote:

    > I'm working with MySQL as back-end and I must perform a heavy select
    > against a 250 columns table that could store near 1 million records
    > aproximately.


    I would get someone who knows what they're doing to design the database
    schema first. Then I would use a proper heavyweight database engine (such
    as Postgres). Then I would look up the SQL keywords 'LIMIT' and 'COUNT'.

    Trying to schlurp a whole table into RAM is not a good idea at the best of
    times.

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/

    ;; Woz: 'All the best people in life seem to like LINUX.'
    ;; <URL:http://www.woz.org/woz/cresponses/response03.html>
     
    Simon Brooke, Oct 19, 2006
    #8
  9. Lonifasiko

    steve Guest

    On Fri, 20 Oct 2006 05:03:47 +0800, Lonifasiko wrote
    (in article <>):

    > Hi and thanks for both the replies.
    >
    > In fact, I'm going to filter data from database using a where clause.
    > It was just an example ;-)
    >
    > I know it's not a good idea to load into memory all rows. I'll see for
    > an alternative way to do it.
    >
    > I'll also take a look at the MySQL driver properties for the caching
    > issue you mention.
    >
    > Thanks very much.
    >




    you listen to any of these noobies and you will get yourself into trouble,

    Cashing , pagation , limiting queries "normalization of data" WTF!!!

    1. Bring only the columns you absolutely need into the client.
    2. start by limiting your query using the where clause.
    3. USE A SCROLLABLE RESULTSET. / UPDATABLE SCROLLABLE RESULTSET.
    This is part of the ODBC standard.

    this allows you A "window" on your RECORD selection.

    you setup:
    how many records you want to see in the "window",
    if you want to scroll forward/backward or both,
    if the resultset is updatable.
    if the updatable resultset is visible by other users.



    ultimately you do ONE query, then let the ODBC driver do the hard work,
    using "count " and other shitty solutions means you need to issue multiple
    queries.


    Steve
     
    steve, Oct 19, 2006
    #9
  10. Lonifasiko

    jmcgill Guest

    Lonifasiko wrote:
    > Hi,
    >
    > I'm working with MySQL as back-end and I must perform a heavy select
    > against a 250 columns table that could store near 1 million records
    > aproximately.


    Those are just numbers. They don't necessarily correlate to the
    complexity of the execution of the query, or the size of the result set.
    It would be more meaningful if you showed your table definition, and
    the output of "EXPLAIN <yourquery>;"

    Indexes can make a dramatic difference in the work a query must do.
    Are your tables indexed in a way that is meaningful to your "heavy" select?

    > What's the best way to do this? I'm newbie to Java language and I've
    > started with:



    > ResultSet rs = stmt.executeQuery("SELECT ....");


    > However, the JDK is throwing me OutOfmemotyException while querying the
    > table, that today has got "only" 18000 records.


    Of course it does. You need to query on a cursor, which may not be
    possible on the MySQL JDBC driver.

    You can keep track of how many trips you have made, and use LIMIT and
    OFFSET in your queries, and then have precise control of the resultset
    chunks. This has concurrency problems if you cannot lock while in process.

    stmt.setFetchSize(##something reasonable##) might help.

    This is from the manual
    stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
    java.sql.ResultSet.CONCUR_READ_ONLY);
    stmt.setFetchSize(Integer.MIN_VALUE);

    It will lock the connection until you are done, which might be good or
    bad depending on your situation.

    The bottom line is that the ResultSet default behavior is to suck the
    entire result into a Vector. I think it is horrible that it is a path
    to a runtime exception, personally.
     
    jmcgill, Oct 20, 2006
    #10
  11. Lonifasiko

    jmcgill Guest

    steve wrote:
    > On Fri, 20 Oct 2006 05:03:47 +0800, Lonifasiko wrote
    > (in article <>):
    >
    >> Hi and thanks for both the replies.
    >>
    >> In fact, I'm going to filter data from database using a where clause.
    >> It was just an example ;-)
    >>
    >> I know it's not a good idea to load into memory all rows. I'll see for
    >> an alternative way to do it.
    >>
    >> I'll also take a look at the MySQL driver properties for the caching
    >> issue you mention.
    >>
    >> Thanks very much.
    >>

    >
    >
    >
    > you listen to any of these noobies and you will get yourself into trouble,
    >
    > Cashing , pagation , limiting queries "normalization of data" WTF!!!
    >
    > 1. Bring only the columns you absolutely need into the client.
    > 2. start by limiting your query using the where clause.
    > 3. USE A SCROLLABLE RESULTSET. / UPDATABLE SCROLLABLE RESULTSET.
    > This is part of the ODBC standard.


    ODBC, but not JDBC. A JDBC query will (try to) suck the entire row set
    into memory. There are some workarounds. See my prior post.

    > this allows you A "window" on your RECORD selection.
    >
    > you setup:
    > how many records you want to see in the "window",
    > if you want to scroll forward/backward or both,
    > if the resultset is updatable.
    > if the updatable resultset is visible by other users.


    If you know a way to do this via JDBC and the ConnectorJ with MySQL,
    please give an example.
     
    jmcgill, Oct 20, 2006
    #11
  12. Lonifasiko

    jmcgill Guest

    Simon Brooke wrote:
    > Trying to schlurp a whole table into RAM is not a good idea at the best of
    > times.


    Nobody seems to have told this to the JDBC team, or MySQL's ConnectorJ
    team.
     
    jmcgill, Oct 20, 2006
    #12
  13. Lonifasiko

    jmcgill Guest

    javajoker wrote:
    > I'm agreeing with the other posters.. turn off caching, make more
    > selective queries (no full table scans) and pagination with query
    > limits if applicable. Maybe you should also think of normalizing that
    > table as well. Maybe it would be easier to troubleshoot if you gave an
    > example of how you planned to use the data after you retrieved it. In
    > most cases you shouldn't need every record stored in memory.


    Your answer does not really address the basic problem: The JDBC
    ResultSet, at least with ConnectorJ and MySQL, by default stores the
    entire rowset into a Vector. You can use fetchSize(), but even that is
    "merely a hint" as to how much to query in each pass, and the end result
    is still to store the entire rowset in a Vector. This is true no matter
    what form of caching is used, no matter how the tables are indexed, no
    matter how normalized the table(s) in the join happen to be.

    The client must paginate explicitly, e.g., with COUNT and LIMIT. Unless
    he locks every table in the join, there's a concurrency problem; later
    queries are from a different state than earlier queries, there's a
    chance of duplicates or omissions, in other words, there is always the
    chance that the resulting aggregation of queries _is not a partition_ of
    the original full query.
     
    jmcgill, Oct 20, 2006
    #13
  14. "Lonifasiko" <> wrote in message
    news:...
    > Hi,
    >
    > I'm working with MySQL as back-end and I must perform a heavy select
    > against a 250 columns table that could store near 1 million records
    > aproximately.
    >
    > What's the best way to do this? I'm newbie to Java language and I've
    > started with:
    >
    > ResultSet rs = stmt.executeQuery("SELECT ....");
    >
    > However, the JDK is throwing me OutOfmemotyException while querying the
    > table, that today has got "only" 18000 records.
    >
    > I would really appreciate any ideas.
    >
    > Thanks very much in advance.


    Besides limiting it vis the WHERE clause se also
    CallableStatement.setMaxRows or PreparedStatement.setMaxRows
     
    Charles Hottel, Oct 20, 2006
    #14
  15. Lonifasiko

    Xagyg Guest

    Lonifasiko wrote:
    > Hi,
    >
    > I'm working with MySQL as back-end and I must perform a heavy select
    > against a 250 columns table that could store near 1 million records
    > aproximately.
    >
    > What's the best way to do this?


    <stuff deleted>

    "Distributed result set iterator: a design pattern for efficient
    retrieval of large result sets from remote data sources"

    See http://dx.doi.org/10.1016/j.jpdc.2004.01.001

    If you can't get access to this article, I'll post a version on a
    website in a day or 2. Let me know.
     
    Xagyg, Oct 20, 2006
    #15
  16. Lonifasiko

    Simon Brooke Guest

    in message <>, Xagyg
    ('') wrote:

    > Lonifasiko wrote:
    >> Hi,
    >>
    >> I'm working with MySQL as back-end and I must perform a heavy select
    >> against a 250 columns table that could store near 1 million records
    >> aproximately.
    >>
    >> What's the best way to do this?

    >
    > <stuff deleted>
    >
    > "Distributed result set iterator: a design pattern for efficient
    > retrieval of large result sets from remote data sources"
    >
    > See http://dx.doi.org/10.1016/j.jpdc.2004.01.001
    >
    > If you can't get access to this article, I'll post a version on a
    > website in a day or 2. Let me know.


    From the abstract, that looks a very interesting article, which could solve
    some problems I have. If you have the right to post copies, I'd appreciate
    a look at it, too.

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/

    ;; Life would be much easier if I had the source code.
     
    Simon Brooke, Oct 20, 2006
    #16
  17. Lonifasiko

    Simon Brooke Guest

    in message <meUZg.1505$rS.170@fed1read05>, jmcgill
    ('') wrote:

    > Simon Brooke wrote:
    >> Trying to schlurp a whole table into RAM is not a good idea at the best
    >> of times.

    >
    > Nobody seems to have told this to the JDBC team, or MySQL's ConnectorJ
    > team.


    No. JDBC is not, in my opinion, a hugely well engineered solution, but it
    does have the benefit that it is a very simple and very general solution.
    And, to be fair, it really ought to be possible to build a ResultSet
    conforming to the API which actually fetched rows a few at a time,
    although you would need specific server-side support for this.

    I seem to remember the Oracle JDBC drivers may actually do this, but it's
    too long since I was seriously using Oracle in anger.

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/

    'graveyards are full of indispensable people'
     
    Simon Brooke, Oct 20, 2006
    #17
  18. Lonifasiko

    Guest

    Hi there,

    It is great if you could share the article.

    Thx.

    Xagyg wrote:
    > Lonifasiko wrote:
    > > Hi,
    > >
    > > I'm working with MySQL as back-end and I must perform a heavy select
    > > against a 250 columns table that could store near 1 million records
    > > aproximately.
    > >
    > > What's the best way to do this?

    >
    > <stuff deleted>
    >
    > "Distributed result set iterator: a design pattern for efficient
    > retrieval of large result sets from remote data sources"
    >
    > See http://dx.doi.org/10.1016/j.jpdc.2004.01.001
    >
    > If you can't get access to this article, I'll post a version on a
    > website in a day or 2. Let me know.
     
    , Oct 20, 2006
    #18
  19. Lonifasiko

    Xagyg Guest

    > From the abstract, that looks a very interesting article, which could solve
    > some problems I have. If you have the right to post copies, I'd appreciate
    > a look at it, too.


    http://www.longbrothers.net/brad/

    You want the linked papers (#4 and it's companion, the one at the very
    bottom of the page).
     
    Xagyg, Oct 20, 2006
    #19
  20. Lonifasiko

    Simon Brooke Guest

    in message <>, Xagyg
    ('') wrote:

    >> From the abstract, that looks a very interesting article, which could
    >> solve some problems I have. If you have the right to post copies, I'd
    >> appreciate a look at it, too.

    >
    > http://www.longbrothers.net/brad/
    >
    > You want the linked papers (#4 and it's companion, the one at the very
    > bottom of the page).


    Thank you very much.

    --
    (Simon Brooke) http://www.jasmine.org.uk/~simon/

    [ This .sig subject to change without notice ]
     
    Simon Brooke, Oct 20, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. DC Gringo
    Replies:
    2
    Views:
    518
    Andy Smith
    Mar 6, 2004
  2. =?Utf-8?B?U1M=?=
    Replies:
    3
    Views:
    3,869
    Juan T. Llibre
    May 15, 2005
  3. Learner
    Replies:
    1
    Views:
    1,008
    Marina Levit [MVP]
    Jan 30, 2006
  4. ais523

    Are enormous malloc()s conforming?

    ais523, Dec 13, 2006, in forum: C Programming
    Replies:
    3
    Views:
    308
    Spiros Bousbouras
    Dec 13, 2006
  5. n00m
    Replies:
    32
    Views:
    1,440
    rishimukherjee
    May 30, 2012
Loading...

Share This Page