Best way to query a enormous database table

Lonifasiko · Oct 19, 2006

Hi,

I'm working with MySQL as back-end and I must perform a heavy select
against a 250 columns table that could store near 1 million records
aproximately.

What's the best way to do this? I'm newbie to Java language and I've
started with:

ResultSet rs = stmt.executeQuery("SELECT ....");

However, the JDK is throwing me OutOfmemotyException while querying the
table, that today has got "only" 18000 records.

I would really appreciate any ideas.

Thanks very much in advance.

AW på ZRX · Oct 19, 2006

ResultSet rs = stmt.executeQuery("SELECT ....");

However, the JDK is throwing me OutOfmemotyException while querying the
table, that today has got "only" 18000 records.

I would really appreciate any ideas.

Are you trying to read all records into the java program?
It is better to let the database do the selection based on a where clause,
so that you only get the records needed.
The size of the table should not be a problem.

Also note that a common newbie error is to use Strings instead of
StringBuffers if you are doing a lot of String processing on the data.

Maybe a bit more code could help....

Allan

Thomas Kellerer · Oct 19, 2006

Hi,

I'm working with MySQL as back-end and I must perform a heavy select
against a 250 columns table that could store near 1 million records
aproximately.

What's the best way to do this? I'm newbie to Java language and I've
started with:

ResultSet rs = stmt.executeQuery("SELECT ....");

However, the JDK is throwing me OutOfmemotyException while querying the
table, that today has got "only" 18000 records.

I would really appreciate any ideas.

If it's throwing th OOME during executeQuery() you should check the docs
for the MySQL driver on how to disable the caching. I assume that the
driver tries to load all rows into memory.

If this happens during your processing of the rows, then the answer is
simple: Don't keep all rows in memory

Thomas

Manish Pandit · Oct 19, 2006

If you are selecting the rows to display them on some sort of a UI, try
to use pagination and limit the set you pick up from the database.
Refer to MySql manual for 'limit' clause. This will help you restrict
the number of rows returned. You can change the limit values for every
page and select a managable number of records rather than the entire
table.

-cheers,
Manish

javajoker · Oct 19, 2006

I'm agreeing with the other posters.. turn off caching, make more
selective queries (no full table scans) and pagination with query
limits if applicable. Maybe you should also think of normalizing that
table as well. Maybe it would be easier to troubleshoot if you gave an
example of how you planned to use the data after you retrieved it. In
most cases you shouldn't need every record stored in memory.

--Nick
http://www.lla50.com

javajoker · Oct 19, 2006

I'm agreeing with the other posters.. turn off caching, make more
selective queries (no full table scans) and pagination with query
limits if applicable. Maybe you should also think of normalizing that
table as well. Maybe it would be easier to troubleshoot if you gave an
example of how you planned to use the data after you retrieved it. In
most cases you shouldn't need every record stored in memory.

--Nick
http://www.lla50.com

Lonifasiko · Oct 19, 2006

Hi and thanks for both the replies.

In fact, I'm going to filter data from database using a where clause.
It was just an example ;-)

I know it's not a good idea to load into memory all rows. I'll see for
an alternative way to do it.

I'll also take a look at the MySQL driver properties for the caching
issue you mention.

Thanks very much.

Simon Brooke · Oct 19, 2006

I'm working with MySQL as back-end and I must perform a heavy select
against a 250 columns table that could store near 1 million records
aproximately.

I would get someone who knows what they're doing to design the database
schema first. Then I would use a proper heavyweight database engine (such
as Postgres). Then I would look up the SQL keywords 'LIMIT' and 'COUNT'.

Trying to schlurp a whole table into RAM is not a good idea at the best of
times.

steve · Oct 19, 2006

Hi and thanks for both the replies.

In fact, I'm going to filter data from database using a where clause.
It was just an example ;-)

I know it's not a good idea to load into memory all rows. I'll see for
an alternative way to do it.

I'll also take a look at the MySQL driver properties for the caching
issue you mention.

Thanks very much.

you listen to any of these noobies and you will get yourself into trouble,

Cashing , pagation , limiting queries "normalization of data" WTF!!!

1. Bring only the columns you absolutely need into the client.
2. start by limiting your query using the where clause.
3. USE A SCROLLABLE RESULTSET. / UPDATABLE SCROLLABLE RESULTSET.
This is part of the ODBC standard.

this allows you A "window" on your RECORD selection.

you setup:
how many records you want to see in the "window",
if you want to scroll forward/backward or both,
if the resultset is updatable.
if the updatable resultset is visible by other users.

ultimately you do ONE query, then let the ODBC driver do the hard work,
using "count " and other shitty solutions means you need to issue multiple
queries.

Steve

jmcgill · Oct 20, 2006

Lonifasiko said:
Hi,

I'm working with MySQL as back-end and I must perform a heavy select
against a 250 columns table that could store near 1 million records
aproximately.

Those are just numbers. They don't necessarily correlate to the
complexity of the execution of the query, or the size of the result set.
It would be more meaningful if you showed your table definition, and
the output of "EXPLAIN <yourquery>;"

Indexes can make a dramatic difference in the work a query must do.
Are your tables indexed in a way that is meaningful to your "heavy" select?

What's the best way to do this? I'm newbie to Java language and I've
started with:

ResultSet rs = stmt.executeQuery("SELECT ....");

However, the JDK is throwing me OutOfmemotyException while querying the
table, that today has got "only" 18000 records.

Of course it does. You need to query on a cursor, which may not be
possible on the MySQL JDBC driver.

You can keep track of how many trips you have made, and use LIMIT and
OFFSET in your queries, and then have precise control of the resultset
chunks. This has concurrency problems if you cannot lock while in process.

stmt.setFetchSize(##something reasonable##) might help.

This is from the manual
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

It will lock the connection until you are done, which might be good or
bad depending on your situation.

The bottom line is that the ResultSet default behavior is to suck the
entire result into a Vector. I think it is horrible that it is a path
to a runtime exception, personally.

jmcgill · Oct 20, 2006

steve said:
you listen to any of these noobies and you will get yourself into trouble,

Cashing , pagation , limiting queries "normalization of data" WTF!!!

1. Bring only the columns you absolutely need into the client.
2. start by limiting your query using the where clause.
3. USE A SCROLLABLE RESULTSET. / UPDATABLE SCROLLABLE RESULTSET.
This is part of the ODBC standard.

ODBC, but not JDBC. A JDBC query will (try to) suck the entire row set
into memory. There are some workarounds. See my prior post.

this allows you A "window" on your RECORD selection.

you setup:
how many records you want to see in the "window",
if you want to scroll forward/backward or both,
if the resultset is updatable.
if the updatable resultset is visible by other users.

If you know a way to do this via JDBC and the ConnectorJ with MySQL,
please give an example.

jmcgill · Oct 20, 2006

Simon said:
Trying to schlurp a whole table into RAM is not a good idea at the best of
times.

Nobody seems to have told this to the JDBC team, or MySQL's ConnectorJ
team.

jmcgill · Oct 20, 2006

javajoker said:
I'm agreeing with the other posters.. turn off caching, make more
selective queries (no full table scans) and pagination with query
limits if applicable. Maybe you should also think of normalizing that
table as well. Maybe it would be easier to troubleshoot if you gave an
example of how you planned to use the data after you retrieved it. In
most cases you shouldn't need every record stored in memory.

Your answer does not really address the basic problem: The JDBC
ResultSet, at least with ConnectorJ and MySQL, by default stores the
entire rowset into a Vector. You can use fetchSize(), but even that is
"merely a hint" as to how much to query in each pass, and the end result
is still to store the entire rowset in a Vector. This is true no matter
what form of caching is used, no matter how the tables are indexed, no
matter how normalized the table(s) in the join happen to be.

The client must paginate explicitly, e.g., with COUNT and LIMIT. Unless
he locks every table in the join, there's a concurrency problem; later
queries are from a different state than earlier queries, there's a
chance of duplicates or omissions, in other words, there is always the
chance that the resulting aggregation of queries _is not a partition_ of
the original full query.

Charles Hottel · Oct 20, 2006

Lonifasiko said:
Hi,

I'm working with MySQL as back-end and I must perform a heavy select
against a 250 columns table that could store near 1 million records
aproximately.

What's the best way to do this? I'm newbie to Java language and I've
started with:

ResultSet rs = stmt.executeQuery("SELECT ....");

However, the JDK is throwing me OutOfmemotyException while querying the
table, that today has got "only" 18000 records.

I would really appreciate any ideas.

Thanks very much in advance.

Besides limiting it vis the WHERE clause se also
CallableStatement.setMaxRows or PreparedStatement.setMaxRows

Xagyg · Oct 20, 2006

Lonifasiko said:
Hi,

I'm working with MySQL as back-end and I must perform a heavy select
against a 250 columns table that could store near 1 million records
aproximately.

What's the best way to do this?

<stuff deleted>

"Distributed result set iterator: a design pattern for efficient
retrieval of large result sets from remote data sources"

See http://dx.doi.org/10.1016/j.jpdc.2004.01.001

If you can't get access to this article, I'll post a version on a
website in a day or 2. Let me know.

Simon Brooke · Oct 20, 2006

Xagyg said:
<stuff deleted>

"Distributed result set iterator: a design pattern for efficient
retrieval of large result sets from remote data sources"

See http://dx.doi.org/10.1016/j.jpdc.2004.01.001

If you can't get access to this article, I'll post a version on a
website in a day or 2. Let me know.

From the abstract, that looks a very interesting article, which could solve
some problems I have. If you have the right to post copies, I'd appreciate
a look at it, too.

Simon Brooke · Oct 20, 2006

jmcgill said:
Nobody seems to have told this to the JDBC team, or MySQL's ConnectorJ
team.

No. JDBC is not, in my opinion, a hugely well engineered solution, but it
does have the benefit that it is a very simple and very general solution.
And, to be fair, it really ought to be possible to build a ResultSet
conforming to the API which actually fetched rows a few at a time,
although you would need specific server-side support for this.

I seem to remember the Oracle JDBC drivers may actually do this, but it's
too long since I was seriously using Oracle in anger.

cklee75 · Oct 20, 2006

Hi there,

It is great if you could share the article.

Thx.

Xagyg · Oct 20, 2006

From the abstract, that looks a very interesting article, which could solve

some problems I have. If you have the right to post copies, I'd appreciate
a look at it, too.

http://www.longbrothers.net/brad/

You want the linked papers (#4 and it's companion, the one at the very
bottom of the page).

Simon Brooke · Oct 20, 2006

Xagyg said:
http://www.longbrothers.net/brad/

You want the linked papers (#4 and it's companion, the one at the very
bottom of the page).

Thank you very much.

Best way to do this?	13	Jul 22, 2010
What's the best way to extract 2 values from a CSV file from each row systematically?	6	Sep 23, 2013
Defining a data structure - is there a standard/best practice way?	1	Nov 11, 2013
Writting to file from Database	1	Feb 7, 2006
best way to display resultset?	3	Sep 25, 2006
Good or best way to allocate a large array	44	Nov 9, 2009
Script to interact with a database	9	Jan 6, 2010
Two of three SQL stmts execute: third results in table does not exist exception	2	Mar 17, 2007

Best way to query a enormous database table

Lonifasiko

AW på ZRX

Thomas Kellerer

Manish Pandit

javajoker

javajoker

Lonifasiko

Simon Brooke

steve

jmcgill

jmcgill

jmcgill

jmcgill

Charles Hottel

Xagyg

Simon Brooke

Simon Brooke

cklee75

Xagyg

Simon Brooke

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads