Best way to query a enormous database table

L

Lonifasiko

Hi,

I'm working with MySQL as back-end and I must perform a heavy select
against a 250 columns table that could store near 1 million records
aproximately.

What's the best way to do this? I'm newbie to Java language and I've
started with:

ResultSet rs = stmt.executeQuery("SELECT ....");

However, the JDK is throwing me OutOfmemotyException while querying the
table, that today has got "only" 18000 records.

I would really appreciate any ideas.

Thanks very much in advance.
 
A

AW på ZRX

ResultSet rs = stmt.executeQuery("SELECT ....");
However, the JDK is throwing me OutOfmemotyException while querying the
table, that today has got "only" 18000 records.

I would really appreciate any ideas.

Are you trying to read all records into the java program?
It is better to let the database do the selection based on a where clause,
so that you only get the records needed.
The size of the table should not be a problem.

Also note that a common newbie error is to use Strings instead of
StringBuffers if you are doing a lot of String processing on the data.

Maybe a bit more code could help....

Allan
 
T

Thomas Kellerer

Hi,

I'm working with MySQL as back-end and I must perform a heavy select
against a 250 columns table that could store near 1 million records
aproximately.

What's the best way to do this? I'm newbie to Java language and I've
started with:

ResultSet rs = stmt.executeQuery("SELECT ....");

However, the JDK is throwing me OutOfmemotyException while querying the
table, that today has got "only" 18000 records.

I would really appreciate any ideas.

If it's throwing th OOME during executeQuery() you should check the docs
for the MySQL driver on how to disable the caching. I assume that the
driver tries to load all rows into memory.

If this happens during your processing of the rows, then the answer is
simple: Don't keep all rows in memory :)

Thomas
 
M

Manish Pandit

If you are selecting the rows to display them on some sort of a UI, try
to use pagination and limit the set you pick up from the database.
Refer to MySql manual for 'limit' clause. This will help you restrict
the number of rows returned. You can change the limit values for every
page and select a managable number of records rather than the entire
table.

-cheers,
Manish
 
J

javajoker

I'm agreeing with the other posters.. turn off caching, make more
selective queries (no full table scans) and pagination with query
limits if applicable. Maybe you should also think of normalizing that
table as well. Maybe it would be easier to troubleshoot if you gave an
example of how you planned to use the data after you retrieved it. In
most cases you shouldn't need every record stored in memory.

--Nick
http://www.lla50.com
 
J

javajoker

I'm agreeing with the other posters.. turn off caching, make more
selective queries (no full table scans) and pagination with query
limits if applicable. Maybe you should also think of normalizing that
table as well. Maybe it would be easier to troubleshoot if you gave an
example of how you planned to use the data after you retrieved it. In
most cases you shouldn't need every record stored in memory.

--Nick
http://www.lla50.com
 
L

Lonifasiko

Hi and thanks for both the replies.

In fact, I'm going to filter data from database using a where clause.
It was just an example ;-)

I know it's not a good idea to load into memory all rows. I'll see for
an alternative way to do it.

I'll also take a look at the MySQL driver properties for the caching
issue you mention.

Thanks very much.
 
S

Simon Brooke

I'm working with MySQL as back-end and I must perform a heavy select
against a 250 columns table that could store near 1 million records
aproximately.

I would get someone who knows what they're doing to design the database
schema first. Then I would use a proper heavyweight database engine (such
as Postgres). Then I would look up the SQL keywords 'LIMIT' and 'COUNT'.

Trying to schlurp a whole table into RAM is not a good idea at the best of
times.
 
S

steve

Hi and thanks for both the replies.

In fact, I'm going to filter data from database using a where clause.
It was just an example ;-)

I know it's not a good idea to load into memory all rows. I'll see for
an alternative way to do it.

I'll also take a look at the MySQL driver properties for the caching
issue you mention.

Thanks very much.



you listen to any of these noobies and you will get yourself into trouble,

Cashing , pagation , limiting queries "normalization of data" WTF!!!

1. Bring only the columns you absolutely need into the client.
2. start by limiting your query using the where clause.
3. USE A SCROLLABLE RESULTSET. / UPDATABLE SCROLLABLE RESULTSET.
This is part of the ODBC standard.

this allows you A "window" on your RECORD selection.

you setup:
how many records you want to see in the "window",
if you want to scroll forward/backward or both,
if the resultset is updatable.
if the updatable resultset is visible by other users.



ultimately you do ONE query, then let the ODBC driver do the hard work,
using "count " and other shitty solutions means you need to issue multiple
queries.


Steve
 
J

jmcgill

Lonifasiko said:
Hi,

I'm working with MySQL as back-end and I must perform a heavy select
against a 250 columns table that could store near 1 million records
aproximately.

Those are just numbers. They don't necessarily correlate to the
complexity of the execution of the query, or the size of the result set.
It would be more meaningful if you showed your table definition, and
the output of "EXPLAIN <yourquery>;"

Indexes can make a dramatic difference in the work a query must do.
Are your tables indexed in a way that is meaningful to your "heavy" select?
What's the best way to do this? I'm newbie to Java language and I've
started with:

ResultSet rs = stmt.executeQuery("SELECT ....");
However, the JDK is throwing me OutOfmemotyException while querying the
table, that today has got "only" 18000 records.

Of course it does. You need to query on a cursor, which may not be
possible on the MySQL JDBC driver.

You can keep track of how many trips you have made, and use LIMIT and
OFFSET in your queries, and then have precise control of the resultset
chunks. This has concurrency problems if you cannot lock while in process.

stmt.setFetchSize(##something reasonable##) might help.

This is from the manual
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

It will lock the connection until you are done, which might be good or
bad depending on your situation.

The bottom line is that the ResultSet default behavior is to suck the
entire result into a Vector. I think it is horrible that it is a path
to a runtime exception, personally.
 
J

jmcgill

steve said:
you listen to any of these noobies and you will get yourself into trouble,

Cashing , pagation , limiting queries "normalization of data" WTF!!!

1. Bring only the columns you absolutely need into the client.
2. start by limiting your query using the where clause.
3. USE A SCROLLABLE RESULTSET. / UPDATABLE SCROLLABLE RESULTSET.
This is part of the ODBC standard.

ODBC, but not JDBC. A JDBC query will (try to) suck the entire row set
into memory. There are some workarounds. See my prior post.
this allows you A "window" on your RECORD selection.

you setup:
how many records you want to see in the "window",
if you want to scroll forward/backward or both,
if the resultset is updatable.
if the updatable resultset is visible by other users.

If you know a way to do this via JDBC and the ConnectorJ with MySQL,
please give an example.
 
J

jmcgill

Simon said:
Trying to schlurp a whole table into RAM is not a good idea at the best of
times.

Nobody seems to have told this to the JDBC team, or MySQL's ConnectorJ
team.
 
J

jmcgill

javajoker said:
I'm agreeing with the other posters.. turn off caching, make more
selective queries (no full table scans) and pagination with query
limits if applicable. Maybe you should also think of normalizing that
table as well. Maybe it would be easier to troubleshoot if you gave an
example of how you planned to use the data after you retrieved it. In
most cases you shouldn't need every record stored in memory.

Your answer does not really address the basic problem: The JDBC
ResultSet, at least with ConnectorJ and MySQL, by default stores the
entire rowset into a Vector. You can use fetchSize(), but even that is
"merely a hint" as to how much to query in each pass, and the end result
is still to store the entire rowset in a Vector. This is true no matter
what form of caching is used, no matter how the tables are indexed, no
matter how normalized the table(s) in the join happen to be.

The client must paginate explicitly, e.g., with COUNT and LIMIT. Unless
he locks every table in the join, there's a concurrency problem; later
queries are from a different state than earlier queries, there's a
chance of duplicates or omissions, in other words, there is always the
chance that the resulting aggregation of queries _is not a partition_ of
the original full query.
 
C

Charles Hottel

Lonifasiko said:
Hi,

I'm working with MySQL as back-end and I must perform a heavy select
against a 250 columns table that could store near 1 million records
aproximately.

What's the best way to do this? I'm newbie to Java language and I've
started with:

ResultSet rs = stmt.executeQuery("SELECT ....");

However, the JDK is throwing me OutOfmemotyException while querying the
table, that today has got "only" 18000 records.

I would really appreciate any ideas.

Thanks very much in advance.

Besides limiting it vis the WHERE clause se also
CallableStatement.setMaxRows or PreparedStatement.setMaxRows
 
X

Xagyg

Lonifasiko said:
Hi,

I'm working with MySQL as back-end and I must perform a heavy select
against a 250 columns table that could store near 1 million records
aproximately.

What's the best way to do this?

<stuff deleted>

"Distributed result set iterator: a design pattern for efficient
retrieval of large result sets from remote data sources"

See http://dx.doi.org/10.1016/j.jpdc.2004.01.001

If you can't get access to this article, I'll post a version on a
website in a day or 2. Let me know.
 
S

Simon Brooke

Xagyg said:
<stuff deleted>

"Distributed result set iterator: a design pattern for efficient
retrieval of large result sets from remote data sources"

See http://dx.doi.org/10.1016/j.jpdc.2004.01.001

If you can't get access to this article, I'll post a version on a
website in a day or 2. Let me know.

From the abstract, that looks a very interesting article, which could solve
some problems I have. If you have the right to post copies, I'd appreciate
a look at it, too.
 
S

Simon Brooke

jmcgill said:
Nobody seems to have told this to the JDBC team, or MySQL's ConnectorJ
team.

No. JDBC is not, in my opinion, a hugely well engineered solution, but it
does have the benefit that it is a very simple and very general solution.
And, to be fair, it really ought to be possible to build a ResultSet
conforming to the API which actually fetched rows a few at a time,
although you would need specific server-side support for this.

I seem to remember the Oracle JDBC drivers may actually do this, but it's
too long since I was seriously using Oracle in anger.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top