Java heap and big database queries

P

Patricia Shanahan

I'm using Java to query a MySQL database through com.mysql.jdbc.Driver.

My Java program calculates a relatively small summary of the query
result, and I'm sure my Java structure would fit in the default heap memory.

However, I run out of heap memory while running with -Xmx700m.

The program runs perfectly on a million line subset of the database, but
fails on the full database, over 88 million lines.

My query is:

String getData = "select count(*), "
+ "sum(net_amt), "
+ "exists(select * from stores where stores.store_id =
transactions.store_id) "
+ "from transactions group by txn_id;";

....

Statement st = con.createStatement();
ResultSet result = st.executeQuery(getData);

There are about 10,160,053 distinct values of txn_id.

Any ideas for what I may be doing wrong? Suggestions for how to obtain
and scan a large query result without Java heap problems?

Thanks,

Patricia
 
P

Patricia Shanahan

Chris said:
See this page:

http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html

and scan down about 2/3 to a sub-section titled "ResultSet".

Thanks. That certainly matches the symptoms I'm seeing.
(Made my hair stand on end... There is no way, just /no way/, I am ever going
to choose MySQL for /anything/.)

I picked MySQL because I had worked on a project that used it, but
for far smaller tables, and the DB comparison web pages I found seemed
to indicate it was reasonable performance for large tables.

I'm very open to suggestions for better choices, especially after this
experience. I need free or cheap for academic and personal use. No need
for commercial use.

Patricia
 
P

Paul Tomblin

In a previous article said:
I'm very open to suggestions for better choices, especially after this
experience. I need free or cheap for academic and personal use. No need
for commercial use.

Postgresql implemented more of the SQL standard (although MySQL has caught
up in the last couple of releases as long as you use InnoDB) and is really
free, even for commercial use
 
C

Chris Smith

Patricia Shanahan said:
I'm very open to suggestions for better choices, especially after this
experience. I need free or cheap for academic and personal use. No need
for commercial use.

I second the recommendation for PostgreSQL. I have used it extensively
from Java applications with large amounts of data, and have run into few
problems. I did run into a quirk in the 7.x release cycle, but I was
able to easily patch the JDBC driver, and the problem is completely
solved with the JDBC drivers for the 8.x versions.

PostgreSQL still does not implement all of JDBC perfectly. In
particular, PostgreSQL does fall back to pre-fetching more complex
result sets (e.g., scrollable result sets). A full list of limitations
is available at http://jdbc.postgresql.org/todo.html

That said, I am not aware of any low-cost database that does better than
PostgreSQL. It is far better than anything else I've tried.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Patricia said:
I'm very open to suggestions for better choices, especially after this
experience. I need free or cheap for academic and personal use. No need
for commercial use.

MySQL, PostgreSQL and FireBird are the usual suspects for a
free database server.

Arne
 
D

David Segall

Patricia Shanahan said:
Thanks. That certainly matches the symptoms I'm seeing.


I picked MySQL because I had worked on a project that used it, but
for far smaller tables, and the DB comparison web pages I found seemed
to indicate it was reasonable performance for large tables.

I'm very open to suggestions for better choices, especially after this
experience. I need free or cheap for academic and personal use. No need
for commercial use.
I can't compare them but I have what I believe to be a complete list
of freely distributable "heavy duty" databases at
<http://database.profectus.com.au/>. Of these, MySql has the most
restrictive license and that includes IBM, Oracle and Microsoft.
 
M

Mike Schilling

I'm very open to suggestions for better choices, especially after this
experience. I need free or cheap for academic and personal use. No need
for commercial use.

Piggybacking in Patricia's question, I'm curious if anyone has experience
with the newly open-sourced Ingres. (The commerical version, not University
Ingres.)
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

David said:
I can't compare them but I have what I believe to be a complete list
of freely distributable "heavy duty" databases at
<http://database.profectus.com.au/>. Of these, MySql has the most
restrictive license and that includes IBM, Oracle and Microsoft.

The text for MySQL is rather imprecise.

You can use MySQL all you want with closed source.

You can use the MySQL connectors all you want with closed source.

As long as you do not distribute.

If you distribute your stuff and it is "linked" with MySQL
code then then you need to buy a commercial license or
make your stuff GPL (or at least FLOSS when it is MySQL).

How to define "linked" can be a bit fuzzy.

But it is not something special for MySQL - this is
how GPL works.

It is rather unusual to use GPL for a library - often
LGPL (or an Apache/BSD style license) is chosen.

The true GPL believers will not consider that a problem.

But there are a reason that Apache/BSD sometimes is called
business friendly open source license.

Arne
 
D

David Segall

Arne Vajhøj said:
If you distribute your stuff and it is "linked" with MySQL
code then then you need to buy a commercial license or
make your stuff GPL (or at least FLOSS when it is MySQL).

How to define "linked" can be a bit fuzzy.

But it is not something special for MySQL - this is
how GPL works.
I am not qualified to interpret the GPL license but MySQL AB's
interpretation contrasts with the thousands of commercial applications
that are "linked" to Linux and dozens of other GPLd programs. Since,
as far as I am aware, these problems don't arise with the other freely
distributable databases I list at <http://database.profectus.com.au>,
including those from Oracle and Microsoft, it seems foolish to choose
MySQL.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

David said:
I am not qualified to interpret the GPL license but MySQL AB's
interpretation contrasts with the thousands of commercial applications
that are "linked" to Linux and dozens of other GPLd programs. Since,
as far as I am aware, these problems don't arise with the other freely
distributable databases I list at <http://database.profectus.com.au>,
including those from Oracle and Microsoft, it seems foolish to choose
MySQL.

PostgreSQL uses BSD license.

Firebird uses a derivative of MPL.

Both are "business friendly".

Linux is not a problem for two reasons:
- applications usually does not interact directly
with Linux but uses GLIBC which is LGPL
- Linus Torvalds has a definition of linking that
is a bit narrower than MySQL's

No huge contrast - just a small difference.

If your page only is for those that produce closed
source to distribute to others, then you are correct,
but should probably state so on the page so other
people does not waste their time reading your advice.

Arne
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Arne said:
If your page only is for those that produce closed
source to distribute to others, then you are correct,
but should probably state so on the page so other
people does not waste their time reading your advice.

Well actually you do so.

So the above is not correct.

But I still do not like it - all the relevant
keywords: GPL, linking and distribution is missing.

Arne
 
C

Chris Smith

Arne Vajhøj said:
The text for MySQL is rather imprecise.

You can use MySQL all you want with closed source.

You can use the MySQL connectors all you want with closed source.

As long as you do not distribute.

See http://www.mysql.com/company/legal/licensing/commercial-license.html
for information, though, on MySQL's definition of "distribute". It may
surprise you. If you write them for clarification, their answers may do
even more to surprise you. MySQL feels that any time that anyone might
possibly use your application with MySQL, you should pay them money.
They call this distribution, despite the obviously contrary common
meaning of that word. I was told, at one point, that they believe this
even if all I do is write an application to conform to a standard API
such as JDBC. In their view, if you write a portable database
application, and later someone else installs MySQL and uses your
application with it, you owe them money. Apparently, they feel that
JDBC, ODBC, and other APIs are all incorporated into their product
because they provide an implementation of the APIs.

Could you fight them on this and win in court? I don't know. Most
likely, it depends on how much money you have, among other factors.
Ultimately, though, you're ethically bound to respect their decisions
with regard to the distribution of their software. You are also well-
advised from a practical perspective not to deal with anyone who is so
deluded and hires lawyers on a regular basis.

I just wish they would stop their campaign of misinformation about the
GPL license. Everything else they do is easy to avoid quite neatly.
PostgreSQL works fine, so use it.
 
D

David Segall

Arne Vajhøj said:
Well actually you do so.

So the above is not correct.

But I still do not like it - all the relevant
keywords: GPL, linking and distribution is missing.

Arne
 
D

David Segall

I cannot accept "only" in that sentence. My intention was that
_everyone_ could use the software I listed in any way they chose.
Well actually you do so.

So the above is not correct.

But I still do not like it - all the relevant
keywords: GPL, linking and distribution is missing.

Arne, I want you to like it! If I can achieve this by adding some
brief, relevant information to the web page I would be pleased to do
so and would appreciate it if you would post the information here or
email it to david at segall dot net. I am reluctant to include any
detailed licensing information because each of the products listed
comes with several pages of legal restrictions that I do not wish to
maintain.

I added the licensing qualifications to the MySQL entry in my list
because I was concerned by a previous post from Chris Smith that
contained the information that he has posted in this thread. Because
MySQL is so popular and many people, encouraged by MySQL AB, consider
it to be freely distributable I chose to qualify the entry in my list
rather than simply delete it.
 
L

Lew

David said:
Because
MySQL is so popular and many people, encouraged by MySQL AB, consider
it to be freely distributable I chose to qualify the entry in my list
rather than simply delete it.

MySQL has two licenses, one of which is the GPL. It is no more nor less
restricted than other GPLed code.

What people "consider" does not determine the license. The license terms
published by MySQL AB determine the license. You could go straight to them for
the definitive information instead of relying on rumor.

- Lew
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Lew said:
MySQL has two licenses, one of which is the GPL. It is no more nor less
restricted than other GPLed code.

What people "consider" does not determine the license. The license terms
published by MySQL AB determine the license. You could go straight to
them for the definitive information instead of relying on rumor.

Which is not true.

MySQL choose the GPL license.

It is not up to them to interpret the GPL license.

FSF must be the authoritative source on how the GPL
license should be interpreted.

Especially since MySQL has an economic interest in
FUD'ing a bit.

Arne
 
?

=?ISO-8859-15?Q?Arne_Vajh=F8j?=

Chris said:
See http://www.mysql.com/company/legal/licensing/commercial-license.html
for information, though, on MySQL's definition of "distribute". It may
surprise you. If you write them for clarification, their answers may do
even more to surprise you. MySQL feels that any time that anyone might
possibly use your application with MySQL, you should pay them money.
They call this distribution, despite the obviously contrary common
meaning of that word. I was told, at one point, that they believe this
even if all I do is write an application to conform to a standard API
such as JDBC. In their view, if you write a portable database
application, and later someone else installs MySQL and uses your
application with it, you owe them money. Apparently, they feel that
JDBC, ODBC, and other APIs are all incorporated into their product
because they provide an implementation of the APIs.

Could you fight them on this and win in court? I don't know. Most
likely, it depends on how much money you have, among other factors.
Ultimately, though, you're ethically bound to respect their decisions
with regard to the distribution of their software.

No.

They decided to use the GPL license.

We need to respect the GPL license.

I think it will be considered extremely unethical by the GPL
software community to allow "extra conditions" on GPL'ed code.

If they want a MySQL Open Source License with a twist
compared to standard GPL, then they should make one.

Arne
 
C

Chris Smith

Lew said:
MySQL has two licenses, one of which is the GPL. It is no more nor less
restricted than other GPLed code.

In a legal sense, that's true. I absolutely agree that the GPL doesn't
say what MySQL says that it does. Their interpretation is hard to
believe in terms of the text of the agreement (IANAL), and common sense
tells us they are completely loony.

Nevertheless, do you want to defend yourself in court if they sue you?
That's my main concern. There are a lot of crazy people with whom I
avoid any kind of relationship. Even if they can't legally do anything
to hurt me, they can cause quite a few problems along the way.

(My other concern is that their web page misleads people on what the
GPL, in general, does. That seems to be rather widespread, though. The
whole free software world often seems to subscribe to the "wishful
thinking" approach to interpreting legal agreements.)
 
D

David Segall

Lew said:
What people "consider" does not determine the license. The license terms
published by MySQL AB determine the license. You could go straight to them for
the definitive information instead of relying on rumor.
Why bother? Chris Smith has pointed out that MySQL AB are deliberately
vague about their license and Arne Vajhøj has put forward a good case
for interpreting their license as a "legally defined" GPL licence.

I have not seen _any_ argument for preferring MySQL over the other
databases I listed at http://database.profectus.com.au. Why risk a
legal battle when you have several alternatives that do not attempt to
restrict how you use the database?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top