Java heap and big database queries

Discussion in 'Java' started by Patricia Shanahan, Feb 10, 2007.

  1. I'm using Java to query a MySQL database through com.mysql.jdbc.Driver.

    My Java program calculates a relatively small summary of the query
    result, and I'm sure my Java structure would fit in the default heap memory.

    However, I run out of heap memory while running with -Xmx700m.

    The program runs perfectly on a million line subset of the database, but
    fails on the full database, over 88 million lines.

    My query is:

    String getData = "select count(*), "
    + "sum(net_amt), "
    + "exists(select * from stores where stores.store_id =
    transactions.store_id) "
    + "from transactions group by txn_id;";

    ....

    Statement st = con.createStatement();
    ResultSet result = st.executeQuery(getData);

    There are about 10,160,053 distinct values of txn_id.

    Any ideas for what I may be doing wrong? Suggestions for how to obtain
    and scan a large query result without Java heap problems?

    Thanks,

    Patricia
     
    Patricia Shanahan, Feb 10, 2007
    #1
    1. Advertising

  2. Patricia Shanahan

    Chris Uppal Guest

    Patricia Shanahan wrote:

    > The program runs perfectly on a million line subset of the database, but
    > fails on the full database, over 88 million lines.


    See this page:

    http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html

    and scan down about 2/3 to a sub-section titled "ResultSet".

    (Made my hair stand on end... There is no way, just /no way/, I am ever going
    to choose MySQL for /anything/.)

    -- chris
     
    Chris Uppal, Feb 10, 2007
    #2
    1. Advertising

  3. Chris Uppal wrote:
    > Patricia Shanahan wrote:
    >
    >> The program runs perfectly on a million line subset of the database, but
    >> fails on the full database, over 88 million lines.

    >
    > See this page:
    >
    > http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html
    >
    > and scan down about 2/3 to a sub-section titled "ResultSet".


    Thanks. That certainly matches the symptoms I'm seeing.

    > (Made my hair stand on end... There is no way, just /no way/, I am ever going
    > to choose MySQL for /anything/.)


    I picked MySQL because I had worked on a project that used it, but
    for far smaller tables, and the DB comparison web pages I found seemed
    to indicate it was reasonable performance for large tables.

    I'm very open to suggestions for better choices, especially after this
    experience. I need free or cheap for academic and personal use. No need
    for commercial use.

    Patricia
     
    Patricia Shanahan, Feb 10, 2007
    #3
  4. Patricia Shanahan

    Paul Tomblin Guest

    In a previous article, Patricia Shanahan <> said:
    >I'm very open to suggestions for better choices, especially after this
    >experience. I need free or cheap for academic and personal use. No need
    >for commercial use.


    Postgresql implemented more of the SQL standard (although MySQL has caught
    up in the last couple of releases as long as you use InnoDB) and is really
    free, even for commercial use


    --
    Paul Tomblin <> http://blog.xcski.com/
    Last I checked, it wasn't the power cord for the Clue Generator
    that was sticking up your ass.
    -- John Novak
     
    Paul Tomblin, Feb 10, 2007
    #4
  5. Patricia Shanahan

    Chris Smith Guest

    Patricia Shanahan <> wrote:
    > I'm very open to suggestions for better choices, especially after this
    > experience. I need free or cheap for academic and personal use. No need
    > for commercial use.


    I second the recommendation for PostgreSQL. I have used it extensively
    from Java applications with large amounts of data, and have run into few
    problems. I did run into a quirk in the 7.x release cycle, but I was
    able to easily patch the JDBC driver, and the problem is completely
    solved with the JDBC drivers for the 8.x versions.

    PostgreSQL still does not implement all of JDBC perfectly. In
    particular, PostgreSQL does fall back to pre-fetching more complex
    result sets (e.g., scrollable result sets). A full list of limitations
    is available at http://jdbc.postgresql.org/todo.html

    That said, I am not aware of any low-cost database that does better than
    PostgreSQL. It is far better than anything else I've tried.

    --
    Chris Smith
     
    Chris Smith, Feb 10, 2007
    #5
  6. Patricia Shanahan wrote:
    > I'm very open to suggestions for better choices, especially after this
    > experience. I need free or cheap for academic and personal use. No need
    > for commercial use.


    MySQL, PostgreSQL and FireBird are the usual suspects for a
    free database server.

    Arne
     
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=, Feb 10, 2007
    #6
  7. Patricia Shanahan

    David Segall Guest

    Patricia Shanahan <> wrote:

    >Chris Uppal wrote:
    >> Patricia Shanahan wrote:
    >>
    >>> The program runs perfectly on a million line subset of the database, but
    >>> fails on the full database, over 88 million lines.

    >>
    >> See this page:
    >>
    >> http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html
    >>
    >> and scan down about 2/3 to a sub-section titled "ResultSet".

    >
    >Thanks. That certainly matches the symptoms I'm seeing.
    >
    >> (Made my hair stand on end... There is no way, just /no way/, I am ever going
    >> to choose MySQL for /anything/.)

    >
    >I picked MySQL because I had worked on a project that used it, but
    >for far smaller tables, and the DB comparison web pages I found seemed
    >to indicate it was reasonable performance for large tables.
    >
    >I'm very open to suggestions for better choices, especially after this
    >experience. I need free or cheap for academic and personal use. No need
    >for commercial use.

    I can't compare them but I have what I believe to be a complete list
    of freely distributable "heavy duty" databases at
    <http://database.profectus.com.au/>. Of these, MySql has the most
    restrictive license and that includes IBM, Oracle and Microsoft.
     
    David Segall, Feb 11, 2007
    #7
  8. "Patricia Shanahan" <> wrote in message
    news:eql7a4$ivp$...

    > I'm very open to suggestions for better choices, especially after this
    > experience. I need free or cheap for academic and personal use. No need
    > for commercial use.


    Piggybacking in Patricia's question, I'm curious if anyone has experience
    with the newly open-sourced Ingres. (The commerical version, not University
    Ingres.)
     
    Mike Schilling, Feb 11, 2007
    #8
  9. David Segall wrote:
    > I can't compare them but I have what I believe to be a complete list
    > of freely distributable "heavy duty" databases at
    > <http://database.profectus.com.au/>. Of these, MySql has the most
    > restrictive license and that includes IBM, Oracle and Microsoft.


    The text for MySQL is rather imprecise.

    You can use MySQL all you want with closed source.

    You can use the MySQL connectors all you want with closed source.

    As long as you do not distribute.

    If you distribute your stuff and it is "linked" with MySQL
    code then then you need to buy a commercial license or
    make your stuff GPL (or at least FLOSS when it is MySQL).

    How to define "linked" can be a bit fuzzy.

    But it is not something special for MySQL - this is
    how GPL works.

    It is rather unusual to use GPL for a library - often
    LGPL (or an Apache/BSD style license) is chosen.

    The true GPL believers will not consider that a problem.

    But there are a reason that Apache/BSD sometimes is called
    business friendly open source license.

    Arne
     
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=, Feb 12, 2007
    #9
  10. Patricia Shanahan

    David Segall Guest

    Arne Vajhøj <> wrote:

    >If you distribute your stuff and it is "linked" with MySQL
    >code then then you need to buy a commercial license or
    >make your stuff GPL (or at least FLOSS when it is MySQL).
    >
    >How to define "linked" can be a bit fuzzy.
    >
    >But it is not something special for MySQL - this is
    >how GPL works.

    I am not qualified to interpret the GPL license but MySQL AB's
    interpretation contrasts with the thousands of commercial applications
    that are "linked" to Linux and dozens of other GPLd programs. Since,
    as far as I am aware, these problems don't arise with the other freely
    distributable databases I list at <http://database.profectus.com.au>,
    including those from Oracle and Microsoft, it seems foolish to choose
    MySQL.
     
    David Segall, Feb 12, 2007
    #10
  11. David Segall wrote:
    > Arne Vajhøj <> wrote:
    >> If you distribute your stuff and it is "linked" with MySQL
    >> code then then you need to buy a commercial license or
    >> make your stuff GPL (or at least FLOSS when it is MySQL).
    >>
    >> How to define "linked" can be a bit fuzzy.
    >>
    >> But it is not something special for MySQL - this is
    >> how GPL works.

    > I am not qualified to interpret the GPL license but MySQL AB's
    > interpretation contrasts with the thousands of commercial applications
    > that are "linked" to Linux and dozens of other GPLd programs. Since,
    > as far as I am aware, these problems don't arise with the other freely
    > distributable databases I list at <http://database.profectus.com.au>,
    > including those from Oracle and Microsoft, it seems foolish to choose
    > MySQL.


    PostgreSQL uses BSD license.

    Firebird uses a derivative of MPL.

    Both are "business friendly".

    Linux is not a problem for two reasons:
    - applications usually does not interact directly
    with Linux but uses GLIBC which is LGPL
    - Linus Torvalds has a definition of linking that
    is a bit narrower than MySQL's

    No huge contrast - just a small difference.

    If your page only is for those that produce closed
    source to distribute to others, then you are correct,
    but should probably state so on the page so other
    people does not waste their time reading your advice.

    Arne
     
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=, Feb 13, 2007
    #11
  12. Arne Vajhøj wrote:
    > If your page only is for those that produce closed
    > source to distribute to others, then you are correct,
    > but should probably state so on the page so other
    > people does not waste their time reading your advice.


    Well actually you do so.

    So the above is not correct.

    But I still do not like it - all the relevant
    keywords: GPL, linking and distribution is missing.

    Arne
     
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=, Feb 13, 2007
    #12
  13. Patricia Shanahan

    Chris Smith Guest

    Arne Vajhøj <> wrote:
    > The text for MySQL is rather imprecise.
    >
    > You can use MySQL all you want with closed source.
    >
    > You can use the MySQL connectors all you want with closed source.
    >
    > As long as you do not distribute.


    See http://www.mysql.com/company/legal/licensing/commercial-license.html
    for information, though, on MySQL's definition of "distribute". It may
    surprise you. If you write them for clarification, their answers may do
    even more to surprise you. MySQL feels that any time that anyone might
    possibly use your application with MySQL, you should pay them money.
    They call this distribution, despite the obviously contrary common
    meaning of that word. I was told, at one point, that they believe this
    even if all I do is write an application to conform to a standard API
    such as JDBC. In their view, if you write a portable database
    application, and later someone else installs MySQL and uses your
    application with it, you owe them money. Apparently, they feel that
    JDBC, ODBC, and other APIs are all incorporated into their product
    because they provide an implementation of the APIs.

    Could you fight them on this and win in court? I don't know. Most
    likely, it depends on how much money you have, among other factors.
    Ultimately, though, you're ethically bound to respect their decisions
    with regard to the distribution of their software. You are also well-
    advised from a practical perspective not to deal with anyone who is so
    deluded and hires lawyers on a regular basis.

    I just wish they would stop their campaign of misinformation about the
    GPL license. Everything else they do is easy to avoid quite neatly.
    PostgreSQL works fine, so use it.

    --
    Chris Smith
     
    Chris Smith, Feb 13, 2007
    #13
  14. Patricia Shanahan

    David Segall Guest

    Arne Vajhøj <> wrote:

    >Arne Vajhøj wrote:
    >> If your page only is for those that produce closed
    >> source to distribute to others, then you are correct,
    >> but should probably state so on the page so other
    >> people does not waste their time reading your advice.

    >
    >Well actually you do so.
    >
    >So the above is not correct.
    >
    >But I still do not like it - all the relevant
    >keywords: GPL, linking and distribution is missing.
    >
    >Arne
     
    David Segall, Feb 13, 2007
    #14
  15. Patricia Shanahan

    David Segall Guest

    Arne Vajhøj <> wrote:

    >Arne Vajhøj wrote:
    >> If your page only is for those that produce closed
    >> source to distribute to others,

    I cannot accept "only" in that sentence. My intention was that
    _everyone_ could use the software I listed in any way they chose.
    >>then you are correct,
    >> but should probably state so on the page so other
    >> people does not waste their time reading your advice.

    >
    >Well actually you do so.
    >
    >So the above is not correct.
    >
    >But I still do not like it - all the relevant
    >keywords: GPL, linking and distribution is missing.


    Arne, I want you to like it! If I can achieve this by adding some
    brief, relevant information to the web page I would be pleased to do
    so and would appreciate it if you would post the information here or
    email it to david at segall dot net. I am reluctant to include any
    detailed licensing information because each of the products listed
    comes with several pages of legal restrictions that I do not wish to
    maintain.

    I added the licensing qualifications to the MySQL entry in my list
    because I was concerned by a previous post from Chris Smith that
    contained the information that he has posted in this thread. Because
    MySQL is so popular and many people, encouraged by MySQL AB, consider
    it to be freely distributable I chose to qualify the entry in my list
    rather than simply delete it.
     
    David Segall, Feb 13, 2007
    #15
  16. Patricia Shanahan

    Lew Guest

    David Segall wrote:
    > Because
    > MySQL is so popular and many people, encouraged by MySQL AB, consider
    > it to be freely distributable I chose to qualify the entry in my list
    > rather than simply delete it.


    MySQL has two licenses, one of which is the GPL. It is no more nor less
    restricted than other GPLed code.

    What people "consider" does not determine the license. The license terms
    published by MySQL AB determine the license. You could go straight to them for
    the definitive information instead of relying on rumor.

    - Lew
     
    Lew, Feb 13, 2007
    #16
  17. Lew wrote:
    > MySQL has two licenses, one of which is the GPL. It is no more nor less
    > restricted than other GPLed code.
    >
    > What people "consider" does not determine the license. The license terms
    > published by MySQL AB determine the license. You could go straight to
    > them for the definitive information instead of relying on rumor.


    Which is not true.

    MySQL choose the GPL license.

    It is not up to them to interpret the GPL license.

    FSF must be the authoritative source on how the GPL
    license should be interpreted.

    Especially since MySQL has an economic interest in
    FUD'ing a bit.

    Arne
     
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=, Feb 14, 2007
    #17
  18. Chris Smith wrote:
    > Arne Vajhøj <> wrote:
    >> The text for MySQL is rather imprecise.
    >>
    >> You can use MySQL all you want with closed source.
    >>
    >> You can use the MySQL connectors all you want with closed source.
    >>
    >> As long as you do not distribute.

    >
    > See http://www.mysql.com/company/legal/licensing/commercial-license.html
    > for information, though, on MySQL's definition of "distribute". It may
    > surprise you. If you write them for clarification, their answers may do
    > even more to surprise you. MySQL feels that any time that anyone might
    > possibly use your application with MySQL, you should pay them money.
    > They call this distribution, despite the obviously contrary common
    > meaning of that word. I was told, at one point, that they believe this
    > even if all I do is write an application to conform to a standard API
    > such as JDBC. In their view, if you write a portable database
    > application, and later someone else installs MySQL and uses your
    > application with it, you owe them money. Apparently, they feel that
    > JDBC, ODBC, and other APIs are all incorporated into their product
    > because they provide an implementation of the APIs.
    >
    > Could you fight them on this and win in court? I don't know. Most
    > likely, it depends on how much money you have, among other factors.
    > Ultimately, though, you're ethically bound to respect their decisions
    > with regard to the distribution of their software.


    No.

    They decided to use the GPL license.

    We need to respect the GPL license.

    I think it will be considered extremely unethical by the GPL
    software community to allow "extra conditions" on GPL'ed code.

    If they want a MySQL Open Source License with a twist
    compared to standard GPL, then they should make one.

    Arne
     
    =?ISO-8859-15?Q?Arne_Vajh=F8j?=, Feb 14, 2007
    #18
  19. Patricia Shanahan

    Chris Smith Guest

    Lew <> wrote:
    > MySQL has two licenses, one of which is the GPL. It is no more nor less
    > restricted than other GPLed code.


    In a legal sense, that's true. I absolutely agree that the GPL doesn't
    say what MySQL says that it does. Their interpretation is hard to
    believe in terms of the text of the agreement (IANAL), and common sense
    tells us they are completely loony.

    Nevertheless, do you want to defend yourself in court if they sue you?
    That's my main concern. There are a lot of crazy people with whom I
    avoid any kind of relationship. Even if they can't legally do anything
    to hurt me, they can cause quite a few problems along the way.

    (My other concern is that their web page misleads people on what the
    GPL, in general, does. That seems to be rather widespread, though. The
    whole free software world often seems to subscribe to the "wishful
    thinking" approach to interpreting legal agreements.)

    --
    Chris Smith
     
    Chris Smith, Feb 14, 2007
    #19
  20. Patricia Shanahan

    David Segall Guest

    Lew <> wrote:

    >
    >What people "consider" does not determine the license. The license terms
    >published by MySQL AB determine the license. You could go straight to them for
    >the definitive information instead of relying on rumor.

    Why bother? Chris Smith has pointed out that MySQL AB are deliberately
    vague about their license and Arne Vajhøj has put forward a good case
    for interpreting their license as a "legally defined" GPL licence.

    I have not seen _any_ argument for preferring MySQL over the other
    databases I listed at http://database.profectus.com.au. Why risk a
    legal battle when you have several alternatives that do not attempt to
    restrict how you use the database?
     
    David Segall, Feb 15, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ian Roddis

    xslt queries in xml to SQL queries

    Ian Roddis, Feb 26, 2006, in forum: Python
    Replies:
    3
    Views:
    1,561
    Crutcher
    Feb 26, 2006
  2. Shaguf
    Replies:
    0
    Views:
    560
    Shaguf
    Dec 24, 2008
  3. Shaguf
    Replies:
    0
    Views:
    507
    Shaguf
    Dec 26, 2008
  4. Shaguf
    Replies:
    0
    Views:
    277
    Shaguf
    Dec 26, 2008
  5. Shaguf
    Replies:
    0
    Views:
    258
    Shaguf
    Dec 24, 2008
Loading...

Share This Page