Stack overflow . . . what should I do about it?

Discussion in 'C++' started by Aguilar, James, Sep 4, 2004.

  1. Hello all,

    To begin, yes, this -is- a homework assignment. However, it is for my
    Algorithms class, and my instructor has given us explicit permission to use
    "expert" groups like newsgroups, so if that's your only reason not to help,
    please do. Otherwise, I guess it's OK. But, just remember, I'm not asking
    you to do my work for me, just to point out my error.

    My problem is not with the algorithm itself (standard divide and conquer on
    a set of points to find the closest pair), but with the C++, that is, my
    writing of it. I seem to be getting a stack overflow exception when the
    input is too large, and I don't know what to do about it. This exception
    has occurred when I compile on g++ in Cygwin and when I compile using VC++
    (whatever the newest version is) on Windows. Cygwin does not do anything
    when it fails, it simply stops in the middle of printing the diagnostics
    (that is, printing a running tally of the closest points). I loaded up the
    Beta version of the Visual C++ ide since I don't know gdb very well/at all.
    When I got to a certain point in the code, VC++ told me that a stack
    overflow exception had been thrown.

    I know my algorithm is correct, because it works on random cases between 10
    and 100, so I think that if I can defeat this stack overflow, I will have
    defeated the entire problem. I am including the code that I think is
    relevant: there is other code in the project, but it just performs
    administrative tasks. This is the entire algorithm. More talking at the
    end.

    ----- CODE BEGINS -----
    // CLOSEST-PAIR.CC
    // Implementation of the closest-pair algorithm

    #include <iostream>
    #include <cfloat>
    #include "closest-pair.hh"

    const double INFINITY = DBL_MAX; //bigger than any possible interpoint
    distance
    void printPointPair(const Point&, const Point&);
    void printPointPair(const PointPair&);
    PointPair divide(Point**, Point**, int);
    /*
    Closest Pair Functions

    Given a collection of nPoints points, find and ***print***
    * the closest pair of points
    * the distance between them
    in the form "(x1, y1) (x2, y2) distance"

    INPUTS:
    - points sorted in nondecreasing order by X coordinate
    - points sorted in nondecreasing order by Y coordinate
    - # of input points
    */


    PointPair bruteForce(Point* pointsByX[], Point* pointsByY[], int nPoints)
    {
    Point a, b;
    double min = INFINITY, dist;
    for (int i = 0; i < nPoints - 1; ++i)
    for (int j = i + 1; j < nPoints; ++j) {
    dist = pointsByX->distance(pointsByX[j]);
    if (min > dist) {
    min = dist;
    a = *pointsByX;
    b = *pointsByX[j];
    }
    }
    printPointPair(a, b);
    return PointPair(a, b);
    }

    PointPair divCon(Point* pointsByX[], Point* pointsByY[], int nPoints)
    {
    PointPair p = divide(pointsByX, pointsByY, nPoints);
    printPointPair(p);
    return p;
    }

    PointPair divide(Point* pointsByX[], Point* pointsByY[], int nPoints)
    {
    using namespace std;
    if (nPoints > 3) {
    //divide
    // set line
    int bisect = pointsByX[nPoints/2]->x();
    // sort Y lists to create new ones
    Point* yL[nPoints];
    Point* yR[nPoints];
    int lenL = 0, lenR = 0;
    for (int i = 0; i < nPoints; ++i) {
    if (pointsByY->x() <= bisect) {
    yL[lenL] = pointsByY;
    ++lenL;
    }
    if (pointsByY->x() >= bisect) {
    yR[lenR] = pointsByY;
    ++lenR;
    }
    }

    //conquer
    // pass sorted arrays for repeat
    PointPair pair1 = divide(pointsByX, yL, lenL);
    PointPair pair2 = divide(pointsByX + nPoints - lenR, yR, lenR);
    PointPair correct;
    //combine
    // find mindist
    double p1Dist = pair1.distance(),
    p2Dist = pair2.distance(),
    min = 0;
    if (p1Dist < p2Dist) {
    correct = pair1;
    min = p1Dist;
    } else {
    correct = pair2;
    min = p2Dist;
    }

    // remove all non-2(mindist) points from y array
    // I reuse yL here, since it is the longer than the array I need anyway
    int stripLen = 0;
    for (int i = 0; i < nPoints; ++i) {
    if (pointsByY->x() >= bisect - min
    || pointsByY->x() <= bisect + min) {
    yL[stripLen] = pointsByY;
    ++stripLen;
    }
    }
    // then check next seven repeated until the end of the list
    for (int i = 0; i < stripLen; ++i)
    for (int j = i+1; j-i < 8 && j < stripLen; ++j)
    if (yL->distance(yL[j]) < min) {
    min = yL->distance(yL[j]);
    correct.p1() = yL;
    correct.p2() = yL[j];
    }
    return correct;
    } else {
    return bruteForce(pointsByX, pointsByY, nPoints);
    }
    }

    void printPointPair(const PointPair& pp)
    {
    using namespace std;
    double dist = pp.distance();

    cout << pp << " " << dist << endl;
    }

    void printPointPair(const Point& a, const Point& b)
    {
    using namespace std;
    double dist = a.distance(&b);
    cout << a << " " << b << " " << dist << endl;
    }

    ----- CODE ENDS -----

    I thought that I might be concerned about the temporary arrays that I
    create. However, I've been informed that since they are temporary (here I
    am referring to yL and yR) they are deleted every time I exit their scope.
    If this is the case, it would seem (to me) logical that the stack overflow
    would occur on the first branch, but this is not the case. The first
    branch, even on very large inputs (up to n = 10000) completes successfully
    (according to the debugger and my diagnostics). Also, for reference, when I
    compile it on G++ in Cygwin, n = 187 completes successfully, but n = 188
    does not. This is consistently the case, and I can show it to repeat.

    If needed, I can post the entire project, but I think with as many smart
    guys as there are here, someone will find my problem just in this post.

    Yours,

    James
     
    Aguilar, James, Sep 4, 2004
    #1
    1. Advertising

  2. "Aguilar, James" <> wrote in message
    news:chbsir$2jl$...
    > Hello all,
    >
    > To begin, yes, this -is- a homework assignment. However, it is for my
    > Algorithms class, and my instructor has given us explicit permission to
    > use "expert" groups like newsgroups, so if that's your only reason not to
    > help, please do. Otherwise, I guess it's OK. But, just remember, I'm not
    > asking you to do my work for me, just to point out my error.
    >
    > My problem is not with the algorithm itself (standard divide and conquer
    > on a set of points to find the closest pair), but with the C++, that is,
    > my writing of it. I seem to be getting a stack overflow exception when
    > the input is too large, and I don't know what to do about it. This
    > exception has occurred when I compile on g++ in Cygwin and when I compile
    > using VC++ (whatever the newest version is) on Windows. Cygwin does not
    > do anything when it fails, it simply stops in the middle of printing the
    > diagnostics (that is, printing a running tally of the closest points). I
    > loaded up the Beta version of the Visual C++ ide since I don't know gdb
    > very well/at all. When I got to a certain point in the code, VC++ told me
    > that a stack overflow exception had been thrown.
    >
    > I know my algorithm is correct, because it works on random cases between
    > 10 and 100, so I think that if I can defeat this stack overflow, I will
    > have defeated the entire problem. I am including the code that I think is
    > relevant: there is other code in the project, but it just performs
    > administrative tasks. This is the entire algorithm. More talking at the
    > end.
    >
    > ----- CODE BEGINS -----


    [snip]

    >
    > PointPair divide(Point* pointsByX[], Point* pointsByY[], int nPoints)
    > {
    > using namespace std;
    > if (nPoints > 3) {
    > //divide
    > // set line
    > int bisect = pointsByX[nPoints/2]->x();
    > // sort Y lists to create new ones
    > Point* yL[nPoints];
    > Point* yR[nPoints];
    > int lenL = 0, lenR = 0;
    > for (int i = 0; i < nPoints; ++i) {
    > if (pointsByY->x() <= bisect) {
    > yL[lenL] = pointsByY;
    > ++lenL;
    > }
    > if (pointsByY->x() >= bisect) {
    > yR[lenR] = pointsByY;
    > ++lenR;
    > }
    > }
    >
    > //conquer
    > // pass sorted arrays for repeat
    > PointPair pair1 = divide(pointsByX, yL, lenL);
    > PointPair pair2 = divide(pointsByX + nPoints - lenR, yR, lenR);
    > PointPair correct;
    > //combine
    > // find mindist
    > double p1Dist = pair1.distance(),
    > p2Dist = pair2.distance(),
    > min = 0;
    > if (p1Dist < p2Dist) {
    > correct = pair1;
    > min = p1Dist;
    > } else {
    > correct = pair2;
    > min = p2Dist;
    > }
    >
    > // remove all non-2(mindist) points from y array
    > // I reuse yL here, since it is the longer than the array I need anyway
    > int stripLen = 0;
    > for (int i = 0; i < nPoints; ++i) {
    > if (pointsByY->x() >= bisect - min
    > || pointsByY->x() <= bisect + min) {
    > yL[stripLen] = pointsByY;
    > ++stripLen;
    > }
    > }
    > // then check next seven repeated until the end of the list
    > for (int i = 0; i < stripLen; ++i)
    > for (int j = i+1; j-i < 8 && j < stripLen; ++j)
    > if (yL->distance(yL[j]) < min) {
    > min = yL->distance(yL[j]);
    > correct.p1() = yL;
    > correct.p2() = yL[j];
    > }
    > return correct;
    > } else {
    > return bruteForce(pointsByX, pointsByY, nPoints);
    > }
    > }
    >


    [snip]

    >
    > ----- CODE ENDS -----
    >
    > I thought that I might be concerned about the temporary arrays that I
    > create. However, I've been informed that since they are temporary (here I
    > am referring to yL and yR) they are deleted every time I exit their scope.


    Not necessarily. Many compilers will only reclaim the space of those
    temporary arrays when the whole function is exitted. In other words stack
    operations are only performed on entry and exit of a function.

    Also that code is illegal. An array size must be a compile time constant;
    which nPoints clearly is not. g++ only allows this code as a language
    extension, try compiling with the --ansi option (I think) and you should get
    a error.

    So, one solution is to dynamically allocate those arrays, that will make
    your code more standard and will reduce the stack usage.

    > If this is the case, it would seem (to me) logical that the stack overflow
    > would occur on the first branch, but this is not the case. The first
    > branch, even on very large inputs (up to n = 10000) completes successfully
    > (according to the debugger and my diagnostics). Also, for reference, when
    > I compile it on G++ in Cygwin, n = 187 completes successfully, but n = 188
    > does not. This is consistently the case, and I can show it to repeat.
    >
    > If needed, I can post the entire project, but I think with as many smart
    > guys as there are here, someone will find my problem just in this post.
    >
    > Yours,
    >
    > James


    john
     
    John Harrison, Sep 4, 2004
    #2
    1. Advertising

  3. "John Harrison" <> wrote in message
    news:...
    >
    > [snip info about dynamic allocation]


    Well, I tried that, and you're right, it was a GNU extension. I repaired
    the error, but it seems that my problem still remains. In fact, there was
    no difference in performance after the edit (i.e. still fails at 188, still
    succeeds at 187) so I'm still up in the air about what this might be. Any
    other takers?

    James
     
    Aguilar, James, Sep 4, 2004
    #3
  4. Aguilar, James wrote:
    >
    > I know my algorithm is correct, because it works on random cases between 10
    > and 100,


    That way you know it is correct for the random cases you've
    tried, but not that it is correct in general. To know that you'd
    have to try it for all possible cases.

    To address your problem -- you either need to allocate your arrays
    dynamically, using 'new' -- that way they will be allocated from
    the heap, not from the stack (let's forget for a moment that the
    standard does not define 'heap' and 'stack') or you can try an
    environment-specific way to increase the stack size. Go for the
    first solution!

    Under WIN32 or UNIX you can use this snippet to check how big
    the stack is:

    #include <iostream>

    #ifndef __WIN32__
    #include <sys/time.h> // getrlimit()
    #include <sys/resource.h> // getrlimit()
    #include <unistd.h> // getrlimit()
    #else
    #include <malloc.h>
    #endif

    using namespace std;

    void check_stack() {
    cerr << "Stack size is: ";
    #ifndef __WIN32__
    rlimit lim;
    getrlimit(RLIMIT_STACK,&lim);
    cerr << lim.rlim_cur << endl;
    #else
    size_t avail=stackavail();
    cerr << avail << endl;
    #endif
    }

    HTH,
    - J.
     
    Jacek Dziedzic, Sep 4, 2004
    #4
  5. "Aguilar, James" <> skrev i en meddelelse
    news:chbu7k$33u$...
    > "John Harrison" <> wrote in message
    > news:...
    > >
    > > [snip info about dynamic allocation]

    >
    > Well, I tried that, and you're right, it was a GNU extension. I repaired
    > the error, but it seems that my problem still remains. In fact, there was
    > no difference in performance after the edit (i.e. still fails at 188,

    still
    > succeeds at 187) so I'm still up in the air about what this might be. Any
    > other takers?
    >
    > James
    >
    >


    I have not checked your code, but if you replace a stack-based array with a
    dynamically allocated one and get the stack overflow at the same location in
    problem space, there almost certainly is a problem with your algorithm. Why
    don't you debug the program?

    /Peter
     
    Peter Koch Larsen, Sep 4, 2004
    #5
  6. "Peter Koch Larsen" <> wrote in message
    news:ZPf_c.43075$...
    >
    > "Aguilar, James" <> skrev i en meddelelse
    > news:chbu7k$33u$...
    >> "John Harrison" <> wrote in message
    >> news:...
    >> >
    >> > [snip info about dynamic allocation]

    >>
    >> Well, I tried that, and you're right, it was a GNU extension. I repaired
    >> the error, but it seems that my problem still remains. In fact, there
    >> was
    >> no difference in performance after the edit (i.e. still fails at 188,

    > still
    >> succeeds at 187) so I'm still up in the air about what this might be.
    >> Any
    >> other takers?
    >>
    >> James
    >>
    >>

    >
    > I have not checked your code, but if you replace a stack-based array with
    > a
    > dynamically allocated one and get the stack overflow at the same location
    > in
    > problem space, there almost certainly is a problem with your algorithm.
    > Why
    > don't you debug the program?
    >
    > /Peter
    >


    Code seems wrong to me. Looks to me like the divide could fail and the
    entire x and y arrays be passed to one of the recursive calls.

    Obviously this would result in an infinite recursion and a stack overflow.

    john
     
    John Harrison, Sep 4, 2004
    #6
  7. "John Harrison" <> wrote in message
    news:...
    >
    > "Peter Koch Larsen" <> wrote in message
    > news:ZPf_c.43075$...
    >>
    >> "Aguilar, James" <> skrev i en meddelelse
    >> news:chbu7k$33u$...
    >>> "John Harrison" <> wrote in message
    >>> news:...
    >>> >
    >>> > [snip info about dynamic allocation]
    >>>
    >>> Well, I tried that, and you're right, it was a GNU extension. I
    >>> repaired
    >>> the error, but it seems that my problem still remains. In fact, there
    >>> was
    >>> no difference in performance after the edit (i.e. still fails at 188,

    >> still
    >>> succeeds at 187) so I'm still up in the air about what this might be.
    >>> Any
    >>> other takers?
    >>>
    >>> James
    >>>
    >>>

    >>
    >> I have not checked your code, but if you replace a stack-based array with
    >> a
    >> dynamically allocated one and get the stack overflow at the same location
    >> in
    >> problem space, there almost certainly is a problem with your algorithm.
    >> Why
    >> don't you debug the program?
    >>
    >> /Peter
    >>

    >
    > Code seems wrong to me. Looks to me like the divide could fail and the
    > entire x and y arrays be passed to one of the recursive calls.
    >
    > Obviously this would result in an infinite recursion and a stack overflow.
    >
    > john


    I'll check it out and report back.

    James
     
    Aguilar, James, Sep 4, 2004
    #7
  8. "John Harrison" <> wrote in message
    news:...
    >
    > Code seems wrong to me. Looks to me like the divide could fail and the
    > entire x and y arrays be passed to one of the recursive calls.
    >
    > Obviously this would result in an infinite recursion and a stack overflow.
    >
    > john


    I found the problem. It was in this snippet:

    ----- CODE BEGINS -----

    int bisect = pointsByX[nPoints/2]->x();
    // sort Y lists to create new ones
    Point* yL[nPoints];
    Point* yR[nPoints];
    int lenL = 0, lenR = 0;
    for (int i = 0; i < nPoints; ++i) {
    if (pointsByY->x() <= bisect) {
    yL[lenL] = pointsByY;
    ++lenL;
    }
    if (pointsByY->x() >= bisect) {
    yR[lenR] = pointsByY;
    ++lenR;
    }
    }

    ----- CODE ENDS -----

    Which now looks like this:

    ----- CODE BEGINS -----

    Point* bisect = pointsByX[nPoints/2];
    // sort Y lists to create new ones
    Point** yL = new Point*[nPoints];
    Point** yR = new Point*[nPoints];
    int lenL = 0, lenR = 0;
    for (int i = 0; i < nPoints; ++i) {
    if (pointsByY->isLeftOf(bisect)) {
    yL[lenL] = pointsByY;
    ++lenL;
    }
    if (!(pointsByY->isLeftOf(bisect))) {
    yR[lenR] = pointsByY;
    ++lenR;
    }
    }
    ----- CODE ENDS -----

    isLeftOf takes into account height, so that even if two points are on the
    bisector line (which becomes more and more likely with larger and larger
    random inputs), the ones below the point which is being checked will return
    true for isLeftOf and the ones above will return false. This is OK because
    after pL and pR have been divided, they are merged for the checking of the
    2*(minDist) strip, which means that even if the closest pair is one of the
    two points on the line, it will still be caught.

    The algorithm works like a charm, making relatively quick work of input
    sizes up to n = 1000000 (13000 milliseconds). However, with inputs that
    large, my computer really starts to load up, since that's 150 MiB just in
    terms of the points that are actually created at the start. I guess that
    means I could conceivably sort four million points on my computer, but
    beyond that, it's probably going to overflow my memory.

    The cool thing is how, even on inputs of as small as 10000, the divide and
    conquer is already orders of magnitude faster than the brute force version.

    Yours,

    James
     
    Aguilar, James, Sep 4, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?amJpeEBuZXdzZ3JvdXBzLm5vc3BhbQ==?=

    Stack overflow exception

    =?Utf-8?B?amJpeEBuZXdzZ3JvdXBzLm5vc3BhbQ==?=, Apr 20, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    7,302
    Rick Spiewak
    Apr 22, 2004
  2. Mr m?ll
    Replies:
    2
    Views:
    1,428
    Mr m?ll
    Oct 16, 2004
  3. Replies:
    0
    Views:
    521
  4. Replies:
    0
    Views:
    417
  5. Kenneth McDonald

    Why stack overflow with such a small stack?

    Kenneth McDonald, Aug 30, 2007, in forum: Ruby
    Replies:
    7
    Views:
    293
    Kenneth McDonald
    Sep 1, 2007
Loading...

Share This Page