std::sort causes segfault when sorting class arrays

Philip Pemberton · Feb 28, 2009

Hi,
I'm trying to write some image processing software, and as part of that I
need to sort an array of pixel values by their luminance...

I'm using std::sort to do this, which works OK.. as long as no two pixels
in the array have the same luminance value. In that case.... it
segfaults. Here's my code:

////// CODE BEGINS

// gcc -o test test.cpp
#include <algorithm>

using namespace std;

class Pixel {
public:
double r, g, b;

Pixel(double _r=0, double _g=0, double _b=0) {
r = _r;
g = _g;
b = _b;
}

// Return the CIE luminance of this pixel
double get_luminance() const {
return (r * 255.0 * 0.212671) + (g * 255.0 *
0.715160) + (b * 255.0 * 0.072169);
}

bool operator<(const Pixel &rhs) const { return
(get_luminance() < rhs.get_luminance()); }
};

int main(int argc, char **argv)
{
for (int gx=0; gx<1000; gx++) {
Pixel pixels[15 * 15];
unsigned int pixel_count = 0;

for (int ym=0; ym<15; ym++) {
for (int xm=0; xm<15; xm++) {
#ifndef MAKE_PIXELS_THE_SAME
pixels[pixel_count].r = (xm*15)+ym;
pixels[pixel_count].g = (xm*15)+ym;
pixels[pixel_count].b = (xm*15)+ym;
#else
pixels[pixel_count].r = 0.12;
pixels[pixel_count].g = 0.93;
pixels[pixel_count].b = 0.32;
#endif
pixel_count++;
}
}

sort(pixels, pixels+pixel_count);
}

return 0;
}

////// CODE ENDS

The segfault always happens on the call to sort().

I know I'm probably missing something blindingly obvious here, but I've
spent an hour hacking away at this and still can't see what I'm doing
wrong...

Can anyone help me figure out why my code isn't working?
(this is only the second time I've used sort(), and the first time I've
used it on an array, but my C++ book says it is possible to do the
latter... is my book wrong?)

Thanks,
Phil.

fedor.chugunov · Feb 28, 2009

Hi,
I'm trying to write some image processing software, and as part of that I
need to sort an array of pixel values by their luminance...

I'm using std::sort to do this, which works OK.. as long as no two pixels
in the array have the same luminance value. In that case.... it
segfaults. Here's my code:

////// CODE BEGINS

// gcc -o test test.cpp
#include <algorithm>

using namespace std;

class Pixel {
public:
double r, g, b;

Pixel(double _r=0, double _g=0, double _b=0) {
r = _r;
g = _g;
b = _b;
}

// Return the CIE luminance of this pixel
double get_luminance() const {
return (r * 255.0 * 0.212671) + (g * 255.0 *
0.715160) + (b * 255.0 * 0.072169);
}

bool operator<(const Pixel &rhs) const { return
(get_luminance() < rhs.get_luminance()); }

};

int main(int argc, char **argv)
{
for (int gx=0; gx<1000; gx++) {
Pixel pixels[15 * 15];
unsigned int pixel_count = 0;

for (int ym=0; ym<15; ym++) {
for (int xm=0; xm<15; xm++) {
#ifndef MAKE_PIXELS_THE_SAME
pixels[pixel_count].r = (xm*15)+ym;
pixels[pixel_count].g = (xm*15)+ym;
pixels[pixel_count].b = (xm*15)+ym;
#else
pixels[pixel_count].r = 0.12;
pixels[pixel_count].g = 0.93;
pixels[pixel_count].b = 0.32;
#endif
pixel_count++;
}
}

sort(pixels, pixels+pixel_count);
}

return 0;

}

////// CODE ENDS

The segfault always happens on the call to sort().

I know I'm probably missing something blindingly obvious here, but I've
spent an hour hacking away at this and still can't see what I'm doing
wrong...

Can anyone help me figure out why my code isn't working?
(this is only the second time I've used sort(), and the first time I've
used it on an array, but my C++ book says it is possible to do the
latter... is my book wrong?)

Thanks,
Phil.

sort(pixels, pixels+pixel_count*sizeof(Pixel));

Kai-Uwe Bux · Feb 28, 2009

Hi,
I'm trying to write some image processing software, and as part of that I
need to sort an array of pixel values by their luminance...

I'm using std::sort to do this, which works OK.. as long as no two pixels
in the array have the same luminance value. In that case.... it
segfaults. Here's my code:

////// CODE BEGINS

// gcc -o test test.cpp
#include <algorithm>

using namespace std;

class Pixel {
public:
double r, g, b;

Pixel(double _r=0, double _g=0, double _b=0) {
r = _r;
g = _g;
b = _b;
}

// Return the CIE luminance of this pixel
double get_luminance() const {
return (r * 255.0 * 0.212671) + (g * 255.0 *
0.715160) + (b * 255.0 * 0.072169);
}

bool operator<(const Pixel &rhs) const { return
(get_luminance() < rhs.get_luminance()); }

};

int main(int argc, char **argv)
{
for (int gx=0; gx<1000; gx++) {
Pixel pixels[15 * 15];
unsigned int pixel_count = 0;

for (int ym=0; ym<15; ym++) {
for (int xm=0; xm<15; xm++) {
#ifndef MAKE_PIXELS_THE_SAME
pixels[pixel_count].r = (xm*15)+ym;
pixels[pixel_count].g = (xm*15)+ym;
pixels[pixel_count].b = (xm*15)+ym;
#else
pixels[pixel_count].r = 0.12;
pixels[pixel_count].g = 0.93;
pixels[pixel_count].b = 0.32;
#endif
pixel_count++;
}
}

sort(pixels, pixels+pixel_count);
}

return 0;

}

////// CODE ENDS

The segfault always happens on the call to sort().

I know I'm probably missing something blindingly obvious here, but I've
spent an hour hacking away at this and still can't see what I'm doing
wrong...

Can anyone help me figure out why my code isn't working?
(this is only the second time I've used sort(), and the first time I've
used it on an array, but my C++ book says it is possible to do the
latter... is my book wrong?)

Thanks,
Phil.

Click to expand...

sort(pixels, pixels+pixel_count*sizeof(Pixel));

No, that would try to access elements beyond the bounds of the array.

To the OP: I cannot reproduce the segfault; and I cannot find an obvious
flaw in the code. Are you sure the flaw is within the snippet you posted?
Maybe, you trimmed it down too much?

Best

Kai-Uwe BUx

Philip Pemberton · Feb 28, 2009

To the OP: I cannot reproduce the segfault; and I cannot find an obvious
flaw in the code. Are you sure the flaw is within the snippet you
posted? Maybe, you trimmed it down too much?

Nope -- I've checked on my machine and it segfaults repeatably...

I rewrote it to use a vector<Pixel *> instead, which works fine, but is
slower than I'd have hoped (probably because of all the Pixel objects
being created).

This is on an x86 laptop, running Ubuntu 8.10 with the latest security
patches. gcc version is 4.3.2 (Ubuntu 4.3.2-1ubuntu12).

Thanks,
Phil.

Kai-Uwe Bux · Feb 28, 2009

Philip said:
Nope -- I've checked on my machine and it segfaults repeatably...

I rewrote it to use a vector<Pixel *> instead, which works fine, but is
slower than I'd have hoped (probably because of all the Pixel objects
being created).

And what happens if you use vector<Pixel> instead? what happens with other
containers?

Also, what does your comparison functor look like? Maybe, there are some
subtle differences there.

This is on an x86 laptop, running Ubuntu 8.10 with the latest security
patches. gcc version is 4.3.2 (Ubuntu 4.3.2-1ubuntu12).

Hm, I'm running gcc 4.3.1 on an x86 desktop.

Best

Kai-Uwe Bux

Thomas J. Gritzan · Mar 1, 2009

Philip said:
Hi,
I'm trying to write some image processing software, and as part of that I
need to sort an array of pixel values by their luminance...

I'm using std::sort to do this, which works OK.. as long as no two pixels
in the array have the same luminance value. In that case.... it
segfaults. Here's my code: [...]
class Pixel {
public:
double r, g, b; [...]
// Return the CIE luminance of this pixel
double get_luminance() const {
return (r * 255.0 * 0.212671) + (g * 255.0 *
0.715160) + (b * 255.0 * 0.072169);
}

bool operator<(const Pixel &rhs) const { return
(get_luminance() < rhs.get_luminance()); }
}; [...]
sort(pixels, pixels+pixel_count);
}

std::sort requires a strikt weak ordering as comparision function. AFAIK
your operator< is a strikt weak ordering, but only as long as
get_luminance() calculates the same value for the same pixel every time.

The problem with floating point is that you might get two different
results if you calculate the same equation twice. The reason is that
some implementations use a higher precision when calculating than fits
in a double.

To avoid this problem, you could use a compiler specific parameter (like
-ffloat-store for GCC, please refer to the GCC manual).

A (more robust) C++ language solution would be to calculate the
luminance value for every pixel once, store it somewhere, and use this
for sorting.
You could sort an index like std:

air< luminance, Pixel* > and use it
to bring your pixels in the right order, or cache the luminance inside
the Pixel class.

Juha Nieminen · Mar 1, 2009

Thomas said:
The problem with floating point is that you might get two different
results if you calculate the same equation twice.

I'm sorry, but this sounds like BS to me.

Care to give an example where the exact *same* equation (using the
exact same, unmodified variables) can give *different* results from
evaluation to evaluation?

Also, care to explain the process by which this happens?

The only way this would happen is if the FPU hardware (or whichever
chip is doing the FP calculations) deliberately gives random results
each time it performs the exact same calculations. Not that this
wouldn't be conceivable (eg. random rounding of the last bit), but I
have never heard of any hardware doing that. Would it even conform to
the related IEEE standard?

A third thing I would like to hear is how a broken comparison can
cause std::sort to segfault, rather than simply causing the array to not
to be properly sorted. I can't think of any sorting algorithm which
would start indexing out of boundaries even if the comparisons return
invalid results.

Kai-Uwe Bux · Mar 1, 2009

Juha said:
I'm sorry, but this sounds like BS to me.

That might be because you snipped the context.

To restore the context, let me reproduce the operator< in question:

double get_luminance() const {
return (r * 255.0 * 0.212671)
+ (g * 255.0 * 0.715160)
+ (b * 255.0 * 0.072169);
}

bool operator<(const Pixel &rhs) const {
return (get_luminance() < rhs.get_luminance());
}

This is implicitly used in a call to sort().

What is in question is whether this comparison predicate will define a
strict weak order.

Care to give an example where the exact *same* equation (using the
exact same, unmodified variables) can give *different* results from
evaluation to evaluation?

Well, I just had this program:

#include <iostream>
#include <iomanip>
#include <cmath>

using namespace std;

int main ( void ) {
double x = 1;
double y = 1;
cout
<< boolalpha
<< ( sin(x)+cos(y) == sin(x)+cos(y) ) << '\n';
}

and the output was "false".

Also, care to explain the process by which this happens?

The program above prints "false" on my machine because of clause [5/10].
Whether the two subexpressions "sin(x)+cos(y)" qualify as the "exact *same*
equation" I don't know. But something like this _could_ cause the operator<
to yield inconsistent answers.

The only way this would happen is if the FPU hardware (or whichever
chip is doing the FP calculations) deliberately gives random results
each time it performs the exact same calculations. Not that this
wouldn't be conceivable (eg. random rounding of the last bit), but I
have never heard of any hardware doing that. Would it even conform to
the related IEEE standard?

Rounding of last bits can occur between registers in the CPU and floating
point objects in memory. A write-reread can therefore change the value.
Clause [5/10] gives a lot of lenience in this way.

A third thing I would like to hear is how a broken comparison can
cause std::sort to segfault, rather than simply causing the array to not
to be properly sorted. I can't think of any sorting algorithm which
would start indexing out of boundaries even if the comparisons return
invalid results.

Hm, I can think of partition exchange implementations that would use the
pivot as a sentinel value to find the left-most index for which

*left < pivot

is false and the right-most index for which

pivot < *right

is false. One would expect that the two expressions _must_ become false
eventually since the pivot is part of the sequence. But if operator< is
broken, then left could be incremented beyond the bound of the sequence.

Let's see what the implementation on my machine does:

#include <vector>
#include <algorithm>

struct A {

bool operator< ( A const & rhs ) const {
return true;
}

};

int main ( void ) {
std::vector< A > av;
for ( int n = 0; n < 32; ++n ) {
av.push_back( A() );
}
std::sort( av.begin(), av.end() );
}

This actually seems to run forever, but it does not segfault.

Best

Kai-Uwe Bux

Juha Nieminen · Mar 1, 2009

Kai-Uwe Bux said:
Well, I just had this program:

#include <iostream>
#include <iomanip>
#include <cmath>

using namespace std;

int main ( void ) {
double x = 1;
double y = 1;
cout
<< boolalpha
<< ( sin(x)+cos(y) == sin(x)+cos(y) ) << '\n';
}

and the output was "false".

I honestly can't understand what's happening there. Even if it's
changed to this:

#include <iostream>
#include <iomanip>
#include <cmath>

double f(double x, double y)
{
return std::sin(x) + std::cos(y);
}

int main ( void ) {
double x = 1;
double y = 1;
std::cout << std::boolalpha
<< ( f(x, y) == f(x, y) ) << std::endl;
}

it still prints false. OTOH it does so only when compiling without
optimizations. With optimizations it prints true.

Compiling to asm (without optimizations) and examining the result is
only more puzzling. The f() function appears, quite naturally, only once
in the result, and it's called twice from main() with the *exact same
code* in both cases. I can't see any difference whatsoever between the
two calls. The exact same asm instructions are used to put the two
variables into the FPU and calling the one and same f() function. Then
an FPU comparison opcode is executed.

In other words, the exact same, bit-by-bit completely identical
machine code is executed twice, and then the results compared, and the
results are unequal? I can't understand how that is even physically
possible, unless the FPU is rounding differently at different times,
based on some randomness or side-effects of previous calls. I have never
heard the intel FPU doing that.

So I suppose I must admit that I was wrong. The exact same code can
give different results from execution to execution, although it's still
a complete mystery to me how that is even physically possible (at least
in a machine that is supposed to be deterministic). One would assume
this is something you definitely don't want to happen, ie. a function
which should not have side-effects having side-effects nevertheless, and
returning different values from identical parameters at different times.
One could assume this would break a lot of programs (and programming
languages) out there which assume that functions without side-effects
always return the same value for the same parameters.

Juha Nieminen · Mar 1, 2009

Victor said:
I don't know what you're talking about, guys. I just took your
code, plugged it into my test project, compiled, ran, and got
'true'. Changed FP settings, changed to optimized, same thing.
Maybe Visual C++ 2008 is wrong somehow, and I am supposed to get
'false'?

Using gcc 4.3.1 on a Pentium4 (running linux) here. Without
optimizations it prints false, with optimizations it prints true. I
wouldn't have believed it if I didn't test it myself.

What compiler, what hardware? Can you post the assembly?

The unoptimized asm is rather verbose, but here it is:

.file "test.cc"
.section
..text._ZStorSt13_Ios_FmtflagsS_,"axG",@progbits,_ZStorSt13_Ios_FmtflagsS_,comdat
.weak _ZStorSt13_Ios_FmtflagsS_
.type _ZStorSt13_Ios_FmtflagsS_, @function
_ZStorSt13_Ios_FmtflagsS_:
..LFB567:
pushl %ebp
..LCFI0:
movl %esp, %ebp
..LCFI1:
movl 8(%ebp), %edx
movl 12(%ebp), %eax
orl %edx, %eax
popl %ebp
ret
..LFE567:
.size _ZStorSt13_Ios_FmtflagsS_, .-_ZStorSt13_Ios_FmtflagsS_
.section
..text._ZStoRRSt13_Ios_FmtflagsS_,"axG",@progbits,_ZStoRRSt13_Ios_FmtflagsS_,comdat
.weak _ZStoRRSt13_Ios_FmtflagsS_
.type _ZStoRRSt13_Ios_FmtflagsS_, @function
_ZStoRRSt13_Ios_FmtflagsS_:
..LFB569:
pushl %ebp
..LCFI2:
movl %esp, %ebp
..LCFI3:
subl $8, %esp
..LCFI4:
movl 8(%ebp), %eax
movl (%eax), %edx
movl 12(%ebp), %eax
movl %eax, 4(%esp)
movl %edx, (%esp)
call _ZStorSt13_Ios_FmtflagsS_
movl %eax, %edx
movl 8(%ebp), %eax
movl %edx, (%eax)
movl 8(%ebp), %eax
leave
ret
..LFE569:
.size _ZStoRRSt13_Ios_FmtflagsS_, .-_ZStoRRSt13_Ios_FmtflagsS_
.section
..text._ZNSt8ios_base4setfESt13_Ios_Fmtflags,"axG",@progbits,_ZNSt8ios_base4setfESt13_Ios_Fmtflags,comdat
.align 2
.weak _ZNSt8ios_base4setfESt13_Ios_Fmtflags
.type _ZNSt8ios_base4setfESt13_Ios_Fmtflags, @function
_ZNSt8ios_base4setfESt13_Ios_Fmtflags:
..LFB597:
pushl %ebp
..LCFI5:
movl %esp, %ebp
..LCFI6:
subl $24, %esp
..LCFI7:
movl 8(%ebp), %eax
movl 12(%eax), %eax
movl %eax, -4(%ebp)
movl 8(%ebp), %eax
leal 12(%eax), %edx
movl 12(%ebp), %eax
movl %eax, 4(%esp)
movl %edx, (%esp)
call _ZStoRRSt13_Ios_FmtflagsS_
movl -4(%ebp), %eax
leave
ret
..LFE597:
.size _ZNSt8ios_base4setfESt13_Ios_Fmtflags,
..-_ZNSt8ios_base4setfESt13_Ios_Fmtflags
.section
..text._ZSt9boolalphaRSt8ios_base,"axG",@progbits,_ZSt9boolalphaRSt8ios_base,comdat
.weak _ZSt9boolalphaRSt8ios_base
.type _ZSt9boolalphaRSt8ios_base, @function
_ZSt9boolalphaRSt8ios_base:
..LFB608:
pushl %ebp
..LCFI8:
movl %esp, %ebp
..LCFI9:
subl $8, %esp
..LCFI10:
movl $1, 4(%esp)
movl 8(%ebp), %eax
movl %eax, (%esp)
call _ZNSt8ios_base4setfESt13_Ios_Fmtflags
movl 8(%ebp), %eax
leave
ret
..LFE608:
.size _ZSt9boolalphaRSt8ios_base, .-_ZSt9boolalphaRSt8ios_base
.text
.type _Z41__static_initialization_and_destruction_0ii, @function
_Z41__static_initialization_and_destruction_0ii:
..LFB1062:
pushl %ebp
..LCFI11:
movl %esp, %ebp
..LCFI12:
subl $24, %esp
..LCFI13:
cmpl $1, 8(%ebp)
jne .L11
cmpl $65535, 12(%ebp)
jne .L11
movl $_ZStL8__ioinit, (%esp)
call _ZNSt8ios_base4InitC1Ev
movl $_ZNSt8ios_base4InitD1Ev, %eax
movl $__dso_handle, 8(%esp)
movl $_ZStL8__ioinit, 4(%esp)
movl %eax, (%esp)
call __cxa_atexit
..L11:
leave
ret
..LFE1062:
.size _Z41__static_initialization_and_destruction_0ii,
..-_Z41__static_initialization_and_destruction_0ii
.type _GLOBAL__I__Z1fdd, @function
_GLOBAL__I__Z1fdd:
..LFB1063:
pushl %ebp
..LCFI14:
movl %esp, %ebp
..LCFI15:
subl $8, %esp
..LCFI16:
movl $65535, 4(%esp)
movl $1, (%esp)
call _Z41__static_initialization_and_destruction_0ii
leave
ret
..LFE1063:
.size _GLOBAL__I__Z1fdd, .-_GLOBAL__I__Z1fdd
.section .ctors,"aw",@progbits
.align 4
.long _GLOBAL__I__Z1fdd
.text
..globl _Z1fdd
.type _Z1fdd, @function
_Z1fdd:
..LFB1053:
pushl %ebp
..LCFI17:
movl %esp, %ebp
..LCFI18:
subl $40, %esp
..LCFI19:
movl 8(%ebp), %eax
movl %eax, -8(%ebp)
movl 12(%ebp), %eax
movl %eax, -4(%ebp)
movl 16(%ebp), %eax
movl %eax, -16(%ebp)
movl 20(%ebp), %eax
movl %eax, -12(%ebp)
fldl -8(%ebp)
fstpl (%esp)
call sin
fstpl -24(%ebp)
fldl -16(%ebp)
fstpl (%esp)
call cos
faddl -24(%ebp)
leave
ret
..LFE1053:
.size _Z1fdd, .-_Z1fdd
..globl main
.type main, @function
main:
..LFB1054:
leal 4(%esp), %ecx
..LCFI20:
andl $-16, %esp
pushl -4(%ecx)
..LCFI21:
pushl %ebp
..LCFI22:
movl %esp, %ebp
..LCFI23:
pushl %ebx
..LCFI24:
pushl %ecx
..LCFI25:
subl $48, %esp
..LCFI26:
fld1
fstpl -24(%ebp)
fld1
fstpl -16(%ebp)
fldl -16(%ebp)
fstpl 8(%esp)
fldl -24(%ebp)
fstpl (%esp)
call _Z1fdd
fstpl -32(%ebp)
fldl -16(%ebp)
fstpl 8(%esp)
fldl -24(%ebp)
fstpl (%esp)
call _Z1fdd
fldl -32(%ebp)
fucompp
fnstsw %ax
sahf
sete %al
setnp %dl
andl %edx, %eax
movzbl %al, %ebx
movl $_ZSt9boolalphaRSt8ios_base, 4(%esp)
movl $_ZSt4cout, (%esp)
call _ZNSolsEPFRSt8ios_baseS0_E
movl %ebx, 4(%esp)
movl %eax, (%esp)
call _ZNSolsEb
movl $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, 4(%esp)
movl %eax, (%esp)
call _ZNSolsEPFRSoS_E
movl $0, %eax
addl $48, %esp
popl %ecx
popl %ebx
popl %ebp
leal -4(%ecx), %esp
ret
..LFE1054:
.size main, .-main
.local _ZStL8__ioinit
.comm _ZStL8__ioinit,1,1
.weakref _ZL20__gthrw_pthread_oncePiPFvvE,pthread_once
.weakref _ZL27__gthrw_pthread_getspecificj,pthread_getspecific
.weakref _ZL27__gthrw_pthread_setspecificjPKv,pthread_setspecific
.weakref
_ZL22__gthrw_pthread_createPmPK14pthread_attr_tPFPvS3_ES3_,pthread_create
.weakref _ZL22__gthrw_pthread_cancelm,pthread_cancel
.weakref
_ZL26__gthrw_pthread_mutex_lockP15pthread_mutex_t,pthread_mutex_lock
.weakref
_ZL29__gthrw_pthread_mutex_trylockP15pthread_mutex_t,pthread_mutex_trylock
.weakref
_ZL28__gthrw_pthread_mutex_unlockP15pthread_mutex_t,pthread_mutex_unlock
.weakref
_ZL26__gthrw_pthread_mutex_initP15pthread_mutex_tPK19pthread_mutexattr_t,pthread_mutex_init
.weakref
_ZL30__gthrw_pthread_cond_broadcastP14pthread_cond_t,pthread_cond_broadcast
.weakref
_ZL25__gthrw_pthread_cond_waitP14pthread_cond_tP15pthread_mutex_t,pthread_cond_wait
.weakref _ZL26__gthrw_pthread_key_createPjPFvPvE,pthread_key_create
.weakref _ZL26__gthrw_pthread_key_deletej,pthread_key_delete
.weakref
_ZL30__gthrw_pthread_mutexattr_initP19pthread_mutexattr_t,pthread_mutexattr_init
.weakref
_ZL33__gthrw_pthread_mutexattr_settypeP19pthread_mutexattr_ti,pthread_mutexattr_settype
.weakref
_ZL33__gthrw_pthread_mutexattr_destroyP19pthread_mutexattr_t,pthread_mutexattr_destroy
.section .eh_frame,"a",@progbits
..Lframe1:
.long .LECIE1-.LSCIE1
..LSCIE1:
.long 0x0
.byte 0x1
..globl __gxx_personality_v0
.string "zP"
.uleb128 0x1
.sleb128 -4
.byte 0x8
.uleb128 0x5
.byte 0x0
.long __gxx_personality_v0
.byte 0xc
.uleb128 0x4
.uleb128 0x4
.byte 0x88
.uleb128 0x1
.align 4
..LECIE1:
..LSFDE9:
.long .LEFDE9-.LASFDE9
..LASFDE9:
.long .LASFDE9-.Lframe1
.long .LFB1062
.long .LFE1062-.LFB1062
.uleb128 0x0
.byte 0x4
.long .LCFI11-.LFB1062
.byte 0xe
.uleb128 0x8
.byte 0x85
.uleb128 0x2
.byte 0x4
.long .LCFI12-.LCFI11
.byte 0xd
.uleb128 0x5
.align 4
..LEFDE9:
..LSFDE15:
.long .LEFDE15-.LASFDE15
..LASFDE15:
.long .LASFDE15-.Lframe1
.long .LFB1054
.long .LFE1054-.LFB1054
.uleb128 0x0
.byte 0x4
.long .LCFI20-.LFB1054
.byte 0xc
.uleb128 0x1
.uleb128 0x0
.byte 0x9
.uleb128 0x4
.uleb128 0x1
.byte 0x4
.long .LCFI21-.LCFI20
.byte 0xc
.uleb128 0x4
.uleb128 0x4
.byte 0x4
.long .LCFI22-.LCFI21
.byte 0xe
.uleb128 0x8
.byte 0x85
.uleb128 0x2
.byte 0x4
.long .LCFI23-.LCFI22
.byte 0xd
.uleb128 0x5
.byte 0x4
.long .LCFI25-.LCFI23
.byte 0x84
.uleb128 0x4
.byte 0x83
.uleb128 0x3
.align 4
..LEFDE15:
.ident "GCC: (SUSE Linux) 4.3.1 20080507 (prerelease) [gcc-4_3-branch
revision 135036]"
.section .note.GNU-stack,"",@progbits

Thomas J. Gritzan · Mar 1, 2009

Sorry for being off-topic, but this is directly related to the C++
problem of this thread.

Juha said:
Using gcc 4.3.1 on a Pentium4 (running linux) here. Without
optimizations it prints false, with optimizations it prints true. I
wouldn't have believed it if I didn't test it myself.

The unoptimized asm is rather verbose, but here it is:

You don't need to post all the iostream stuff. Cutting down to essential
parts:

in main():

fld1
fstpl -24(%ebp)
fld1
fstpl -16(%ebp)
fldl -16(%ebp)
fstpl 8(%esp)
fldl -24(%ebp)
fstpl (%esp)

The above puts x and y onto the stack.

call _Z1fdd
fstpl -32(%ebp)

This calls f(x,y) and stores the result into a temporary.

fldl -16(%ebp)
fstpl 8(%esp)
fldl -24(%ebp)
fstpl (%esp)
call _Z1fdd

Again loading x and y and calling f(x,y).

fldl -32(%ebp)

This loads the first result back from the temporary, while the result of
the second call to f is still loaded.

fucompp

Compares the result of the first call to the result of the second call.

So whats the difference between the two results? The first result was
stored in a temporary, the second result remained in the floating point
unit.

The x86 floating point co-processor operates on 80 bit floating point
values. Temporaries stored in memory are in 64 bit double precision
format. So by storing the first result into a temporary, it was
truncated and rounded to 64 bit, and by loading it back, it was extended
to 80 bit again. The second result still is 80 bit when the two are
compared.

Sounds like BS to you, Juha? ;-)

That's why your teacher told you never to compare two floating point
values using equality.

Juha Nieminen · Mar 1, 2009

Thomas said:
Sounds like BS to you, Juha? ;-)

Sounds like a compiler bug to me. Especially the fact that compiling
with optimizations results in different behavior than without
optimizations is rather telling.

On the other hand, I suppose this is, once again, one of those things
where compilers are free to do whatever they want, so even if one call
to f() results in -2.1 and a second call results in pi, that would still
be "allowed". After all, does the standard impose any requirements on
the correctness of floating point calculations?

I actually find it rather ironic that it gives the incorrect answer
without optimizations and the correct answer with optimizations. One
would think that, if it makes any difference, it would be the optimized
version which cuts corners at the cost of accuracy, and the unoptimized
version would do all steps in the same way (because speed is not, after
all, a concern).

Philip Pemberton · Mar 1, 2009

And what happens if you use vector<Pixel> instead? what happens with
other containers?

Segfault.

Also, what does your comparison functor look like? Maybe, there are some
subtle differences there.

There isn't a comparison functor. std::sort is using the "operator<"
overloaded operator in the Pixel class.

I did try refactoring the code to use a functor instead, and got the same
result (segfault).

Thanks,
Phil.

Kai-Uwe Bux · Mar 1, 2009

Thomas said:
Sorry for being off-topic, but this is directly related to the C++
problem of this thread.

No, the outcome is implementation defined. Arguably, your implementation has
the _better_ result.

You don't need to post all the iostream stuff. Cutting down to essential
parts:

in main():

The above puts x and y onto the stack.

This calls f(x,y) and stores the result into a temporary.

Again loading x and y and calling f(x,y).

This loads the first result back from the temporary, while the result of
the second call to f is still loaded.

Compares the result of the first call to the result of the second call.

Great explanation! Thanks.

So whats the difference between the two results? The first result was
stored in a temporary, the second result remained in the floating point
unit.

The x86 floating point co-processor operates on 80 bit floating point
values. Temporaries stored in memory are in 64 bit double precision
format. So by storing the first result into a temporary, it was
truncated and rounded to 64 bit, and by loading it back, it was extended
to 80 bit again. The second result still is 80 bit when the two are
compared.

Yup, and that's ok with the standard by [5/10].

Sounds like BS to you, Juha? ;-)

That's why your teacher told you never to compare two floating point
values using equality.

Actually, _that_ is not the reason I heard. I thaught equality is very
likely not what you want anyway because of what the floating point numbers
in your program represent.

However, it is true that the excess precision in registers can screw up
comparisons. But it can be prevented forcing a write-reread cycle. The
program

#include <iostream>
#include <iomanip>
#include <cmath>

using namespace std;

bool equal ( double const & lhs, double const & rhs ) {
return ( lhs == rhs );
}

int main ( void ) {
double x = 1;
double y = 1;
cout
<< boolalpha
<< equal( sin(x)+cos(y), sin(x)+cos(y) ) << '\n';
}

prints true. I am not certain whether the standard requires that or not, but
since the OP also uses gcc on an intel platform, the example pertains to
the behavior he observes.

BTW:

#include <iostream>
#include <iomanip>
#include <cmath>
#include <functional>

using namespace std;

int main ( void ) {
double x = 1;
double y = 1;
cout
<< boolalpha
<< std::equal_to< double >()( sin(x)+cos(y), sin(x)+cos(y) ) << '\n';
}

prints "true", as well: the std::equal_to() also forces a write-reread
cycle. The OP therefore could test whether this is the problem by changing
the operator< to:

bool operator<(const Pixel &rhs) const {
return std::less< double >()( get_luminance(), rhs.get_luminance() );
}

Best

Kai-Uwe Bux

Kai-Uwe Bux · Mar 1, 2009

Philip said:
Segfault.

There isn't a comparison functor. std::sort is using the "operator<"
overloaded operator in the Pixel class.

Huh? If you are using std::vector< Pixel* >, then there _must_ be a
comparison functor since otherwise you are using

std::less< Pixel* >

which will compare the addresses of the Pixels and not their luminations.
You would the functor to forward the comparison from the pointer to the
pointee.

I did try refactoring the code to use a functor instead, and got the same
result (segfault).

Out of curiosity, I suggested elsethread the following implementation for
Pixel:

perator<:

bool operator<(const Pixel &rhs) const {
return std::less< double >()( get_luminance(), rhs.get_luminance() );
}

Could you try that?

Best

Kai-Uwe Bux

James Kanze · Mar 2, 2009

I'm sorry, but this sounds like BS to me.

Regretfully, it's not.

Care to give an example where the exact *same* equation (using
the exact same, unmodified variables) can give *different*
results from evaluation to evaluation?

It depends on the machine. The standard explicitly allows
expressions to be evaluated with greater precision than
requested, and many compilers for Intel processors will
systematically use long double for all intermediate results.

The standard does require that if the value is assigned to a
double, or used to initialize a double, it should be truncated
to double. G++ (and probably other compilers as well) ignores
this, at least by default---if I'm not mistaken, it returns
doubles in a floating point register, *without* truncating the
precision to double. So if you write something like:

f( a, b, c ) == f( a, b, c )

(where a, b and c are doubles, and f simply returns some
arbitrary expression), it's entirely possible that one of the
return values will be spilled to memory (and thus truncated to
double), and the other not. The results of the fucntion are the
same in both cases, but one has been truncated to 52 bits
precision, and the other maintains 64 bits precision. Unless
the results are exactly representable n 52 bits, the above won't
compare equal.

Also, care to explain the process by which this happens?

The only way this would happen is if the FPU hardware (or
whichever chip is doing the FP calculations) deliberately
gives random results each time it performs the exact same
calculations. Not that this wouldn't be conceivable (eg.
random rounding of the last bit), but I have never heard of
any hardware doing that. Would it even conform to the related
IEEE standard?

A third thing I would like to hear is how a broken comparison
can cause std::sort to segfault, rather than simply causing
the array to not to be properly sorted. I can't think of any
sorting algorithm which would start indexing out of boundaries
even if the comparisons return invalid results.

Quick sort will.

James Kanze · Mar 2, 2009

That might be because you snipped the context.

To restore the context, let me reproduce the operator< in question:

double get_luminance() const {
return (r * 255.0 * 0.212671)
+ (g * 255.0 * 0.715160)
+ (b * 255.0 * 0.072169);
}

bool operator<(const Pixel &rhs) const {
return (get_luminance() < rhs.get_luminance());
}

This is implicitly used in a call to sort().

What is in question is whether this comparison predicate will
define a strict weak order.

Well, I just had this program:

#include <iostream>
#include <iomanip>
#include <cmath>

using namespace std;

int main ( void ) {
double x = 1;
double y = 1;
cout
<< boolalpha
<< ( sin(x)+cos(y) == sin(x)+cos(y) ) << '\n';
}

and the output was "false".

The program above prints "false" on my machine because of
clause [5/10]. Whether the two subexpressions "sin(x)+cos(y)"
qualify as the "exact *same* equation" I don't know. But
something like this _could_ cause the operator< to yield
inconsistent answers.

Paragraph [5/10] only applies within expressions. When the
value is used to initialize an object (e.g. the return value),
the standard requires it to be truncated to the correct
precision. Some compilers don't do this, at least by default.

Rounding of last bits can occur between registers in the CPU
and floating point objects in memory. A write-reread can
therefore change the value. Clause [5/10] gives a lot of
lenience in this way.

But only within an expression. Technically, the return
statement initializes a variable of the given type, so the
excess precision must be lost at this point. (Formally, Juha's
suggestion concerning some random element in rounding is
conformant. Like Juha, I don't know of a machine where this
actually happens, however, and it wouldn't be IEEE conform. I'm
pretty sure that we're dealing with a lack of conformaty here,
however, and that extended precision is being extended to places
where it isn't allowed.)

Hm, I can think of partition exchange implementations that
would use the pivot as a sentinel value to find the left-most
index for which

*left < pivot

is false and the right-most index for which

pivot < *right

is false. One would expect that the two expressions _must_
become false eventually since the pivot is part of the
sequence. But if operator< is broken, then left could be
incremented beyond the bound of the sequence.

Let's see what the implementation on my machine does:

#include <vector>
#include <algorithm>

struct A {
bool operator< ( A const & rhs ) const {
return true;
}
};

int main ( void ) {
std::vector< A > av;
for ( int n = 0; n < 32; ++n ) {
av.push_back( A() );
}
std::sort( av.begin(), av.end() );
}

This actually seems to run forever, but it does not segfault.

Yes. But as I'm sure you know, this doesn't prove anything. It
all depends on the values, where they're situated in the array,
and some of the finer details of how the partition is actually
implemented. You might access outside array bounds; you might
run forever; you might return with the array incorrectly sorted,
or it might even work.

We've certainly seen examples here before where an incorrect
comparison algorithm resulted in core dumps.

James Kanze · Mar 2, 2009

Juha said:
Juha said:

Kai-Uwe Bux said:

Well, I just had this program:
#include <iostream>
#include <iomanip>
#include <cmath>
using namespace std;
int main ( void ) {
double x = 1;
double y = 1;
cout
<< boolalpha
<< ( sin(x)+cos(y) == sin(x)+cos(y) ) << '\n';
}
and the output was "false".

Click to expand...

I honestly can't understand what's happening there. Even if
it's changed to this:
#include <iostream>
#include <iomanip>
#include <cmath>
double f(double x, double y)
{
return std::sin(x) + std::cos(y);
}
int main ( void ) {
double x = 1;
double y = 1;
std::cout << std::boolalpha
<< ( f(x, y) == f(x, y) ) << std::endl;
}
it still prints false. [..]

Click to expand...

I don't know what you're talking about, guys. I just took
your code, plugged it into my test project, compiled, ran, and
got 'true'. Changed FP settings, changed to optimized, same
thing. Maybe Visual C++ 2008 is wrong somehow, and I am
supposed to get 'false'?

What compiler, what hardware? Can you post the assembly?

You're supposed to get true, at least if the floating point
hardware doesn't to any random rounding. (I think we all agree
that if you change the rounding mode of the FPU between the
calls, all bets are off.) This is a well known issue with g++
(I'd call it a bug, but it is intentional), which optimizes more
than is allowed by the standard.

Note that if you change the code to manually inline the
functions (and thus eliminate the initialization of the double
return value), the standard does allow the results not to
compare equal. Curiously enough, with g++, it's more likely to
work in the case where it's not guaranteed; unless the
expression requires enough temporaries to spill to memory,
everything will be done with 64 bits precision, and you'll get
the wrong results. The problem here is that the compiler cannot
see the register use in the function, so after the first call,
it must assume that all floating point registers are used, and
spill those results to memory. It then (incorrectly) compares
the 64 bit results with the truncated results.

On an Intel, and probably on other processors which do all
floating point arithmetic with extended precision by default.
On a Sparc (which doesn't have a 64 bit precision double), there
is no problem.

James Kanze · Mar 2, 2009

Sounds like a compiler bug to me. Especially the fact that
compiling with optimizations results in different behavior
than without optimizations is rather telling.

On the other hand, I suppose this is, once again, one of those
things where compilers are free to do whatever they want, so
even if one call to f() results in -2.1 and a second call
results in pi, that would still be "allowed". After all, does
the standard impose any requirements on the correctness of
floating point calculations?

Actually, in this particular case, it is a case of g++ not being
conform. The problem is still present in general; Kai-Uwe cited
the passage which creates the problem.

I actually find it rather ironic that it gives the incorrect
answer without optimizations and the correct answer with
optimizations. One would think that, if it makes any
difference, it would be the optimized version which cuts
corners at the cost of accuracy, and the unoptimized version
would do all steps in the same way (because speed is not,
after all, a concern).

The difference is probably due to the fact the with optimizing,
if the function is defined in the same translation unit, g++
will inline it, thus seeing that it has registers to spare, and
doesn't need to store the results of the first call in memory.
In other words, it cuts the same corner both times, resulting in
the same formally wrong value being used on both sides of the
equation. A case of two wrongs making a right.

Kai-Uwe Bux · Mar 2, 2009

James said:
Juha said:

Kai-Uwe Bux wrote:
Well, I just had this program:
#include <iostream>
#include <iomanip>
#include <cmath>
using namespace std;
int main ( void ) {
double x = 1;
double y = 1;
cout
<< boolalpha
<< ( sin(x)+cos(y) == sin(x)+cos(y) ) << '\n';
}
and the output was "false".
I honestly can't understand what's happening there. Even if
it's changed to this:
#include <iostream>
#include <iomanip>
#include <cmath>
double f(double x, double y)
{
return std::sin(x) + std::cos(y);
}
int main ( void ) {
double x = 1;
double y = 1;
std::cout << std::boolalpha
<< ( f(x, y) == f(x, y) ) << std::endl;
}
it still prints false. [..]

Click to expand...

Click to expand...

I don't know what you're talking about, guys. I just took
your code, plugged it into my test project, compiled, ran, and
got 'true'. Changed FP settings, changed to optimized, same
thing. Maybe Visual C++ 2008 is wrong somehow, and I am
supposed to get 'false'?

Click to expand...

What compiler, what hardware? Can you post the assembly?

Click to expand...

You're supposed to get true, at least if the floating point
hardware doesn't to any random rounding. (I think we all agree
that if you change the rounding mode of the FPU between the
calls, all bets are off.) This is a well known issue with g++
(I'd call it a bug, but it is intentional), which optimizes more
than is allowed by the standard.

I don't see how to prove from the standard that the code should
yield "true". For

... << ( sin(x)+cos(y) == sin(x)+cos(y) ) ...

it is clear that [5/10] gives the permission to print "false". For

... << ( f(x, y) == f(x, y) ) ...

I am not sure. What I don't see is whether return value optimization may
allow the implementation to _not_ truncate precision.

Another observation:

... << std::equal_to< double >()( sin(x)+cos(y), sin(x)+cos(y) ) ...

prints "true". This would be in line with RVO being an issue since passing
values into functions is not subject to optimizations allowed by the
standard.

[snip]

Best

Kai-Uwe

Custom matrix multiplication produces different results to glm	0	Sep 16, 2023
Interesting observation (predicate with std::sort)	3	Feb 2, 2010
vector has segfault null dereference	6	Oct 2, 2007
About the third parameter of std::sort in VC6.	3	Jun 8, 2005
why std::sort take no template parameter to swap?	3	Nov 27, 2007
std::list of C-style arrays	8	Dec 7, 2007
Why std::sort is ~20% faster then introsort coded by hand?	11	Sep 14, 2008
Use of std::vector slower than arrays?	24	Nov 12, 2007

std::sort causes segfault when sorting class arrays

Philip Pemberton

fedor.chugunov

Kai-Uwe Bux

Philip Pemberton

Kai-Uwe Bux

Thomas J. Gritzan

Juha Nieminen

Kai-Uwe Bux

Juha Nieminen

Juha Nieminen

Thomas J. Gritzan

Juha Nieminen

Philip Pemberton

Kai-Uwe Bux

Kai-Uwe Bux

James Kanze

James Kanze

James Kanze

James Kanze

Kai-Uwe Bux

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads