Portability Problem; Minimal Example

dgoodmaniii · Aug 5, 2010

Summary: code works correctly on x86_64; fails on i686;
both systems running the same kernel and the same operating
system. When running the code through valgrind, however, it
works on both systems. valgrind finds 0 errors in 0
contexts. A minimal example which reproduces this errors is
attached to the end.

To compile in all instances: gcc -lm -o doztest doztest.c

Long story: I'm writing a base-ten to base-twelve
converter, very simple, that works fine on my x86_64 box,
running Debian GNU/Linux (stable) on a 2.6.26 kernel. It
passed all my tests; in particularly, it correctly converted
this:

0.3333333333333333 --> 0;4000

When I cloned the repository and built it on my i686 box,
however, running the same Debian on the same kernel, this is
what came out:

0.3333333333333333 --> 0;3000
0.3333333333333 --> 0;3EEE

This is wrong, of course, but I can't figure out how. So I
ran the following:

valgrind -v --leak-check=full

followed by my program. It says there are 0 errors from 0
contexts; however, when run through valgrind the proper
answer comes out! That is, the code that produces this:

0.3333333333333333 --> 0;3000

when run by itself, produces this:

0.3333333333333333 --> 0;4000

when run through valgrind. I'm completely mystified. So I
isolated the code that seems to be causing the trouble,
which is the dectodoz function, which only calls two of my
own functions, and have posted it here as a minimal example.
This code reproduces the same strange results in all contexts.
#include<stdio.h>
#include<float.h>
#include<math.h>
#include<string.h>

void reverse(char *s)
{
int i, j;
char tmp;
size_t length;

length = strlen(s) - 1;
for (i=0, j=length; i<j; ++i, --j) {
tmp = *(s+i);
*(s+i) = *(s+j);
*(s+j) = tmp;
}
}

char dozenify(char num)
{
switch (num) {
case 0: case 1: case 2: case 3: case 4: case 5: case 6:
case 7: case 8: case 9:
return (num % 10) + '0';
case 10:
return 'X';
case 11:
return 'E';
}
}

int dectodoz(char *doznum, double decnum)
{
int i = 0; int sign = 0; int j = 0;
double wholedec; /* whole number portion of decnum */
double partholder; /* someplace for modf to dump integral */

if (decnum < 0) {
decnum = -decnum;
sign = 1;
}
partholder = modf(decnum,&wholedec);
decnum -= wholedec;
while (wholedec >= 12) {
*(doznum+(i++)) = dozenify(fmod(wholedec,12.0));
wholedec /= 12;
}
*(doznum+(i++)) = dozenify(fmod(wholedec,12));
if (sign == 1)
*(doznum+(i++)) = '-';
*(doznum+i) = '\0';
reverse(doznum);
if (decnum > 0) {
*(doznum+(i++)) = ';';
for (i=i; i <= DBL_MAX_10_EXP; ++i) {
*(doznum+i) = dozenify((int)(decnum * 12));
decnum = modf(decnum*12,&partholder);
}
*(doznum+i) = '\0';
}
return 0;
}

int main(void)
{
char doznum[2000] = "";
double decnum = 0.33333333333333333333333333333333333;

dectodoz(doznum,decnum);
printf("%s\n",doznum);
return 0;
}
<<<<<<<

Seebs · Aug 5, 2010

followed by my program. It says there are 0 errors from 0
contexts; however, when run through valgrind the proper
answer comes out! That is, the code that produces this:

0.3333333333333333 --> 0;3000

Wait, 3000 or 3EEE?

More interestingly...

when run through valgrind. I'm completely mystified. So I
isolated the code that seems to be causing the trouble,
which is the dectodoz function, which only calls two of my
own functions, and have posted it here as a minimal example.
This code reproduces the same strange results in all contexts.

When I run this, I get:

0;40000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000

(line breaks added).

This may have to do with the value of DBL_MAX_10_EXP being different
on different targets, but I remain a bit confused.

void reverse(char *s)
{
int i, j;
char tmp;
size_t length;

length = strlen(s) - 1;
for (i=0, j=length; i<j; ++i, --j) {
tmp = *(s+i);
*(s+i) = *(s+j);
*(s+j) = tmp;
}
}

This seems like it should be irrelevant, because after all, it wouldn't
be able to generate E's from whole cloth.

Let's remove it, for now, since it's irrelevant (we hope) in this case.

char dozenify(char num)
{
switch (num) {
case 0: case 1: case 2: case 3: case 4: case 5: case 6:
case 7: case 8: case 9:
return (num % 10) + '0';
case 10:
return 'X';
case 11:
return 'E';
}
}

It isn't obvious why you use 'char' num instead of, say, 'int' num here,
but looks harmless.

int dectodoz(char *doznum, double decnum)
{
int i = 0; int sign = 0; int j = 0;
double wholedec; /* whole number portion of decnum */
double partholder; /* someplace for modf to dump integral */

if (decnum < 0) {
decnum = -decnum;
sign = 1;
}
partholder = modf(decnum,&wholedec);
decnum -= wholedec;

After you have done this, 'partholder' and 'decnum' are identical. I would
say drop one of them.

while (wholedec >= 12) {
*(doznum+(i++)) = dozenify(fmod(wholedec,12.0));
wholedec /= 12;
}

Okay, you're doing floating point math where you don't need to, but okay.

*(doznum+(i++)) = dozenify(fmod(wholedec,12));

if (sign == 1)
*(doznum+(i++)) = '-';
*(doznum+i) = '\0';

reverse(doznum);

And we can remove this "reverse" and it shouldn't matter (for this specific
case) and further simplifies the code. And in any event, it doesn't seem
that this part is producing "wrong" results.

if (decnum > 0) {
*(doznum+(i++)) = ';';
for (i=i; i <= DBL_MAX_10_EXP; ++i) {
*(doznum+i) = dozenify((int)(decnum * 12));
decnum = modf(decnum*12,&partholder);
}
*(doznum+i) = '\0';
}
return 0;
}

Hmm.

Seems to me you could simplify this quite a bit. Let's assume we're not
concerned with the accuracy of the whole result, but merely with reproducing
the apparent bug.

int dectodoz(char *doznum, double decnum)
{
int i = 0;
double wholedec; /* whole number portion of decnum */

decnum = modf(decnum,&wholedec);
if (decnum > 0) {
*(doznum++) = ';';
for (i=0; i <= DBL_MAX_10_EXP; ++i) {
*(doznum++) = dozenify((int)(decnum * 12));
decnum = modf(decnum*12,&wholedec);
}
*(doznum++) = '\0';
}
return 0;
}

This seems as though it should do the same thing. We drop the unneeded
elaborate indexing into doznum. (Since it's a parameter, we're working
with a local copy of the pointer, so we can just iterate through it.)

This produces the same results, for me. We can even clean up that inner
loop a bit more:

decnum = modf(decnum * 12, &wholedec);
*(doznum++) = dozenify(wholedec);

And I still get the same results.

-s

Seebs · Aug 5, 2010

I messed around with this more.

for (i=0; i <= DBL_MAX_10_EXP; ++i) {
#ifdef HAX
printf("%f\n", decnum * 12);
#endif
*(doznum++) = dozenify((int)(decnum * 12));
decnum = modf(decnum*12,&wholedec);
if (decnum == 0)
break;
}

On several versions of gcc I've tried, compiling -DHAX produces
4.000000
;4
and compiling without -DHAX produces
;3

So I think this is a compiler bug.

By happy coincidence, it occurs in a compiler for which I have access to an
active support arrangement. I'm filing a bug report to see what obvious
explanation I missed.

-s

Seebs · Aug 5, 2010

I messed around with this more.

And a bit more.

for (i=0; i <= DBL_MAX_10_EXP; ++i) {
#ifdef HAX
printf("%f\n", decnum * 12);
#endif
*(doznum++) = dozenify((int)(decnum * 12));
decnum = modf(decnum*12,&wholedec);
if (decnum == 0)
break;
}

On several versions of gcc I've tried, compiling -DHAX produces
4.000000
;4
and compiling without -DHAX produces
;3

Changing "decnum * 12* to "decnum" in printf undoes this change.
Printing .30f, "decnum" is a bunch of threes and then some smaller
digits (which I'd expect), and "decnum * 12" is EXACTLY 4.0000.

Which I did not expect.

I think someone's doing a premature optimization.

-s

dgoodmaniii · Aug 5, 2010

Thanks for the responses.

gcc 4.3.2 (Debian 4.3.2-1.1), target i486-linux-gnu, doesn't
do this on the i686 machine. Compiling with -DHAX produces
4.000000
;3
This holds true whether I've got .30 in printf or not.

I think someone's doing a premature optimization.

Do you mean me, or the compiler? I normally wouldn't dare
ask the question except that you've mentioned you think it
might be a compiler problem. Is there something I can do
about this? E.g., set a compiler flag, use a different
compiler? Or am I stuck for now running my program only on
an AMD64 machine?

Seebs · Aug 5, 2010

gcc 4.3.2 (Debian 4.3.2-1.1), target i486-linux-gnu, doesn't
do this on the i686 machine. Compiling with -DHAX produces
4.000000
;3
This holds true whether I've got .30 in printf or not.
Interesting.

Do you mean me, or the compiler? I normally wouldn't dare
ask the question except that you've mentioned you think it
might be a compiler problem. Is there something I can do
about this? E.g., set a compiler flag, use a different
compiler? Or am I stuck for now running my program only on
an AMD64 machine?

Answer: This is well-known.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

You should be able to make it go away with:

-msse -mfpmath=sse

because your problem is that the x87 FPU is annoying and quirky.

-s

James Dow Allen · Aug 6, 2010

Summary: code works correctly on x86_64; fails on i686;
this:
0.3333333333333333 --> 0;4000
... [changed incorrectly to this]:
0.3333333333333333 --> 0;3000

What is happening, quite simply, is that you have
a double quite close in value to an integer, specifically
4.000000 in your case. Some library routines seem
to treat it as having integer part 4, but when you pass
it to dozenify(char num = ~ 4.000000) dozenify sees 3.
This is a commonish peril of floating point.

The question arises: Are your compiler and library
conforming to The Standard, or are they instead doing
that which they are not allowed to do? I don't know
the answer to that question.

I *do* know this problem would never arise the way I
program. On the precept that if you want to have a
calculation always give the same result, then do that
calculation only once, I write
int int_part = f;
when I need such an integer, and use int_part thereafter.

I'll admit I program this way as much to avoid confusing
*myself* as to avoid confusing compiler, but your
example suggests my way was right all along.

James Dow Allen

James Dow Allen · Aug 6, 2010

... if you want to have a
calculation always give the same result, then do that
calculation only once ...

BTW, here's a smaller program to exhibit the same "faulty"
behavior under gcc:

#include <stdio.h>

int main()
{
double x1, x2;

x1 = 0.33333333333333333333;
x2 = x1 * 3;
printf("%d %d\n", (int)(x1 * 3), (int)x2);
return 0;
}

James Dow Allen

dgoodmaniii · Aug 8, 2010

Answer: This is well-known.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

You should be able to make it go away with:

-msse -mfpmath=sse

because your problem is that the x87 FPU is annoying and quirky.

Wow. I've heard programmers complain about the chipset, but
I've never experienced the problem. I refused to even
consider that this might be a hardware problem; I'm shocked
that it was one.

This solution didn't work; but perusing the link provided, I
included:
-march=pentium4 -msse -mfpmath=sse
and all was well. The difference is that in x86_64
chipsets, I've learned, gcc compiles for this SSE FPU by
default, while in 32-bit mode gcc compiles for x87 by
default. It seems that all "newer" processors have this SSE
FPU (in 2008), so it's probably safe. I'll have to include
a warning for those who try to use it with older x87 FPUs.

Thank you very much for puzzling this out for me. I'm
positive I never would have figured it out on my own.

Noob · Aug 9, 2010

dgoodmaniii said:
-march=pentium4 -msse -mfpmath=sse
and all was well. The difference is that in x86_64
chipsets, I've learned, gcc compiles for this SSE FPU by
default, while in 32-bit mode gcc compiles for x87 by
default. It seems that all "newer" processors have this SSE
FPU (in 2008), so it's probably safe. I'll have to include
a warning for those who try to use it with older x87 FPUs.

FYI

http://en.wikipedia.org/wiki/SSE2#Notable_IA-32_CPUs_not_supporting_SSE2
http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions

Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Dynamic Array Size Problem??	9	Jul 10, 2023
program not working properly. book example. program included.	6	Sep 27, 2013
A process take input from /proc/<pid>/fd/0, but won't process it	0	Oct 29, 2023
std::list of class pointers, understanding problem (with minimal example)	4	Jul 28, 2010
Adding adressing of IPv6 to program	1	Feb 16, 2023
Typecasting portability in C	12	Jan 28, 2007
Portability / compatibility issues	23	Jan 15, 2006

Portability Problem; Minimal Example

dgoodmaniii

Seebs

Seebs

Seebs

dgoodmaniii

Seebs

James Dow Allen

James Dow Allen

dgoodmaniii

Noob

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads