32/64 bit cc differences

Eric Sosman · Jan 10, 2014

I'm getting a tiny-cum-microscopic, but nevertheless fatal,
difference in the behavior of the exact same C code compiled
on one 64-bit linux machine...

I concur with Ben Bacarisse's assessment of the code's
readability; my head hurts, too! But I toughed it out with
Tylenol long enough to notice one possible source of trouble:
Your pseudo-random number generator produces `float' values.
Even if the internal mechanics of the generator are accurately
reproduced on all systems, the actual values used in subsequent
computations might not be. The C Standard gives implementations
some freedom in floating-point calculation, in particular:

"Except for assignment and cast (which remove all extra
range and precision), the values yielded by operators with
floating operands and values subject to the usual arithmetic
conversions and of floating constants are evaluated to a
format whose range and precision may be greater than
required by the type. [...]" -- 5.2.4.2.2p9

So: The computations involving the `float' numbers you generate
might come out differently on different machines, even if you
manage to generate exactly the same `float' numbers. One machine
could use `float' precision, another could use `double', yet
another might use `long double' or even some hardware-internal
precision not directly accessible from C. The upshot is that a
low-order bit might round differently every now and again. (And
because of the way your program operates, there's a decent chance
that the damage could be affect only a single block and leave any
subsequent blocks intact.)

I'm not saying this *does* happen, only that it might. Unless
you find some other smoking gun -- or even if you do! -- I'd suggest
eliminating all floating-point calculation from the program. Use
a purely-integer PRNG, and use purely-integer arithmetic on what
it generates.

Mikko Rauhala · Jan 10, 2014

I concur with Ben Bacarisse's assessment of the code's
readability; my head hurts, too! But I toughed it out with
Tylenol long enough to notice one possible source of trouble:
Your pseudo-random number generator produces `float' values.

Good catch. This is definitely something that might cause the
problem. To add some detail, x86 FPUs use 80-bit extended
precision internally, while x86-64 has deprecated (or altogether
disabled for long mode, not sure) the old FPU interface. x86-64
compilers use SSE2 as the baseline for float functionality these
days, and _there_ the precision is standard 64-bit double.

glen herrmannsfeldt · Jan 10, 2014

I concur with Ben Bacarisse's assessment of the code's
readability; my head hurts, too! But I toughed it out with
Tylenol long enough to notice one possible source of trouble:
Your pseudo-random number generator produces `float' values.
Even if the internal mechanics of the generator are accurately
reproduced on all systems, the actual values used in subsequent
computations might not be. The C Standard gives implementations
some freedom in floating-point calculation, in particular:

I hadn't looked at it at all before my post. Yes, don't use float
if you want predictable results. Knuth has written that floating
point shouldn't be used for financial or typesetting. Seems to me
that it shouldn't be used for encryption, either.

The other one that I thought about that comes up just often
enough is bit shifts greater than or equal to the word size.

Many machines treat shift modulo the width of the word, such
that shifting a 32 bit int 32 bits doesn't shift at all.
On a 64 bit system, with a 64 bit int, it would shift.

-- glen

Keith Thompson · Jan 10, 2014

glen herrmannsfeldt said:
Many machines treat shift modulo the width of the word, such
that shifting a 32 bit int 32 bits doesn't shift at all.
On a 64 bit system, with a 64 bit int, it would shift.

Which is why such shifts have undefined behavior (depending on the width
of the left operand, not necessarily on the width of a "word"):

N1570 6.5.7p3:
The integer promotions are performed on each of the operands. The
type of the result is that of the promoted left operand.
If the value of the right operand is negative or is greater
than or equal to the width of the promoted left operand, the
behavior is undefined.

Ben Bacarisse · Jan 10, 2014

Mikko Rauhala said:
Good catch. This is definitely something that might cause the
problem.

Yes, good spot. It might not be a problem though because the program
scatters "random" data about which is then filtered out on decryption.
FP differences will just add another dimension to the "randomness".
However, if the RNG is also used for deterministic but chaotic data in
the heart of the algorithm then, yes, there will be trouble ahead.

<snip>

glen herrmannsfeldt · Jan 10, 2014

Which is why such shifts have undefined behavior (depending on the width
of the left operand, not necessarily on the width of a "word"):

If you define "word" as the size of the left operand, then it is word,
but yes. Well, after the appropriate promotions.

N1570 6.5.7p3:
The integer promotions are performed on each of the operands. The
type of the result is that of the promoted left operand.
If the value of the right operand is negative or is greater
than or equal to the width of the promoted left operand, the
behavior is undefined.

In the development of the Hercules IBM mainframe emulator, when
OS/360 was almost, but not quite, running, I remembered that S/360
does shifts (32 and 64 bit) modulo 64. As well as I remember, without
looking at the code I figured out that might be a problem.

-- glen

-- glen

Kaz Kylheku · Jan 10, 2014

If you define "word" as the size of the left operand, then it is word,
but yes. Well, after the appropriate promotions.

In the development of the Hercules IBM mainframe emulator, when
OS/360 was almost, but not quite, running, I remembered that S/360
does shifts (32 and 64 bit) modulo 64. As well as I remember, without
looking at the code I figured out that might be a problem.

Other undefined behaviors can happen "modulo". For instance, suppose that a
structure contains some small near the beginning and so statically addressed
entries in this array, like foo.a[3] can be addressed using some displaced
addressing mode where the displacement is a small field in the instruction
opcode, say 6 bits wide or whatever. Then foo.a[257] could be reduced to 1
(modulo 64): just stuffed into the opcode and truncated. "Undefined behavior"
allows that.

JohnF · Jan 11, 2014

James Kuyper said:
The random numbers in your program make it difficult to debug, because
the behavior can be different each time it's run.

Not really. Seeds are generated from hash-like functions of user's
input key(s). How could it later decrypt what it had previously
encrypted if seeds were generated differently every run?
But I did find the problem (see subsequent followup),
and you're closer than you might think based on preceding remark.

JohnF · Jan 11, 2014

BartC said:
How big are int, long and long long on your machines?

Yeah, I suppose I should put a verbose/debugging printf
for sizeof(this),sizeof(that),sizeof(the_other_thing),etc,
just to easily and immediately see what's going on in
every environment I run it on.

I seem to remember that gcc 'long' was 32-bit under Windows, and 64-bit
under Linux. (If you expect long to be 64-bits, and ever want to run under
Windows, you might want to upgrade the type.)

(I've looked at your code; I've no idea what the problem might be, but if
you're porting from an int=32, long=64 implementation to an int=64, long=64
one, I'm surprised you haven't got bigger differences. Certainly my own code
would have a load of problems!)

The code's supposed to be portable, and not care as long as int>=32.
The one place it wanted strictly >32 I used long long (despite
obnoxious -Wall warnings about it). Anyway, I found the problem,
explained in subsequent followup, kind of along the lines you're
suggesting, but a rounding problem.

Werner Wenzel · Jan 11, 2014

Am 11.01.2014 01:06, schrieb Keith Thompson:

N1570 6.5.7p3:

Please tell a non-native speaker what "p" in "6.5.7p3" stands for:

"point"? "phrase"?

Assuming that if refers to the number on the margin I would call it
"margin number", "marginal number", or "recital".

Thanks in advance.

JohnF · Jan 11, 2014

Stephen Sprunk said:
Are all of those systems ILP32, or is the Alpha one ILP64? (I've never
used OpenVMS.) Have you tried it on Win64, which is IL32LLP64, or just
Win32?

Linux/x86-64 is I32LP64, so if that's the first such system you've
ported to, that may indicate where the problem lies. If your code works
on ILP64 systems, type problems seem unlikely, but you could still have
int/long mismatches that wouldn't show up there--or on IL32LLP64.

Also, you're using three different versions of GCC; it's possible that
one of them has a bug (or "feature") that's triggered by some UB in your
code and results in slight output changes. I'd highly suggest using the
same version on all your systems, if possible, to eliminate that as a
potential source of differences.

Aside: Why roll your own encryption algorithm rather than use a proven,
off-the-shelf algorithm, e.g. AES? Depending on OpenSSL's libcrypto is
pretty standard these days; no sense reinventing the square wheel.
S

Aw, c'mon Stephen, everybody rolls their own (oh, do you mean programs?

.
Though note that the program's forkosh.com/fm.html page (copy included
in forkosh.com/fm.zip distribution) says exactly what you're saying --
lots of existing stuff already does what it does. And fm.html provides
several links to lists of encryption utilities, i.e., a list of lists
of existing encryption programs. So, yeah, obviously, lots of stuff
already out there.

JohnF · Jan 11, 2014

Aleksandar Kuktin said:
Wait, I don't understand this.
Correct me if I'm wrong:
1. same input file;
2. output of 32-bit executable and of 64-bit executable are bit-for-bit
identical;
3. output of executable, when decrypted by same executable that outputted
it, is bit-for-bit identical to the input;
4. output of 32-bit executable, when decrypted on 64-bit produces wrong
output;
5. output of 64-bit executable, when decrypted on 32-bit produces wrong
output.
Obviously, this can not be.

Your #2 is wrong, as explicitly stated in the last parenthetical
"btw" sentence you quoted from me. Correct that, and it now can be.
In fact, it is.

ais523 · Jan 11, 2014

Werner said:
Am 11.01.2014 01:06, schrieb Keith Thompson:

Please tell a non-native speaker what "p" in "6.5.7p3" stands for:

"point"? "phrase"?

Assuming that if refers to the number on the margin I would call it
"margin number", "marginal number", or "recital".

"paragraph".

luser- -droog · Jan 11, 2014

Am 11.01.2014 01:06, schrieb Keith Thompson:

Please tell a non-native speaker what "p" in "6.5.7p3" stands for:

"point"? "phrase"?

Assuming that if refers to the number on the margin I would call it
"margin number", "marginal number", or "recital".

I believe it stands for "paragraph".

JohnF · Jan 11, 2014

Eric Sosman said:
I concur with Ben Bacarisse's assessment of the code's
readability; my head hurts, too!

Sorry about that, too (in addition to apology to Ben).
If either of you guys are ever in NYC, I'll buy you both
a few beers. And Eric gets an extra -- you pretty much
nailed the problem, which was more precisely a rounding
(actually, intentional truncating) difference when
I generated permutations (see below).

But I toughed it out with
Tylenol long enough to notice one possible source of trouble:
Your pseudo-random number generator produces `float' values.
Even if the internal mechanics of the generator are accurately
reproduced on all systems, the actual values used in subsequent
computations might not be. The C Standard gives implementations
some freedom in floating-point calculation, in particular:

"Except for assignment and cast (which remove all extra
range and precision), the values yielded by operators with
floating operands and values subject to the usual arithmetic
conversions and of floating constants are evaluated to a
format whose range and precision may be greater than
required by the type. [...]" -- 5.2.4.2.2p9

So: The computations involving the `float' numbers you generate
might come out differently on different machines, even if you
manage to generate exactly the same `float' numbers. One machine
could use `float' precision, another could use `double', yet
another might use `long double' or even some hardware-internal
precision not directly accessible from C. The upshot is that a
low-order bit might round differently every now and again. (And
because of the way your program operates, there's a decent chance
that the damage could be affect only a single block and leave any
subsequent blocks intact.)

I'm not saying this *does* happen, only that it might. Unless
you find some other smoking gun -- or even if you do! -- I'd suggest
eliminating all floating-point calculation from the program. Use
a purely-integer PRNG, and use purely-integer arithmetic on what
it generates.

Well, go ahead and say it. Not only might it happen,
it pretty much does...

I first fixed up Ben's original guess about bitwise operations
on signed ints, but that was no help. So I turned off xor/xnor
operations for testing, but problem remained with just
bit-permutation encryption. So I turned off permutations,
leaving xor/xnor, and now problem disappeared.
And that was the beginning of the end for that bug.
I'd originally dismissed permutation problems,
thinking that would mess up entirely or not at all -- once
any random number changes, the seed gets changed, and then
the entire subsequent sequence changes. And that reasoning
was right, and random numbers were being generated identically.
But truncating to get an integer index from a floating 0.0-1.0
was behaving differently -- once in every 5000 cases
in test below.

Some new debugging output in permutation() func now displays
entire sequence of random ints used to generate a permutation.
In test case, it was called for n=29616, i.e., a random
permutation of 1,2,3,...,29616. It initializes p=i+1 and
then generates random swaps in a loop, swapping p<-->p[j]
where i=0...n-1 and,
int j = i + (int)(((float)(n-i))*ran1(stream,seed)); /*random i...n-1*/

I printf'ed all j's, redirected output to files, and diffed.
Diff output reproduced below: 6 numbers different out of 29616.
Top diff line is from 32-bit run, bottom line from 64.
It's got to be the intentional truncating in the above j=i+...
that's very occasionally behaving differently on 32- vs 64-bit,
when ran1() returns 0.999... Note that the int from the 32-bit
run always has a one-lower value.
And that's what's allowing the occasional permutation error,
without messing up the entire sequence. Never occurred to me.
So I paid very little attention to the permutation part of the code.
Thanks for all the help. Fyi, those diffs are...
271c271
< 15314 3090 5212 6467 9922 25042 15143 7413 25123 9071
---

15314 3090 5212 6467 9922 25042 15143 7414 25123 9071

Click to expand...

322c322
< 20092 22664 21806 14761 4275 13431 8709 27394 19911 8583
---

20092 22664 21806 14761 4275 13431 8709 27395 19911 8583

Click to expand...

449c449
< 17749 23068 27147 6689 25466 24802 7244 8462 7406 23700
---

17749 23068 27147 6689 25466 24802 7244 8462 7406 23701

Click to expand...

1155c1155
< 25740 19807 16586 13952 25819 13810 27766 18252 15360 22940
---

25740 19807 16586 13952 25819 13810 27767 18252 15360 22940

Click to expand...

1476c1476
< 21053 29163 28366 15367 21197 22616 17552 26731 20982 22816
---

21053 29163 28366 15367 21197 22616 17553 26731 20982 22816

Click to expand...

1921c1921
< 22473 20864 29287 22011 19527 28185 22095 24022 22117 21598
---

Jorgen Grahn · Jan 11, 2014

Well, it's only about 1100 lines in one file, and about 740 that are not
comment lines. Unfortunately it has inconsistent (and odd) spacing and
indentation. It made my head hurt reading it!

There are signs that it's not written by someone who knows C well --
some odd idioms,

And a unique layout. I've not seen anything like it before. A FORTRAN
influence perhaps?

unnecessary casts, not using prototype declarations,

That last one good way of making code break for 64-bit Linux. If the
compiler believes all function arguments are int, passing a long or
unsigned long will work on x86 because all involved types are 32 bits.
In contrast, on x86_64 int is 32 bits and a long is 64 bits.

using signed types for bit operations... These things mean it's going
to be harder work that one might hope. You'll need to decide if the
pay-off is worth it.

Especially since free, standardized and widely used encryption
software is available.

/Jorgen

glen herrmannsfeldt · Jan 11, 2014

JohnF said:
(snip)

But I toughed it out with
Tylenol long enough to notice one possible source of trouble:
Your pseudo-random number generator produces `float' values.
Even if the internal mechanics of the generator are accurately
reproduced on all systems, the actual values used in subsequent
computations might not be. The C Standard gives implementations
some freedom in floating-point calculation, in particular:
"Except for assignment and cast (which remove all extra
range and precision), the values yielded by operators with
floating operands and values subject to the usual arithmetic
conversions and of floating constants are evaluated to a
format whose range and precision may be greater than
required by the type. [...]" -- 5.2.4.2.2p9

Click to expand...

Pretty much never use floating point when you need exact
predictability. In the documentation for TeX, Knuth notes that
no floating point is used where it affects typesetting decisions
for similar reasons. One small change can effect line splitting
which affects page splitting, and pretty soon the result is
completely different. TeX does use floating point in some
messages that it prints out, though.

(snip)

Well, go ahead and say it. Not only might it happen,
it pretty much does...
(snip)
And that was the beginning of the end for that bug.
I'd originally dismissed permutation problems,
thinking that would mess up entirely or not at all -- once
any random number changes, the seed gets changed, and then
the entire subsequent sequence changes. And that reasoning
was right, and random numbers were being generated identically.
But truncating to get an integer index from a floating 0.0-1.0
was behaving differently -- once in every 5000 cases
in test below.

It used to be that many random number generators were less random
in the low bits, and so the more obvious way of generating an
integer between 0 and N-1 was taking the generated value modulo N.

But this one looks like it shouldn't be so bad, and integer
modulo should be repeatable.

-- glen

Jorgen Grahn · Jan 11, 2014

Especially since free, standardized and widely used encryption
software is available.

I missed the fact that the OP is also the author of the software; I
somehow got the impression that he had downloaded it.

That certainly makes a difference -- I use my own programs instead of
standard ones, too.

/Jorgen

James Dow Allen · Jan 11, 2014

One specific suggestion is to avoid all floating-point.
Use an integer-only PRNG.

James

BartC · Jan 11, 2014

But truncating to get an integer index from a floating 0.0-1.0
was behaving differently -- once in every 5000 cases
in test below.

How did you fix that?

py2exe and 64/32 bit windows	3	Apr 9, 2013
64-bit integers where the implementation supports max 32-bit ints	37	Aug 5, 2013
How to access 32-bit and 64-bit heap space ?	13	May 22, 2009
why 32 bit ubuntu can install 64 bit openjdk?	11	Jul 2, 2013
32 or 64	7	Oct 14, 2012
Perl 32-bit vs 64-bit question	1	Feb 3, 2012
Arithmetic for 32-bit and 64-bit machines	10	Jan 8, 2008
64 bit design	2	Oct 15, 2010

32/64 bit cc differences

Eric Sosman

Mikko Rauhala

glen herrmannsfeldt

Keith Thompson

Ben Bacarisse

glen herrmannsfeldt

Kaz Kylheku

JohnF

JohnF

Werner Wenzel

JohnF

JohnF

ais523

luser- -droog

JohnF

Jorgen Grahn

glen herrmannsfeldt

Jorgen Grahn

James Dow Allen

BartC

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads