bit-twiddling on 32-bit machines

S

Sal

I recently implemented a C hashing algorithm in Perl and wasted about
8 hours of my time trying to debug it because of a stupid assumption
on my part. When doing a lot of shifts and bit-wise xor's it's easy
for 32-bit integers to overflow into double precision floating point
values. I recognized this early on and thought I was capturing only
the lowest 32 bits with ($x &= 0xffffffff). That simply does not work!
The way I got around it was with ($x %= 4294967296). In hindsight it's
easy for me now to recognize why the first not only doesn't work but
doesn't even make sense with the mixed number types. I hope this helps
someone else. Happy bit-twiddling!
 
S

Sal

perldoc -f integer

Ben

Tried it, didn't work with the integer pragma. The only way I could
get it to work properly was to allow an integer promotion to a C
double and then do the modulo arithmetic to capture the least-
significant 32 bits.
 
I

Ilya Zakharevich

Depending on what you're doing, promotion to double *should* safely give
you 53 bits of integer precision, but you may want to consider
rebuilding your perl with -Duse64bitint to avoid type conversion all
over the place. The standard FreeBSD build has been built with this
option for a while now, and there is some talk of using it for the
Debian builds of 5.12.

Having int wider than float type is a VERY large can of worms (well,
at least it was the last time I was looking; did things change
recently?). What is reported above (which is, essentially, a bug of
the implementation of `use integer') is a very small can of worms.

Does not look like a fair trade...
Ilya
 
I

Ilya Zakharevich

That's all been fixed, I think by Nick Clark during 5.8. A whole lot of
code in {pp,sv}.c is now conditional on NV_PRESERVES_UV, which is
determined by Configure.

Looks like I might have been not very specific. The can of worms I
meant is with the semantic, not implementation - things "not behaving
as expected" vs things behaving "obviously wrong".

This thread is one example of the "first category" - if one forgets
what happens with `use integer', which is obviously a bug (BTW, is it
reported yet?).

Yours,
Ilya
 
P

Peter J. Holzer

^^ .........^^
This is redundant. There is only unsigned and int.

Wrong. "integer" is not the same as "int". The C standard knows five
standard signed integer types and six standard unsigned integer types.
And "int" and "unsigned" are just abbreviations of "signed int" and
"unsigned int", respectively.

hp
 
I

Ilya Zakharevich

Can you give an example of things not behaving as expected?

I was presuming that such things happens left and right, so I did not
try to remember the problems people were reporting. But I may try to guess...

And: I do not have Perl with 64-bit IV AND 64-bit NV around, so all
this is just a conjecture: I expect the the following things to break:

rounding by int($n + 0.t);
$x + 0.5 >= $x

etc... (These are just the first things which come to my mind; real
problems are usually less "obvious"...)
Not by me. With a perl with 32bit IVs I get

~% perl -E'
my $x = (1<<30); say $x;
$x <<= 1; say $x;
$x <<= 1; say $x;'
1073741824
-2147483648
0

I presume it's the final '0' you consider a bug, not the negative value?

For bitwise operations, the negative value "behaves the same" as the
positive one, right? (At least it WAS the reason why I designed the
Perl "numeric system" the way it is...) Then I have no problem with
the negative number. And I definitely have no problem with 0.

I was under assumption that the OP's problem was that shifts were
propagating to floats, and the results (above UV_MAX) converted to
UVs were becoming UV_MAX... But maybe I misread what he wrote...

Yours,
Ilya
 
S

sln

Wrong. "integer" is not the same as "int". The C standard knows five
standard signed integer types and six standard unsigned integer types.
And "int" and "unsigned" are just abbreviations of "signed int" and
"unsigned int", respectively.

hp

Technically there are no "integer types", they are integral types.
"int" is referred to as the machine word (the natural width of the
data registers) which provide for a bit width in the range of
char <= short <= int <= long. All being unsigned.
"signed" is the shortcut here, not unsigned.

"Integers" are defined by the set of whole numbers {..,-2,-1,0,1,2,..}.
I sure would hate to look at C source code that had
"signed" in it, although some compilers have the option to
convert the default to unsigned (/J on MS), except where
"signed" is denoted.

So, we could agree that C program integral types are either
unsigned, or the default, which is signed (the synonym for the default),
and that signed int is redundant (in the default case), and is would be
more specific as a 32-bit long.

-sln
 
P

Peter J. Holzer

I was presuming that such things happens left and right, so I did not
try to remember the problems people were reporting. But I may try to guess...

And: I do not have Perl with 64-bit IV AND 64-bit NV around, so all
this is just a conjecture: I expect the the following things to break:

rounding by int($n + 0.t);

what is $n here (conceptually)? An integer? Then it doesn't make sense
to round it. A floating point value? Then adding 0.5 doesn't change the
type and whether IVs are 32 bit or 64 bit doesn't matter.
$x + 0.5 >= $x

Again, this is already "broken" (i.e., a false assumption) on large
numbers, making IVs 64 bit doesn't change that.

A better example would be where something like

$large_even_integer/2

where the result is exact in integer arithmetic but must be rounded in
FP arithmetic. However, perl (at least since 5.8.8) does the right thing
here: The result is an IV with the exact result. (how does it do this?)

hp
 
S

sln

Please read the standard.
Integer is what it is, a single whole number from the set of all whole numbers.
There is no type of integer. There are subsets, they are called integral types
in computing science.

If the phrase is loosely used, thats a shame. C/C++ standard and all references to
it use integral types as classifier of the subsets.

-sln
 
I

Ilya Zakharevich

what is $n here (conceptually)?

A number.
An integer? Then it doesn't make sense to round it.

Why? I can round 3 without any problem here...
A floating point value? Then adding 0.5 doesn't change the
type and whether IVs are 32 bit or 64 bit doesn't matter.

perldoc perlnumber
Again, this is already "broken" (i.e., a false assumption) on large
numbers, making IVs 64 bit doesn't change that.

??? Are you sure you know what you are talking about?

Yours,
Ilya
 
P

Peter J. Holzer

A number.

Not specific enough.

Why? I can round 3 without any problem here...

Because it is already an integer. Rounding it is useless since it is
guarantueed to give you the same value (unless you want to round to the
nearest integer representable as an FP value, then the expression does
what you want).


perldoc perlnumber

Please be more specific.

??? Are you sure you know what you are talking about?

Sorry, I misread the ">=" as ">".

But from what I see, ($x + 0.5 >= $x) actually seems to be true for all
numbers, because the comparison is done on the NVs.

That, however, does lead to a result which may surprise the unwary:

#!/usr/bin/perl
use warnings;
use strict;
no warnings 'portable';

use Devel::peek;

my $x1 = 18014398509481728.000000; # exact FP value
my $x2 = 18014398509481729; # exact 64 bit int value
if ($x1 >= $x2) {
print "surprise!\n";
}

hp
 
I

Ilya Zakharevich

Not specific enough.

Specific enough for me. What part of

perldoc perlnumber

do you not agree with?
Because it is already an integer. Rounding it is useless since it is
guarantueed to give you the same value (unless you want to round to the
nearest integer representable as an FP value, then the expression does
what you want).

I do not see how a correct answer may be deemed "useless". Do you
want to allow sqrt(1) to be 75 since "it does not make sense to apply
sqrt() to 1"?
But from what I see, ($x + 0.5 >= $x) actually seems to be true for all
numbers, because the comparison is done on the NVs.

So one moved dust to be under a different carpet. This comparison
"looks" true, but only due to another problem...
my $x1 = 18014398509481728.000000; # exact FP value
my $x2 = 18014398509481729; # exact 64 bit int value
if ($x1 >= $x2) {
print "surprise!\n";
}

Yes, this is what I meant by "looking true..."...

Yours,
Ilya
 
P

Peter J. Holzer

Specific enough for me.

Not for me. A number in a program isn't just a number. It represents
something. To determine which operations on that number make sense you
need to know what it represents, how accurate that represention is (and
needs to be), what properties it has. Then you can say whether a given
representation in a progrem (e.g. as a 64-bit FP number) is adequate and
whether computing int($n + 0.5) makes sense or not.
What part of

perldoc perlnumber

do you not agree with?

Who says I don't agree with it? You were vaguely referring to it in
response to a very specific claim:

I don't know why you referred to it, or which part of perldoc perlnumber
is supposed to proof that my claim is wrong, and I won't guess. Either
you refer to the specific part of perldoc perlnumber which is applicable
here or you let it rest.

I do not see how a correct answer may be deemed "useless".

Not the answer is useless, but the operation. If I have a variable $x in
my program which at some point contains an integral value, why would I
write

$y = int($x + 0.5);

instead of simply

$y = $x;

at this point? I could just as well write

$y = $x * 1;

or

$y = $x + 0;

or

$y = $x + 1E300 - 1E300;

which are similarly useless (unless executed specifically for the
side-effects caused by the representation).

Do you want to allow sqrt(1) to be 75 since "it does not make sense to
apply sqrt() to 1"?

No, but I do want to allow sin(1E100) to return an (more or less) random
number between -1 and +1, because ε(1E100) ≫ 2π. The input to the
function is useless, so I must expect the output to be useless, too.
Similarly ($n + 0.5) is only useful if the result can actually be
represented in an FP number with adequate precision. Perl doesn't save
you from thinking about stuff like that. Actually, you have to think
more than in statically typed languages: A typeless language gives you
more freedom but demands more discipline.

hp
 
I

Ilya Zakharevich

my $x1 = 18014398509481728.000000; # exact FP value
my $x2 = 18014398509481729; # exact 64 bit int value
if ($x1 >= $x2) {
print "surprise!\n";
}

Thinking about it more, it is not a "surprise", but just a plain bug.

It is trivial to write down a macro ncmp_IV_NV(iv,nv) which would do
the "correct thing" in both cases (powerful- and braindamaged-NVs).
So the fact that Perl does not use such a construct is a bug.

Hope this helps,
Ilya
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top