overflow problem?

G

Greg

Hi all,
I'm having a bit of problem with a piece of code I've written that I
think comes down to essentially an overflow, and I thought I'd check
with this group :) The code is included below, and I'd just like to say
before all the flames happen that i) I *know* it's extremely poor
programming style, but it is intended to be that hard to read, and ii)
that I don't think anyone really has to try and follow the code to
answer my question.

I'm still ducking though.

So, the problem is this: I'm using one element of my char array, let's
call it mask for clarity, as a bit mask. An ascii char value is then
bitwise anded with this. Am I right in thinking that when I leftshift
10000000 for a char that it rolls over (so to speak) and I get 11111111
back out? The behaviour of the programme seems to confirm this, but I'd
like an expert opinion :)

CODE:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

/* Any sufficiently advanced technology is indistinguishable from magic.

Arthur C. Clarke */

int bzs(char*c){
for(*++c= !(*++c = 1); (log(*c--) / log(2) < 9); *c-- += *c++ &
( *++c << 1 ), *c<<=1 ) ;
return (*++c&1 << (*c- * --c))-1;

}

int main( int argc, char **argv ) {
if( argc == 2 && (argv[0]=malloc(sizeof(char)*(strlen(argv[1])
+5))) ) {
FILE*f,* g;
if( sprintf( argv[0], "%s.bzs", argv[1] ), !(((f =
fopen( argv[1], "r" )) == NULL) || ((g=fopen( argv[0],"w"))==NULL) ))
while( fscanf(f, "%c",argv[0]) != EOF ) fprintf( g, "%d", bzs(argv[0])
);
}
return 0;

}

Incidentally, the code compiles cleanly :)
 
K

Keith Thompson

Greg said:
for(*++c= !(*++c = 1); (log(*c--) / log(2) < 9); *c-- += *c++ &
( *++c << 1 ), *c<<=1 ) ;
[...]

"*c-- += *c++" has undefined behavior; it modifies an object twice
between sequence points.
 
J

James Kuyper

Hi all,
I'm having a bit of problem with a piece of code I've written that I
think comes down to essentially an overflow, and I thought I'd check
with this group :) The code is included below, and I'd just like to say
before all the flames happen that i) I *know* it's extremely poor
programming style, but it is intended to be that hard to read, and ii)
that I don't think anyone really has to try and follow the code to
answer my question.

I'm still ducking though.

So, the problem is this: I'm using one element of my char array, let's
call it mask for clarity, as a bit mask. An ascii char value is then
bitwise anded with this. Am I right in thinking that when I leftshift
10000000 for a char that it rolls over (so to speak) and I get 11111111
back out? The behaviour of the programme seems to confirm this, but I'd
like an expert opinion :)

That program makes extensive use of plain char; plain char can be either
signed or unsigned, that answers are quite different in each case.
I'll assume, as you seem to be, that CHAR_BIT==8.

If char is signed:
If the bit pattern is 10000000, the '1' is presumably the sign bit.
Which number it represents would depend upon which of the three
representations allowed by C is used:
sign and magnitude - negative zero
two's complement: -256
one's complement: -255
On signed integers, if E1 has a non-negative value, and E1 * pow(2,E2)
is representable in the result type, then that is the resulting value;
Otherwise the behavior is undefined. The only representation where
10000000 is non-negative is sign and magnitude, in which case the result
is exactly 0.

Unsigned:
The bits 10000000 represent 128. The value of E1 << E2 is E1*pow(2,E2),
reduced modulo 1 more than the maximum representable value; in this
case, 256. For any value of E2 greater than 0, the result is therefore 0.

Therefore, the only way you could get the binary value 11111111 by
shifting 10000000 by a valid shift count using a conforming
implementation of C would be if the behavior was undefined. If no other
aspect of the program were suspect, that would imply that on your system
char is signed, and uses either a two's complement or a one's complement
representation. Unfortunately, your code has undefined behavior for
several other reasons, so you can't be sure which issue is the one
actually responsible for that value.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

/* Any sufficiently advanced technology is indistinguishable from magic.

Arthur C. Clarke */

int bzs(char*c){
for(*++c= !(*++c = 1); (log(*c--) / log(2) < 9); *c-- += *c++ &
( *++c << 1 ), *c<<=1 ) ;

The first expression in the for statement changes the value of c twice
without an intervening sequence point. The third expression does so
three times. Those both have undefined behavior.

If *c is negative, then log(*c--) has undefined behavior. Since the
value of *c depends upon the input file, if char is signed, this program
does nothing to avoid that possibility.
return (*++c&1 << (*c- * --c))-1;

That expression updates the value of c twice without an intervening
sequence point. More undefined behavior.
}

int main( int argc, char **argv ) {
if( argc == 2 && (argv[0]=malloc(sizeof(char)*(strlen(argv[1])
+5))) ) {

Since your code calls strlen() without a declaration of that function in
scope, if it compiled without diagnostics you must be using C90, where
it will be implicitly declared as returning an 'int'. Since it actually
returns a size_t, that alone is sufficient to cause problems,
particularly if sizeof(size_t) is different from sizeof(int).
FILE*f,* g;
if( sprintf( argv[0], "%s.bzs", argv[1] ), !(((f =
fopen( argv[1], "r" )) == NULL) || ((g=fopen( argv[0],"w"))==NULL) ))
while( fscanf(f, "%c",argv[0]) != EOF ) fprintf( g, "%d", bzs(argv[0])
);
}
return 0;

}

I think I understand what main() is doing; at least I understand it
better than I understand what bzs() was intended to do; but I wouldn't
dare declare main() free of defects.
 
N

Nobody

I'll assume, as you seem to be, that CHAR_BIT==8.

Actually, you appear to be assuming that CHAR_BIT==9 ;)
If char is signed:
If the bit pattern is 10000000, the '1' is presumably the sign bit.
Which number it represents would depend upon which of the three
representations allowed by C is used:
sign and magnitude - negative zero
two's complement: -256
one's complement: -255

The last two should be -128 and -127 respectively for CHAR_BIT==8.
 
J

James Kuyper

Actually, you appear to be assuming that CHAR_BIT==9 ;)


The last two should be -128 and -127 respectively for CHAR_BIT==8.

It seems like almost every post I've made for the past several days has
had at least one silly stupid mistake. I guess I need a vacation.

Someone has been saving up a list of several dozen "undocumented data
outages" for several years, and it suddenly occurred to them that maybe
they should report all of these outages to someone who can actually
investigate why they happened, in time for the start of Collection 6
reprocessing. It's dull boring work, with no one I can delegate any of
it to. The boredom is relieved only by the terror of worrying about
whether one of the outages might turn out to be due to a defect in one
of my programs which will need a last-minute fix.
 
G

Greg

James said:
It seems like almost every post I've made for the past several days has
had at least one silly stupid mistake. I guess I need a vacation.

Someone has been saving up a list of several dozen "undocumented data
outages" for several years, and it suddenly occurred to them that maybe
they should report all of these outages to someone who can actually
investigate why they happened, in time for the start of Collection 6
reprocessing. It's dull boring work, with no one I can delegate any of
it to. The boredom is relieved only by the terror of worrying about
whether one of the outages might turn out to be due to a defect in one
of my programs which will need a last-minute fix.

Thanks to all who responded; that clears up what was going on. As far as
it goes, I know I'm using undefined behaviour and taking advantage of the
way my particular compiler and operating system work together, and I know
this is severely frowned upon: that's why I didn't ask anyone to try and
follow the code through. Nonetheless, had I not known it, I'd have
appreciated having it pointed out to me :)
 
N

Nick Keighley

Thanks to all who responded; that clears up what was going on.  As far as
it goes, I know I'm using undefined behaviour and taking advantage of the
way my particular compiler and operating system work together, and I know
this is severely frowned upon: that's why I didn't ask anyone to try and
follow the code through.  Nonetheless, had I not known it, I'd have
appreciated having it pointed out to me :)

in real world programs some exploitation of undefined behaviour is
probably inevitable. But relying on particular behaviour for *++c= !(*+
+c = 1) just seems pure madness! If you change the compiler the
behaviour may change. If you upgrade the compiler the behaviour may
change. If you change optimisation settings the behaviour may change.
A small change to your program text may radically change the
behaviour. The benefit/cost ratio (what benefit?) just seems too low
to me.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top