overflow problem?

Discussion in 'C Programming' started by Greg, Nov 8, 2011.

  1. Greg

    Greg Guest

    Hi all,
    I'm having a bit of problem with a piece of code I've written that I
    think comes down to essentially an overflow, and I thought I'd check
    with this group :) The code is included below, and I'd just like to say
    before all the flames happen that i) I *know* it's extremely poor
    programming style, but it is intended to be that hard to read, and ii)
    that I don't think anyone really has to try and follow the code to
    answer my question.

    I'm still ducking though.

    So, the problem is this: I'm using one element of my char array, let's
    call it mask for clarity, as a bit mask. An ascii char value is then
    bitwise anded with this. Am I right in thinking that when I leftshift
    10000000 for a char that it rolls over (so to speak) and I get 11111111
    back out? The behaviour of the programme seems to confirm this, but I'd
    like an expert opinion :)

    CODE:

    #include <stdio.h>
    #include <stdlib.h>
    #include <math.h>

    /* Any sufficiently advanced technology is indistinguishable from magic.

    Arthur C. Clarke */

    int bzs(char*c){
    for(*++c= !(*++c = 1); (log(*c--) / log(2) < 9); *c-- += *c++ &
    ( *++c << 1 ), *c<<=1 ) ;
    return (*++c&1 << (*c- * --c))-1;

    }

    int main( int argc, char **argv ) {
    if( argc == 2 && (argv[0]=malloc(sizeof(char)*(strlen(argv[1])
    +5))) ) {
    FILE*f,* g;
    if( sprintf( argv[0], "%s.bzs", argv[1] ), !(((f =
    fopen( argv[1], "r" )) == NULL) || ((g=fopen( argv[0],"w"))==NULL) ))
    while( fscanf(f, "%c",argv[0]) != EOF ) fprintf( g, "%d", bzs(argv[0])
    );
    }
    return 0;

    }

    Incidentally, the code compiles cleanly :)
     
    Greg, Nov 8, 2011
    #1
    1. Advertising

  2. Greg <> writes:
    [...]
    > for(*++c= !(*++c = 1); (log(*c--) / log(2) < 9); *c-- += *c++ &
    > ( *++c << 1 ), *c<<=1 ) ;

    [...]

    "*c-- += *c++" has undefined behavior; it modifies an object twice
    between sequence points.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Nov 8, 2011
    #2
    1. Advertising

  3. Greg

    James Kuyper Guest

    On 11/08/2011 05:16 PM, Greg wrote:
    > Hi all,
    > I'm having a bit of problem with a piece of code I've written that I
    > think comes down to essentially an overflow, and I thought I'd check
    > with this group :) The code is included below, and I'd just like to say
    > before all the flames happen that i) I *know* it's extremely poor
    > programming style, but it is intended to be that hard to read, and ii)
    > that I don't think anyone really has to try and follow the code to
    > answer my question.
    >
    > I'm still ducking though.
    >
    > So, the problem is this: I'm using one element of my char array, let's
    > call it mask for clarity, as a bit mask. An ascii char value is then
    > bitwise anded with this. Am I right in thinking that when I leftshift
    > 10000000 for a char that it rolls over (so to speak) and I get 11111111
    > back out? The behaviour of the programme seems to confirm this, but I'd
    > like an expert opinion :)


    That program makes extensive use of plain char; plain char can be either
    signed or unsigned, that answers are quite different in each case.
    I'll assume, as you seem to be, that CHAR_BIT==8.

    If char is signed:
    If the bit pattern is 10000000, the '1' is presumably the sign bit.
    Which number it represents would depend upon which of the three
    representations allowed by C is used:
    sign and magnitude - negative zero
    two's complement: -256
    one's complement: -255
    On signed integers, if E1 has a non-negative value, and E1 * pow(2,E2)
    is representable in the result type, then that is the resulting value;
    Otherwise the behavior is undefined. The only representation where
    10000000 is non-negative is sign and magnitude, in which case the result
    is exactly 0.

    Unsigned:
    The bits 10000000 represent 128. The value of E1 << E2 is E1*pow(2,E2),
    reduced modulo 1 more than the maximum representable value; in this
    case, 256. For any value of E2 greater than 0, the result is therefore 0.

    Therefore, the only way you could get the binary value 11111111 by
    shifting 10000000 by a valid shift count using a conforming
    implementation of C would be if the behavior was undefined. If no other
    aspect of the program were suspect, that would imply that on your system
    char is signed, and uses either a two's complement or a one's complement
    representation. Unfortunately, your code has undefined behavior for
    several other reasons, so you can't be sure which issue is the one
    actually responsible for that value.

    > #include <stdio.h>
    > #include <stdlib.h>
    > #include <math.h>
    >
    > /* Any sufficiently advanced technology is indistinguishable from magic.
    >
    > Arthur C. Clarke */
    >
    > int bzs(char*c){
    > for(*++c= !(*++c = 1); (log(*c--) / log(2) < 9); *c-- += *c++ &
    > ( *++c << 1 ), *c<<=1 ) ;


    The first expression in the for statement changes the value of c twice
    without an intervening sequence point. The third expression does so
    three times. Those both have undefined behavior.

    If *c is negative, then log(*c--) has undefined behavior. Since the
    value of *c depends upon the input file, if char is signed, this program
    does nothing to avoid that possibility.

    > return (*++c&1 << (*c- * --c))-1;


    That expression updates the value of c twice without an intervening
    sequence point. More undefined behavior.

    > }
    >
    > int main( int argc, char **argv ) {
    > if( argc == 2 && (argv[0]=malloc(sizeof(char)*(strlen(argv[1])
    > +5))) ) {


    Since your code calls strlen() without a declaration of that function in
    scope, if it compiled without diagnostics you must be using C90, where
    it will be implicitly declared as returning an 'int'. Since it actually
    returns a size_t, that alone is sufficient to cause problems,
    particularly if sizeof(size_t) is different from sizeof(int).

    > FILE*f,* g;
    > if( sprintf( argv[0], "%s.bzs", argv[1] ), !(((f =
    > fopen( argv[1], "r" )) == NULL) || ((g=fopen( argv[0],"w"))==NULL) ))
    > while( fscanf(f, "%c",argv[0]) != EOF ) fprintf( g, "%d", bzs(argv[0])
    > );
    > }
    > return 0;
    >
    > }


    I think I understand what main() is doing; at least I understand it
    better than I understand what bzs() was intended to do; but I wouldn't
    dare declare main() free of defects.
     
    James Kuyper, Nov 8, 2011
    #3
  4. Greg

    Nobody Guest

    On Tue, 08 Nov 2011 18:26:46 -0500, James Kuyper wrote:

    > I'll assume, as you seem to be, that CHAR_BIT==8.


    Actually, you appear to be assuming that CHAR_BIT==9 ;)

    > If char is signed:
    > If the bit pattern is 10000000, the '1' is presumably the sign bit.
    > Which number it represents would depend upon which of the three
    > representations allowed by C is used:
    > sign and magnitude - negative zero
    > two's complement: -256
    > one's complement: -255


    The last two should be -128 and -127 respectively for CHAR_BIT==8.
     
    Nobody, Nov 9, 2011
    #4
  5. Greg

    James Kuyper Guest

    On 11/09/2011 04:05 AM, Nobody wrote:
    > On Tue, 08 Nov 2011 18:26:46 -0500, James Kuyper wrote:
    >
    >> I'll assume, as you seem to be, that CHAR_BIT==8.

    >
    > Actually, you appear to be assuming that CHAR_BIT==9 ;)
    >
    >> If char is signed:
    >> If the bit pattern is 10000000, the '1' is presumably the sign bit.
    >> Which number it represents would depend upon which of the three
    >> representations allowed by C is used:
    >> sign and magnitude - negative zero
    >> two's complement: -256
    >> one's complement: -255

    >
    > The last two should be -128 and -127 respectively for CHAR_BIT==8.


    It seems like almost every post I've made for the past several days has
    had at least one silly stupid mistake. I guess I need a vacation.

    Someone has been saving up a list of several dozen "undocumented data
    outages" for several years, and it suddenly occurred to them that maybe
    they should report all of these outages to someone who can actually
    investigate why they happened, in time for the start of Collection 6
    reprocessing. It's dull boring work, with no one I can delegate any of
    it to. The boredom is relieved only by the terror of worrying about
    whether one of the outages might turn out to be due to a defect in one
    of my programs which will need a last-minute fix.
    --
    James Kuyper
     
    James Kuyper, Nov 9, 2011
    #5
  6. Greg

    Greg Guest

    James Kuyper writes:

    > On 11/09/2011 04:05 AM, Nobody wrote:
    >> On Tue, 08 Nov 2011 18:26:46 -0500, James Kuyper wrote:
    >>
    >>> I'll assume, as you seem to be, that CHAR_BIT==8.

    >>
    >> Actually, you appear to be assuming that CHAR_BIT==9 ;)
    >>
    >>> If char is signed:
    >>> If the bit pattern is 10000000, the '1' is presumably the sign bit.
    >>> Which number it represents would depend upon which of the three
    >>> representations allowed by C is used:
    >>> sign and magnitude - negative zero
    >>> two's complement: -256
    >>> one's complement: -255

    >>
    >> The last two should be -128 and -127 respectively for CHAR_BIT==8.

    >
    > It seems like almost every post I've made for the past several days has
    > had at least one silly stupid mistake. I guess I need a vacation.
    >
    > Someone has been saving up a list of several dozen "undocumented data
    > outages" for several years, and it suddenly occurred to them that maybe
    > they should report all of these outages to someone who can actually
    > investigate why they happened, in time for the start of Collection 6
    > reprocessing. It's dull boring work, with no one I can delegate any of
    > it to. The boredom is relieved only by the terror of worrying about
    > whether one of the outages might turn out to be due to a defect in one
    > of my programs which will need a last-minute fix.


    Thanks to all who responded; that clears up what was going on. As far as
    it goes, I know I'm using undefined behaviour and taking advantage of the
    way my particular compiler and operating system work together, and I know
    this is severely frowned upon: that's why I didn't ask anyone to try and
    follow the code through. Nonetheless, had I not known it, I'd have
    appreciated having it pointed out to me :)
     
    Greg, Nov 9, 2011
    #6
  7. On Nov 9, 8:01 pm, Greg <> wrote:

    <snip>

    > Thanks to all who responded; that clears up what was going on.  As far as
    > it goes, I know I'm using undefined behaviour and taking advantage of the
    > way my particular compiler and operating system work together, and I know
    > this is severely frowned upon: that's why I didn't ask anyone to try and
    > follow the code through.  Nonetheless, had I not known it, I'd have
    > appreciated having it pointed out to me :)


    in real world programs some exploitation of undefined behaviour is
    probably inevitable. But relying on particular behaviour for *++c= !(*+
    +c = 1) just seems pure madness! If you change the compiler the
    behaviour may change. If you upgrade the compiler the behaviour may
    change. If you change optimisation settings the behaviour may change.
    A small change to your program text may radically change the
    behaviour. The benefit/cost ratio (what benefit?) just seems too low
    to me.
     
    Nick Keighley, Nov 10, 2011
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Simon Devlin
    Replies:
    6
    Views:
    2,964
    Simon Devlin
    Jan 6, 2004
  2. =?Utf-8?B?SmFzb24gQ2h1?=

    Panel in Table with overflow problem with firefox

    =?Utf-8?B?SmFzb24gQ2h1?=, Oct 14, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    3,523
    =?Utf-8?B?SmFzb24gQ2h1?=
    Oct 14, 2005
  3. Peter
    Replies:
    1
    Views:
    5,740
    brucie
    Dec 15, 2003
  4. Marcus
    Replies:
    1
    Views:
    8,711
    Adrienne
    Jun 8, 2005
  5. Vivi
    Replies:
    2
    Views:
    6,029
    ES Kim
    Aug 25, 2003
Loading...

Share This Page