Re: A portable code to create a 4-bytes Big Endian twos complement

Discussion in 'C Programming' started by Spiros Bousbouras, Mar 17, 2011.

  1. On Thu, 17 Mar 2011 19:25:40 +0100
    pozz <> wrote:
    > Il 17/03/2011 18:16, pozz ha scritto:
    > > On pag. 179 of "C Unleashed" book (by Heathfield, Kirby et al.), there
    > > is this code for a similar task (ifp is an input stream):
    > > [...]

    >
    > And another thing I couldn't understand on that book, for a similar topic.
    >
    > On pag. 178 it explains how to write and read a two-bytes integer value
    > on a portable data file (writing/reading to a file is a similar task to
    > sending/receiving to/from a network).


    [...]

    > How is possible the book is wrong?


    Books can have mistakes. But in this case I don't know if it has made a
    mistake because I don't know what "two-bytes integer value" means.
    Could you provide a precise definition ?

    > If I want to fix this, how can I do?


    Exactly what do you want to achieve ?
     
    Spiros Bousbouras, Mar 17, 2011
    #1
    1. Advertising

  2. On Thu, 17 Mar 2011 21:03:19 +0100
    pozz <> wrote:
    > Il 17/03/2011 19:43, Spiros Bousbouras ha scritto:
    > > Books can have mistakes. But in this case I don't know if it has made a
    > > mistake because I don't know what "two-bytes integer value" means.
    > > Could you provide a precise definition ?

    >
    > The book was talking about the issue to store an integer value on a file
    > in a portable way.
    > "Suppose we decide that the int as represented in the data file will be
    > two bytes in little-endian order [...]".
    > After the writing (putc) instructions, the author says:
    > "Key point number two is that we're not concerned with how big an int
    > happens to be on this machine; [...]"
    >
    > So I think that the author has presented a code that works on every C
    > implementation (16-, 32-, 64-bits sized int).


    I don't have the book but I don't think that 32 or 64 bits sized int is
    meant to count as 2 bytes. My guess is that when it says 2 bytes it
    means that the whole bit pattern which represents the number is stored
    in 2 bytes.
     
    Spiros Bousbouras, Mar 17, 2011
    #2
    1. Advertising

  3. pozz <> writes:

    > Il 17/03/2011 21:15, Spiros Bousbouras ha scritto:
    >>> So I think that the author has presented a code that works on every C
    >>> implementation (16-, 32-, 64-bits sized int).

    >>
    >> I don't have the book but I don't think that 32 or 64 bits sized int is
    >> meant to count as 2 bytes. My guess is that when it says 2 bytes it
    >> means that the whole bit pattern which represents the number is stored
    >> in 2 bytes.

    >
    > The size of int variable in memory depends on implementation (16, 32
    > or 64 bits or maybe other values), but the value stored in the file is
    > fixed arbitrarily to 2-bytes little-endian.
    > So the author proposes the code:
    > putc(i & 0xff, ofp);
    > putc((i >> 8) & 0xff, ofp);
    > Indeed, i (an int variable) could be 16, 32 or 64 bits, but the 2
    > bytes written to ofp will be always the same (of course, the value of
    > i must be between -32767 and +32767).
    >
    > For example, suppose i contains the value 600. It could be represented
    > in memory as:
    > 0258 (16-bits big-endian)
    > 00000258 (32-bits big-endian)
    > 5802 (16-bits little-endian)
    > 58020000 (32-bits little-endian)
    > With the above two lines of code, I'll write always the same two bytes
    > on the file ofp: 0x58 (the first) and 0x02 (the second).
    >
    > Now I want to write a function that returns the same value in a int
    > variable, starting from 0x58 and 0x02 bytes read from the file. The
    > code should be portable on 16-, 32- and maybe 64-bits int
    > implementations.
    > And the value stored in the file could be negative.
    >
    > The goal of this will be to have a code that writes and reads a
    > *portable data file*, so a file that can be created by an application
    > on a machine and read by another application on another machine.
    > Or a data packet sent from an application running on a machine and
    > received by an application running on a different machine.


    If you can, avoid using signed binary numbers as a portable
    representation but if you can't you can read the two bytes and pack
    them into an unsigned int:

    int a = fgetc(fp);
    int b = fgetc(fp);
    unsigned int ux = (((unsigned)b) << 8) | a;

    (obviously check for EOF and errors in the real code). Then you need to
    convert this to an int. One portable way is like this:

    int x;
    if (ux >= 0x8000u) {
    x = 0xffffu - ux;
    x = -x - 1;
    }
    else x = ux; /* conversion is in range possible */

    This came up here some time ago (July 2010) and Tim Rentsch came up
    with:

    (int)(ux & 32767) - (int)(ux/2 & 16384) - (int)(ux/2 & 16384);

    which has several advantages over my suggestion.

    --
    Ben.
     
    Ben Bacarisse, Mar 18, 2011
    #3
  4. Spiros Bousbouras

    pozz Guest

    On 18 Mar, 03:23, Ben Bacarisse <> wrote:
    > If you can, avoid using signed binary numbers as a portable
    > representation


    Unfortunately I can't avoid it, because the data packet format
    is fixed not by me.


    > but if you can't you can read the two bytes and pack
    > them into an unsigned int:


    So you agree with me that the two reading instructions can't work with
    signed values on 32 bits machines.

    I found the same code on Question 12.42 of comp.lang.c FAQ ("How can I
    write code to conform to these old, binary data file formats?"). There
    the following struct is defined:
    struct mystruct {
    char c;
    long int i32;
    int i16;
    } s;
    and the following code is used to read the 16 bits value:
    s.i16 = getc(fp) << 8;
    s.i16 |= getc(fp);

    If we assume the values stored in the file are unsigned (0-65535), the
    member i16 should had be defined as unsigned int (not signed int),
    otherwise the value 40000 can't be correctly received on 16-bits
    machines.

    If we assume the values stored in the file are signed (-32767 -
    +32767), the code to read them doesn't work for negative values on
    implementations with int size greater than 16 bits (the value -1 is
    written in the file as 0xFF 0xFF, but is read back as 65535 on 32-bits
    platforms).

    In both cases, the code doesn't work and should be fixed.


    >   int a = fgetc(fp);
    >   int b = fgetc(fp);
    >   unsigned int ux = (((unsigned)b) << 8) | a;
    >[...]
    >   int x;
    >   if (ux >= 0x8000u) {
    >       x = 0xffffu - ux;
    >       x = -x - 1;
    >   }
    >   else x = ux; /* conversion is in range possible */


    Ok, it could be a solution. Another solution would be (if I assume the
    presence of int16_t):
    int i = (int16_t)(((unsigned)b) << 8) | a;


    > This came up here some time ago (July 2010) and Tim Rentsch came up
    > with:
    >
    >   (int)(ux & 32767) - (int)(ux/2 & 16384) - (int)(ux/2 & 16384);
    >
    > which has several advantages over my suggestion.


    I found the _very long_ thread. I'll try to understand it alone, but
    it seems difficult for my small knowledge of C language.
     
    pozz, Mar 18, 2011
    #4
  5. pozz <> writes:

    > On 18 Mar, 03:23, Ben Bacarisse <> wrote:
    >> [...] you can read the two bytes and pack
    >> them into an unsigned int:

    >
    > So you agree with me that the two reading instructions can't work with
    > signed values on 32 bits machines.


    I am not exactly sure what you think I am agreeing with. You showed
    code that looked wrong but without context it's hard to know what the
    code was supposed to do. For example, this:

    > I found the same code on Question 12.42 of comp.lang.c FAQ ("How can I
    > write code to conform to these old, binary data file formats?"). There
    > the following struct is defined:
    > struct mystruct {
    > char c;
    > long int i32;
    > int i16;
    > } s;
    > and the following code is used to read the 16 bits value:
    > s.i16 = getc(fp) << 8;
    > s.i16 |= getc(fp);
    >
    > If we assume the values stored in the file are unsigned (0-65535), the
    > member i16 should had be defined as unsigned int (not signed int),
    > otherwise the value 40000 can't be correctly received on 16-bits
    > machines.


    is not as bad as you think since the context makes it clear that the
    purpose is to read signed 16-bit ints. 40000 is not an option.

    > If we assume the values stored in the file are signed (-32767 -
    > +32767), the code to read them doesn't work for negative values on
    > implementations with int size greater than 16 bits (the value -1 is
    > written in the file as 0xFF 0xFF, but is read back as 65535 on 32-bits
    > platforms).


    That's not a good assumption. The context is clear: ints are 16 bits.
    I agree that it could be made more explicit, and it should certainly
    mention the reliance on an implementation-defined conversion, but the
    code is not intended to work with all int sizes. (I think the FAQ dates
    from before C99 so there is no possibility of a signal being raised.)

    > In both cases, the code doesn't work and should be fixed.


    Have you told the people concerned?

    >>   int a = fgetc(fp);
    >>   int b = fgetc(fp);
    >>   unsigned int ux = (((unsigned)b) << 8) | a;
    >>[...]
    >>   int x;
    >>   if (ux >= 0x8000u) {
    >>       x = 0xffffu - ux;
    >>       x = -x - 1;
    >>   }
    >>   else x = ux; /* conversion is in range possible */

    >
    > Ok, it could be a solution. Another solution would be (if I assume the
    > presence of int16_t):
    > int i = (int16_t)(((unsigned)b) << 8) | a;


    No, that does not address the question -- the implementation-defined
    conversion when an unsigned int value it out of range for int. You may
    be happy with assuming that it works as you expect, but you seemed to
    want a 100% portable solution.

    >> This came up here some time ago (July 2010) and Tim Rentsch came up
    >> with:
    >>
    >>   (int)(ux & 32767) - (int)(ux/2 & 16384) - (int)(ux/2 & 16384);
    >>
    >> which has several advantages over my suggestion.

    >
    > I found the _very long_ thread. I'll try to understand it alone, but
    > it seems difficult for my small knowledge of C language.


    If I recall, after solutions were posted a lot of the thread was about
    readability and the ability of compilers to optimise the resulting code.

    --
    Ben.
     
    Ben Bacarisse, Mar 18, 2011
    #5
  6. Spiros Bousbouras

    pozz Guest

    On 18 Mar, 13:43, Ben Bacarisse <> wrote:
    > > I found the same code on Question 12.42 of comp.lang.c FAQ ("How can I
    > > write code to conform to these old, binary data file formats?"). There
    > > the following struct is defined:
    > >   struct mystruct {
    > >     char c;
    > >     long int i32;
    > >     int i16;
    > >   } s;
    > > and the following code is used to read the 16 bits value:
    > >   s.i16 = getc(fp) << 8;
    > >   s.i16 |= getc(fp);

    >
    > > If we assume the values stored in the file are unsigned (0-65535), the
    > > member i16 should had be defined as unsigned int (not signed int),
    > > otherwise the value 40000 can't be correctly received on 16-bits
    > > machines.

    >
    > is not as bad as you think since the context makes it clear that the
    > purpose is to read signed 16-bit ints.  40000 is not an option.


    Ok, for me it wasn't so clear...


    > > If we assume the values stored in the file are signed (-32767 -
    > > +32767), the code to read them doesn't work for negative values on
    > > implementations with int size greater than 16 bits (the value -1 is
    > > written in the file as 0xFF 0xFF, but is read back as 65535 on 32-bits
    > > platforms).

    >
    > That's not a good assumption.  The context is clear: ints are 16 bits.


    Ok, even in this case I made the assumption the code worked also for
    32 bits machines.

    Anyway you agree with me when I say that the code in the FAQ doesn't
    work for 32 bits integers, don't you?


    > > Ok, it could be a solution. Another solution would be (if I assume the
    > > presence of int16_t):
    > >   int i = (int16_t)(((unsigned)b) << 8) | a;

    >
    > No, that does not address the question -- the implementation-defined
    > conversion when an unsigned int value it out of range for int.  You may
    > be happy with assuming that it works as you expect, but you seemed to
    > want a 100% portable solution.


    You are right. Now I understand that unsigned->int conversion is
    bad :)
     
    pozz, Mar 18, 2011
    #6
  7. pozz <> writes:

    > On 18 Mar, 13:43, Ben Bacarisse <> wrote:
    >> > I found the same code on Question 12.42 of comp.lang.c FAQ ("How can I
    >> > write code to conform to these old, binary data file formats?"). There
    >> > the following struct is defined:
    >> >   struct mystruct {
    >> >     char c;
    >> >     long int i32;
    >> >     int i16;
    >> >   } s;
    >> > and the following code is used to read the 16 bits value:
    >> >   s.i16 = getc(fp) << 8;
    >> >   s.i16 |= getc(fp);

    <snip>
    >> > If we assume the values stored in the file are signed (-32767 -
    >> > +32767), the code to read them doesn't work for negative values on
    >> > implementations with int size greater than 16 bits (the value -1 is
    >> > written in the file as 0xFF 0xFF, but is read back as 65535 on 32-bits
    >> > platforms).

    >>
    >> That's not a good assumption.  The context is clear: ints are 16 bits.

    >
    > Ok, even in this case I made the assumption the code worked also for
    > 32 bits machines.
    >
    > Anyway you agree with me when I say that the code in the FAQ doesn't
    > work for 32 bits integers, don't you?


    Yes, nor for 18-bit ints or 64 bit ones. The code has a purpose and it
    does not do anything else. If your point is "I'd rather the FAQ had an
    example of reading a 16-bit signed 2's complement integer into an int of
    any permitted size" then I agree that might be more interesting but i
    don't think it makes the current code wrong.

    <snip>
    --
    Ben.
     
    Ben Bacarisse, Mar 18, 2011
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. redstripe

    twos complement data

    redstripe, Apr 21, 2006, in forum: VHDL
    Replies:
    3
    Views:
    2,483
  2. Replies:
    5
    Views:
    398
    Stephen Sprunk
    Aug 31, 2006
  3. James Kuyper
    Replies:
    3
    Views:
    366
    James Kuyper
    Mar 19, 2011
  4. Jorgen Grahn
    Replies:
    2
    Views:
    383
  5. Tim Rentsch
    Replies:
    0
    Views:
    480
    Tim Rentsch
    Mar 19, 2011
Loading...

Share This Page