One for the language lawyers

Discussion in 'C Programming' started by Kenny McCormack, Jun 9, 2008.

  1. Here is a commonly used technique, that will, of course, work fine on
    any reasonably modern, normal hardware. But, does it pass the CLC test?

    /* Assume well-formed input - of course, you can always break it by
    * feeding it bad input */

    struct foo { int field1, field2; char nl; } *bar;
    char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

    int main(void) {
    bar = (struct foo *) buffer;
    fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
    /* Now access the members of the struct (using, e.g., bar -> field1).
    * Note that no actual struct was ever declared - we are using
    * buffer as if it were the struct */
    }
     
    Kenny McCormack, Jun 9, 2008
    #1
    1. Advertising

  2. On Mon, 09 Jun 2008 17:08:20 +0000, Kenny McCormack wrote:
    > Here is a commonly used technique,


    It is? Where have you seen it used?

    > that will, of course, work fine on
    > any reasonably modern, normal hardware. But, does it pass the CLC test?


    No.

    > /* Assume well-formed input - of course, you can always break it by
    > * feeding it bad input */
    >
    > struct foo { int field1, field2; char nl; } *bar;


    What's the nl member for?

    > char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];
    >
    > int main(void) {
    > bar = (struct foo *) buffer;


    This assumes that buffer is appropriately aligned for a struct foo. When
    you access *bar, you also ignore C's aliasing rules. Both problems can be
    avoided by using a union.

    > fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);


    Did you mean fread, or were you really asking about fgets? If you meant
    fread, I don't see the point of a nl member at all. If you meant fgets, I
    don't see the point of a nl member at the very end.

    > /* Now access the members of the struct (using, e.g., bar -> field1).
    > * Note that no actual struct was ever declared - we are using
    > * buffer as if it were the struct */
    > }
     
    Harald van Dijk, Jun 9, 2008
    #2
    1. Advertising

  3. In article <g2jo24$ilh$>,
    Kenny McCormack <> wrote:
    >Here is a commonly used technique, that will, of course, work fine on
    >any reasonably modern, normal hardware. But, does it pass the CLC test?


    >/* Assume well-formed input - of course, you can always break it by
    > * feeding it bad input */


    >struct foo { int field1, field2; char nl; } *bar;
    >char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];


    >int main(void) {
    > bar = (struct foo *) buffer;
    > fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
    > /* Now access the members of the struct (using, e.g., bar -> field1).
    > * Note that no actual struct was ever declared - we are using
    > * buffer as if it were the struct */
    > }


    There may be unnamed padding between struct members for any reason,
    so unless the data being read from stdin via fgets was written
    with exactly the same compiler version on exactly the same target,
    the code is not certain to work.

    Some of the compilers I use *do* put unnamed padding in places
    where it is not obvious to do so, in order to achive better caching
    performance.


    --
    "Any sufficiently advanced bug is indistinguishable from a feature."
    -- Rich Kulawiec
     
    Walter Roberson, Jun 9, 2008
    #3
  4. Kenny McCormack <> wrote:
    > Here is a commonly used technique, that will, of course, work fine on
    > any reasonably modern, normal hardware. But, does it pass the CLC test?


    > /* Assume well-formed input - of course, you can always break it by
    > * feeding it bad input */


    > struct foo { int field1, field2; char nl; } *bar;
    > char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];


    > int main(void) {
    > bar = (struct foo *) buffer;
    > fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
    > /* Now access the members of the struct (using, e.g., bar -> field1).
    > * Note that no actual struct was ever declared - we are using
    > * buffer as if it were the struct */
    > }


    As long as sizeof(struct foo) isn't smaller than
    SOMENUMBERWHATEVERFLOATSYOURBOAT then there's no problem.
    It's rather obfuscated and I dare to doubt that this is
    a "commonly used technique", but 'buffer' is memory
    you own so you can do with it whatever you want. Of
    course, all hinges on your primary assuption that the
    input is well-formed (it may be difficult to make it
    non-well-formed for the types of members the structure
    has on main-stream hardware, but there might be some
    systems where certain bit-patterns don't represent ints
    and thus you may run into danger of undefined behaviour).
    So figuring out what's well-formed can be a bit of a
    bother but as long as you do that there's no problem.

    Regards, Jens
    --
    \ Jens Thoms Toerring ___
    \__________________________ http://toerring.de
     
    Jens Thoms Toerring, Jun 9, 2008
    #4
  5. Kenny McCormack writes:
    > Here is a commonly used technique, (...)


    I hope not.

    > struct foo { int field1, field2; char nl; } *bar;
    > char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];
    >
    > int main(void) {
    > bar = (struct foo *) buffer;
    > fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
    > /* Now access the members of the struct (using, e.g., bar -> field1).


    This breaks e.g. if there is a 0x10 byte (newline) in the integer
    representation of the would-be bar->field1 value. And as Harald
    said, it breaks if buffer is not properly aligned for a struct foo.

    Also when I see fgets() I suspect the file has been opened in text
    instead of binary mode, which means there may be bugs from converting
    between newline and the file system's representation of end-of-line.

    --
    Hallvard
     
    Hallvard B Furuseth, Jun 9, 2008
    #5
  6. Kenny McCormack

    Chris Torek Guest

    >Kenny McCormack <> wrote:
    >> Here is a commonly used technique, that will, of course, work fine on
    >> any reasonably modern, normal hardware. But, does it pass the CLC test?

    >
    >> /* Assume well-formed input - of course, you can always break it by
    >> * feeding it bad input */
    >> struct foo { int field1, field2; char nl; } *bar;
    >> char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

    >
    >> int main(void) {
    >> bar = (struct foo *) buffer;
    >> fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
    >> /* Now access the members of the struct (using, e.g., bar -> field1).
    >> * Note that no actual struct was ever declared - we are using
    >> * buffer as if it were the struct */
    >> }


    In article <-berlin.de>,
    Jens Thoms Toerring <> wrote:
    >As long as sizeof(struct foo) isn't smaller than
    >SOMENUMBERWHATEVERFLOATSYOURBOAT then there's no problem.


    When I first built the 4.xBSD system for the SPARC, tftp broke,
    precisely because it used this kind of trick. (In tftp's case,
    it was a more complex variant of the "struct hack".)

    >It's rather obfuscated and I dare to doubt that this is
    >a "commonly used technique", but 'buffer' is memory
    >you own so you can do with it whatever you want. Of
    >course, all hinges on your primary assuption that the
    >input is well-formed ...


    More importantly, it depends on the variable "buffer" being
    properly aligned for all member accesses.

    This was not true on the SPARC, where the compiler put the
    big buffer on an odd byte boundary.

    As a quick fix, I wrapped the buffer up into a union, which
    forced gcc to align the entire thing on an appropriate boundary.

    The trick also works if you use malloc() to obtain the buffer.

    In any case, it is not a very good idea to write the code this way,
    because it places such strong constraints on what constitutes "well
    formed" input. You need to make sure that these severe restrictions
    on whatever uses the code are paid-for by whatever benefit you are
    getting from this "commonly used technique" (which, in my experience,
    was used perhaps once in the entire 4.xBSD code base -- that seems
    to argue against the claim that it is "commonly used").
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: gmail (figure it out) http://web.torek.net/torek/index.html
     
    Chris Torek, Jun 9, 2008
    #6
  7. Chris Torek <> wrote:
    > >Kenny McCormack <> wrote:
    > >> Here is a commonly used technique, that will, of course, work fine on
    > >> any reasonably modern, normal hardware. But, does it pass the CLC test?

    > >
    > >> /* Assume well-formed input - of course, you can always break it by
    > >> * feeding it bad input */
    > >> struct foo { int field1, field2; char nl; } *bar;
    > >> char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

    > >
    > >> int main(void) {
    > >> bar = (struct foo *) buffer;
    > >> fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
    > >> /* Now access the members of the struct (using, e.g., bar -> field1).
    > >> * Note that no actual struct was ever declared - we are using
    > >> * buffer as if it were the struct */
    > >> }


    > In article <-berlin.de>,
    > Jens Thoms Toerring <> wrote:
    > >As long as sizeof(struct foo) isn't smaller than
    > >SOMENUMBERWHATEVERFLOATSYOURBOAT then there's no problem.


    > When I first built the 4.xBSD system for the SPARC, tftp broke,
    > precisely because it used this kind of trick. (In tftp's case,
    > it was a more complex variant of the "struct hack".)


    > >It's rather obfuscated and I dare to doubt that this is
    > >a "commonly used technique", but 'buffer' is memory
    > >you own so you can do with it whatever you want. Of
    > >course, all hinges on your primary assuption that the
    > >input is well-formed ...


    > More importantly, it depends on the variable "buffer" being
    > properly aligned for all member accesses.


    > This was not true on the SPARC, where the compiler put the
    > big buffer on an odd byte boundary.


    Yes, that's a point I forgot about. Should have known better,
    being bitten more than once by this issue when trying to port
    (mostly other peoples;-) code to a different architecture. I
    guess I am not too good a language lawyer;-)

    Best regards, Jens
    --
    \ Jens Thoms Toerring ___
    \__________________________ http://toerring.de
     
    Jens Thoms Toerring, Jun 9, 2008
    #7
  8. Kenny McCormack

    rahul Guest

    On Jun 10, 3:30 am, Chris Torek <> wrote:
    >
    > As a quick fix, I wrapped the buffer up into a union, which
    > forced gcc to align the entire thing on an appropriate boundary.


    A bit off the topic:

    We can also use compiler specific extensions to achieve the alignment
    and padding
    requirements. In case of gcc, __attribute__((packed)) for eliminating
    padding for structures.
    We can also use aligned attributes for buffer to coerce the alignment.
     
    rahul, Jun 10, 2008
    #8
  9. On 9 Jun, 18:08, (Kenny McCormack)
    wrote:

    > Here is a commonly used technique, that will, of course, work fine on
    > any reasonably modern, normal hardware.  But, does it pass the CLC test?
    >
    > /* Assume well-formed input - of course, you can always break it by
    >  * feeding it bad input */
    >
    > struct foo { int field1, field2; char nl; } *bar;
    > char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];
    >
    > int main(void) {
    >     bar = (struct foo *) buffer;
    >     fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
    >     /* Now access the members of the struct (using, e.g., bar -> field1).
    >      * Note that no actual struct was ever declared - we are using
    >      * buffer as if it were the struct */
    >     }


    I used it on real systems. Now it makes me nervous.
    I've seen a system break when an OS was upgraded
    due to this.

    To use this I'd want to be *very* sure there was an
    identical system at both ends. And always would be.


    --
    Nick Keighley
     
    Nick Keighley, Jun 10, 2008
    #9
  10. On 10 Jun, 05:30, rahul <> wrote:
    > On Jun 10, 3:30 am, Chris Torek <> wrote:
    >
    >
    >
    > > As a quick fix, I wrapped the buffer up into a union, which
    > > forced gcc to align the entire thing on an appropriate boundary.

    >
    > A bit off the topic:
    >
    > We can also use compiler specific extensions to achieve the alignment
    > and padding
    > requirements. In case of gcc, __attribute__((packed)) for eliminating
    > padding for structures.
    > We can also use aligned attributes for buffer to coerce the alignment.


    eek!!! These things are different on every compiler. And sometimes
    don't exist. Some hardware cannot support it (or it becomes *very*
    ineffceint).

    I worked on systems that turned it on and off for
    each structure in a large header...

    I've hunted bugs when different packed/not packed options
    had been used in different object files. It *linked* fine.

    --
    Nick Keighley

    "Almost every species in the universe has an irrational fear of
    #pragma packed. But they're wrong"
     
    Nick Keighley, Jun 10, 2008
    #10
  11. Kenny McCormack

    Guest

    Kenny the Troll wrote:
    > Here is a commonly used technique, that will, of course, work fine on

    How did you come to the conclusion that this technique is common?
    Where did you see or hear about it?
    > any reasonably modern, normal hardware. But, does it pass the CLC test?

    It certainly won't work for the "unreasonably modern/antique"
    "abnormal hardware/software".
    > /* Assume well-formed input - of course, you can always break it by
    > * feeding it bad input */

    You *can't* always break it by feeding it bad input as long as it's
    properly programmed.
    > struct foo { int field1, field2; char nl; } *bar;
    > char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];
    >
    > int main(void) {
    > bar = (struct foo *) buffer;
    > fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);

    You don't check the return value of fgets, nor you include <stdio.h>
    for it.
    > /* Now access the members of the struct (using, e.g., bar -> field1).

    Where? I don't see the code accessing said members.
    > * Note that no actual struct was ever declared - we are using

    There was - struct foo { int field1, field2; char n1; }.
    > * buffer as if it were the struct */

    No you are not.
    > }

    You don't return a value from main().
     
    , Jun 10, 2008
    #11
  12. Kenny McCormack

    Serve Lau Guest

    "Nick Keighley" <> schreef in bericht
    news:...
    > On 10 Jun, 05:30, rahul <> wrote:
    >> On Jun 10, 3:30 am, Chris Torek <> wrote:
    >>
    >>
    >>
    >> > As a quick fix, I wrapped the buffer up into a union, which
    >> > forced gcc to align the entire thing on an appropriate boundary.

    >>
    >> A bit off the topic:
    >>
    >> We can also use compiler specific extensions to achieve the alignment
    >> and padding
    >> requirements. In case of gcc, __attribute__((packed)) for eliminating
    >> padding for structures.
    >> We can also use aligned attributes for buffer to coerce the alignment.

    >
    > eek!!! These things are different on every compiler. And sometimes
    > don't exist. Some hardware cannot support it (or it becomes *very*
    > ineffceint).


    *very* inefficient is *very* relative. It all depends on the structure of
    your code. So I would not worry about the efficiency aspect of unaligned
    access, only on the incorrectness aspect :)
     
    Serve Lau, Jun 10, 2008
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Roedy Green

    Nested Class Language Lawyers

    Roedy Green, Aug 24, 2005, in forum: Java
    Replies:
    9
    Views:
    364
    Tor Iver Wilhelmsen
    Aug 29, 2005
  2. news.frontiernet.net
    Replies:
    6
    Views:
    1,168
    news.frontiernet.net
    Apr 16, 2004
  3. Michele Simionato

    iterable terminology (for language lawyers)

    Michele Simionato, Mar 16, 2005, in forum: Python
    Replies:
    4
    Views:
    448
    Raymond Hettinger
    Mar 16, 2005
  4. Evan
    Replies:
    1
    Views:
    318
    Kai-Uwe Bux
    Dec 5, 2006
  5. Replies:
    6
    Views:
    117
Loading...

Share This Page