One for the language lawyers

Kenny McCormack · Jun 9, 2008

Here is a commonly used technique, that will, of course, work fine on
any reasonably modern, normal hardware. But, does it pass the CLC test?

/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */

struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
/* Now access the members of the struct (using, e.g., bar -> field1).
* Note that no actual struct was ever declared - we are using
* buffer as if it were the struct */
}

Harald van DÄ³k · Jun 9, 2008

Here is a commonly used technique,

It is? Where have you seen it used?

that will, of course, work fine on
any reasonably modern, normal hardware. But, does it pass the CLC test?
No.

/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */

struct foo { int field1, field2; char nl; } *bar;

What's the nl member for?

char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

int main(void) {
bar = (struct foo *) buffer;

This assumes that buffer is appropriately aligned for a struct foo. When
you access *bar, you also ignore C's aliasing rules. Both problems can be
avoided by using a union.

fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);

Did you mean fread, or were you really asking about fgets? If you meant
fread, I don't see the point of a nl member at all. If you meant fgets, I
don't see the point of a nl member at the very end.

Walter Roberson · Jun 9, 2008

Here is a commonly used technique, that will, of course, work fine on
any reasonably modern, normal hardware. But, does it pass the CLC test?

/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */

struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
/* Now access the members of the struct (using, e.g., bar -> field1).
* Note that no actual struct was ever declared - we are using
* buffer as if it were the struct */
}

There may be unnamed padding between struct members for any reason,
so unless the data being read from stdin via fgets was written
with exactly the same compiler version on exactly the same target,
the code is not certain to work.

Some of the compilers I use *do* put unnamed padding in places
where it is not obvious to do so, in order to achive better caching
performance.

Jens Thoms Toerring · Jun 9, 2008

Kenny McCormack said:
Here is a commonly used technique, that will, of course, work fine on
any reasonably modern, normal hardware. But, does it pass the CLC test?

/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */

struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
/* Now access the members of the struct (using, e.g., bar -> field1).
* Note that no actual struct was ever declared - we are using
* buffer as if it were the struct */
}

As long as sizeof(struct foo) isn't smaller than
SOMENUMBERWHATEVERFLOATSYOURBOAT then there's no problem.
It's rather obfuscated and I dare to doubt that this is
a "commonly used technique", but 'buffer' is memory
you own so you can do with it whatever you want. Of
course, all hinges on your primary assuption that the
input is well-formed (it may be difficult to make it
non-well-formed for the types of members the structure
has on main-stream hardware, but there might be some
systems where certain bit-patterns don't represent ints
and thus you may run into danger of undefined behaviour).
So figuring out what's well-formed can be a bit of a
bother but as long as you do that there's no problem.

Regards, Jens

Hallvard B Furuseth · Jun 9, 2008

Kenny said:
Here is a commonly used technique, (...)

I hope not.

struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
/* Now access the members of the struct (using, e.g., bar -> field1).

This breaks e.g. if there is a 0x10 byte (newline) in the integer
representation of the would-be bar->field1 value. And as Harald
said, it breaks if buffer is not properly aligned for a struct foo.

Also when I see fgets() I suspect the file has been opened in text
instead of binary mode, which means there may be bugs from converting
between newline and the file system's representation of end-of-line.

Chris Torek · Jun 9, 2008

Kenny McCormack said:
Here is a commonly used technique, that will, of course, work fine on
any reasonably modern, normal hardware. But, does it pass the CLC test?

Click to expand...

/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */
struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

Click to expand...

int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
/* Now access the members of the struct (using, e.g., bar -> field1).
* Note that no actual struct was ever declared - we are using
* buffer as if it were the struct */
}

Click to expand...

As long as sizeof(struct foo) isn't smaller than
SOMENUMBERWHATEVERFLOATSYOURBOAT then there's no problem.

When I first built the 4.xBSD system for the SPARC, tftp broke,
precisely because it used this kind of trick. (In tftp's case,
it was a more complex variant of the "struct hack".)

It's rather obfuscated and I dare to doubt that this is
a "commonly used technique", but 'buffer' is memory
you own so you can do with it whatever you want. Of
course, all hinges on your primary assuption that the
input is well-formed ...

More importantly, it depends on the variable "buffer" being
properly aligned for all member accesses.

This was not true on the SPARC, where the compiler put the
big buffer on an odd byte boundary.

As a quick fix, I wrapped the buffer up into a union, which
forced gcc to align the entire thing on an appropriate boundary.

The trick also works if you use malloc() to obtain the buffer.

In any case, it is not a very good idea to write the code this way,
because it places such strong constraints on what constitutes "well
formed" input. You need to make sure that these severe restrictions
on whatever uses the code are paid-for by whatever benefit you are
getting from this "commonly used technique" (which, in my experience,
was used perhaps once in the entire 4.xBSD code base -- that seems
to argue against the claim that it is "commonly used").

Jens Thoms Toerring · Jun 9, 2008

Chris Torek said:
Kenny McCormack said:

Here is a commonly used technique, that will, of course, work fine on
any reasonably modern, normal hardware. But, does it pass the CLC test?

Click to expand...

/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */
struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

Click to expand...

int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
/* Now access the members of the struct (using, e.g., bar -> field1).
* Note that no actual struct was ever declared - we are using
* buffer as if it were the struct */
}

Click to expand...

Click to expand...

When I first built the 4.xBSD system for the SPARC, tftp broke,
precisely because it used this kind of trick. (In tftp's case,
it was a more complex variant of the "struct hack".)

More importantly, it depends on the variable "buffer" being
properly aligned for all member accesses.

This was not true on the SPARC, where the compiler put the
big buffer on an odd byte boundary.

Yes, that's a point I forgot about. Should have known better,
being bitten more than once by this issue when trying to port
(mostly other peoples;-) code to a different architecture. I
guess I am not too good a language lawyer;-)

Best regards, Jens

rahul · Jun 10, 2008

As a quick fix, I wrapped the buffer up into a union, which
forced gcc to align the entire thing on an appropriate boundary.

A bit off the topic:

We can also use compiler specific extensions to achieve the alignment
and padding
requirements. In case of gcc, __attribute__((packed)) for eliminating
padding for structures.
We can also use aligned attributes for buffer to coerce the alignment.

Nick Keighley · Jun 10, 2008

Here is a commonly used technique, that will, of course, work fine on
any reasonably modern, normal hardware. But, does it pass the CLC test?

/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */

struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
/* Now access the members of the struct (using, e.g., bar -> field1).
* Note that no actual struct was ever declared - we are using
* buffer as if it were the struct */
}

I used it on real systems. Now it makes me nervous.
I've seen a system break when an OS was upgraded
due to this.

To use this I'd want to be *very* sure there was an
identical system at both ends. And always would be.

Nick Keighley · Jun 10, 2008

A bit off the topic:

We can also use compiler specific extensions to achieve the alignment
and padding
requirements. In case of gcc, __attribute__((packed)) for eliminating
padding for structures.
We can also use aligned attributes for buffer to coerce the alignment.

eek!!! These things are different on every compiler. And sometimes
don't exist. Some hardware cannot support it (or it becomes *very*
ineffceint).

I worked on systems that turned it on and off for
each structure in a large header...

I've hunted bugs when different packed/not packed options
had been used in different object files. It *linked* fine.

vippstar · Jun 10, 2008

Kenny said:
Here is a commonly used technique, that will, of course, work fine on

How did you come to the conclusion that this technique is common?
Where did you see or hear about it?

any reasonably modern, normal hardware. But, does it pass the CLC test?

It certainly won't work for the "unreasonably modern/antique"
"abnormal hardware/software".

/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */

You *can't* always break it by feeding it bad input as long as it's
properly programmed.

struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);

You don't check the return value of fgets said:
/* Now access the members of the struct (using, e.g., bar -> field1).

Where? I don't see the code accessing said members.

* Note that no actual struct was ever declared - we are using

There was - struct foo { int field1, field2; char n1; }.

* buffer as if it were the struct */ No you are not.
}

You don't return a value from main().

Serve Lau · Jun 10, 2008

Nick Keighley said:
eek!!! These things are different on every compiler. And sometimes
don't exist. Some hardware cannot support it (or it becomes *very*
ineffceint).

*very* inefficient is *very* relative. It all depends on the structure of
your code. So I would not worry about the efficiency aspect of unaligned
access, only on the incorrectness aspect

Errata for The C Programming Language, Second Edition, by Brian Kernighanand Dennis Ritchie	4	May 16, 2009
Looking for advice on how to deal with array of structs	11	Jan 25, 2007
Why C Is Not My Favourite Programming Language	132	Feb 5, 2005
the C language lack for keyboard input functions	10	Sep 13, 2006
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
Sorry for the NOOB question..	19	Sep 14, 2007
Newbie: learning to use malloc().	11	Jul 16, 2006
label inside for-loop	10	Jun 8, 2004

One for the language lawyers

Kenny McCormack

Harald van DÄ³k

Walter Roberson

Jens Thoms Toerring

Hallvard B Furuseth

Chris Torek

Jens Thoms Toerring

rahul

Nick Keighley

Nick Keighley

vippstar

Serve Lau

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads