One for the language lawyers

  • Thread starter Kenny McCormack
  • Start date
K

Kenny McCormack

Here is a commonly used technique, that will, of course, work fine on
any reasonably modern, normal hardware. But, does it pass the CLC test?

/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */

struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
/* Now access the members of the struct (using, e.g., bar -> field1).
* Note that no actual struct was ever declared - we are using
* buffer as if it were the struct */
}
 
H

Harald van Dijk

Here is a commonly used technique,

It is? Where have you seen it used?
that will, of course, work fine on
any reasonably modern, normal hardware. But, does it pass the CLC test?
No.

/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */

struct foo { int field1, field2; char nl; } *bar;

What's the nl member for?
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

int main(void) {
bar = (struct foo *) buffer;

This assumes that buffer is appropriately aligned for a struct foo. When
you access *bar, you also ignore C's aliasing rules. Both problems can be
avoided by using a union.
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);

Did you mean fread, or were you really asking about fgets? If you meant
fread, I don't see the point of a nl member at all. If you meant fgets, I
don't see the point of a nl member at the very end.
 
W

Walter Roberson

Here is a commonly used technique, that will, of course, work fine on
any reasonably modern, normal hardware. But, does it pass the CLC test?
/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */
struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];
int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
/* Now access the members of the struct (using, e.g., bar -> field1).
* Note that no actual struct was ever declared - we are using
* buffer as if it were the struct */
}

There may be unnamed padding between struct members for any reason,
so unless the data being read from stdin via fgets was written
with exactly the same compiler version on exactly the same target,
the code is not certain to work.

Some of the compilers I use *do* put unnamed padding in places
where it is not obvious to do so, in order to achive better caching
performance.
 
J

Jens Thoms Toerring

Kenny McCormack said:
Here is a commonly used technique, that will, of course, work fine on
any reasonably modern, normal hardware. But, does it pass the CLC test?
/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */
struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];
int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
/* Now access the members of the struct (using, e.g., bar -> field1).
* Note that no actual struct was ever declared - we are using
* buffer as if it were the struct */
}

As long as sizeof(struct foo) isn't smaller than
SOMENUMBERWHATEVERFLOATSYOURBOAT then there's no problem.
It's rather obfuscated and I dare to doubt that this is
a "commonly used technique", but 'buffer' is memory
you own so you can do with it whatever you want. Of
course, all hinges on your primary assuption that the
input is well-formed (it may be difficult to make it
non-well-formed for the types of members the structure
has on main-stream hardware, but there might be some
systems where certain bit-patterns don't represent ints
and thus you may run into danger of undefined behaviour).
So figuring out what's well-formed can be a bit of a
bother but as long as you do that there's no problem.

Regards, Jens
 
H

Hallvard B Furuseth

Kenny said:
Here is a commonly used technique, (...)

I hope not.
struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
/* Now access the members of the struct (using, e.g., bar -> field1).

This breaks e.g. if there is a 0x10 byte (newline) in the integer
representation of the would-be bar->field1 value. And as Harald
said, it breaks if buffer is not properly aligned for a struct foo.

Also when I see fgets() I suspect the file has been opened in text
instead of binary mode, which means there may be bugs from converting
between newline and the file system's representation of end-of-line.
 
C

Chris Torek

Kenny McCormack said:
Here is a commonly used technique, that will, of course, work fine on
any reasonably modern, normal hardware. But, does it pass the CLC test?
/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */
struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];
int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
/* Now access the members of the struct (using, e.g., bar -> field1).
* Note that no actual struct was ever declared - we are using
* buffer as if it were the struct */
}

As long as sizeof(struct foo) isn't smaller than
SOMENUMBERWHATEVERFLOATSYOURBOAT then there's no problem.

When I first built the 4.xBSD system for the SPARC, tftp broke,
precisely because it used this kind of trick. (In tftp's case,
it was a more complex variant of the "struct hack".)
It's rather obfuscated and I dare to doubt that this is
a "commonly used technique", but 'buffer' is memory
you own so you can do with it whatever you want. Of
course, all hinges on your primary assuption that the
input is well-formed ...

More importantly, it depends on the variable "buffer" being
properly aligned for all member accesses.

This was not true on the SPARC, where the compiler put the
big buffer on an odd byte boundary.

As a quick fix, I wrapped the buffer up into a union, which
forced gcc to align the entire thing on an appropriate boundary.

The trick also works if you use malloc() to obtain the buffer.

In any case, it is not a very good idea to write the code this way,
because it places such strong constraints on what constitutes "well
formed" input. You need to make sure that these severe restrictions
on whatever uses the code are paid-for by whatever benefit you are
getting from this "commonly used technique" (which, in my experience,
was used perhaps once in the entire 4.xBSD code base -- that seems
to argue against the claim that it is "commonly used").
 
J

Jens Thoms Toerring

Chris Torek said:
Kenny McCormack said:
Here is a commonly used technique, that will, of course, work fine on
any reasonably modern, normal hardware. But, does it pass the CLC test?
/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */
struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];
int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
/* Now access the members of the struct (using, e.g., bar -> field1).
* Note that no actual struct was ever declared - we are using
* buffer as if it were the struct */
}
When I first built the 4.xBSD system for the SPARC, tftp broke,
precisely because it used this kind of trick. (In tftp's case,
it was a more complex variant of the "struct hack".)
More importantly, it depends on the variable "buffer" being
properly aligned for all member accesses.
This was not true on the SPARC, where the compiler put the
big buffer on an odd byte boundary.

Yes, that's a point I forgot about. Should have known better,
being bitten more than once by this issue when trying to port
(mostly other peoples;-) code to a different architecture. I
guess I am not too good a language lawyer;-)

Best regards, Jens
 
R

rahul

As a quick fix, I wrapped the buffer up into a union, which
forced gcc to align the entire thing on an appropriate boundary.

A bit off the topic:

We can also use compiler specific extensions to achieve the alignment
and padding
requirements. In case of gcc, __attribute__((packed)) for eliminating
padding for structures.
We can also use aligned attributes for buffer to coerce the alignment.
 
N

Nick Keighley

Here is a commonly used technique, that will, of course, work fine on
any reasonably modern, normal hardware.  But, does it pass the CLC test?

/* Assume well-formed input - of course, you can always break it by
 * feeding it bad input */

struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

int main(void) {
    bar = (struct foo *) buffer;
    fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
    /* Now access the members of the struct (using, e.g., bar -> field1).
     * Note that no actual struct was ever declared - we are using
     * buffer as if it were the struct */
    }

I used it on real systems. Now it makes me nervous.
I've seen a system break when an OS was upgraded
due to this.

To use this I'd want to be *very* sure there was an
identical system at both ends. And always would be.
 
N

Nick Keighley

A bit off the topic:

We can also use compiler specific extensions to achieve the alignment
and padding
requirements. In case of gcc, __attribute__((packed)) for eliminating
padding for structures.
We can also use aligned attributes for buffer to coerce the alignment.

eek!!! These things are different on every compiler. And sometimes
don't exist. Some hardware cannot support it (or it becomes *very*
ineffceint).

I worked on systems that turned it on and off for
each structure in a large header...

I've hunted bugs when different packed/not packed options
had been used in different object files. It *linked* fine.
 
V

vippstar

Kenny said:
Here is a commonly used technique, that will, of course, work fine on
How did you come to the conclusion that this technique is common?
Where did you see or hear about it?
any reasonably modern, normal hardware. But, does it pass the CLC test?
It certainly won't work for the "unreasonably modern/antique"
"abnormal hardware/software".
/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */
You *can't* always break it by feeding it bad input as long as it's
properly programmed.
struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdin);
You don't check the return value of fgets said:
/* Now access the members of the struct (using, e.g., bar -> field1).
Where? I don't see the code accessing said members.
* Note that no actual struct was ever declared - we are using
There was - struct foo { int field1, field2; char n1; }.
* buffer as if it were the struct */ No you are not.
}
You don't return a value from main().
 
S

Serve Lau

Nick Keighley said:
eek!!! These things are different on every compiler. And sometimes
don't exist. Some hardware cannot support it (or it becomes *very*
ineffceint).

*very* inefficient is *very* relative. It all depends on the structure of
your code. So I would not worry about the efficiency aspect of unaligned
access, only on the incorrectness aspect :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top