Memory alignment

C

CBFalconer

No, the padding bytes are most definitely *not* yours to write to.
Although most implementations will let you get away with it, there
are implementations that do careful memory bounds checking and won't.

Bear in mind that Twink is a troll, whose objective is to disrupt
this newsgroup.
 
K

Keith Thompson

Eric Sosman said:
Keith said:
Lowell Gilbert said:
On Oct 3, 12:48 pm, (e-mail address removed) wrote: [...]
Why would you want to declare a 1 char array to store 2 anyway?
Good question. This is found in some real embedded
code to make more efficient of the memory. As I understood
it, the last s[1] is just a placeholder as you can
allocate more memory when needed. For example:

my_struct = malloc(sizeof(my_struct_t) + MY_PAYLOAD_STRING_SIZE);
or even more likely, something more like
my_struct = malloc(sizeof(my_struct_t) + strlen(my->struct->s));
+ 1. The length returned by strlen() doesn't include the
terminating '\0'.

The struct has `char s[1]' as its last element[*], and the
size of that element is already included in sizeof(my_struct_t).

So it does.

Seeing strlen() in a computation of how much memory to allocate sets
off my alarm bells. If I were going to write something like that in
real code, it would be heavily commented.

[...]
 
W

Why Tea

typedef struct some_struct
{
    int i;
    short k,
    int m;
    char s[1];
} some_struct_t;
But strcpy(my->struct->s, "AB"); is OK if there is
padding. Isn't it not? Please go easy on me here.
I just want to understand how this really works...

Yes, of course, if there are two or more padding bytes at the end of the
struct then that's your memory to write to, whether it's on the stack if
my_struct is an automatic variable, or on the heap if you got the memory
from malloc().

You should be aware that the regulars here aren't interested in your
wish for a pragmatic answer that's true in practise: they'll just
bombard you with hypothetical answers that are true in theory.

Thanks Antoninus. I appreciate your answer. Eric S. is very
knowledgeable and I respect that. But to say "It's not OK at all"
categorically without considering cases when it COULD be OK
can give a wrong impression of the problem.

For example, if there is a system crash in an embedded system and
the dump shows it was due to a wild pointer. When you see a piece
of code that uses the struct hack (a term I just learned) and it
forgets to include the '\0' in the size allocated, you can't really
swear by the bible that it causes the crash as it over steps the
memory allocated with strcpy. I understood it's bad and dangerous
programming, but the question is can we be 100% sure that an
existing code like that causes memory corruption? Based on the
little I know, I don't think it's a simple answer of a YES or NO.
That's why I think Antoninus has given a more accurate answer.

can't swear by the bible
"It's not OK at all"
 
K

Keith Thompson

Why Tea said:
Thanks Antoninus. I appreciate your answer. Eric S. is very
knowledgeable and I respect that. But to say "It's not OK at all"
categorically without considering cases when it COULD be OK
can give a wrong impression of the problem.

For example, if there is a system crash in an embedded system and
the dump shows it was due to a wild pointer. When you see a piece
of code that uses the struct hack (a term I just learned) and it
forgets to include the '\0' in the size allocated, you can't really
swear by the bible that it causes the crash as it over steps the
memory allocated with strcpy. I understood it's bad and dangerous
programming, but the question is can we be 100% sure that an
existing code like that causes memory corruption? Based on the
little I know, I don't think it's a simple answer of a YES or NO.
That's why I think Antoninus has given a more accurate answer.

can't swear by the bible
"It's not OK at all"

Um, did you read my answer, in which I specifically acknowledged that
you can probably get away with accessing padding bytes but also
explained why it's a bad idea to depend on it?

There is nothing "pragmatic" about encouraging you to assume that
there are a certain number of padding bytes at the end of a structure,
and that you can safely use them for whatever you want.

"Antoninus Twink" is a troll. Please do us all a favor and ignore
him.
 
C

CBFalconer

Richard said:
.... snip ...

The best solution would be for all the trolls to either:

(a) become C experts - of which there seems little or no hope; or
(b) stop posting C advice - of which there seems little or no
hope, although ISTR that Kenny McCormack, at least, has
the grace to realise that he doesn't know spit about C
and consequently limits his articles to obnoxiousness and
misconceived attempts at irony; or
(c) stop trolling - of which there seems to be no hope whatsoever.

Since none of these solutions is going to happen, we are left with
"to killfile or not to killfile", and - as we have seen - each has
its problems.

There is also:

(d) Await a quote from a reply to the troll, and reply to that.
That works adequately with a killfile.
 
A

Antoninus Twink

There is nothing "pragmatic" about encouraging you to assume that
there are a certain number of padding bytes at the end of a structure,
and that you can safely use them for whatever you want.

Why don't you get off your high horse for a minute and try to separate
the two issues in your mind?

The question you're addressing is: Is it OK to make certain padding
assumptions about structs, and based on those assumptions to write to
memory beyond the last field in the struct? The answer is, of course
that's not a good idea in general, and I'd never advise anyone to do it.

But that wasn't what the OP was asking. He was talking about a specific
compiler on a specific system, where he'd verified that there were some
number of padding bytes at the end of the struct. He asked, in that
specific situation, whether writing into those padding bytes could cause
his program to blow up?

The pragmatic answer is no, unless (as someone else pointed out) the
compiler comes with some elaborate bounds-checking "feature", a
possibility which for all practical purposes can be ignored, because
it's vanishingly unlikely.

struct foo *p = malloc(sizeof(struct foo));
char *q = (char *) p;
q[sizeof(struct foo) - 1]=0;

Are you seriously saying that there's any real-world system that will
blow up here if it happens that there's padding at the end of struct
foo?
 
K

Kenny McCormack

Keith Thompson <[email protected]> got a little confused (as is the norm
for him) and wrote something else. What he meant to write was:
....
"Antoninus Twink" is one of the few people here who tells the truth.
This pisses us off and so we call him (and others like him) a "troll".

Please do us all a favor (because this newsgroup is the only thing many
of us have that passes for a social life) and pretend that you too are
as dumb as we are.

Corrections done. No thanks are necessary (but cash is always accepted).
 
K

Kenny McCormack

Antoninus Twink said:
Are you seriously saying that there's any real-world system that will
blow up here if it happens that there's padding at the end of struct
foo?

Real world systems are OT here. Surely you know that by now.
 
R

Richard Tobin

Antoninus Twink said:
Yes, of course, if there are two or more padding bytes at the end of the
struct then that's your memory to write to, whether it's on the stack if
my_struct is an automatic variable, or on the heap if you got the memory
from malloc().

I think it's guaranteed to be safe to write to them through a char
pointer, but you can't rely on the value you write staying the same.
For example, a little-endian system writing a 16-bit unsigned short
value into an 8-bit unsigned char followed by padding might just write
the whole short, clobbering the first padding byte.

And as has already been pointed out, structure assignment may not
copy padding.

-- Richard
 
R

Richard Tobin

Antoninus Twink said:
struct foo *p = malloc(sizeof(struct foo));
char *q = (char *) p;
q[sizeof(struct foo) - 1]=0;

Are you seriously saying that there's any real-world system that will
blow up here if it happens that there's padding at the end of struct
foo?

I don't think this paticular case is a real-world vs theoretical issue.
It's always legal to write malloc()ed memory through a byte pointer.

-- Richard
 
K

Keith Thompson

Malcolm McLean said:
I think you've hit on a glitch in the standard there.
Use of memset() to zero out memory is well-established, but could lead
to writing to padding bytes, which strictly isn't allowed. Which leads
to the issue of whether a custom "zero-memory", just a hand-coded
memset() with a hard zero, would lead to UB, whilst memset()
doesn't. That's a nonsense rule.

I don't see a glitch. You're permitted to access any object as if it
were an array of unsigned char, which is what memset (or a hand-coded
equivalent) does.

That includes accessing padding bytes. For example:

struct foo { int x; char y; } obj;

if (sizeof obj > offsetof(struct foo, y) + 1) {

/* struct foo has one or more padding bytes at the end */
/* We can do what we like with those padding bytes. */
((unsigned char*)&obj)[sizeof obj - 1] = 42;

/* But updating foo.y might clobber the padding bytes. */
obj.y = 'y';
/* We don't know what value the padding byte now has. */
}

And if you happen to know, after carefully reading your
implementation's documentation, that there are one or more padding
bytes at the end of struct foo, then you can get away with writing to
them. But then your code will be non-portable -- and it could very
easily break if the declaration of struct foo is changed during
maintenance.
 
H

Harald van Dijk

[...]
However C0C0C0C0 is the pointer trap representation. So

memset(&x, 0xC0, sizeof(struct foo));

will cause a program termination, probably when the structure is
accessed, even though ptr isn't written through or read from.

"The value of a structure or union object is never a trap representation,
even though the value of a member of the structure or union object may be
a trap representation."

You're allowed to set a pointer to C0C0C0C0 the way you do, and you're
allowed to do pretty much anything using a structure containing that
pointer, so long as you don't look at that pointer specifically.
But the question is, can the padding bytes have a similar trap
representation? If so, can it be all bits zero, and so can Ben's example
blow up?

No, padding bytes cannot affect whether the bits represent a value.
 
K

Keith Thompson

Malcolm McLean said:
Consider this

struct foo
{
char *ptr;
/* we've got a few padding bytes here */
};

Now we happen to know that null is all zeroes on our particular machine, so

foo x;
memset(&x, 0, sizeof(struct foo));

is OK.

However C0C0C0C0 is the pointer trap representation. So

memset(&x, 0xC0, sizeof(struct foo));

will cause a program termination, probably when the structure is
accessed, even though ptr isn't written through or read from.

The latter won't cause a program termination *unless* we access x.ptr.
This is just a simple case of type punning; it doesn't really have
anything to do with padding bytes.
But the question is, can the padding bytes have a similar trap
representation? If so, can it be all bits zero, and so can Ben's
example blow up?

No. n1256 6.2.6.1p6:

When a value is stored in an object of structure or union type,
including in a member object, the bytes of the object
representation that correspond to any padding bytes take
unspecified values. The value of a structure or union object is
never a trap representation, even though the value of a member of
the structure or union object may be a trap representation.

(There's a change bar on the last sentence; I think it was added post-C99.)
 
M

Martien Verbruggen

struct foo { int x; char y; } obj;

if (sizeof obj > offsetof(struct foo, y) + 1) {

/* struct foo has one or more padding bytes at the end */
/* We can do what we like with those padding bytes. */
((unsigned char*)&obj)[sizeof obj - 1] = 42;

/* But updating foo.y might clobber the padding bytes. */
obj.y = 'y';
/* We don't know what value the padding byte now has. */
}

And if you happen to know, after carefully reading your
implementation's documentation, that there are one or more padding
bytes at the end of struct foo, then you can get away with writing to

If you've used the baove test, would you still need to read your
compiler's documentation? Isn't the test enough?

IOW, why did you include the phrase "after carefully reading your
implementation's documentation", rather than leave it unqualified?


Martien
 
K

Keith Thompson

Martien Verbruggen said:
struct foo { int x; char y; } obj;

if (sizeof obj > offsetof(struct foo, y) + 1) {

/* struct foo has one or more padding bytes at the end */
/* We can do what we like with those padding bytes. */
((unsigned char*)&obj)[sizeof obj - 1] = 42;

/* But updating foo.y might clobber the padding bytes. */
obj.y = 'y';
/* We don't know what value the padding byte now has. */
}

And if you happen to know, after carefully reading your
implementation's documentation, that there are one or more padding
bytes at the end of struct foo, then you can get away with writing to

If you've used the baove test, would you still need to read your
compiler's documentation? Isn't the test enough?

IOW, why did you include the phrase "after carefully reading your
implementation's documentation", rather than leave it unqualified?

Unclear writing on my part. The "And if you happen to know ..." part
was intended to refer to accessing the padding bytes in general
(without the "if"), not specifically to the code above.

The whole idea is frankly a bit silly. If you want to access bytes
within a struct, why on Earth would you not declare members to cover
those bytes? (The struct hack isn't an exception to this; it
deliberately accesses bytes that may be outside the struct, but within
a block allocated by malloc.) And the proposed "if" just makes it
sillier; what are you going to do if there isn't any padding at the
end, and why not just do that unconditionally? (That's a generic
"you".)
 
W

Why Tea

Keith Thompson said:


It is a very simple, but two-fold, answer - YES, you can be sure that
writing into memory you don't own causes memory corruption; and NO, you
can't be sure that the effect of this corruption will always be
noticeable. When you write into memory you don't own, the behaviour of
your program is undefined, and the rules of C no longer apply - so
anything can happen, including (but by no means limited to) what you
expected to happen.

Thanks Richard. I understood what said. I'd like to apologize for
asking for more questions. But I really would like to get to the
bottom of this.

If the memory is corrupted, wouldn't the system eventually crash
if you run it long enough? If so, then we can be sure that the
corruption will be noticeable.

I went back to c-faq to read 2.6 many times again. I'll paste
the code here for easy reference.

#include <stdlib.h>
#include <string.h>

struct name {
int namelen;
char namestr[1];
};

struct name *makename(char *newname)
{
struct name *ret =
malloc(sizeof(struct name)-1 + strlen(newname)+1);
/* -1 for initial [1]; +1 for \0 */
if(ret != NULL) {
ret->namelen = strlen(newname);
strcpy(ret->namestr, newname);
}
return ret;
}

Although not specifically stated, padding is likely
to occur for namestr. So strcpy must have written into
the padding bytes. The faq says "... has deemed that
it is not strictly conforming with the C Standard,
although it does seem to work under all known
implementations...".

I know it's bad and it shouldn't be done. But when
I look at tens of thousands of lines of code written
by someone else and many of them make use of this
hack. What can we conclude? Perhaps it does work,
just like the faq says.

I appreciate all of you who took the time to answer
my questions. Not all of us work with C everyday,
that's why we ask question here. Of course we try
to google and ask our colleagues before turning to
the group, but it doesn't always work - as this
"struct hack" indicated. It doesn't help to answer
the question with an almighty attitude, again as
this "struct hack" has indicated, although the
consensus is not to do it, but no one can say for
sure if the system will eventually die or crash.
Very often, we ask a question because we badly
need help and we know there are many knowledgeable
and competent people here. Thanks again for your
time.
 
N

Nick Keighley

Keith Thompson said:


It is clear from the above that at least some newbies are *not* ignoring
the technically incompetent answers provided by Mr Twink. Note that
"technically incompetent" and "trollish" are not the same thing. The
problem with Mr Twink (and it is not a problem that is unique to him) is
that he's *both*. The regular contributors to this group know full well
that he's a troll. He is in many killfiles (including mine). And therefore
he can often get away with spouting any old rubbish without being
challenged, and thus newbies can be misled into following his "advice" (as
appears to have happened in this case).

which is why I don't have him kill filed.

So the killfile "solution" is problematic, because it allows trolls like Mr
Twink the freedom to give stupid advice to up-lapping newbies with little
or no risk of being corrected. Unfortunately, the non-killfile "solution"
is also problematic, because it raises the overall temperature of the
group. If, for example, I were to remove Mr Twink from my killfile, I know
from experience that within a week there'd be a flame war several hundreds
of articles long.

my rule of two (or sometimes 3) hopefully keeps me from this.
I me and my protagonist have repeated the same position three
then I declare it a stalemate and retire from the discussion.
The analogy is with similar chess rule.

I usually try to avoid replying to Twink but instead reply
to his repliers. Sometimes he's so potentially misleading
I feel I have to respond for the sake of the lurkers.

<snip>

Kenny's easier as he's just tedious.
 
K

Keith Thompson

Why Tea said:
Thanks Richard. I understood what said. I'd like to apologize for
asking for more questions. But I really would like to get to the
bottom of this.

If the memory is corrupted, wouldn't the system eventually crash
if you run it long enough? If so, then we can be sure that the
corruption will be noticeable.

Maybe, maybe not. There are no guarantees, one way or the other.

For example, if by "corrupting memory" you clobber the value of some
other variable, maybe it's a variable that isn't used again, so it has
no visible effect on the program. Or maybe you clobber memory that's
outside the object you're trying to access, but also outside any other
object. Or maybe you set some variable to a value that happens to be
correct.

There are any number of ways you can corrupt memory with no visible
effect. The risk is that the effect could become visible at the least
convenient possible time -- say, when your software has been deployed
to customers, or when you're demonstrating it to somebody important,
or years later when all the people who are familiar with the code have
left the company. Such is the nature of undefined behavior.
I went back to c-faq to read 2.6 many times again. I'll paste
the code here for easy reference.

#include <stdlib.h>
#include <string.h>

struct name {
int namelen;
char namestr[1];
};

struct name *makename(char *newname)
{
struct name *ret =
malloc(sizeof(struct name)-1 + strlen(newname)+1);
/* -1 for initial [1]; +1 for \0 */
if(ret != NULL) {
ret->namelen = strlen(newname);
strcpy(ret->namestr, newname);
}
return ret;
}

Although not specifically stated, padding is likely
to occur for namestr. So strcpy must have written into
the padding bytes. The faq says "... has deemed that
it is not strictly conforming with the C Standard,
although it does seem to work under all known
implementations...".

Proper use of the struct hack does *not* depend on padding bytes. It
writes outside the bounds of the array, and of the struct that
contains it, but *within* the bounds of the chunk of memory allocated
by malloc. For this to work, you need an implementation that doesn't
do bounds checking; almost all existing implementations qualify. (In
fact, since the struct hack is a common trick, a compiler that broke
it would probably fail in the marketplace.)

On the other hand, there's some risk that an optimizing compiler could
cause problems. Since violating array bounds invokes undefined
behavior, an optimizing compiler is allowed to *assume* that you
haven't done so, even if it doesn't generate code for explicit
run-time bounds checks. But again, the struct hack is common enough
that you should be ok.
I know it's bad and it shouldn't be done. But when
I look at tens of thousands of lines of code written
by someone else and many of them make use of this
hack. What can we conclude? Perhaps it does work,
just like the faq says.

The struct hack itself *probably* violates the rules of the language,
but it's generally supported -- and C99 explicitly supports it in a
different form. Code that assumes the presence of padding bytes, on
the other hand, is more dangerous. For example, if your declaration
changes from this:
struct name {
int namelen;
char namestr[1];
};
to this:
struct name {
int namelen;
short something;
unsigned char something_else;
char namestr[1];
};
then it's likely (given 4-byte int, 2-byte short, and, of course,
1-byte char) that the structure will be 8 bytes with *no* padding.

There might be some confusion here. I haven't gone back to the
original article, and I'm not certain that the code you originally
posted actually assumed the existence of padding bytes rather than
just making ordinary use of the struct hack.

[...]
 
N

Nick Keighley

On 4 Oct 2008 at 5:02, Keith Thompson wrote:

Why don't you get off your high horse for a minute and try to separate
the two issues in your mind?

The question you're addressing is: Is it OK to make certain padding
assumptions about structs, and based on those assumptions to write to
memory beyond the last field in the struct? The answer is, of course
that's not a good idea in general, and I'd never advise anyone to do it.
ok

But that wasn't what the OP was asking. He was talking about a specific
compiler on a specific system,

and perhaps he should have asked on a specifc news group
where he'd verified that there were some
number of padding bytes at the end of the struct. He asked, in that
specific situation, whether writing into those padding bytes could cause
his program to blow up?

The pragmatic answer is no,

only if "pragmatic" means "wrong". The problem is his program is now
non-portable. This non-portability *could* involve different compilers
on the same platform. Or different versions of the same compiler.
Or changes to flag settings of the compiler (particularly
optimistation
flags).

Its quite easier for you to end up writing to bytes that
don't belong to you. And that's an accident waiting to happen.

unless (as someone else pointed out) the
compiler comes with some elaborate bounds-checking "feature", a
possibility which for all practical purposes can be ignored, because
it's vanishingly unlikely.

struct foo *p = malloc(sizeof(struct foo));
char *q = (char *) p;
q[sizeof(struct foo) - 1]=0;

Are you seriously saying that there's any real-world system that will
blow up here if it happens that there's padding at the end of struct
foo?

no no-one knows of one. The standard commitee still doesn't think
highly of the "struct hack" of which this a variant.


--
Nick Keighley
"Almost every species in the universe has an irrational fear of the
dark.
But they're wrong- cos it's not irrational. It's Vashta Nerada."
The Doctor
 
N

Nick Keighley

Why Tea said:
On Oct 3, 12:48 pm, (e-mail address removed) wrote:
Good question. This is found in some real embedded
code to make more efficient of the memory. As I understood
it, the last s[1] is just a placeholder as you can
allocate more memory when needed. For example:
my_struct = malloc(sizeof(my_struct_t) + MY_PAYLOAD_STRING_SIZE);

This is called the "struct hack". It has been formalised in C99 so if
you can use C99 then all will be well.

um. I thought the position of the struct hack was the same
on C99 as C90. What *did* change was the addiition of
VLAs that were intended to remove the need for TSH

The same my_struct_t is used throughout the code for
signal sending. If s[] is used to carry binary data, the
size is specified by an int preceding s[]. I'd be
interested to hear comments from the experts about
this approach.

It is considered to be "a bit dodgy" (that is the technical term) but
it generally works. I am not sure there is really much more to say
about it though I get the feeling I will be proved very much wrong
about that!

I think on a reasonably sane embedded system it was almost
certain it would work. Obviously you run some tests. I've used
TSH heavily.

--
Nick Keighley

"If, indeed the subatomic energy in the stars is being freely
used to maintain their great furnaces, it seems to bring a little
nearer to fulfillment our dreams of controlling this latent
power for the well-being of the human race - or for its suicide."
Aurthur S. Eddington "The Internal Constitution of the Stars" 1926
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,176
Latest member
Jerilyn201
Top