Malloc Query

Uncle Steve · May 23, 2011

I should also have said especially if the macro does not have an
ALL_CAPS name flagging it's macrosity.

But as some feel here, macrosity is a vanishing distinction. I use
all-caps macro names for constant expressions and the like. If I
think a macro will be more readable in lower-case, I'll do just that.
Shouted macro definitions interrupt the flow of source text when a
human is reading it, and I don't think I need that kind of distraction
in the code as much as some do.

They should, but the onus is on the programmer to make that reasonable
(and where there are side effects in the expressions used in the macro
"call" this may not be possible).

Yes, using an inline function is better yet. You seemed dead against
them which is why I commented on the macro. Use a function and the
issue disappears.

I'm not "dead against" function calls for simple things. I'm merely
predisposed against them.

I don't see how you can say that. The speed advantage that used to be
one of the main reasons to take the risk with macros has gone, but all
the potential bugs that function-like macros can hide are still there.

Sorry, I should say 'moot' in consideration of the now historic
performance advantage of macros over function calls. In reality,
function calls /are/ slower than macro instances but the compiler
masks this when it inlines a function by omitting function preamble
and jinking the basic blocks of code to flow together as if they were
part of the immediate text.

<snip>
#define ARENA_TYPE struct foo

Click to expand...

struct arena_s {
size_t arena_size;
int free_list;
size_t free;
ARENA_TYPE * arena;
};

And your access macro becomes:

#define arena_obj_addr(x, n) &x->arena[n]

Here the [n] is OK, but why risk it with x? The (x) costs nothing and,
I think, saves time reading the macro since the reader does no have to
start worrying about what might be passed.

Click to expand...

No worries here. I've edited for brevity in posting, but I always
protect macro arguments from side-effects these days. It doesn't look
as clean and concise, which is why I omitted those statements from
example code. No biggie.

Click to expand...

OK. I'll try to remember to assume any errors in future code were added
for brevity and I'll save time by not commenting on it.

Please do. I find it's the cut-and-paste-modify step that introduces
errors into program code more so than writing it in the first place.

In this newsgroup is it really preferable to see

#define arena_object_addr(x, n) ({ \
struct arena_s X = x; \
int N = n; \
*X->arena[N]; }) \

for every dumb example that is posted?

Click to expand...

The correct version of the macro is simpler than the one you seem to
think you'd write to make it safe. Why would you write the above (it's
wrong, so it's obviously not what you really use) instead of

*(x)->arena[n]

That's only correct if the macro is a simple one-liner. If (x) is
used more than once and has side-effects, such as in (x++), you're
toast.

? Also, why would you use a non-standard feature when it's not needed? There
might be a reason to use ({...}) when a macro parameter is used more
than once that's not the case here. I suppose if you know you'll never
use another compiler, tying code to language extensions is not a problem.

I'm a creature of habit, I guess. Note that while I didn't learn C
under gcc, it's about the only compiler I've used in the last
seventeen years. Consequently, avoiding its extensions is not an
automatic process. I'm making an effort to avoid non-portable code,
but it is slow going.

One of the bug-bears of Usenet is "oh that's not the real code". This
translates to "you've just wasted your time". I'd vote for real code
(at the last code that's gone though a compiler) every time.

I look at the majority of the example code posted in Usenet messages
as being contrived for the purposes of demonstration. When I was a
total noob, comp.sources.unix and a few others were places where real
code was made available to Usenet readers. So called real code is no
longer posted to newsgroups. I'm not ecstatic that my contrived
examples are not perfect, but the alternative is to post my 'real
code' and annoy people who would prefer the exemplar pared down to the
bare minimum.

The arena allocator I've been on about is instantiated as a .h file
and is over 500 lines of code at this time. It carries with it all
the 'baggage' of my development environment and would make the
discussion of its data structures and algorithm very awkward. A
reasonable compromise would be to ensure that posted code compiles
and works as expected, but even than that won't correct for all
possible errors: note that I buggered up the for() loop in the
arena_creat() function without triggering the error during testing.

I'll endeavor to post better code in subsequent messages, but it is
not always practical to test quick one-off constructs completely when
so many things compete for my (and your) time. I suppose I might
sound more intelligent and authoritative if my messages never
contained errors, but then I might never post at all if that were a
basic prerequisite for Usenet participation. Somewhere there is a
happy medium that will satisfy everyone here. I'll look more closely
for it.

Regards,

Uncle Steve

Ian Collins · May 23, 2011

Not so fast, Mr. Collins. Macros are useful in numerous situations
where function calls would be inefficient.

The only real use for macros is passing __FILE__ and __LINE__ to debug logs!

For my purposes, it is
sometimes nice to write a macro which uses conditionals to select a
particular code path through the macro. If the conditional can be
resolved at compile-time, the dead code branches in the macro are
pruned, saving space. A similarly structured function call won't be
optimized in this fashion unless (maybe) it is inlined.

Why not? Whether the function is called or in line, the conditional
compiles are still there.

Not enough? Is it smart to rely on the compiler to optimize the crap
out of everything, or is it better to write efficient code at the
outset?

Inlining is a very simple optimisation. You also overlook the opposite
optimisation - not inlining. There are situations where a smaller
executable performs better (or even fit the target device!). The
compiler should be able to judge where it is appropriate to inline.
Compilers targeted at embedded devices often have options to tune these
heuristics. Decent compilers for hosted environments offer profile
feedback to further tune optimisations.

In short, the more choice you give the compiler, the better job it can do.

There are a number of C compilers out there that are not gcc,
and which may or may not generate equivalent code for otherwise
similar macros/functions. I'm O.K. with peppering my code with
well-designed macros and ignoring functions for simple common
procedures, but then I'm biased that way from habit.

I guess working with C++ for many years has biased me the other way.
Aggressive inlining has long been a feature of C++ compilers. It used
to be a unique feature, but those days are long past.

luser- -droog · May 23, 2011

I look at the majority of the example code posted in Usenet messages
as being contrived for the purposes of demonstration. When I was a
total noob, comp.sources.unix and a few others were places where real
code was made available to Usenet readers. So called real code is no
longer posted to newsgroups. I'm not ecstatic that my contrived
examples are not perfect, but the alternative is to post my 'real
code' and annoy people who would prefer the exemplar pared down to the
bare minimum.

Cast my vote for real code. Unless you're telling a story with little
pieces
of code. That's good too.

luser- -droog · May 23, 2011

I didn't mean to suggest any consideration for negative 'ptrdiff_t'
values. It's a signed integer type just as 'int' is a signed integer
type. But because a pointer to the top of your arena minus a pointer to
the bottom of it yields a 'ptrdiff_t', perhaps it could be enjoyable to
use 'ptrdiff_t' for the addition, just for the sake of symmetry.

If ever there were a place to use one, it'd be here.

Shao Miller · May 23, 2011

True in spirit, but the 'array' aspect is an implementation detail.

Heheheh. I was suggesting that it's more than "philosophically" and
more than "in spirit." I think you are implying that a user of your
allocator needn't be aware of the fact that the arena is an array, or
that there is an "arena" at all.

Well, if C implementations pad out structures this way automatically
that would be a big win, but I have no way of knowing ahead of time
whether this is true for any given C compiler.

I fail to understand why any C implementation would not pad structures
in this way. Suppose you have an implementation that doesn't. Suppose
an 'int' has a size and alignment of 4 bytes. Suppose you have:

struct foo {
int i; /* 0 through 3 */
char ca[3]; /* 4 through 6 */
/* No padding */
};

struct foo bar[2];

It would seem that the C implementation would have the 'bar' array's
second element's 'i' member starting at offset 7, counting from 0 and
relative to the first byte of the array. That would be misaligned.
Adding one byte of padding would do the trick.

Now I can understand if an implementation has a documented extension
that allows you to override such sensible padding and pack a structure
or specify your own padding, but use of such an extension would, in my
opinion, put the onus on the user to know what they're doing...

If they really want the structure above to be 7 bytes and you allocate
the second slot in your arena for such a structure and they try to
access the 'i' member at offset 7 from the beginning of your arena and
their CPU is angry about it... Too bad for the user, in my opinion.
They need to re-think using the "packing" extension for their scenario.

I don't believe that your allocator should need to be concerned about
padding, but I could be mistaken.

And for platform
like x86 where unaligned access works with a small cycle penalty you
can choose to pack your nine-byte structures so they don't require an
extra three bytes of alignment. That saves twenty percent or
thereabouts for each array slot, which adds up bigtime if you have
thousands or millions of data elements.

Ok. So if it works with a small cycle penalty, then the CPU won't be
angry about it and 'i' will be read at offset 7 (continuing my example
above) and your allocator still works without having been worried about
padding. When they chose to pack, they opted for the small cycle
penalty. Should your allocator be burdened to guess at and provide
additional padding in order to remove the small cycle penalty?

'You' is whomever is hypothetically padding out structures to satisfy
alignment concerns.

Ok.

Well, if one is using a platform that requires four-byte alignment to
access ints (for instance), but you know the algorithm only needs 24
bits of data, you /can/ pack it in to three bytes and save 25% of the
nominal storage requirements of the array.

If your allocator uses padding in the arena, where are the savings?

Except on x86, which does not demand strict alignment. I'm leaving
alignment constraints up to the application programmer to define. If
someone wants to shoot themselves in the foot on a platform that
requires strict alignment, who am I to argue against it? Bloody toes
might be part of the application spec., and would therefore represent
an administrative policy beyond the scope of the allocator proper.

Agreed. So why attempt to pad on their behalf within the allocator?

I doubt it. If you're careful not to do pointer arithmetic where you
might get a negative offset, life is good. While I would not
disparage the utility of negative pointer offsets in certain
situations, it isn't completely necessary. "ptr - unsigned int"
should do the right thing. So far, I have not come across a situation
where I would need negative pointer offsets, making the use of
ptrdiff_t a moot issue, IMO. YMMV.

I didn't mean to suggest any consideration for negative 'ptrdiff_t'
values. It's a signed integer type just as 'int' is a signed integer
type. But because a pointer to the top of your arena minus a pointer to
the bottom of it yields a 'ptrdiff_t', perhaps it could be enjoyable to
use 'ptrdiff_t' for the addition, just for the sake of symmetry.

Ben Bacarisse · May 23, 2011

Uncle Steve said:
But as some feel here, macrosity is a vanishing distinction.

There you go again. I won't say why not another time. Ian's also said
why the distinction is changing bit certainly not vanishing. Presumably
you don't accept the arguments, presumably because you never make a
mistake with your macros. Sorry, I just spotted "some". Who? You seem
to suggest you don't hold this view anymore.

I use
all-caps macro names for constant expressions and the like. If I
think a macro will be more readable in lower-case, I'll do just that.
Shouted macro definitions interrupt the flow of source text when a
human is reading it, and I don't think I need that kind of distraction
in the code as much as some do.

Ah. You code is for your eye only. Not only should it not be commented
on because you've edited for brevity, it should not be commented on for
readability or clarity because only you'll read it. I get it. Nothing
in you posted code should be commented on.

I'm not "dead against" function calls for simple things. I'm merely
predisposed against them.

It no longer matters. Your code is for your eyes only and it does not
have any of the errors that plague other people when they use macros.
Any evidence to the contrary here is due to your thought posting code
edited versions for brevity.

Sorry, I should say 'moot' in consideration of the now historic
performance advantage of macros over function calls. In reality,
function calls /are/ slower than macro instances but the compiler
masks this when it inlines a function by omitting function preamble
and jinking the basic blocks of code to flow together as if they were
part of the immediate text.

I am very happy to have the slowness of something "masked" by the
compiler making if fast. In fact, I'd go so far as to say that it's not
slow anymore but you that's just my way of looking at it.

<snip>
#define ARENA_TYPE struct foo

struct arena_s {
size_t arena_size;
int free_list;
size_t free;
ARENA_TYPE * arena;
};

And your access macro becomes:

#define arena_obj_addr(x, n) &x->arena[n]

Here the [n] is OK, but why risk it with x? The (x) costs nothing and,
I think, saves time reading the macro since the reader does no have to
start worrying about what might be passed.

No worries here. I've edited for brevity in posting, but I always
protect macro arguments from side-effects these days. It doesn't look
as clean and concise, which is why I omitted those statements from
example code. No biggie.

Click to expand...

OK. I'll try to remember to assume any errors in future code were added
for brevity and I'll save time by not commenting on it.

Click to expand...

Please do. I find it's the cut-and-paste-modify step that introduces
errors into program code more so than writing it in the first place.

In this newsgroup is it really preferable to see

#define arena_object_addr(x, n) ({ \
struct arena_s X = x; \
int N = n; \
*X->arena[N]; }) \

for every dumb example that is posted?

Click to expand...

The correct version of the macro is simpler than the one you seem to
think you'd write to make it safe. Why would you write the above (it's
wrong, so it's obviously not what you really use) instead of

*(x)->arena[n]

Click to expand...

That's only correct if the macro is a simple one-liner. If (x) is
used more than once and has side-effects, such as in (x++), you're
toast.

You posted an incorrect one-liner. You then said it's OK because you
really use 4 lines and an non-C construct to do these things. The seems
daft to me. If the code is short (and this code was short) just write
the short version correctly. Errors happen, so more lines are often
worse than fewer. I know, errors don't happen in your real code so you
can be as verbose as you like, but for other people reading this shorter
macros are almost always better.

I'm a creature of habit, I guess. Note that while I didn't learn C
under gcc, it's about the only compiler I've used in the last
seventeen years. Consequently, avoiding its extensions is not an
automatic process. I'm making an effort to avoid non-portable code,
but it is slow going.

gcc -std=c99 -pedantic if you want to avoid extensions. gcc -ansi
-pedantic is you want to avoid more non-portable features. I am sure
you know this, and using it in your real builds, but I comment in case
anyone else could use a reminder of how to make gcc limit itself to
standard C.

I look at the majority of the example code posted in Usenet messages
as being contrived for the purposes of demonstration.

Maybe, but it's real code in the sense that people want comments on it.

When I was a
total noob, comp.sources.unix and a few others were places where real
code was made available to Usenet readers. So called real code is no
longer posted to newsgroups. I'm not ecstatic that my contrived
examples are not perfect, but the alternative is to post my 'real
code' and annoy people who would prefer the exemplar pared down to the
bare minimum.

Maybe just prefix the posting with "This is not real code. If there
are mistakes they are probably not there in the original so don't spend
time on the details. Stick to the gist of it."

The arena allocator I've been on about is instantiated as a .h file
and is over 500 lines of code at this time.

Hmmm.... I wonder if that affects the performance. Maybe not.

It carries with it all
the 'baggage' of my development environment and would make the
discussion of its data structures and algorithm very awkward. A
reasonable compromise would be to ensure that posted code compiles
and works as expected, but even than that won't correct for all
possible errors: note that I buggered up the for() loop in the
arena_creat() function without triggering the error during testing.

I'll endeavor to post better code in subsequent messages, but it is
not always practical to test quick one-off constructs completely when
so many things compete for my (and your) time. I suppose I might
sound more intelligent and authoritative if my messages never
contained errors, but then I might never post at all if that were a
basic prerequisite for Usenet participation. Somewhere there is a
happy medium that will satisfy everyone here. I'll look more closely
for it.

Don't worry to much about it. No one else seems to mind, and I've got a
solution that works for me.

Uncle Steve · May 23, 2011

Heheheh. I was suggesting that it's more than "philosophically" and
more than "in spirit." I think you are implying that a user of your
allocator needn't be aware of the fact that the arena is an array, or
that there is an "arena" at all.

More like a willful suspension of disbelief. There are a few gotchas
with this arrangement, but the organization of the data behind the API
is mostly immaterial to any algorithm using it. realloc() can move
the array, so you have to account for that when using a pointer
calculated from arena_obj_pointer() macro/function, but that's about
it, modulo corner cases such as the situation that comes about with
co-processors or with DMA from external devices.

Well, if C implementations pad out structures this way automatically
that would be a big win, but I have no way of knowing ahead of time
whether this is true for any given C compiler.

Click to expand...

I fail to understand why any C implementation would not pad structures
in this way. Suppose you have an implementation that doesn't. Suppose
an 'int' has a size and alignment of 4 bytes. Suppose you have:

struct foo {
int i; /* 0 through 3 */
char ca[3]; /* 4 through 6 */
/* No padding */
};

struct foo bar[2];

It would seem that the C implementation would have the 'bar' array's
second element's 'i' member starting at offset 7, counting from 0 and
relative to the first byte of the array. That would be misaligned.
Adding one byte of padding would do the trick.

The question is whether the padding is added at the structure
definition or if it is accounted for when the bar[2] array is used.
No idea what the standard says about that. As gcc supplies a 'packed'
attribute to use, I can assume that it is safe in this regard.

Now I can understand if an implementation has a documented extension
that allows you to override such sensible padding and pack a structure
or specify your own padding, but use of such an extension would, in my
opinion, put the onus on the user to know what they're doing...

__attribute__(( packed )) applied to struct foo {...} ought to
misalign the structure elements properly.

If they really want the structure above to be 7 bytes and you allocate
the second slot in your arena for such a structure and they try to
access the 'i' member at offset 7 from the beginning of your arena and
their CPU is angry about it... Too bad for the user, in my opinion.
They need to re-think using the "packing" extension for their scenario.

Yes, which translates to reading and understanding the API
documentation. I realize that is a nearly unacceptable burden to put
on the user, but I see no reasonable substitute for informed literacy.

I don't believe that your allocator should need to be concerned about
padding, but I could be mistaken.

I'm taking the easy way out and leaving that decision to the
programmer. A sensible default aligns to machine-word boundaries, but
allows it to be overridden.

Ok. So if it works with a small cycle penalty, then the CPU won't be
angry about it and 'i' will be read at offset 7 (continuing my example
above) and your allocator still works without having been worried about
padding. When they chose to pack, they opted for the small cycle
penalty. Should your allocator be burdened to guess at and provide
additional padding in order to remove the small cycle penalty?

There are two choices: yes and no. The allocator will do either one,
so the question becomes: which would you choose? MIPS users may have
a different preference from those who are using x86, but then they
have to know what they're doing. The Linux kernel apparently traps
unaligned accesses and emulates them:

http://bit.ly/mwGxRV

PowerPC has issues with unaligned floating-point access as well
according to IBM Developer Works documentation:

http://ibm.co/6OdYj (I guess we know who runs bit.ly?)

That article goes into some detail on the issue. One thing that stood
out is that atomic operations fail on unaligned access for PowerPC (If
I am reading it correctly), which could cause huge problems if that
were brought about by careless use of the allocator.

I think it's safe to say that there is no substitute for reading and
understanding the documentation for your system.

If your allocator uses padding in the arena, where are the savings?

It doesn't /have/ to use padding. That's up to the programmer.
Therefore the savings in the example above are about 25% RAM, and
possibly reduced d-cache pollution.

Agreed. So why attempt to pad on their behalf within the allocator?

It comes down to having a sensible default. As alignment is an issue
for some platforms, and since it is impossible to know beforehand the
platform the allocator will run on, the safe way aligns the array
slots to minimize the artifacts of misalignment. Aligning to VM page
boundaries is about the only way I can think of to avoid all possible
unwanted interactions, but then we are off into ludicrous territory.

I didn't mean to suggest any consideration for negative 'ptrdiff_t'
values. It's a signed integer type just as 'int' is a signed integer
type. But because a pointer to the top of your arena minus a pointer to
the bottom of it yields a 'ptrdiff_t', perhaps it could be enjoyable to
use 'ptrdiff_t' for the addition, just for the sake of symmetry.

I'm not sure how that will save me from the risks inherent in the use
of ints, but I'll look into it.

Regards,

Uncle Steve

Seebs · May 23, 2011

I am very happy to have the slowness of something "masked" by the
compiler making if fast.

Beautiful phrasing.

I think this gets to the heart of why good programmers measure before they
optimize; our guesses as to what's fast and what's slow are often useless.
I spent a while trying to improve the performance of the computational
inner loop of a program, then thought to benchmark it a bit. It turns
out the "computational inner loop" was under 10% of the run time*.

.... oops.

-s
[*] Did you know that rendering antialiased lines on a 32-bit display is
actually more computationally expensive than calculating their end points?
Apparently, I didn't.

Uncle Steve · May 23, 2011

There you go again. I won't say why not another time. Ian's also said
why the distinction is changing bit certainly not vanishing. Presumably
you don't accept the arguments, presumably because you never make a
mistake with your macros. Sorry, I just spotted "some". Who? You seem
to suggest you don't hold this view anymore.

Look. There are coding idioms that I use habitually; macros figure
prominently in specific roles. That's my coding standard, FWIW, and
as I've no supervisor dictating some other arbitrary standard, I am
perfectly happy with my style more or less as-is. My brain is not
composed of granite, so I am free to change my coding standards at
any time if I see a need to do so.

Ah. You code is for your eye only. Not only should it not be commented
on because you've edited for brevity, it should not be commented on for
readability or clarity because only you'll read it. I get it. Nothing
in you posted code should be commented on.

See below.

I'm not "dead against" function calls for simple things. I'm merely
predisposed against them.

Click to expand...

It no longer matters. Your code is for your eyes only and it does not
have any of the errors that plague other people when they use macros.
Any evidence to the contrary here is due to your thought posting code
edited versions for brevity.
....

Sorry, I should say 'moot' in consideration of the now historic
performance advantage of macros over function calls. In reality,
function calls /are/ slower than macro instances but the compiler
masks this when it inlines a function by omitting function preamble
and jinking the basic blocks of code to flow together as if they were
part of the immediate text.

Click to expand...

I am very happy to have the slowness of something "masked" by the
compiler making if fast. In fact, I'd go so far as to say that it's not
slow anymore but you that's just my way of looking at it.
Right.

<snip>
#define ARENA_TYPE struct foo

struct arena_s {
size_t arena_size;
int free_list;
size_t free;
ARENA_TYPE * arena;
};

And your access macro becomes:

#define arena_obj_addr(x, n) &x->arena[n]

Here the [n] is OK, but why risk it with x? The (x) costs nothing and,
I think, saves time reading the macro since the reader does no have to
start worrying about what might be passed.

No worries here. I've edited for brevity in posting, but I always
protect macro arguments from side-effects these days. It doesn't look
as clean and concise, which is why I omitted those statements from
example code. No biggie.

OK. I'll try to remember to assume any errors in future code were added
for brevity and I'll save time by not commenting on it.

Click to expand...

Please do. I find it's the cut-and-paste-modify step that introduces
errors into program code more so than writing it in the first place.

In this newsgroup is it really preferable to see

#define arena_object_addr(x, n) ({ \
struct arena_s X = x; \
int N = n; \
*X->arena[N]; }) \

for every dumb example that is posted?

The correct version of the macro is simpler than the one you seem to
think you'd write to make it safe. Why would you write the above (it's
wrong, so it's obviously not what you really use) instead of

*(x)->arena[n]

Click to expand...

That's only correct if the macro is a simple one-liner. If (x) is
used more than once and has side-effects, such as in (x++), you're
toast.

Click to expand...

You posted an incorrect one-liner. You then said it's OK because you
really use 4 lines and an non-C construct to do these things. The seems
daft to me. If the code is short (and this code was short) just write
the short version correctly. Errors happen, so more lines are often
worse than fewer. I know, errors don't happen in your real code so you
can be as verbose as you like, but for other people reading this shorter
macros are almost always better.

I didn't say it was ok that I made an error; I merely explained the
cause. Then you jumped on me for using a gccism. I've already spent
about four hours in the last twenty-four writing messages to this
newsgroup. How slowly should I write to Usenet?

gcc -std=c99 -pedantic if you want to avoid extensions. gcc -ansi
-pedantic is you want to avoid more non-portable features. I am sure
you know this, and using it in your real builds, but I comment in case
anyone else could use a reminder of how to make gcc limit itself to
standard C.

Actually, I'm not using that for my 'real' builds since I'm hardly
building anything at all right now. Instead I am writing and
designing. Trust me, I have enough details up in the air right now.
I don't even care if my code is compilable right at this moment as
designing it properly is a higher priority. If I had learned standard
C previously instead of gcc C, I wouldn't have this problem. So at
some point when the code base is stable, /then/ (and only then) will
I go through it to remove gccisms in the older stuff.

Maybe, but it's real code in the sense that people want comments on it.

True enough, but that also implies that people are doing just as I am,
and shortening their code to make their queries and examples more
concise.

Maybe just prefix the posting with "This is not real code. If there
are mistakes they are probably not there in the original so don't spend
time on the details. Stick to the gist of it."

Hmmm.... I wonder if that affects the performance. Maybe not.

Quite a bit of it is documentation.

Don't worry to much about it. No one else seems to mind, and I've got a
solution that works for me.

Ok, I won't.

Regards,

Uncle Steve

Keith Thompson · May 23, 2011

Uncle Steve said:
The question is whether the padding is added at the structure
definition or if it is accounted for when the bar[2] array is used.
No idea what the standard says about that. As gcc supplies a 'packed'
attribute to use, I can assume that it is safe in this regard.

[...]

I would have thought so as well, but it turns out that it *isn't* safe.

Here's a test program I just wrote:

#include <stdio.h>
#include <stddef.h>
int main(void)
{
struct foo {
char c;
int x;
} __attribute__((packed));
struct foo arr[2] = { { 'a', 10 }, {'b', 20 } };
int *p0 = &arr[0].x;
int *p1 = &arr[1].x;
printf("sizeof(struct foo) = %d\n",
(int)sizeof(struct foo));
printf("offsetof(struct foo, c) = %d\n",
(int)offsetof(struct foo, c));
printf("offsetof(struct foo, x) = %d\n",
(int)offsetof(struct foo, x));
printf("arr[0].x = %d\n", arr[0].x);
printf("arr[1].x = %d\n", arr[1].x);
printf("p0 = %p\n", (void*)p0);
printf("p1 = %p\n", (void*)p1);
printf("*p0 = %d\n", *p0);
printf("*p1 = %d\n", *p1);
return 0;
}

On my Linux x86 system, with gcc 4.5.2, the output is:

sizeof(struct foo) = 5
offsetof(struct foo, c) = 0
offsetof(struct foo, x) = 1
arr[0].x = 10
arr[1].x = 20
p0 = 0xbfaa71ff
p1 = 0xbfaa7204
*p0 = 10
*p1 = 20

As you can see, p0 points to arr[0].x, which is at an odd address.
Since the x86 tolerates misaligned accesses, the program doesn't
misbehave.

Now here's the output of the same program on a Solaris SPARC system,
compiled with gcc 4.2.1:

sizeof(struct foo) = 5
offsetof(struct foo, c) = 0
offsetof(struct foo, x) = 1
arr[0].x = 10
arr[1].x = 20
p0 = ffbff327
p1 = ffbff32c
Bus error

If you refer to arr[0].x directly, the compiler knows that it's
a member of a packed structure, and can generate whatever extra
code is needed to access it. If you take its address, though,
you risk creating a misaligned pointer, and when you dereference
it the compiler has no way of knowing that it's misaligned. Kaboom.

(Oddly, I don't see a warning about this in the gcc documentation.)

Uncle Steve · May 23, 2011

The only real use for macros is passing __FILE__ and __LINE__ to debug logs!

No, not really.

Why not? Whether the function is called or in line, the conditional
compiles are still there.

A function is only inlined if it is defined in the same source file.
If you define a function in an external source file, it cannot be
inlined anywhere else unless the function is also defined in an
included header file. Doesn't everyone know this?

Inlining is a very simple optimisation. You also overlook the opposite

No it isn't.

optimisation - not inlining. There are situations where a smaller
executable performs better (or even fit the target device!). The

Whether or not the code is inlined is going to depend more on
whether you are interested in performance or code size.

Fitting code in some machines' L1 cache is harder with a lot of
gratuitous inlining, and you also pollute the l2 cache. Sometimes
this matters, and sometimes not. All highly dependent on the
specifics of the code and platform.

compiler should be able to judge where it is appropriate to inline.
Compilers targeted at embedded devices often have options to tune these
heuristics. Decent compilers for hosted environments offer profile
feedback to further tune optimisations.

In short, the more choice you give the compiler, the better job it can do.

Of course.

I guess working with C++ for many years has biased me the other way.
Aggressive inlining has long been a feature of C++ compilers. It used
to be a unique feature, but those days are long past.

I guess so. In the last twenty years there's been a lot of work on
compilers, that's for sure.

Regards,

Uncle Steve

Ian Collins · May 23, 2011

A function is only inlined if it is defined in the same source file.

Macros also have to be defined in the same compilation unit.

If you define a function in an external source file, it cannot be
inlined anywhere else unless the function is also defined in an
included header file. Doesn't everyone know this?

I'm sure they do, what's your point?

Uncle Steve · May 23, 2011

Macros also have to be defined in the same compilation unit.

I'm sure they do, what's your point?

That function inlining, far from being the unfettered panacea some
make it out to be, has inherent limitations owing to the way C
compilers are designed to take their input. If your software is
written as one monolithic source file, this may not be much of an
issue, but I doubt that is the case for any non-trivial program.

Given that reality, function-like macros can be included to inline
code where it will do the most good. If the programmer is lacking in
clue[tm] there's no telling what might happen, but smart people will
learn not to abuse the facility.

Regards,

Uncle Steve

Ian Collins · May 23, 2011

That function inlining, far from being the unfettered panacea some
make it out to be, has inherent limitations owing to the way C
compilers are designed to take their input.

Which it shares with macros, so again, what is your point?

The single compilation unit limitation will eventually be a thing of the
past as cross module optimisation become more common. For example
see http://gcc.gnu.org/wiki/LightweightIpo

If your software is
written as one monolithic source file, this may not be much of an
issue, but I doubt that is the case for any non-trivial program.

Given that reality, function-like macros can be included to inline
code where it will do the most good.

So can functions, so again, what is your point?

Uncle Steve · May 23, 2011

Which it shares with macros, so again, what is your point?

The single compilation unit limitation will eventually be a thing of the
past as cross module optimisation become more common. For example
see http://gcc.gnu.org/wiki/LightweightIpo

Filed under "what will they think of next".

So can functions, so again, what is your point?

To annoy with impertinent questions?

Regards,

Uncle Steve

Ian Collins · May 23, 2011

To annoy with impertinent questions?

Oh you'll have to try a lot harder than that to annoy me. Amuse would
be a better description thus far.

Uncle Steve · May 24, 2011

Oh you'll have to try a lot harder than that to annoy me. Amuse would
be a better description thus far.

I'll see what I can do about that, heh heh heh.

Regards,

Uncle Steve

robertwessel2 · May 24, 2011

Macros also have to be defined in the same compilation unit.

I'm sure they do, what's your point?

Compilers that support link time code generation manage to inline
functions defined in different translation units all the time.

Ian Collins · May 24, 2011

Compilers that support link time code generation manage to inline
functions defined in different translation units all the time.

I know, see one of my later posts.

James Kuyper · May 24, 2011

A function is only inlined if it is defined in the same source file.

Not quite - it only has to be in the same translation unit. A
translation unit consists of a given source file plus all of the other
files merged into it by #include statements. The use of the function
must also occur within the scope of the inline declaration, which
basically means that the declaration should occur prior to the use.
These requirements also apply to macros, though the scope rules are a
bit different for them.

If you define a function in an external source file, it cannot be
inlined anywhere else unless the function is also defined in an
included header file. Doesn't everyone know this?

If you replace your function-like macro, wherever it is that you have it
defined, with an inline function definition in that same location, that
function will be usable pretty much wherever the the macro was usable.
It's probably feasible to come up with pathological contexts where a
simple in-place replacement won't work, but such contexts are not the norm.

....

No it isn't.

Perhaps not, but it's one that most popular modern compilers do
routinely and very well; that's part of what makes them popular.

Whether or not the code is inlined is going to depend more on
whether you are interested in performance or code size.

Fitting code in some machines' L1 cache is harder with a lot of
gratuitous inlining, and you also pollute the l2 cache. Sometimes
this matters, and sometimes not. All highly dependent on the
specifics of the code and platform.

Any decent compiler can be relied upon to take those issues into
consideration when deciding whether or not to inline a function. This is
something it can decide, independently of whether or not the function is
declared inline. As far as actual inlining is concerned, the 'inline'
keyword is only a hint, which a compiler is free to ignore; and it's
perfectly legal for a compiler to decide to inline a function that is
not declared 'inline', as long as doing so doesn't change the observable
behavior. In fact, that was one of the arguments given against the
introducing of the 'inline' keyword. Any static function written to meet
the same special requirements that currently apply to 'inline' functions
could already have been inlined, even if not so declared, so long as the
compiler thought that doing so would be a good idea.

Unless you know a lot more about the target platform than your compiler
does, it's probably best to rely upon it to make inlining decisions.
Defining a function (whether or not you use 'inline') gives it that
option. Defining it as a function-like macro does not - it only allows
inlining. Well, technically, I suppose a sufficiently sophisticated
compiler could perform anti-inlining: recognizing a common code pattern,
and replacing it wherever used with a call to a compiler-generated
function definition. The fact that the common code was the result of a
macro expansion would make it easier to recognize the feasibility of
such an optimization. However, and it seems to me to be a harder
optimization to perform than inlining, and I doubt that it is a common
feature even of the most sophisticated compilers.

Malloc Query	8	Oct 15, 2008
a fast malloc/free implementation & benchmarks	0	Mar 20, 2011
Questions regarding specialized malloc()/free() replacements	3	Jan 4, 2009
memory managers and malloc/free	3	Aug 12, 2006
Improving memory consumption in the container library	6	Oct 9, 2009
Dealing with naive malloc() implementations	14	May 9, 2007
malloc and realloc problem	1	May 19, 2008
[Slightly OT] Memory management and custom allocator	64	Dec 31, 2011

Malloc Query

Uncle Steve

Ian Collins

luser- -droog

luser- -droog

Shao Miller

Ben Bacarisse

Uncle Steve

Seebs

Uncle Steve

Keith Thompson

Uncle Steve

Ian Collins

Uncle Steve

Ian Collins

Uncle Steve

Ian Collins

Uncle Steve

robertwessel2

Ian Collins

James Kuyper

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads