Malloc Query

Uncle Steve · May 20, 2011

You're right - I should have expressed that differently: if you have
enough spare time to write your allocator, you're going to receive a
serious lack of sympathy when you complain about not having enough time
to bother testing it. You are, of course, free to ignore the ridicule
you deserve for prioritizing your time that way.

The main reason I didn't time the thing previously is that the
allocator will usually be a small fraction of the total 'k' for any
given application. A binary tree using the allocator is going to
spend more time doing comparisons on an insert than the cost of
allocating the node. I've known this for a while.

Measuring the performance of a binary tree algorithm is another kettle
of fish altogether, and if I were making similar claims about one, I
think there'd be better grounds for criticism in that case.
Personally, I think that a demand for test data in a trivial case such
as this is a little harsh. If a programmer is making claims about
algorithm performance in a real-world application, that is another
thing altogether, an in that case you would be perfectly justified in
demanding test results. But on Usenet, and about something that is
available to inspection by competent individuals... Not so much.

What I said has nothing to do with the fact that this is a public
newsgroup. Even if you're just doing it for yourself, it's a serious
waste of your time (which you are, of course, completely free to decide
to waste) to go to the trouble of building a custom allocator without
bothering to test whether you gained any benefit from doing so.

I have no qualms about testing and verification, I just did not think
it was quite so important in this particular instance.

I don't see a distinction. I don't see how you could discuss code in any
meaningful sense without first reviewing it - otherwise, how would you
know what it was that you were discussing? Perhaps you thought I was
using the word "review" strictly in some more formal sense?

I thought you may have been implying association with a 'peer review'
process, which is necessarily much more formal. The discussion of a
topic would generally imply familiarity with its material,
necessitating /a/ review of it, but that says nothing about the degree
of analytical engagement. In a public newsgroup, I would anticipate a
more casual approach to the discussion of an algorithm or fragments of
same.

I have posted a very shaky test-harness (written in less than one
hour) that is exclusively intended to show the gross performance
improvement over the 'standard' malloc(), whatever that may be on your
platform. That is all, and I 'proved' the validity of my hypothesis,
so far as that goes. If we were discussing real computer science,
post latency would be on the order of days or weeks, not the minutes
or hours turnaround generally expected in a Usenet thread.

C is notorious for the ubiquity of pointers in contexts where other
languages would use integer indices.

Which is why I get the impression that some of you are a little
puzzled by my use of integer-multiplied offsets from the array base?
I would have thought that a more common idiom merely for the fact that
it is such a useful arrangement in circumstances where the data is of
a uniform size. C makes this easy. In another language, I might have
had to maintain the free list with a bitmap, or some other
contrivance. For most people this will be a non-issue because they
will use the language-supplied libraries for everything, and they'll
never have to worry about the implementation details of their binary
tree structures, or whatever.

C is different in that respect, and so naturally I've thought about
methods for reducing algorithm complexity and memory requirements. If
I use this algorithm consistently in my new programs I lose no coding
time in doing so, as opposed to trying to jam it in to an already
existing application that uses malloc. Even if it only improves a
real-world algorithm by five percent, I have probably lost nothing by
using it, and the educational value derived from writing and using it
is 100% free.

I've added comp.programming to the Newsgroups: line. I think this
discussion is not so much about C, the language as it is about C
programming technique or algorithm design.

Regards,

Uncle Steve

Uncle Steve · May 20, 2011

Uncle Steve said:
Uncle Steve said:

On Thu, May 19, 2011 at 03:22:18AM +0100, Ben Bacarisse wrote:
[snip]

Ok, here's a quick and dirty hack that measures a highly contrived
scenario. This code first allocates two arenas of different size, and
then ping-pongs the allocations between the two arenas for a defined
number of iterations. Then, I do more or less the same thing with
malloc.y

The preliminary results show that my special-purpose allocator is 2-3
times faster than glibc malloc. Not quite and order-of-magnitude
difference in performance (unless you think in base-2), but very
acceptable. In some programs there may be a larger difference in
performance becuase of reduceed memory fragmentation or reduced
overall memory use. I do not plan to isolate those factors at this
time to measure their effect.

Click to expand...

You don't include the set-up time for the areans. That's fair if the
pattern you expect to see is a very large number of allocations and a
lot of re-use. If, however, new areans are needed in a long running
program (because not all the memory is re-used) you'd have to include
this cost.

True enough, but it's not wrong to charge the set-up cost to program
initialization mainly because the arena allocator is mainly suitable
for those situations where there will be a lot of re-use. Malloc
may also have set-up costs as well that are hidden in program
initialization, but that will depend on the implementation.

It makes a huge difference with the numbers you've picked: it halves the
benefit on my system (3x vs 1.5x speed). I can reduce the effect of
course just by increasing the number of timed iterations. For very
small objects, with lots of optimisation and excluding the start-up
costs I can get just over your order of magnitude speedup. That's a
rather contrived scenario though. In particular, it is likely that the
allocation code will not be able to be inlined (as I think it can be
here).

Whether or not the code is inlined is going to depend more on whether
you are interested in performance or code size. Of course it is
trivial to force the allocation routine inline -- just make it a
macro. Whether that makes any sort of difference at all is highly
application specific, and is bound to be of less importance than the
difference in algorithm design.

Thank you for providing something that can give real data, albeit for
rather specific situations. FWIW I am on using an Intel(R) Core(TM)2
Duo CPU P8700 @ 2.53GHz.

I'm mildly curious as to the numbers you got, would you mind posting
them?

If I have time (looks unlikely) I'll plug it into a small binary tree
program to see what happens when there is a little work being done too.

Coincidentally, a binary tree algorithm is just the application that I
envisioned as suitable for use with this allocator. At some point I
probably will be measuring its performance, so I'll have a better idea
of whats going on in that case when I get to it.

Regards,

Uncle Steve

Ben Bacarisse · May 20, 2011

I'm mildly curious as to the numbers you got, would you mind posting
them?

Not at all. Exactly as per your compile but I added more iterations
just in case I was not getting good times:

Arena: Iterations: 1000000; elapsed CPU time (msec): 36109
Malloc: Iterations: 1000000; elapsed CPU time (msec): 107952

If I add in the set-up costs:

Arena: Iterations: 1000000; elapsed CPU time (msec): 70330
Malloc: Iterations: 1000000; elapsed CPU time (msec): 107935

Base case again but with maximum optimisation turned on (-O3):

Arena: Iterations: 1000000; elapsed CPU time (msec): 12303
Malloc: Iterations: 1000000; elapsed CPU time (msec): 104033

and the same but, again, with the set-up time included:

Arena: Iterations: 1000000; elapsed CPU time (msec): 39187
Malloc: Iterations: 1000000; elapsed CPU time (msec): 102874

(gcc version 4.4.3)

<snip>

Uncle Steve · May 21, 2011

Not at all. Exactly as per your compile but I added more iterations
just in case I was not getting good times:

Arena: Iterations: 1000000; elapsed CPU time (msec): 36109
Malloc: Iterations: 1000000; elapsed CPU time (msec): 107952

If I add in the set-up costs:

Arena: Iterations: 1000000; elapsed CPU time (msec): 70330
Malloc: Iterations: 1000000; elapsed CPU time (msec): 107935

Base case again but with maximum optimisation turned on (-O3):

Arena: Iterations: 1000000; elapsed CPU time (msec): 12303
Malloc: Iterations: 1000000; elapsed CPU time (msec): 104033

and the same but, again, with the set-up time included:

Arena: Iterations: 1000000; elapsed CPU time (msec): 39187
Malloc: Iterations: 1000000; elapsed CPU time (msec): 102874

Funny. Your numbers are almost exactly 10x better than mine on the
netbook. But at -O3, I get:

Arena: Iterations: 100000; elapsed CPU time (msec): 28540
Malloc: Iterations: 100000; elapsed CPU time (msec): 97913

Which is 2.5 (*10) poorer than your results. I don't know what that
actually signifies, but it's kind of interesting.

Regards,

Uncle Steve

Ian Collins · May 21, 2011

On 05/20/11 10:34 AM, Uncle Steve wrote:

A couple of comments on the code:

arena_head * arena_creat(size_t s, int n)
{
arena_head * h;
int i;

h = malloc(sizeof(arena_head));
if(h == NULL) {
perror("malloc()");
exit(1);
}

h->obj_size = s;
h->arena_size = n;
h->free = n;

h->arena = malloc(s * n);
if(h->arena == NULL) {
perror("malloc()");
exit(1);
}

h->free_list = 0;

for(i = 0; i< n; i++)
*((int *) arena_obj_addr(h, i)) = i + 1;

*((int *) arena_obj_addr(h, i)) = -1;

This writes past the end of the allocated block.

return(h);
}

typedef unsigned long long int T64;

#define processcputime(p) \

Why go the the trouble of declaring this as a messy macro?

Uncle Steve · May 21, 2011

On 05/20/11 10:34 AM, Uncle Steve wrote:

A couple of comments on the code:

This writes past the end of the allocated block.

Ha, you're right. The original code this is based on doesn't have
that problem. The for(;

comparison should be i < (n - 1). Since I
wrote it in less than an hour, there ought to be several good bugs in
there.

Why go the the trouble of declaring this as a messy macro?

I'm not all that interested in nanosecond resolution for timing
things, so I scaled the resolution down to microseconds and turned it
into a 64-bit quantity so arithmetic calculations would be more
convenient. I suppose it could be a function call, but inline it
probably generates ten or so instructions. Insulating the actual
system function call also makes porting a little easier.

Regards,

Uncle Steve

Ian Collins · May 21, 2011

I'm not all that interested in nanosecond resolution for timing
things, so I scaled the resolution down to microseconds and turned it
into a 64-bit quantity so arithmetic calculations would be more
convenient. I suppose it could be a function call, but inline it
probably generates ten or so instructions.

No more or less than an inlined function call.

Insulating the actual
system function call also makes porting a little easier.

But doing so in a macro makes it harder!

Uncle Steve · May 22, 2011

No more or less than an inlined function call.

Right, so what's the problem?

But doing so in a macro makes it harder!

Not really. Maybe what I don't get is what you mean by 'harder'.
Is 'harder' some ineffable quality that in a more robust age would
serve to differentiate the men from the boys? I like to think so, but
I don't think it's a popular view.

Regards,

Uncle Steve

Ian Collins · May 22, 2011

Right, so what's the problem?

Using a macro where a function would do.

Not really. Maybe what I don't get is what you mean by 'harder'.
Is 'harder' some ineffable quality that in a more robust age would
serve to differentiate the men from the boys? I like to think so, but
I don't think it's a popular view.

Well there's more to type (all those pesky line continuation characters
to line up), no way to step in if you have a bug you can't easily
diagnose and stylistic limits imposed by a macro. The function would be
more concise if it could return its result.

luser- -droog · May 22, 2011

You edited in some bugs too:

Click to expand...

| #define arena_obj_addr(x, n) ((void *) *(&p->arena[n * x->obj_size]))

Click to expand...

'p' can't be right here. It must be 'x', yes? The result of the
expression is a char converted to void *. That can't be right either..
presumably the '*' in front of the '(&' should not be there. (And I'd
add parentheses round all the macro arguments in the body.)

Click to expand...

You're correct. The original function definitions have extra
convenience macros, and the macro variables are protected from
side-effects, etc. Of course it should be:

| #define arena_obj_addr(x, n) ((void *) *(&x->arena[n * x->obj_size]))

I don't thinks it's a common idiom at all and I've see a lot of C over
the years. The last time I saw it was in Gregory Chaitin's tiny Lisp
interpreter that he's used in his books and courses. It's not exactly
idiomatic C, but then he has bigger fish to fry.

Click to expand...

I take it you're refering to hiding the integer in the body of the
arena object for the free list via a slightly clever cast? One of the
strengths of C, although it is a weakness as well since the conversion
of pointer types introduces complexities that can easily trip you up.

I've been using integer offsets instead of pointers a lot lately and
I think you might be better off using the array formula explicitly:

#define arena_obj_addr(x, n) ((void *) (x->arena + n * x->obj_size))

It's harder to get wrong.

Uncle Steve · May 22, 2011

Using a macro where a function would do.

IMO, this is a stylistic preference and has no meaningful impact on
code quality.

Well there's more to type (all those pesky line continuation characters
to line up), no way to step in if you have a bug you can't easily
diagnose and stylistic limits imposed by a macro. The function would be
more concise if it could return its result.

One simply writes a replacement macro if the target platform uses
something other than clock_gettime(). If you have to use clock() or
gettime() because your system is older, changes are straightforward,
and it doesn't matter much whether you're adjusting a macro or
function.

I agree that line-continuation is a hassle to maintain, but then so is
everything. If by 'harder' you mean more robust, I fail to see how.

Regards,

Uncle Steve

Uncle Steve · May 22, 2011

I've been using integer offsets instead of pointers a lot lately and
I think you might be better off using the array formula explicitly:

#define arena_obj_addr(x, n) ((void *) (x->arena + n * x->obj_size))

It's harder to get wrong.

Still, you're at the mercy of transcription errors. Philosophically,
the arena is an array even though the program using it does not care.
But it would be better still to define the x->arena structure element
with the proper type of object so accesses would be type constrained
by the compiler and you would therefore get a warning if you tried to
assign the address to the wrong pointer. (The following is modified
more from the real version, and less from the quick hack I posted.)

So,

#define ARENA_TYPE struct foo

struct arena_s {
size_t arena_size;
int free_list;
size_t free;
ARENA_TYPE * arena;
};

And your access macro becomes:

#define arena_obj_addr(x, n) &x->arena[n]

since the pointer will be of the expected type.

Even simpler. But if your ARENA_TYPE is an odd number of bytes, this
can create alignment issues on some platforms. On x86 there is merely
a small penalty presumably because the processor has to access two
32-bit words to resolve a structure element that crosses alignment
boundaries. I've read that some architectures trap with a bus error
when you do this, so the easy solution is unworkable in practice
unless you carefully pad out the structure to satisfy alignment
requirements. All I've done here is move the alignment calculations
into the arena allocator logic, looking something like this:

op->obj_pad_size = (op->obj_size % pad_factor) ?
(op->obj_size + pad_factor - (op->obj_size % pad_factor)) :
op->obj_size;

And so we go back to something much more like what is in the
previously posted benchmarking code:

struct arena_s {
size_t obj_size; // Size in bytes of the object
size_t obj_pad_size; // Oobject size inclusive of padding
size_t arena_size; // Number of obj_pad_size slots in pool
int free_list; // Pointer to head of free list
int free; // Number of slots remaining
unsigned char *
arena; //
};

typedef struct arena_s arena;

I do in fact have a function to calculate arena_obj_addr(), and it
necessarily contains

&p->arena[n * p->obj_pad_size]

which is of course necessary since we can't calculate obj_pad_size
before we know what the target object is and the run-time platform
constraints. No real real reason it couldn't be "&p->arena + n *
p->obj_pad_size", but I didn't do it that way. I guess the pointer
conversions don't cause my brain to lock-up the way they used to.

Regards,

Uncle Steve

Shao Miller · May 22, 2011

Still, you're at the mercy of transcription errors. Philosophically,
the arena is an array even though the program using it does not care.

I agree, but would also like to point out that it _is_ an array object.
Your 'arena' member points to its first element, if I understand your
code correctly.

But it would be better still to define the x->arena structure element
with the proper type of object so accesses would be type constrained
by the compiler and you would therefore get a warning if you tried to
assign the address to the wrong pointer. (The following is modified
more from the real version, and less from the quick hack I posted.)

Heheh. Yes but that departs further from 'malloc()', which returns a
'void *'. It seems that you are suggesting that "better," in this case,
would be an allocator for every type.

So,

#define ARENA_TYPE struct foo

struct arena_s {
size_t arena_size;
int free_list;
size_t free;
ARENA_TYPE * arena;
};

And your access macro becomes:

#define arena_obj_addr(x, n)&x->arena[n]

since the pointer will be of the expected type.

Even simpler. But if your ARENA_TYPE is an odd number of bytes, this
can create alignment issues on some platforms.

If 'ARENA_TYPE' is a structure, oughtn't the implementation to pad
_every_ struct such that it can be the element type of an array? I
really think so, so I don't understand your concern, here. My
impression is that the 'sizeof' any type must be an integer multiple of
the alignment requirement, to allow for arrays... Even unions ought to
contain such trailing padding, when needed.

On x86 there is merely
a small penalty presumably because the processor has to access two
32-bit words to resolve a structure element that crosses alignment
boundaries. I've read that some architectures trap with a bus error
when you do this, so the easy solution is unworkable in practice
unless you carefully pad out the structure to satisfy alignment
requirements.

Who does "you" refer to in this last sentence? Isn't it the
implementation's job to worry about padding?

All I've done here is move the alignment calculations
into the arena allocator logic, looking something like this:

op->obj_pad_size = (op->obj_size % pad_factor) ?
(op->obj_size + pad_factor - (op->obj_size % pad_factor)) :
op->obj_size;

And so we go back to something much more like what is in the
previously posted benchmarking code:

struct arena_s {
size_t obj_size; // Size in bytes of the object
size_t obj_pad_size; // Oobject size inclusive of padding
size_t arena_size; // Number of obj_pad_size slots in pool
int free_list; // Pointer to head of free list
int free; // Number of slots remaining
unsigned char *
arena; //
};

If you are using 'malloc()' to allocate the arena, and 'malloc()'
returns a pointer to memory suitably aligned for _any_ object, then that
would seem to include an _array_ object, so again, I don't understand
the padding concerns.

If someone tries to use your allocator with a size of "3", they had
better not be lying and requiring an alignment of "4", of which "3" is
not an integer multiple.

typedef struct arena_s arena;

I do in fact have a function to calculate arena_obj_addr(), and it
necessarily contains

&p->arena[n * p->obj_pad_size]

Or, equivalently:

p->arena + n * p->obj_pad_size

which is of course necessary since we can't calculate obj_pad_size
before we know what the target object is and the run-time platform
constraints. No real real reason it couldn't be "&p->arena + n *
p->obj_pad_size", but I didn't do it that way.

(Without the '&', else your element size is 'sizeof (ARENA_TYPE *)')

I guess the pointer
conversions don't cause my brain to lock-up the way they used to.

Using an 'int' and macro/function? Seems reasonable to me, though
perhaps a 'ptrdiff_t' could be pleasant, too.

Shao Miller · May 22, 2011

(Without the '&', else your element size is 'sizeof (ARENA_TYPE *)')

Sorry, I meant 'sizeof (unsigned char *)' there.

Uncle Steve · May 22, 2011

I agree, but would also like to point out that it _is_ an array object.
Your 'arena' member points to its first element, if I understand your
code correctly.

True in spirit, but the 'array' aspect is an implementation detail.

Heheh. Yes but that departs further from 'malloc()', which returns a
'void *'. It seems that you are suggesting that "better," in this case,
would be an allocator for every type.

Can do with some macro magic, although there is room for debate as to
the value of forcing type constraints when your program logic already
carries with it the assumption that it will be working with objects
of a certain size as specified when you build the arena.

So,

#define ARENA_TYPE struct foo

struct arena_s {
size_t arena_size;
int free_list;
size_t free;
ARENA_TYPE * arena;
};

And your access macro becomes:

#define arena_obj_addr(x, n)&x->arena[n]

since the pointer will be of the expected type.

Even simpler. But if your ARENA_TYPE is an odd number of bytes, this
can create alignment issues on some platforms.

Click to expand...

If 'ARENA_TYPE' is a structure, oughtn't the implementation to pad
_every_ struct such that it can be the element type of an array? I
really think so, so I don't understand your concern, here. My
impression is that the 'sizeof' any type must be an integer multiple of
the alignment requirement, to allow for arrays... Even unions ought to
contain such trailing padding, when needed.

Well, if C implementations pad out structures this way automatically
that would be a big win, but I have no way of knowing ahead of time
whether this is true for any given C compiler. And for platform
like x86 where unaligned access works with a small cycle penalty you
can choose to pack your nine-byte structures so they don't require an
extra three bytes of alignment. That saves twenty percent or
thereabouts for each array slot, which adds up bigtime if you have
thousands or millions of data elements.

Who does "you" refer to in this last sentence? Isn't it the
implementation's job to worry about padding?

'You' is whomever is hypothetically padding out structures to satisfy
alignment concerns.

If you are using 'malloc()' to allocate the arena, and 'malloc()'
returns a pointer to memory suitably aligned for _any_ object, then that
would seem to include an _array_ object, so again, I don't understand
the padding concerns.

Well, if one is using a platform that requires four-byte alignment to
access ints (for instance), but you know the algorithm only needs 24
bits of data, you /can/ pack it in to three bytes and save 25% of the
nominal storage requirements of the array.

If someone tries to use your allocator with a size of "3", they had
better not be lying and requiring an alignment of "4", of which "3" is
not an integer multiple.

Except on x86, which does not demand strict alignment. I'm leaving
alignment constraints up to the application programmer to define. If
someone wants to shoot themselves in the foot on a platform that
requires strict alignment, who am I to argue against it? Bloody toes
might be part of the application spec., and would therefore represent
an administrative policy beyond the scope of the allocator proper.

typedef struct arena_s arena;
w

I do in fact have a function to calculate arena_obj_addr(), and it
necessarily contains

&p->arena[n * p->obj_pad_size]

Click to expand...

Or, equivalently:

p->arena + n * p->obj_pad_size

which is of course necessary since we can't calculate obj_pad_size
before we know what the target object is and the run-time platform
constraints. No real real reason it couldn't be "&p->arena + n *
p->obj_pad_size", but I didn't do it that way.

Click to expand...

(Without the '&', else your element size is 'sizeof (ARENA_TYPE *)')

Yes, sorry. The ampersand is superfluous in that usage.

Using an 'int' and macro/function? Seems reasonable to me, though
perhaps a 'ptrdiff_t' could be pleasant, too.

I doubt it. If you're careful not to do pointer arithmetic where you
might get a negative offset, life is good. While I would not
disparage the utility of negative pointer offsets in certain
situations, it isn't completely necessary. "ptr - unsigned int"
should do the right thing. So far, I have not come across a situation
where I would need negative pointer offsets, making the use of
ptrdiff_t a moot issue, IMO. YMMV.

Regards,

Uncle Steve

Ben Bacarisse · May 22, 2011

Everyone is taking chances here. The x and n should be in parentheses
so that the macro expands as expected. This is one reason why functions
are usually preferred.

#define ARENA_TYPE struct foo

struct arena_s {
size_t arena_size;
int free_list;
size_t free;
ARENA_TYPE * arena;
};

And your access macro becomes:

#define arena_obj_addr(x, n) &x->arena[n]

Here the [n] is OK, but why risk it with x? The (x) costs nothing and,
I think, saves time reading the macro since the reader does no have to
start worrying about what might be passed.

since the pointer will be of the expected type.

<snip>

Uncle Steve · May 23, 2011

Everyone is taking chances here. The x and n should be in parentheses
so that the macro expands as expected. This is one reason why functions
are usually preferred.

For a long time I've thought macros were preferable to functions for
the reason that a function call has to save registers, stuff it's arguments on the
stack, and so on. Macros should just fold into the program text, but
today inline functions are just as fast as macros. It has become a
matter of preference. gcc even has the ability to use register
calling conventions for function calls in some circumstances. The
distinction between a macro and a function call is becoming moot.

#define ARENA_TYPE struct foo
struct arena_s {
size_t arena_size;
int free_list;
size_t free;
ARENA_TYPE * arena;
};

And your access macro becomes:

#define arena_obj_addr(x, n) &x->arena[n]

Click to expand...

Here the [n] is OK, but why risk it with x? The (x) costs nothing and,
I think, saves time reading the macro since the reader does no have to
start worrying about what might be passed.

No worries here. I've edited for brevity in posting, but I always
protect macro arguments from side-effects these days. It doesn't look
as clean and concise, which is why I omitted those statements from
example code. No biggie.

In this newsgroup is it really preferable to see

#define arena_object_addr(x, n) ({ \
struct arena_s X = x; \
int N = n; \
*X->arena[N]; }) \

for every dumb example that is posted? Possibly not if people can be
reminded that all code posted in comp.lang.c is generally intended as
an example.

Regards,

Uncle Steve

Ian Collins · May 23, 2011

For a long time I've thought macros were preferable to functions for
the reason that a function call has to save registers, stuff it's arguments on the
stack, and so on. Macros should just fold into the program text, but
today inline functions are just as fast as macros. It has become a
matter of preference. gcc even has the ability to use register
calling conventions for function calls in some circumstances. The
distinction between a macro and a function call is becoming moot.

No it isn't. All the disadvantages of function like macros still exist,
but the single (possible) disadvantage of function calls has gone.

Ben Bacarisse · May 23, 2011

I should also have said especially if the macro does not have an
ALL_CAPS name flagging it's macrosity.

For a long time I've thought macros were preferable to functions for
the reason that a function call has to save registers, stuff it's arguments on the
stack, and so on. Macros should just fold into the program text,

They should, but the onus is on the programmer to make that reasonable
(and where there are side effects in the expressions used in the macro
"call" this may not be possible).

but
today inline functions are just as fast as macros.

Yes, using an inline function is better yet. You seemed dead against
them which is why I commented on the macro. Use a function and the
issue disappears.

It has become a
matter of preference. gcc even has the ability to use register
calling conventions for function calls in some circumstances. The
distinction between a macro and a function call is becoming moot.

I don't see how you can say that. The speed advantage that used to be
one of the main reasons to take the risk with macros has gone, but all
the potential bugs that function-like macros can hide are still there.

#define ARENA_TYPE struct foo
struct arena_s {
size_t arena_size;
int free_list;
size_t free;
ARENA_TYPE * arena;
};

And your access macro becomes:

#define arena_obj_addr(x, n) &x->arena[n]

Click to expand...

Here the [n] is OK, but why risk it with x? The (x) costs nothing and,
I think, saves time reading the macro since the reader does no have to
start worrying about what might be passed.

Click to expand...

No worries here. I've edited for brevity in posting, but I always
protect macro arguments from side-effects these days. It doesn't look
as clean and concise, which is why I omitted those statements from
example code. No biggie.

OK. I'll try to remember to assume any errors in future code were added
for brevity and I'll save time by not commenting on it.

In this newsgroup is it really preferable to see

#define arena_object_addr(x, n) ({ \
struct arena_s X = x; \
int N = n; \
*X->arena[N]; }) \

for every dumb example that is posted?

The correct version of the macro is simpler than the one you seem to
think you'd write to make it safe. Why would you write the above (it's
wrong, so it's obviously not what you really use) instead of

*(x)->arena[n]

? Also, why would you use a non-standard feature when it's not needed? There
might be a reason to use ({...}) when a macro parameter is used more
than once that's not the case here. I suppose if you know you'll never
use another compiler, tying code to language extensions is not a problem.

Possibly not if people can be
reminded that all code posted in comp.lang.c is generally intended as
an example.

One of the bug-bears of Usenet is "oh that's not the real code". This
translates to "you've just wasted your time". I'd vote for real code
(at the last code that's gone though a compiler) every time.

Uncle Steve · May 23, 2011

No it isn't. All the disadvantages of function like macros still exist,
but the single (possible) disadvantage of function calls has gone.

Not so fast, Mr. Collins. Macros are useful in numerous situations
where function calls would be inefficient. For my purposes, it is
sometimes nice to write a macro which uses conditionals to select a
particular code path through the macro. If the conditional can be
resolved at compile-time, the dead code branches in the macro are
pruned, saving space. A similarly structured function call won't be
optimized in this fashion unless (maybe) it is inlined.

Not enough? Is it smart to rely on the compiler to optimize the crap
out of everything, or is it better to write efficient code at the
outset? There are a number of C compilers out there that are not gcc,
and which may or may not generate equivalent code for otherwise
similar macros/functions. I'm O.K. with peppering my code with
well-designed macros and ignoring functions for simple common
procedures, but then I'm biased that way from habit.

Regards,

Uncle Steve

Malloc Query	8	Oct 15, 2008
a fast malloc/free implementation & benchmarks	0	Mar 20, 2011
Questions regarding specialized malloc()/free() replacements	3	Jan 4, 2009
memory managers and malloc/free	3	Aug 12, 2006
Improving memory consumption in the container library	6	Oct 9, 2009
Dealing with naive malloc() implementations	14	May 9, 2007
malloc and realloc problem	1	May 19, 2008
[Slightly OT] Memory management and custom allocator	64	Dec 31, 2011

Malloc Query

Uncle Steve

Uncle Steve

Ben Bacarisse

Uncle Steve

Ian Collins

Uncle Steve

Ian Collins

Uncle Steve

Ian Collins

luser- -droog

Uncle Steve

Uncle Steve

Shao Miller

Shao Miller

Uncle Steve

Ben Bacarisse

Uncle Steve

Ian Collins

Ben Bacarisse

Uncle Steve

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads