Defined and undefined C pointer manipulation

G

glen herrmannsfeldt

(snip, someone wrote)
(snip)
ISTM that most malloc implementations place their memory managment nodes
between the allocated memory spaces which would cause the alignment creep
you mention. It is also potentially fragile because 1) those spaces cannot
be protected, and 2) a pointer going just beyond where it should could lead
to corruption of a node.

One reason not to is the poor peformance of virtual memory systems.
I came up with an idea for a memory allocator which stores all of its
metadata elsewhere (though I am not looking at impementing that just now).
It is a little more complex and wouldn't have such good free() performance.
free() normally just has to offset the pointer it is passed in order to find
the node but if not stored relative to the start of the allocated memory
space the node would need to be found. Storing the node-type data elsewhere,
though, does buy you quite a bit: greater security, the alignments you want
and often faster scanning for malloc to find space.

I know some put at least some of the data elsewhere for better
virtual memory reasons. If you follow a linked list through the
beginning of each allocated block, you have to page in that block.

I know some do it differently, but I don't remember how much.
You could put a linked list somewhere else, and a pointer to a
link in the list before each block, to make free fast.
A hash table for free should also be pretty fast.
Don't forget that alignment creep can be a good thing if it ends up
offsetting cache lines that are used together so perfect alignment can
sometimes be a bad thing - less of a problem now with CPUs which have
many-way caches.

I suppose, but you could do that even without the data before
each block.

One that I have wondered about is web browsers and the data that
they keep for each tab or window. Many perform poorly, doing a lot
of paging, as their memory use gets big.

-- glen
 
G

glen herrmannsfeldt

(snip)
Often, the size of an object is known statically and does not have
to be stored in memory at all. On the Amiga, one can use operating
system functions like:

The OS/360 (and successor) GETMAIN/FREEMAIN allows for freeing
part of a previous allocation. You can allocate a big block,
then free the middle of it, leaving two smaller blocks.

There is also a call with a minimum and maximum, such that any
sized block in the range can be returned, along with its length.
Might be nice for a hash table, where bigger is better, but smaller
can also work. And also can reduce fragmentation.

-- glen
 
G

glen herrmannsfeldt

(snip)
For a byte pointer, setting the low-order bit to zero deosn't mean much;
it can still be misaligned. But if you're limiting yourself to word
pointers, then yes, you can do that.
(snip)

Parameterizing which bit you use for a flag could *in principle* let
your code work correctly on some implementations where it might fail if
you always use the low-order bit, at the expense of having to change a
one-line macro definition for some targets. I don't know enough about
your goals and requirements to tell you whether that's worthwhile.
Using a bit outside the pointer could let you write 100% portable C
code, but at the expense of allocating extra memory. The tradeoffs are
up to you.

The BLAST program for DNA and protein sequence searching does that.
It generates finite state automata in an array of pointers, but also
needs a bit to indicate when a match is made. On word addressed
machines, where pointer pointers have low zero bits, it uses the
low bit. On word addressed Cray machines, a high order bit.
(More specifically, it generates a DAWG.)

I believe the comments specifically indicate Cray.

In normal use, most characters are not matches, so the pointer is
used directly, for a very fast inner loop. In the case of a match,
it exits the loop and processes the hit.

(It was some years ago that I was working with BLAST source.
It could have changed by now.)

-- glen
 
B

BartC

Stefan Ram said:
Often, the size of an object is known statically and does not have
to be stored in memory at all. On the Amiga, one can use operating
system functions like:

if( m = AllocMem( 100, MEMF_ANY )){ use( m ); FreeMem( m, 100); }

, and as you can see, one has to pass »100« to »FreeMem« again.

That's exactly what I've used for many years. It works extremely well.

And you will know the size of a block more often that you might think. So if
you are allocating space for a struct, you will obviously know the size of
that struct. If you have a dynamic array, it's of little use unless you know
the bounds.

Only for things such as zero-terminated strings, where you do not store the
length, would you need to calculate it to free it.

Furthermore, if malloc() and free() already worked like that, then it would
be easier build size-retaining versions on top, than to do it the other way
around.

(I haven't considered the usefulness of block-size information when
malloc/free need to manage memory blocks. Last time I implemented something
like that and need to know this, I used a separate bit-map; for allocations
based on 16-byte chunks, the overhead would be only 0.8%. But for small
power-of-two allocations, it hasn't been necessary and fragmentation hasn't
been a problem.)
 
S

Stefan Ram

Malcolm McLean said:
Allocating fixed blocks is much more efficient than calling a general-purpose
allocator, both in space and speed terms, especially for small blocks.

One does not have to abandon the standard malloc. Instead, one can allocate
a large chunk for, say 100, fixed blocks at once using malloc and then manage
the internal memory of this chunk oneself.

One can save some of the effort to manage this with container libraries. For
example, C++ has an ::std::deque that can help to manage the block allocations
transparently. C++ will by itself allocate larger chunks to store the entries
of an ::std::deque. One just has to add a record of the free entries (that
can be reused before allocating new entries) to have full memory-management,
based on malloc but possibly more efficient than one malloc/free per entry.
 
I

Ian Collins

BartC said:
I tend to use my own allocator for small objects, because it's more
efficient than malloc. It uses large blocks obtained by malloc (but could
come direct from the OS). On one actual application, using malloc instead
made the program run at half the speed.

I find it's better to stick to the standard interface and use the best
memory allocator library (at least in the Unix world, there tend to be
several allocator libraries available on any one platform) for a
particular application. What works best in a single threaded
application may be hopeless in a multi-threaded one.
 
B

Ben Bacarisse

Barry Schwarz said:
You could also try to make the code self adjusting in case you change
compilers.

#if sizeof(void*) <= sizeof(unsigned char)
typedef unsigned char myintptr_t;
#elif sizeof(void*) <= sizeof(unsigned short)
typedef unsigned short myintptr_t;
#elif sizeof(void*) <= sizeof(unsigned int)
typedef unsigned int myintptr_t;
#elif sizeof(void*) <= sizeof(unsigned long)
typedef unsigned long myintptr_t;
#elif sizeof(void*) <= sizeof(unsigned long long)
typedef unsigned long long myintptr_t;
#else
#error No integer type large enough to hold pointer value
#endif

This won't work because, to put it rather informally, # lines are
processed before sizeof means anything. Sorry I can't offer an
alternative.
 
J

James Kuyper

That's an interesting point. Since C allows

The C standard never disallows anything - it just tell you what behavior
is mandatory, prohibited, or neither, when code containing certain
features is processed by a conforming implementation of C. That having
been said, 6.5.12 is a constraint section, so code which doesn't conform
to that "shall" is a constraint violation - which is as close as C ever
comes to disallowing something.
... (or at least implementations of
C allow) such operations on a pointer and, as you point out, the operands
are required by the standard to be of integer type does that imply that the
pointer gets 'converted' to an integer for the bitwise operation to take
place? In other words, does the compiler perform an implementation specific
conversion to an integer so that the bitwise OR can be done? And if it does
wouldn't it 'convert' it to a intptr_t? Maybe it's going too far to say
that would be safe.


Way too far. The standard imposes no requirements on how an
implementation deals with such code, except for the mandatory diagnostic.
I know what you mean - when programming in C we nearly always use malloc for
memory allocation - though it could be argued that malloc is not a C
function.

It's defined in the C standard, as being part of the C standard library.
It's an optional part, only hosted implementations of C need to support
it - but I'd have expected you to use different wording if that had been
the distinction you were making.
... AIUI malloc is not part of C at all but merely part of the
standard library, and that some other allocator could be used just as well
and the program which still be wholly C.

There are other allocators defined as being part of the C standard
library (several were added in C2011). Any of those could be used
without making the program any less of a C program. The same would not
be true of any allocator that is not defined by the C standard, and not
implemented using strictly conforming C code.
 
M

Malcolm McLean

I thought more along the lines that not every average
programer is capable of writing an implementation of
malloc that is better than the malloc provided by the
library in a /hosted/ implementation of C, so in a hosted
environment, it is usually common to use the malloc provided.
There are two common patterns for heap memory use.

The first is that calls to the allocator are matched by calls to the
deallocator, either in the same function, or in nested constructor/
destructor calls.
The second is that a persistent graph is created, consisting of a large
number of relatively small fixed-size nodes, linked by pointers.

In both these cases it's easy to use extremely efficient algorithms to
replace malloc(). Malloc() can then be kept in reserve for the few cases
that don't fit (e.g. an expandable buffer not in a leaf routine).
However it's not normally worth it, because whilst the performance gain
will be decent, it won't usally transform the program, and it involves
putting extra complexity and dependency into the code.
 
I

Ian Collins

Malcolm said:
There are two common patterns for heap memory use.

The first is that calls to the allocator are matched by calls to the
deallocator, either in the same function, or in nested constructor/
destructor calls.
The second is that a persistent graph is created, consisting of a large
number of relatively small fixed-size nodes, linked by pointers.

You forgot 3: allocation and de-allocation in different threads.
In both these cases it's easy to use extremely efficient algorithms to
replace malloc().

Really? If so, why to system designers spend so much effort
implementing (usually more than one) efficient memory allocators?
 
B

BartC

Malcolm McLean said:
There are two common patterns for heap memory use.

The first is that calls to the allocator are matched by calls to the
deallocator, either in the same function, or in nested constructor/
destructor calls.
The second is that a persistent graph is created, consisting of a large
number of relatively small fixed-size nodes, linked by pointers.

There are lots of patterns. For example, large numbers of blocks of the same
size, but not necessarily linked to each other. Or large numbers of blocks
of different sizes, which are never going to be freed (until the program
terminates). Or a combination of these. Or a collection of blocks which will
eventually be freed en masse. Then there are blocks that can grow in size,
and those that will be fixed (and won't need any over-allocation).

But they will still depend on some underlying allocator of large,
arbitrary-sized blocks.
 
M

Malcolm McLean

You forgot 3: allocation and de-allocation in different threads.
There are other patterns. There are two common patterns that are easy
to optimise. I gave an example of another, less easy to optimise pattern.
Really? If so, why to system designers spend so much effort
implementing (usually more than one) efficient memory allocators?
Because they're trying to keep the same interface as malloc(), rather
than doing a higher-level analysis of what patterns are actually going
to be needed first? Because they are doing a higher-level analysis, but
not doing it very well, because they are maybe less skilled, or less
highly qualified than I am? Because they know that very simple efficient
algorithms are available, but there's a case for using a very complex,
over-engineered one, and they get paid to do engineering, not as a
share of the venture's profits? Because they're writing for small specialised
systems which have rather different requirements to those running
general-purpose programs?

There are lots of possible explanations.

A stack allocator and a fixed block allocator are trivial to write, and will
cover maybe 80% of memory allocations if a typical program is written
to take advantage of them. I haven't actually done any statistics, but I
know from long experience of programming that its going to be something
like that. But it does add an extra burden to the programmer. The stack
allocator has to be very carefully called with matching allocates / frees,
and the fixed block allocators need setting up with the block size and the
expected total allocation. But you can get a significant increase in
performance. You can't alter the big O complexity of the program, however,
you're never going to totally transform it.
 
I

Ian Collins

Malcolm said:
Because they're trying to keep the same interface as malloc(), rather
than doing a higher-level analysis of what patterns are actually going
to be needed first? Because they are doing a higher-level analysis, but
not doing it very well, because they are maybe less skilled, or less
highly qualified than I am? Because they know that very simple efficient
algorithms are available, but there's a case for using a very complex,
over-engineered one, and they get paid to do engineering, not as a
share of the venture's profits? Because they're writing for small specialised
systems which have rather different requirements to those running
general-purpose programs?

Normally because they want to get the best performance from a range of
applications and hardware. The better their allocator behaves, the
better the synthetic benchmark results they can brag about.
A stack allocator and a fixed block allocator are trivial to write, and will
cover maybe 80% of memory allocations if a typical program is written
to take advantage of them.

So you end up reinventing a slab allocator, which any decent system will
already have.
 
M

Malcolm McLean

So you end up reinventing a slab allocator, which any decent system will
already have.
Implementing. I did actually realise that fixed blocks could be allocated very
easily and very quickly independently, but that was a long time ago, when I
first started programming. Many people have used them since, and I'm sure
I wasn't the first person to write one - I don't actually know if anyone lays
claim to the title of first inventor. In games, where performance is at a premium,
it's a standard technique.

You can use one already existing, or knock on up in five minutes, or take the one
from my book "Basic Algorithms". It hardly matters. The important point is that
a lot of persistent structures consist of large numbers of small blocks of equal size,
linked together with pincers to form a graph. In this common situation, you can
use a fixed block allocator. You might shave off an extra cycle or two by writing it
specially for the hardware, but basically you can't beat a single pointer deference
and write to allocate and to free.

However they are a bit fiddly. You have to set them up with the block size, and you
have to decide how you will set the maximum allocation limit, or if you'll make it soft.
So it's easier just to call malloc() for most purposes.
 
M

Malcolm McLean

There are lots of patterns. For example, large numbers of blocks of the same
size, but not necessarily linked to each other. Or large numbers of blocks
of different sizes, which are never going to be freed (until the program
terminates). Or a combination of these. Or a collection of blocks which will
eventually be freed en masse. Then there are blocks that can grow in size,
and those that will be fixed (and won't need any over-allocation).

But they will still depend on some underlying allocator of large,
arbitrary-sized blocks.
You're absolutely right. But look at any code you happen to have written. Almost
certainly you'll find that the calls to malloc() mostly look like this

void foo()
{
char *buff1 = malloc(N);
char *buff2 = mallo(M);

callsubroutines();
free(buff1);
free(buff2);
}

or like this

void foo()
{
OPAQUE *object = constructobject();

callsubroutines();

destroyobject();
}

So they can be replaced with a stack allocator, but you have to be careful. For example the free
calls in example one need to be reversed.

The other common situation is

foo()
{
NODE *root = rootnode;

while(complex_condition)
{
NODE *sub = findnode(root, complex_criterion);
addordelteanode(sub);
callsubroutines();
}

freealltheremainignnodes(root);
}

So you can use the fixed block allocator.

Now not absolutely everything can be reworked with a little effort
into these two patterns. As Ian said, if for some reason you need to
allocate memory in one thread and free it in another, you're not
going to be able to use a stack allocator, for example.

But most can.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,016
Latest member
TatianaCha

Latest Threads

Top