malloc and alignment

F

Francis Moreau

Hello,

I was wondering what was the alignment rule when allocating memories
by using malloc, and the spec says that the alignment should be so
that the pointer returned can be assigned to a pointer to any type of
objects.

So I thought that 8 bytes of alignment should be enough for the
purpose on a 64 bits cpu, but the following simple test program makes
me wondering if it's really true:

#include <stdlib.h>
#include <stdio.h>

int main(void)
{
int i;

for (i = 0; i < 18; i++)
printf("%02d: %p\n", i, malloc(1));
return 0;
}

And the ouput is:

00: 0xca7010
01: 0xca7030
02: 0xca7050
03: 0xca7070
04: 0xca7090
05: 0xca70b0
06: 0xca70d0
07: 0xca70f0
08: 0xca7110
09: 0xca7130
10: 0xca7150
11: 0xca7170
12: 0xca7190
13: 0xca71b0
14: 0xca71d0
15: 0xca71f0
16: 0xca7210
17: 0xca7230

So every returned pointers are 32 bytes aligned which looks like a
very waste of memory for small malloced memory chunk.

Could anybody enlight me ?

Thanks
 
N

Nate Eldredge

Francis Moreau said:
Hello,

I was wondering what was the alignment rule when allocating memories
by using malloc, and the spec says that the alignment should be so
that the pointer returned can be assigned to a pointer to any type of
objects.

So I thought that 8 bytes of alignment should be enough for the
purpose on a 64 bits cpu, but the following simple test program makes
me wondering if it's really true:

#include <stdlib.h>
#include <stdio.h>

int main(void)
{
int i;

for (i = 0; i < 18; i++)
printf("%02d: %p\n", i, malloc(1));
return 0;
}

And the ouput is:

00: 0xca7010
01: 0xca7030
02: 0xca7050
03: 0xca7070
04: 0xca7090
05: 0xca70b0
06: 0xca70d0
07: 0xca70f0
08: 0xca7110
09: 0xca7130
10: 0xca7150
11: 0xca7170
12: 0xca7190
13: 0xca71b0
14: 0xca71d0
15: 0xca71f0
16: 0xca7210
17: 0xca7230

So every returned pointers are 32 bytes aligned which looks like a
very waste of memory for small malloced memory chunk.

Note those are 16 bytes aligned, not 32. There are 32 bytes in between them.

There are many reasons why the implementors might have chosen to do it
this way. Here's one possible scenario. It may be that

- There are some situations where 16-byte alignment is better than
8-byte alignment. For example, on amd64, although 8-byte alignment is
sufficient for operations involving 64-bit integers, which are the
most common, there is a set of 128-bit vector operations (called SSE)
which require their operands to be aligned on a 16-byte boundary.

- malloc stores some additional data with each chunk. It's easy to
imagine there might be 16 bytes of this: a 64-bit integer containing
the size of the chunk, and a 64-bit pointer to the next chunk.

If that were the case, then an allocation of one byte would actually
require 17 bytes, and 16-byte alignment would force you to round that up
to 32 bytes.

Given this sort of overhead, it is usually a good idea to avoid
allocating a huge number of tiny chunks if you can.
 
F

Francis Moreau

Nate Eldredge said:
Note those are 16 bytes aligned, not 32. There are 32 bytes in between them.
True.


There are many reasons why the implementors might have chosen to do it
this way. Here's one possible scenario. It may be that

- There are some situations where 16-byte alignment is better than
8-byte alignment. For example, on amd64, although 8-byte alignment is
sufficient for operations involving 64-bit integers, which are the
most common, there is a set of 128-bit vector operations (called SSE)
which require their operands to be aligned on a 16-byte boundary.

But are these operands part of the C objects ?

BTW, what is the type of C object that needs the biggest alignment ?
- malloc stores some additional data with each chunk. It's easy to
imagine there might be 16 bytes of this: a 64-bit integer containing
the size of the chunk, and a 64-bit pointer to the next chunk.
Agreed.



If that were the case, then an allocation of one byte would actually
require 17 bytes, and 16-byte alignment would force you to round that up
to 32 bytes.

Are there any ways to get the actual memory size used when mallocing ?
Given this sort of overhead, it is usually a good idea to avoid
allocating a huge number of tiny chunks if you can.

Not that tiny: if you allocate (with malloc) 1000 objects of 16 bytes
you actually use 32K...
 
I

Ian Collins

Francis said:
But are these operands part of the C objects ?

That doesn't really matter, there is no restriction on the maximum
alignment.
BTW, what is the type of C object that needs the biggest alignment ?

On some but not all systems, long double.
Are there any ways to get the actual memory size used when mallocing ?

Not unless your implementation provides one.
Not that tiny: if you allocate (with malloc) 1000 objects of 16 bytes
you actually use 32K...

On a desktop system, that's still noise. On a memory constrained
system, a customised small allocator would be used.

It's a fairly straight forward and very educational exercise to write
one's own memory allocator. Give it a go!
 
F

Francis Moreau

Ian Collins said:
That doesn't really matter, there is no restriction on the maximum
alignment.

I would say it really matters !

If you're a malloc implementer and wants to optimize the mermory space
used by your allocator...
 
K

Kaz Kylheku

Hello,

I was wondering what was the alignment rule when allocating memories
by using malloc, and the spec says that the alignment should be so
that the pointer returned can be assigned to a pointer to any type of
objects.

So I thought that 8 bytes of alignment should be enough for the
purpose on a 64 bits cpu, but the following simple test program makes
me wondering if it's really true:

Your program has no way of determining what alignment is being
enforced by malloc. The property of the the addresses coming out of malloc
is a consequence of not only alignment, but chunking.
And the ouput is:

00: 0xca7010
01: 0xca7030

[ snip ]
So every returned pointers are 32 bytes aligned which looks like a

That is false: 0xca7010 is not divisible by 32, but only by 16!
In hex, numbers divisible by 32 (0x20) end in some even digit folowed
by a zero: 0x....[02468ACE]0. If it ends with 0x...[13579BDF]0, it's
not divisible by 32.

So the pointers appear to be (no more strictly than) 16 byte aligned.
But they are spaced 32 bytes apart.

The alignment could simply be a coincidence. Suppose the allocator is actually
enforcing an 8 byte allocation strategy, but the minimum chunk it works with is
32 bytes wide, and it happens to start taking chunks starting at and address
that is divisible by 16. In your test case, it will never hit an address that
is divisible by 8, but not by 16.

You cannot know the true alignment from blackbox observations like this.

Also, your specific program is nto a particularly good investigation tool
because it triggers a predictable, repetitive behavior from the allocator. A
better program would give the allocator a varied mixture of allocation and
freeing requests, using a variety of sizes. I.e. try to randomize its behavior,
to try to create many different situations and cases within the blackbox. By
doing this you might just get your malloc to produce an address that's
divisible by eight.
 
I

Ian Collins

Francis said:
I would say it really matters !

If you're a malloc implementer and wants to optimize the mermory space
used by your allocator...
Or make it useful, or get the best performance on your platform. Life
and programming is full of chaises and compromise.
 
N

Nate Eldredge

Francis Moreau said:
But are these operands part of the C objects ?

On my amd64 machine, there's no ISO C type corresponding to these vector
objects. But they can be used via compiler extensions or inline
assembly, and therefore malloc provides the necessary alignment in order
to support this.
BTW, what is the type of C object that needs the biggest alignment ?

On my machine, none of the standard C types actually require any
particular alignment, but there is a speed penalty if they are not
aligned. The largest standard C types are 64 bits (long int and
pointers) and recommend 8-byte alignment, and the compiler normally
obeys this.

Of course, none of this is guaranteed by the C standard, and could be
completely different on other machines/implementations.
Are there any ways to get the actual memory size used when mallocing ?

Not portably. It's a difficult thing to measure, anyway, because one
can also imagine malloc needing additional space for other bookkeeping,
which doesn't correspond exactly to specific chunks.

My implementation does provide a `malloc_usable_size' function which
returns the amount of space actually provided in a chunk. In my example
above, malloc_usable_size(p=malloc(1)) would probably be 16. The
documentation cautions against using this for any other purpose than
"introspection", and it's totally non-portable.
Not that tiny: if you allocate (with malloc) 1000 objects of 16 bytes
you actually use 32K...

16 bytes is pretty tiny, IMHO. But you get the idea.
 
V

vippstar

Not that tiny: if you allocate (with malloc) 1000 objects of 16 bytes
you actually use 32K...

You claimed 1000 * 16 * 2 to be 32K. That's a claim only hard drive
manufacturers would make ;-).
It's actually 31.25KB. If you're going to care about the 16 bytes
'wasted' in each allocation, you should also consider the .75KB left
out of your calculations.
 
K

Keith Thompson

You claimed 1000 * 16 * 2 to be 32K. That's a claim only hard drive
manufacturers would make ;-).
It's actually 31.25KB. If you're going to care about the 16 bytes
'wasted' in each allocation, you should also consider the .75KB left
out of your calculations.

I think we can assume 32K to be an approximation. And strictly
speaking, 32KB means 32000; 32KiB means 32768 (though when talking
about memory K does usually mean 1024).
 
J

J. J. Farrell

Francis said:
...


Are there any ways to get the actual memory size used when mallocing ?


Not that tiny: if you allocate (with malloc) 1000 objects of 16 bytes
you actually use 32K...

It's a question of definition of terms, but I think most people would
agree that 16 bytes is a tiny (or even extremely tiny) chunk to malloc.

If you're saying that the amount of space you waste is not tiny when
this implementation of malloc is used to allocate 1000 16 byte chunks,
that's exactly the point Nate was making - don't do that! Most
implementations of malloc will waste a lot of memory of you use them to
allocate tiny chunks like 16 bytes. If you want space for 1000 16 byte
things, do a single malloc for an array of 1000 of them - or some other
way of allocating a substantial number at a time.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

malloc and alignment question 8
Alternative to Malloc in C 0
malloc and maximum size 56
malloc() and alignment 10
Regarding malloc memory alignment 10
malloc() and alignment 12
alignment shame (?) 14
malloc 40

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top