How much memory does malloc(0) allocate?

Eric Sosman · Jul 28, 2013

[...]
allocator statistics have generally shown that small objects (< 4kB)
represent a good portion of the total memory use, *1, but currently with
a big spike at 32kB (one of the major subsystems allocates a lot of 32kB
arrays).

*1: roughly forming a Gaussian distribution centered on 0, with millions
of small objects.

Centered on zero? So zero-byte allocations are the commonest
of all? And allocations of 10 to 20 bytes are about as common as
those of -20 to -10?

BGB · Jul 28, 2013

[...]
allocator statistics have generally shown that small objects (< 4kB)
represent a good portion of the total memory use, *1, but currently with
a big spike at 32kB (one of the major subsystems allocates a lot of 32kB
arrays).

*1: roughly forming a Gaussian distribution centered on 0, with millions
of small objects.

Click to expand...

Centered on zero? So zero-byte allocations are the commonest
of all? And allocations of 10 to 20 bytes are about as common as
those of -20 to -10?

well, no. allocations 0 are not common (0 is "generally invalid", and <0
is invalid and can't be allocated).

but the highest point is (was) 1-15 bytes, and it rapidly drops off from
there.

but, I am not sure of what name there would be for this exact
distribution (Gaussian was the closest I could find).

correction: re-ran the heap-statistics tool, currently the highest point
seems to be 16-31 bytes (followed by 32-47 bytes, ...).

the most common object types are currently "metadata leaves" and
"metadata nodes" (basically, structures related to a hierarchical
database), followed mostly by various other small-structure types.

in any case though, small allocations seem to be pretty common.

Eric Sosman · Jul 28, 2013

On 7/28/2013 4:59 PM, BGB wrote:
[...]
allocator statistics have generally shown that small objects (< 4kB)
represent a good portion of the total memory use, *1, but currently with
a big spike at 32kB (one of the major subsystems allocates a lot of 32kB
arrays).

*1: roughly forming a Gaussian distribution centered on 0, with millions
of small objects.

Centered on zero? So zero-byte allocations are the commonest
of all? And allocations of 10 to 20 bytes are about as common as
those of -20 to -10?

Click to expand...

well, no. allocations 0 are not common (0 is "generally invalid", and <0
is invalid and can't be allocated).

but the highest point is (was) 1-15 bytes, and it rapidly drops off from
there.

but, I am not sure of what name there would be for this exact
distribution (Gaussian was the closest I could find).

correction: re-ran the heap-statistics tool, currently the highest point
seems to be 16-31 bytes (followed by 32-47 bytes, ...).

the most common object types are currently "metadata leaves" and
"metadata nodes" (basically, structures related to a hierarchical
database), followed mostly by various other small-structure types.

in any case though, small allocations seem to be pretty common.

Click to expand...

Sounds perhaps Geometric/Exponential (that wouldn't have the spike at 32k).

... and wouldn't be "centered on 0."

BGB · Jul 28, 2013

On 7/28/2013 4:59 PM, BGB wrote:
[...]
allocator statistics have generally shown that small objects (< 4kB)
represent a good portion of the total memory use, *1, but currently with
a big spike at 32kB (one of the major subsystems allocates a lot of 32kB
arrays).

*1: roughly forming a Gaussian distribution centered on 0, with millions
of small objects.

Centered on zero? So zero-byte allocations are the commonest
of all? And allocations of 10 to 20 bytes are about as common as
those of -20 to -10?

Click to expand...

well, no. allocations 0 are not common (0 is "generally invalid", and <0
is invalid and can't be allocated).

but the highest point is (was) 1-15 bytes, and it rapidly drops off from
there.

but, I am not sure of what name there would be for this exact
distribution (Gaussian was the closest I could find).

correction: re-ran the heap-statistics tool, currently the highest point
seems to be 16-31 bytes (followed by 32-47 bytes, ...).

the most common object types are currently "metadata leaves" and
"metadata nodes" (basically, structures related to a hierarchical
database), followed mostly by various other small-structure types.

in any case though, small allocations seem to be pretty common.

Click to expand...

Sounds perhaps Geometric/Exponential (that wouldn't have the spike at 32k).

yeah, graphs look about right...

the spike is a break from the pattern, but alas...

Malcolm McLean · Jul 29, 2013

On 7/28/2013 5:09 PM, Eric Sosman wrote:

correction: re-ran the heap-statistics tool, currently the highest point
seems to be 16-31 bytes (followed by 32-47 bytes, ...).

the most common object types are currently "metadata leaves" and
"metadata nodes" (basically, structures related to a hierarchical
database), followed mostly by various other small-structure types.

in any case though, small allocations seem to be pretty common.

Skewed sample problem.
One program doesn't necessarily represent the typical situation. If you have
a tree-like structure which dominates the total number of objects in the
system, you either have lots of allocations of sizeof(NODE), or a few
large allocations of N * sizeof(NODE). (See my book Basic Algororthms
about how to write a fast fixed-block allocator). It depends if allocation
performace is a concern or not.
The some program have mainly dynamic strings, others mainly fixed fields.
If you're ultimately storing data in a database like SQL, you might as well
write char str[64], because SQL can't handle arbitrarily long strings.
However if you're not, generally mallocing strings is neater an more robust.

BGB · Jul 29, 2013

On 7/28/2013 5:09 PM, Eric Sosman wrote:

correction: re-ran the heap-statistics tool, currently the highest point
seems to be 16-31 bytes (followed by 32-47 bytes, ...).

the most common object types are currently "metadata leaves" and
"metadata nodes" (basically, structures related to a hierarchical
database), followed mostly by various other small-structure types.

in any case though, small allocations seem to be pretty common.

Click to expand...

Skewed sample problem.
One program doesn't necessarily represent the typical situation. If you have
a tree-like structure which dominates the total number of objects in the
system, you either have lots of allocations of sizeof(NODE), or a few
large allocations of N * sizeof(NODE). (See my book Basic Algororthms
about how to write a fast fixed-block allocator). It depends if allocation
performace is a concern or not.
The some program have mainly dynamic strings, others mainly fixed fields.
If you're ultimately storing data in a database like SQL, you might as well
write char str[64], because SQL can't handle arbitrarily long strings.
However if you're not, generally mallocing strings is neater an more robust.

this is not to say that they represent the bulk of the memory usage,
only that they held top-place (for the most allocated object type).

they represent around 0.87% of the total memory usage (5MB / 576MB),
with an allocation count of around 1.93M.

they are followed by heap-allocated triangles for skeletal models (~ 21k
allocs), terrain-chunk headers (6k allocs), and around 116 other object
types.

don't have a percentage for object-counts, I would have to add and
calculate this manually.

yeah, there are heap allocated strings and symbols in the mix as well,
but they don't hold as high of a position.

there were previously lots of individually wrapped int/float/double
values as well, but these have since been moved over to using slab
allocators.

to explain the 32kB spike:
this has to do with the voxel terrain logic, which has "chunks" which
are 16x16x16 arrays of 8 byte values (voxels, each represents the
locally active area in terms of 1 meter cubes, and are basically a
collection of bit-fields).

there are only about 5826 of them, but in the dump data, these represent
32% of the total memory usage (186MB / 576MB).

there are also serialized voxel regions while only having 8 allocations
(in the dump), represent 7% of the memory use (41MB / 576MB). regions
store the voxels in an RLE-compressed format, for parts of the terrain
that are not currently active.

then there are occasional other large things, like 9 console buffers
which use 2MB (currently for a 160x90 console with 4-bytes for each
character and formatting).

....

note that some data is also stored in RAM in a "compressed" format, such
as audio data for the mixer.

originally, this data was stored in RAM as raw PCM audio, but this was
kind of bulky (audio data can use a lot of RAM at 16-bit 44.1kHz), so I
developed a custom audio codec which allows random-access decompression,
and stores the audio at 176kbps.

now audio is no longer a significant memory user.

work was also going on recently to allow an alternate in-memory format
for the voxel chunks, which basically would exploit a property:
typically, each chunk only has a small number of unique voxel types;
so, in many cases, eligible chunks could be represented in a form where
they use 8-bit (byte) indices into a table of voxel-values, which would
store an eligible chunk in 6kB rather than 32kB.

but, as-is, this is a fairly involved modification.

....

Lynn McGuire · Jul 29, 2013

[...]
allocator statistics have generally shown that small objects (< 4kB)
represent a good portion of the total memory use, *1, but currently with
a big spike at 32kB (one of the major subsystems allocates a lot of 32kB
arrays).

*1: roughly forming a Gaussian distribution centered on 0, with millions
of small objects.

Click to expand...

Centered on zero? So zero-byte allocations are the commonest
of all? And allocations of 10 to 20 bytes are about as common as
those of -20 to -10?

Click to expand...

Obviously I've been going about handing out of memory conditions the
wrong way - I should just malloc a few object of negative size!

Isn't size_t always an unsigned int?

Lynn

James Kuyper · Jul 29, 2013

On 7/28/2013 4:59 PM, BGB wrote:
[...]
allocator statistics have generally shown that small objects (< 4kB)
represent a good portion of the total memory use, *1, but currently with
a big spike at 32kB (one of the major subsystems allocates a lot of 32kB
arrays).

*1: roughly forming a Gaussian distribution centered on 0, with millions
of small objects.

Centered on zero? So zero-byte allocations are the commonest
of all? And allocations of 10 to 20 bytes are about as common as
those of -20 to -10?

Click to expand...

Obviously I've been going about handing out of memory conditions the
wrong way - I should just malloc a few object of negative size!

Click to expand...

Isn't size_t always an unsigned int?

If the distribution of allocation sizes had in fact been centered on
zero, and included any positive allocation sizes, then it would also
necessarily have had to include some negative allocation sizes.
Therefore, it would have had to have been a non-conforming
implementation which used a signed type.

Of course, the description of the curve as being "centered on zero" was
incorrect. It has a peak at 0, but no part of the curve covers negative
values.

James Kuyper · Jul 29, 2013

On 07/29/2013 03:42 PM, Lynn McGuire wrote:
....

Isn't size_t always an unsigned int?

No. It must be an unsigned integer type, but it doesn't have to be
unsigned int. SIZE_MAX must be at least 65535, but even "unsigned short"
is big enough to meet that requirement. On a system where CHAR_BIT==16,
size_t could even be "unsigned char". The only unsigned type that can't
be size_t is _Bool.

Ike Naar · Jul 29, 2013

Isn't size_t always an unsigned int?

It isn't. Counterexample: linux on amd64,
with 32-bit int and 64-bit long.
size_t is unsigned long.

Keith Thompson · Jul 29, 2013

James Kuyper said:
On 07/29/2013 03:42 PM, Lynn McGuire wrote:
...

No. It must be an unsigned integer type, but it doesn't have to be
unsigned int. SIZE_MAX must be at least 65535, but even "unsigned short"
is big enough to meet that requirement. On a system where CHAR_BIT==16,
size_t could even be "unsigned char". The only unsigned type that can't
be size_t is _Bool.

There's a common confusion between the terms "int" and "integer".

Even though the derivation of the C keyword "int" is obviously as
an abbreviation of the English word "integer", their meanings are
quite distinct.

In C, the word "integer" refers to any of a number of distinct types,
ranging from char to long long.

"int" is a type name that refers to just one of those types.
The keyword "int" can also be used as part of the names for several
other types, such as "short int", and "unsigned long long int", and
so forth, but when used by itself it refers only to that one type.

"const" and "constant" can cause similar confusion; "constant"
means, more or less, evaluable at compile time, but "const" means
"read-only".

glen herrmannsfeldt · Jul 29, 2013

Keith Thompson said:
(snip)

There's a common confusion between the terms "int" and "integer".

The word "an" above suggests this is appropriate.

Even though the derivation of the C keyword "int" is obviously as
an abbreviation of the English word "integer", their meanings are
quite distinct.

(snip)

However, it does hint that int might be used as an abbreviation
for the work "integer". Following the usual English rules, it should
be followed by a period.

-- glen

Keith Thompson · Jul 29, 2013

glen herrmannsfeldt said:
The word "an" above suggests this is appropriate.

(snip)

However, it does hint that int might be used as an abbreviation
for the work "integer". Following the usual English rules, it should
be followed by a period.

Using "int", with or without a period, as an abbreviation for "integer"
while discussing C strikes me as a Very Bad Idea. (No offense intended
to Lynn McGuire, who probably just made a minor and unintentional error,
as we all do from time to time.)

Geoff · Jul 29, 2013

Using "int", with or without a period, as an abbreviation for "integer"
while discussing C strikes me as a Very Bad Idea. (No offense intended
to Lynn McGuire, who probably just made a minor and unintentional error,
as we all do from time to time.)

Or he peeked at one of the header files for his implementation and
found it defined as unsigned int and assumed that is what the standard
specifies.

Malcolm McLean · Jul 30, 2013

Using "int", with or without a period, as an abbreviation for "integer"
while discussing C strikes me as a Very Bad Idea. (No offense intended
to Lynn McGuire, who probably just made a minor and unintentional error,
as we all do from time to time.)

Almost every integer should be int. Since integers usually end up indexing
arrays (even a char, when you think about it, will probably eventually
end up as an index into a glyph table of some sort), that means that int
needs to be able to index an arbitrary array. Then you don't need any other
types, except to save memory, or for a few algorithms that need huge integers.

We don't need twenty plus integer types in C.

Phil Carmody · Jul 30, 2013

Eric Sosman said:
The C99 Rationale (I haven't seen a C11 version yet) explains
the Committee's thinking; see section 7.20.3.

Fine -- But in your usage, the assert should precede the
call to malloc(), and not depend on the returned value.

What's a "solid" assert? assert() is one of the most ephemeral
bits of code it's possible to write in C.

Phil

James Kuyper · Jul 30, 2013

On 07/30/2013 05:34 AM, Malcolm McLean wrote:
....

Almost every integer should be int. Since integers usually end up indexing
arrays (even a char, when you think about it, will probably eventually
end up as an index into a glyph table of some sort), that means that int
needs to be able to index an arbitrary array. Then you don't need any other
types, except to save memory, or for a few algorithms that need huge integers.

We don't need twenty plus integer types in C.

If you dismiss all the reasons for doing so as irrelevant, it can seem
pointless to have so many different integer types. Using the same
"logic", we only need one hammer design.
<http://en.wikipedia.org/wiki/Hammer#Gallery>.

Kleuske · Jul 30, 2013

What's a "solid" assert? assert() is one of the most ephemeral bits of
code it's possible to write in C.

Sigh... That's the level of debate you prefer?

James Harris \(es\) · Jul 30, 2013

James Kuyper said:
On 07/30/2013 05:34 AM, Malcolm McLean wrote:
...

If you dismiss all the reasons for doing so as irrelevant, it can seem
pointless to have so many different integer types. Using the same
"logic", we only need one hammer design.
<http://en.wikipedia.org/wiki/Hammer#Gallery>.

Similarly, lots of different types of wheels are needed. We wouldn't want to
run our cars and bicycles on Assyrian chariot wheels. If nothing else, the
the iron scythes might get in the way of other road users. ;-)

Hence, despite the oft-quoted anti-proverbial, wheels do sometimes need to
be reinvented.

James

Lynn McGuire · Jul 30, 2013

Using "int", with or without a period, as an abbreviation for "integer"
while discussing C strikes me as a Very Bad Idea. (No offense intended
to Lynn McGuire, who probably just made a minor and unintentional error,
as we all do from time to time.)

I make minor and unintentional errors all the time!

My point was that size_t is unsigned. I was not
thinking about the actual size of size_t. I would
prefer all modern day usage of this kind of data
to be 64 bit. At least.

Lynn

Using C for competitive advantage	27	May 17, 2009
malloc and alignment question	8	Jul 19, 2010
a fast malloc/free implementation & benchmarks	0	Mar 19, 2011
How much memory does Django consume compared to Rails?	1	Jun 19, 2013
malloc (0)	14	Jul 4, 2006
using my own malloc()	14	Jul 30, 2009
Questions regarding specialized malloc()/free() replacements	3	Jan 4, 2009
Does C guarantee the data layout of the memory allocated by malloc function?	6	Sep 22, 2005

How much memory does malloc(0) allocate?

Eric Sosman

BGB

Eric Sosman

BGB

Malcolm McLean

BGB

Lynn McGuire

James Kuyper

James Kuyper

Ike Naar

Keith Thompson

glen herrmannsfeldt

Keith Thompson

Geoff

Malcolm McLean

Phil Carmody

James Kuyper

Kleuske

James Harris \(es\)

Lynn McGuire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads