How much memory does malloc(0) allocate?

Discussion in 'C Programming' started by Lynn McGuire, Jul 26, 2013.

  1. Lynn McGuire

    Eric Sosman Guest

    Centered on zero? So zero-byte allocations are the commonest
    of all? And allocations of 10 to 20 bytes are about as common as
    those of -20 to -10?
    Eric Sosman, Jul 28, 2013
    1. Advertisements

  2. Lynn McGuire

    BGB Guest

    well, no. allocations 0 are not common (0 is "generally invalid", and <0
    is invalid and can't be allocated).

    but the highest point is (was) 1-15 bytes, and it rapidly drops off from

    but, I am not sure of what name there would be for this exact
    distribution (Gaussian was the closest I could find).

    correction: re-ran the heap-statistics tool, currently the highest point
    seems to be 16-31 bytes (followed by 32-47 bytes, ...).

    the most common object types are currently "metadata leaves" and
    "metadata nodes" (basically, structures related to a hierarchical
    database), followed mostly by various other small-structure types.

    in any case though, small allocations seem to be pretty common.
    BGB, Jul 29, 2013
    1. Advertisements

  3. Lynn McGuire

    Eric Sosman Guest

    ... and wouldn't be "centered on 0."
    Eric Sosman, Jul 29, 2013
  4. Lynn McGuire

    BGB Guest

    yeah, graphs look about right...

    the spike is a break from the pattern, but alas...
    BGB, Jul 29, 2013
  5. Skewed sample problem.
    One program doesn't necessarily represent the typical situation. If you have
    a tree-like structure which dominates the total number of objects in the
    system, you either have lots of allocations of sizeof(NODE), or a few
    large allocations of N * sizeof(NODE). (See my book Basic Algororthms
    about how to write a fast fixed-block allocator). It depends if allocation
    performace is a concern or not.
    The some program have mainly dynamic strings, others mainly fixed fields.
    If you're ultimately storing data in a database like SQL, you might as well
    write char str[64], because SQL can't handle arbitrarily long strings.
    However if you're not, generally mallocing strings is neater an more robust.
    Malcolm McLean, Jul 29, 2013
  6. Lynn McGuire

    BGB Guest

    this is not to say that they represent the bulk of the memory usage,
    only that they held top-place (for the most allocated object type).

    they represent around 0.87% of the total memory usage (5MB / 576MB),
    with an allocation count of around 1.93M.

    they are followed by heap-allocated triangles for skeletal models (~ 21k
    allocs), terrain-chunk headers (6k allocs), and around 116 other object

    don't have a percentage for object-counts, I would have to add and
    calculate this manually.

    yeah, there are heap allocated strings and symbols in the mix as well,
    but they don't hold as high of a position.

    there were previously lots of individually wrapped int/float/double
    values as well, but these have since been moved over to using slab

    to explain the 32kB spike:
    this has to do with the voxel terrain logic, which has "chunks" which
    are 16x16x16 arrays of 8 byte values (voxels, each represents the
    locally active area in terms of 1 meter cubes, and are basically a
    collection of bit-fields).

    there are only about 5826 of them, but in the dump data, these represent
    32% of the total memory usage (186MB / 576MB).

    there are also serialized voxel regions while only having 8 allocations
    (in the dump), represent 7% of the memory use (41MB / 576MB). regions
    store the voxels in an RLE-compressed format, for parts of the terrain
    that are not currently active.

    then there are occasional other large things, like 9 console buffers
    which use 2MB (currently for a 160x90 console with 4-bytes for each
    character and formatting).


    note that some data is also stored in RAM in a "compressed" format, such
    as audio data for the mixer.

    originally, this data was stored in RAM as raw PCM audio, but this was
    kind of bulky (audio data can use a lot of RAM at 16-bit 44.1kHz), so I
    developed a custom audio codec which allows random-access decompression,
    and stores the audio at 176kbps.

    now audio is no longer a significant memory user.

    work was also going on recently to allow an alternate in-memory format
    for the voxel chunks, which basically would exploit a property:
    typically, each chunk only has a small number of unique voxel types;
    so, in many cases, eligible chunks could be represented in a form where
    they use 8-bit (byte) indices into a table of voxel-values, which would
    store an eligible chunk in 6kB rather than 32kB.

    but, as-is, this is a fairly involved modification.

    BGB, Jul 29, 2013
  7. Lynn McGuire

    Lynn McGuire Guest

    Isn't size_t always an unsigned int?

    Lynn McGuire, Jul 29, 2013
  8. Lynn McGuire

    James Kuyper Guest

    If the distribution of allocation sizes had in fact been centered on
    zero, and included any positive allocation sizes, then it would also
    necessarily have had to include some negative allocation sizes.
    Therefore, it would have had to have been a non-conforming
    implementation which used a signed type.

    Of course, the description of the curve as being "centered on zero" was
    incorrect. It has a peak at 0, but no part of the curve covers negative
    James Kuyper, Jul 29, 2013
  9. Lynn McGuire

    James Kuyper Guest

    On 07/29/2013 03:42 PM, Lynn McGuire wrote:
    No. It must be an unsigned integer type, but it doesn't have to be
    unsigned int. SIZE_MAX must be at least 65535, but even "unsigned short"
    is big enough to meet that requirement. On a system where CHAR_BIT==16,
    size_t could even be "unsigned char". The only unsigned type that can't
    be size_t is _Bool.
    James Kuyper, Jul 29, 2013
  10. Lynn McGuire

    Ike Naar Guest

    It isn't. Counterexample: linux on amd64,
    with 32-bit int and 64-bit long.
    size_t is unsigned long.
    Ike Naar, Jul 29, 2013
  11. There's a common confusion between the terms "int" and "integer".

    Even though the derivation of the C keyword "int" is obviously as
    an abbreviation of the English word "integer", their meanings are
    quite distinct.

    In C, the word "integer" refers to any of a number of distinct types,
    ranging from char to long long.

    "int" is a type name that refers to just one of those types.
    The keyword "int" can also be used as part of the names for several
    other types, such as "short int", and "unsigned long long int", and
    so forth, but when used by itself it refers only to that one type.

    "const" and "constant" can cause similar confusion; "constant"
    means, more or less, evaluable at compile time, but "const" means
    Keith Thompson, Jul 29, 2013
  12. The word "an" above suggests this is appropriate.

    However, it does hint that int might be used as an abbreviation
    for the work "integer". Following the usual English rules, it should
    be followed by a period.

    -- glen
    glen herrmannsfeldt, Jul 30, 2013
  13. Using "int", with or without a period, as an abbreviation for "integer"
    while discussing C strikes me as a Very Bad Idea. (No offense intended
    to Lynn McGuire, who probably just made a minor and unintentional error,
    as we all do from time to time.)
    Keith Thompson, Jul 30, 2013
  14. Lynn McGuire

    Geoff Guest

    Or he peeked at one of the header files for his implementation and
    found it defined as unsigned int and assumed that is what the standard
    Geoff, Jul 30, 2013
  15. Almost every integer should be int. Since integers usually end up indexing
    arrays (even a char, when you think about it, will probably eventually
    end up as an index into a glyph table of some sort), that means that int
    needs to be able to index an arbitrary array. Then you don't need any other
    types, except to save memory, or for a few algorithms that need huge integers.

    We don't need twenty plus integer types in C.
    Malcolm McLean, Jul 30, 2013
  16. Lynn McGuire

    Phil Carmody Guest

    What's a "solid" assert? assert() is one of the most ephemeral
    bits of code it's possible to write in C.

    Phil Carmody, Jul 30, 2013
  17. Lynn McGuire

    James Kuyper Guest

    On 07/30/2013 05:34 AM, Malcolm McLean wrote:
    If you dismiss all the reasons for doing so as irrelevant, it can seem
    pointless to have so many different integer types. Using the same
    "logic", we only need one hammer design.
    James Kuyper, Jul 30, 2013
  18. Lynn McGuire

    Kleuske Guest

    Sigh... That's the level of debate you prefer?
    Kleuske, Jul 30, 2013
  19. Similarly, lots of different types of wheels are needed. We wouldn't want to
    run our cars and bicycles on Assyrian chariot wheels. If nothing else, the
    the iron scythes might get in the way of other road users. ;-)

    Hence, despite the oft-quoted anti-proverbial, wheels do sometimes need to
    be reinvented.

    James Harris \(es\), Jul 30, 2013
  20. Lynn McGuire

    Lynn McGuire Guest

    I make minor and unintentional errors all the time!

    My point was that size_t is unsigned. I was not
    thinking about the actual size of size_t. I would
    prefer all modern day usage of this kind of data
    to be 64 bit. At least.

    Lynn McGuire, Jul 30, 2013
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.