Can array allocation cause memory fragmentation

A

Andreas Schmitt

Hi,

I recently worked on an open source project and tried to make on of the
arrays they are using dynamically allocated to get rid of the max size.

I used the realloc instead of the usual C++ new and delete because they
asked me to stick to their "coding style"

Well whatever..

here's what I wrote:

Messages = (MMessage*)realloc( Messages, sizeof(MMessage) *
(m_num_messages + Num_builtin_messages) + 1 );

And here is one comment that came as a result of this:

"If the goal is to make it properly dynamic and he is actually just
adding one element, that is wrong. You always need to allocate in
batches to prevent memory fragmentation. Always increment by 5 or 10 at
the least, some const value, and keep track of the actual size of the
array so that you only have to go up in size when you really need it."



My question now.. from my undertanding, an array is always allocated as
N consecutive blocks of memory to hold its items. How can this cause
memory fragmentation, no matter how small the increase in a reallocation
is...?
 
V

Victor Bazarov

Andreas said:
[..]
My question now.. from my undertanding, an array is always allocated
as N consecutive blocks of memory to hold its items. How can this
cause memory fragmentation, no matter how small the increase in a
reallocation is...?

Memory fragmentation is going to happen. Period. Now, how much is too
much is the 64 thousand dollar question - it may mean that if it happens
too soon, you're going to run out of memory, or the performance of the
system is going to be affected or whatever. The more often your code
allocates and deallocates memory, the sooner you're going to run into
the problem caused by the heap fragmentation. That's all. Making your
array allocate in larger chunks makes reallocations somewhat _less_
_frequent_, which in turn may mean that during the run your program is
not going to drive the heap to the threshold of being problematically
fragmented.

Now, how is this a C++ language question, I am not sure.

V
 
C

crisgoogle

Hi,

I recently worked on an open source project and tried to make on of the
arrays they are using dynamically allocated to get rid of the max size.

I used the realloc instead of the usual C++ new and delete because they
asked me to stick to their "coding style"

Well whatever..

here's what I wrote:

Messages = (MMessage*)realloc( Messages, sizeof(MMessage) *
(m_num_messages + Num_builtin_messages) + 1 );

First of all, this is a dangerous way to (mis)use realloc. If realloc
_can't_ allocate the new buffer, it returns NULL, which you then copy
into your old pointer. The old value of Messages, your only reference
to the previously allocated memory, is gone.

The correct way to do this is to store the result of realloc in a
temporary, and only assign the value to Messages once you know that
the old value of Messages is no longer needed.
And here is one comment that came as a result of this:

"If the goal is to make it properly dynamic and he is actually just
adding one element, that is wrong. You always need to allocate in
batches to prevent memory fragmentation. Always increment by 5 or 10 at
the least, some const value, and keep track of the actual size of the
array so that you only have to go up in size when you really need it."

My question now.. from my undertanding, an array is always allocated as
N consecutive blocks of memory to hold its items. How can this cause
memory fragmentation, no matter how small the increase in a reallocation
is...?

The issue isn't really that your realloc by itself causes
fragmentation, it's that other allocations interspersed with it will.

Let's say that your realloc call really does need to move the memory.
Now you have a hole of size 'oldsize' in the memory pool. Now some
other part of the code allocates a few bytes, which may go immediately
after your newer block. Now you reallocate again, leaving a hole of
oldsize + x, and so on, and so on. You'll wind up leaving a string of
holes, all differing in size by a single object.

At the least, you should reallocate for several extra objects, as your
commenter mentioned. This simply reduces the number of reallocations
and therefore the probability of the above scenario being a problem.
(The typical way to do this is to increase the buffer size by some
proportion, by the way, not by some absolute value.)

If you're reallocating frequently, you should also consider some
(probably system-specific) way of allocating this particular array
always from a pool separate from other allocations.
 
J

Jim Langston

Andreas Schmitt said:
Hi,

I recently worked on an open source project and tried to make on of the
arrays they are using dynamically allocated to get rid of the max size.

I used the realloc instead of the usual C++ new and delete because they
asked me to stick to their "coding style"

Well whatever..

here's what I wrote:

Messages = (MMessage*)realloc( Messages, sizeof(MMessage) *
(m_num_messages + Num_builtin_messages) + 1 );

And here is one comment that came as a result of this:

"If the goal is to make it properly dynamic and he is actually just adding
one element, that is wrong. You always need to allocate in batches to
prevent memory fragmentation. Always increment by 5 or 10 at the least,
some const value, and keep track of the actual size of the array so that
you only have to go up in size when you really need it."

My question now.. from my undertanding, an array is always allocated as N
consecutive blocks of memory to hold its items. How can this cause memory
fragmentation, no matter how small the increase in a reallocation is...?

Cris answered the question rather well. The more you allocate/reallocate
memory the more fragmentation that can happen. The way a few STL
implementaions do it, asi I understand, is to double the amount of memory
each time. This allocates more memory than needed which may not be used,
but reduces the number of allocations that will be needed. But, what is
this fragmentation? Consider

messages you first allocate 100 bytes.
100 bytes = messages
then you allocate 101
100 bytes = nothing
101 bytes = messages
then you allocate 102 bytes. It won't fit in the 100 bytes so it'll go
after that block
100 bytes = nothing
101 bytes = nothing
102 bytes = messages

You've now used up 303 bytes but are only using 102. Next time you go to
allocate 103 bytes, it would grab it from the beginning (hopefully) but then
you probably have other allcoations taking place for other variables, and
you wind up with differnet size holes of unused memory in the freestore.
(For example, say some other variable needed 105 bytes now, it would grab it
making it:

105 bytes = someothervariable
96 bytes = unused
102 bytes = messages

enough of this goes on in the program and you start running out of memory to
grab your increasingly large blocks of memory. Even though the memory is
there unused, it's all broken up in little chunks that can't be grabbed at
once. One way to prevent this is to grab more memory than you need,
reducing the amount of reallocations that take place. Doubling is one way.
Instead of grabbing 101 blocks, grab 200. If you run out of that 200, grab
400 next time, etc... This helps a bit decrease the amount of allocations
that take place. Even though fragmentation will still occur, it might help
a little.
 
J

James Kanze

I recently worked on an open source project and tried to make on of the
arrays they are using dynamically allocated to get rid of the max size.
I used the realloc instead of the usual C++ new and delete because they
asked me to stick to their "coding style"
Well whatever..
here's what I wrote:
Messages = (MMessage*)realloc( Messages, sizeof(MMessage) *
(m_num_messages + Num_builtin_messages) + 1 );

If MMessage isn't a POD, this won't work.
And here is one comment that came as a result of this:
"If the goal is to make it properly dynamic and he is actually just
adding one element, that is wrong. You always need to allocate in
batches to prevent memory fragmentation. Always increment by 5 or 10 at
the least, some const value, and keep track of the actual size of the
array so that you only have to go up in size when you really need it."

Is this from the same person who requires realloc, rather than
new and delete? He obviously doesn't understand memory
management very well.
My question now.. from my undertanding, an array is always
allocated as N consecutive blocks of memory to hold its items.
How can this cause memory fragmentation, no matter how small
the increase in a reallocation is...?

Indirectly, it could allocate a block of N, then allocate a
block of N+1 and free the block of N. If there are intervening
allocations, there could be an isolated block the size of N
elements. In practice, however:

-- Historically, you'd using realloc because you expect to
increase the size of the existing buffer, with no new
allocation. If there's not space, at least with the
algorithms which were frequent 20 years ago (when I was
working in C), the growing block would fairly quickly
migrate to the end of the free space arena, where it would
almost always be possible to grow it without reallocating.
Unless you can reasonably expect something like this,
there's no point of using realloc.

-- If realloc cannot grow something in place, it must copy it.
I suspect that with many modern implementations of
malloc/free, this would almost always be the case (but I've
not actually looked to see). Copying is, or can be, an
expensive operation; the usual solution when unlimited
growth is required is to use exponential allocation, always
multiplying the size by a fixed factor. (For various
reasons, 1.5 is generally to be preferred, although 2 is
very common as well.)

std::vector uses the second strategy, and that's really what you
should be using here.
 
J

James Kanze

The way a few STL implementaions do it, asi I understand, is
to double the amount of memory each time. This allocates more
memory than needed which may not be used, but reduces the
number of allocations that will be needed.

The language specification requires that
std::vector<>::push_back have amoritized constant time. This
requires some sort of exponential growth strategy. In older
implementations, 2 was a common factor, but Andy Koenig (I
think) pointed out that this has the serious drawback that even
if previously allocated blocks are contiguous, they will never
be large enough for the next allocation; the upper limit if you
want some memory reuse is, IIRC, the (1 + sqrt(5))/2---roughly
1.6. 1.5 is pretty close, and easy to calculate without
floating point arithmetic, and I suspect that this is the factor
used in many more recent implementations.
But, what is
this fragmentation? Consider

messages you first allocate 100 bytes.
100 bytes = messages
then you allocate 101
100 bytes = nothing
101 bytes = messages
then you allocate 102 bytes. It won't fit in the 100 bytes so it'll go
after that block
100 bytes = nothing
101 bytes = nothing
102 bytes = messages
You've now used up 303 bytes but are only using 102. Next
time you go to allocate 103 bytes, it would grab it from the
beginning (hopefully) but then you probably have other
allcoations taking place for other variables, and you wind up
with differnet size holes of unused memory in the freestore.

You can't make those sort of assumptions. It all depends on
realloc, and how it works. And realloc doesn't work well if
there are intervening allocations.
(For example, say some other variable needed 105 bytes now, it
would grab it making it:
105 bytes = someothervariable
96 bytes = unused
102 bytes = messages
enough of this goes on in the program and you start running
out of memory to grab your increasingly large blocks of
memory. Even though the memory is there unused, it's all
broken up in little chunks that can't be grabbed at once.

What typically happens is that most of the allocated memory is
at the bottom of the free space arena. Fairly rapidly, realloc
can't find any room in the fragments, and migrates to the top.
Where everything that follows is free, so the next realloc's
just change the size of the buffer.

At least, that was the way it worked 20 years ago, when realloc
was relevant. Today, I suspect that you're right, and that
there will often be enough intervening allocations to require
realloc to actually reallocate and copy. At which point, it
ceases to be relevant. (Of course, an implementation could also
note when a block was being realloc'ed, and when it did actually
allocate something new, place the new memory somewhere where it
would have a maximum of space behind it, and avoid allocating
other memory behind it, so as to increase the chances of being
able to grow the allocation in place.)
One way to prevent this is to grab more memory than you need,
reducing the amount of reallocations that take place.
Doubling is one way. Instead of grabbing 101 blocks, grab
200. If you run out of that 200, grab 400 next time, etc...
This helps a bit decrease the amount of allocations that take
place. Even though fragmentation will still occur, it might
help a little.

Doubling is a very poor solution for this, as it means that you
can never reuse the freed memory for a new block, see above.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,151
Latest member
JaclynMarl
Top