sizeof([ALLOCATED MEMORY])

  • Thread starter ballpointpenthief
  • Start date
B

ballpointpenthief

If I have malloc()'ed a pointer and want to read from it as if it were
an array, I need to know that I won't be reading past the last index.

If this is a pointer to a pointer, a common technique seems to be
setting a NULL pointer to the end of the list, and here we know that
the allocated memory has been exhausted. All good.

When this is a pointer to another type, say int, I could have a
variable that records how much memory is being allocated and use that
to track the size of the 'array'.
Alternatively, we could set the end of the 'array' to some kind of
error-code, such as 99 or MAX_INT.
I don't like either of these techniques.

So, what is a good way to stop a loop reading or writing past the
memory allocated to a pointer?
Or if possible, what is a good way of determining the size of memory
allocated to a pointer?

Cheers,
Matt
 
R

Richard Heathfield

ballpointpenthief said:
If I have malloc()'ed a pointer and want to read from it as if it were
an array, I need to know that I won't be reading past the last index.

When you allocate the memory, you know precisely how much memory you are
requesting.

Don't Forget (tm).
 
C

Chris McDonald

ballpointpenthief said:
So, what is a good way to stop a loop reading or writing past the
memory allocated to a pointer?
Or if possible, what is a good way of determining the size of memory
allocated to a pointer?

Your code allocated the memory.
Your code should remember how much it allocated.
 
J

Joe Smith

Chris McDonald said:
Your code allocated the memory.
Your code should remember how much it allocated.

If I understand what I've learned today, if one has a differing module other
than the one in which
p = malloc(some number);
is declared and in which memory is a burning question, then ones source
needs to declare and pass. Joe
 
C

Chris McDonald

If I understand what I've learned today, if one has a differing module other
than the one in which
p = malloc(some number);
is declared and in which memory is a burning question, then ones source
needs to declare and pass. Joe

(now, If I'm understanding you correctly) Yes, if some other code (for
which you perhaps don't have the code) performs the allocation, then if
your code needs to iterate over the memory, your code needs to know the
allocated size.

You will typically need to indicate how much memory the other routine
should allocate, or it should inform your code how much it did allocate.

In short, you cannot determine the allocated size, from the allocated
memory itself.
 
C

CBFalconer

Chris said:
.... snip ...

In short, you cannot determine the allocated size, from the
allocated memory itself.

However, you can determine how much memory is needed by some means
or other, and perform a realloc to that size. This can lead to
snarling dogs playing tug-of-war.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
H

Howard Hinnant

Richard Heathfield said:
ballpointpenthief said:


When you allocate the memory, you know precisely how much memory you are
requesting.

Don't Forget (tm).

However when you allocate the memory, you don't know precisely how much
you received. It could be more than you requested. And common
applications exist which could make use of that extra memory if only
they knew it existed (e.g. a dynamic array).

Major OS's provide this functionality in a non-portable fashion. Imho C
should provide a portable layer over this functionality.

-Howard
 
?

=?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?=

Howard said:
However when you allocate the memory, you don't know precisely how much
you received. It could be more than you requested. And common
applications exist which could make use of that extra memory if only
they knew it existed (e.g. a dynamic array).
Why would they make use of more memory than they requested instead of
just requesting that much memory in the first place ?
Major OS's provide this functionality in a non-portable fashion. Imho C
should provide a portable layer over this functionality.
No.
How horrible.
 
H

Howard Hinnant

However when you allocate the memory, you don't know precisely how much
you received. It could be more than you requested. And common
applications exist which could make use of that extra memory if only
they knew it existed (e.g. a dynamic array).
[/QUOTE]
Why would they make use of more memory than they requested instead of
just requesting that much memory in the first place ?

Perhaps I should elucidate "dynamic array".

Consider a general purpose library which manages an array of short.
This array has a length to be determined at run time, and that length
can grow or shrink over the lifetime of the array via functions such as
"insert", "resize" or "append".

A typical use case might be:

struct array_of_short
{
short* data;
size_t size;
};

array_of_short my_array = create_array_short(N);
// ...
// Fill my_array with values and do some computation with it
// ...
if (some_condition)
{
append_array_short(&my_array, M, value);
// ...
// continue computation with modified array
// ...
}

The data structure as described above has an efficiency concern:
Repeated growth in size can lead to quadratic expense due to continually
reallocating for a slightly larger size. A well known fix is to have
the array keep track of both a logical size and a physical capacity
which it can cheaply grow into:

struct array_of_short
{
short* data;
size_t size;
size_t capacity;
};

Use as before, but this new struct has an invariant: size <= capacity.

"append_array_short" and its brethren can now increase the size very
rapidly if the new size is still less than the capacity. When the new
size is greater than the capacity, these functions can reallocate the
capacity at a geometric rate (say by some growth factor between 1.5 and
2, (1+sqrt(5))/2 has been theorized as an optimum growth factor). Using
a geometric growth factor for the capacity bounds the number of
reallocations that a given array will see over its lifetime to a very
small number (compared to an incremental growth strategy).

No matter what the growth strategy for capacity, its existence is a
major motivation for finding out the amount of memory actually
allocated. create_array_short(N) may request only N*sizeof(short)
bytes, but may receive slightly more than that (for alignment purposes
or whatever). If create_array_short(N) had some way to find out how
much memory it actually received, then it makes perfect sense to set
capacity to size_received/sizeof(short) instead of to N. It makes for
higher performing code in the average case.

-Howard
 
K

Kelsey Bjarnason

[snips]

The data structure as described above has an efficiency concern:
Repeated growth in size can lead to quadratic expense due to continually
reallocating for a slightly larger size.

So don't do that. In simplest terms, don't add, multiply.

Assume you're adding, oh, 100 elements at each step, and starting with
1000. I'm not going to add, I'm going to scale, by 25%. By the time
you've _doubled_ your storage, mine's gone up by a factor of almost ten,
yet the _worst_ waste I will ever have is 25% of the total elements, minus
one element.

Other scaling factors work, too. Need to be more conservative? Scale by
5 or 10 per cent. Need memory to grow faster? Try 50%. Or even 100%.

Increasing by a static amount - adding 100 units at each step, etc -
strikes me as a tad silly.
 
H

Howard Hinnant

Kelsey Bjarnason said:
[snips]

The data structure as described above has an efficiency concern:
Repeated growth in size can lead to quadratic expense due to continually
reallocating for a slightly larger size.

So don't do that. In simplest terms, don't add, multiply.

Did you read my entire post or just stop here?

-Howard
 
S

Stephen Sprunk

Howard Hinnant said:
No matter what the growth strategy for capacity, its existence is a
major motivation for finding out the amount of memory actually
allocated. create_array_short(N) may request only N*sizeof(short)
bytes, but may receive slightly more than that (for alignment purposes
or whatever). If create_array_short(N) had some way to find out how
much memory it actually received, then it makes perfect sense to set
capacity to size_received/sizeof(short) instead of to N. It makes for
higher performing code in the average case.

You're over-optimizing here. If malloc() returns more memory than you asked
for, when you expand the array, you'll get access to it. On average, you'll
perform the same number of realloc()s with or without this optimization
unless you're expanding by a very small amount each time, in which case a
large fraction of your realloc()s will be no-ops for the implementation.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin


*** ***
 
S

Stephen Sprunk

CBFalconer said:
However, you can determine how much memory is needed by some means
or other, and perform a realloc to that size. This can lead to
snarling dogs playing tug-of-war.

My first thought was "how clever", but there are serious problems with this:

1. it assumes the pointer the callee received is to the start of the object
2. realloc() may relocate the object

The latter is a serious problem if you can't communicate that fact back to
the caller; it is cleaner for the interface to provide a way to indicate the
object's size to the callee in the first place so that this hack isn't
needed.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin


*** ***
 
H

Howard Hinnant

"Stephen Sprunk said:
You're over-optimizing here. If malloc() returns more memory than you asked
for, when you expand the array, you'll get access to it. On average, you'll
perform the same number of realloc()s with or without this optimization
unless you're expanding by a very small amount each time, in which case a
large fraction of your realloc()s will be no-ops for the implementation.

Let's go through a specific example. I'm going to assume the struct I
showed before:

struct array_of_short
{
short* data;
size_t size;
size_t capacity;
};

Let's say that size == capacity == 100. Now for some reason my
algorithm decides it needs 2 more shorts. What do I do?

Today I call realloc similar to:

data = realloc(data, 3*capacity/2 * sizeof(short));

I.e. I up the capacity by 50% (or whatever growth factor you want to
use). Now under the hood lets say (and this is quite reasonable) that
the heap had already allocated 4 more bytes to data, and had another 64
bytes beyond that which it could have expanded that chunk in place. But
because my code is ignorant of the current heap structure in the
neighborhood of my pointer, the best thing I know to do is blindly ask
for a realloc expanding to 50 more shorts (assuming a 2 byte short for
this example). That in turn will trigger the heap to relocate my 200
byte array to another location.

If I had more access to the current heap structure, I could have known
that there was already room for two extra shorts allocated to me. I
could've avoided a call to realloc altogether.

Continuing along with this example: let's say instead of 2 more shorts,
my algorithm actually needed 10 more shorts. Here's my choices:

1. realloc for 10 more shorts.
2. realloc for a 50% bigger capacity.

Choice 1 risks the poor efficiency of an incremental growth strategy
(known performance problem).

Choice 2 completely wastes the 68 bytes that were sitting beyond my 200
byte buffer but had no way to find out about.

Wouldn't it be nice if I could:

A: Discover those extra 2 shorts that were allocated to me anyway.
B: Discover that I could expand in place by another 32 shorts without
the expense of relocating my array.

?

These seem like significant and obvious optimizations to me.

Indeed, I've coded them up and tested them. Given the right underlying
allocation interface one can achieve very decent performance increases
with very little effort. It won't make your code 10 times faster. But
it sure is an easy way to gain say 15% while minimizing heap
fragmentation and thrashing.

But it is only easy if you have the right allocation interface. Indeed
today the only way to achieve this type of optimization in C is to write
your own malloc/realloc/free functions (adding to the interface as
appropriate). Anyone who's written these functions (I have) knows that
this exercise can't be labeled easy.

-Howard
 
E

Eric Sosman

Howard Hinnant wrote On 05/04/06 17:26,:
[...]

If I had more access to the current heap structure, I could have known
that there was already room for two extra shorts allocated to me. I
could've avoided a call to realloc altogether.

IMHO, this isn't such a big deal. Or to put it another
way: if you've got an array that's (1) big enough to be a
problem and (2) expands and expands and expands and expands,
it's time to think about other data structures.

Are you familiar with any present-day file systems
that require each file to occupy contiguous disk sectors?
 
K

Keith Thompson

Howard Hinnant said:
Let's go through a specific example. I'm going to assume the struct I
showed before:

struct array_of_short
{
short* data;
size_t size;
size_t capacity;
};

Let's say that size == capacity == 100. Now for some reason my
algorithm decides it needs 2 more shorts. What do I do?

Today I call realloc similar to:

data = realloc(data, 3*capacity/2 * sizeof(short));

I.e. I up the capacity by 50% (or whatever growth factor you want to
use). Now under the hood lets say (and this is quite reasonable) that
the heap had already allocated 4 more bytes to data, and had another 64
bytes beyond that which it could have expanded that chunk in place. But
because my code is ignorant of the current heap structure in the
neighborhood of my pointer, the best thing I know to do is blindly ask
for a realloc expanding to 50 more shorts (assuming a 2 byte short for
this example). That in turn will trigger the heap to relocate my 200
byte array to another location.

If I had more access to the current heap structure, I could have known
that there was already room for two extra shorts allocated to me. I
could've avoided a call to realloc altogether.

No, you couldn't necessarily avoid the realloc altogether, unless the
extra memory is actually reserved. There might happen to be another
two shorts worth of storage available immediately after your allocated
space, but the system might later choose to use it for another
allocation.

More generally, if I call, say, malloc(300), the system might reserve
512 bytes for me, and might (internally) guarantee that it won't use
any of it for anything else until I call free() -- or it might
allocate 512 bytes, but remember that I only asked for 300, and later
carve out the last 128 bytes for another allocation. Or another call
to free() might result in my 300-byte allocation being at the
beginning of a contiguous chunk of 1024 bytes.

In other words, the number of bytes that are *really* reserved for a
given malloc() call isn't necessarily a meaningful question, and may
vary over time depending on things outside the program's control.

If I were going to suggest a new feature to address this (perhaps
straying off-topic a bit), I'd probably propose a new function similar
to realloc(), except that it's guaranteed not to relocate the object.
Let's call it xrealloc() (not a great name, but it will do for now).
If realloc() would have expanded or contracted the object in place,
xrealloc() is equivalent to realloc(). If realloc() would have
relocated the object, xrealloc() fails. (We wouldn't require
xrealloc() to succeed in *exactly* the same circumstances where
realloc() would expand or shrink the object in place, but it would
usually work out that way.)

If I've allocated space for 300 bytes, and I find that I need 302
bytes, I can call xrealloc(), asking for just 302 bytes. If this
succeeds, I'm done, and the call was probably very cheap. If it
fails, I can then do a more expensive realloc() call, requesting 400
or 500 bytes to leave room for future growth. And of course existing
code can ignore xrealloc() and continue to work as it always has.

This would allow an implementation to allocate more memory than you
ask for, while still permitting it to take back some of the extra
space if it needs it.
 
H

Howard Hinnant

Eric Sosman said:
Howard Hinnant wrote On 05/04/06 17:26,:
[...]

If I had more access to the current heap structure, I could have known
that there was already room for two extra shorts allocated to me. I
could've avoided a call to realloc altogether.

IMHO, this isn't such a big deal. Or to put it another
way: if you've got an array that's (1) big enough to be a
problem and (2) expands and expands and expands and expands,
it's time to think about other data structures.

Having multiple data structures in your tool box is always a good thing.
Sometimes you need contiguous array, sometimes you need a linked list,
sometimes a hash table or tree, etc. That being said, the contiguous
array is the workhorse of data structures. It's the one I find myself
reaching for most often. Even when I have no idea how much it will grow.
Are you familiar with any present-day file systems
that require each file to occupy contiguous disk sectors?

Nope. But I am intimately familiar with an in-ram data structure which
stores a set of contiguous arrays (and makes it look contiguous). Every
once in a while it comes in very handy.

-Howard
 
H

Howard Hinnant

If I had more access to the current heap structure, I could have known
that there was already room for two extra shorts allocated to me. I
could've avoided a call to realloc altogether.

No, you couldn't necessarily avoid the realloc altogether, unless the
extra memory is actually reserved. There might happen to be another
two shorts worth of storage available immediately after your allocated
space, but the system might later choose to use it for another
allocation.[/QUOTE]

There are two issues here (I'm guilty of introducing the second one in
my last post).

1. The allocator may give you extra memory for free but currently has
no way to tell you that.

2. The allocator may have extra memory located directly after what it
has allocated to you and that amount may vary with time.

In my previous example I attempted to address both of these.

I heartily agree that insight into #2 is the bigger performance win.
However, insight into #1 is practically cost free (there's a dollar on
the sidewalk, do you pick it up, or is it too much trouble to bend
over?).
If I were going to suggest a new feature to address this (perhaps
straying off-topic a bit), I'd probably propose a new function similar
to realloc(), except that it's guaranteed not to relocate the object.
Let's call it xrealloc() (not a great name, but it will do for now).
If realloc() would have expanded or contracted the object in place,
xrealloc() is equivalent to realloc(). If realloc() would have
relocated the object, xrealloc() fails. (We wouldn't require
xrealloc() to succeed in *exactly* the same circumstances where
realloc() would expand or shrink the object in place, but it would
usually work out that way.)

If I've allocated space for 300 bytes, and I find that I need 302
bytes, I can call xrealloc(), asking for just 302 bytes. If this
succeeds, I'm done, and the call was probably very cheap. If it
fails, I can then do a more expensive realloc() call, requesting 400
or 500 bytes to leave room for future growth. And of course existing
code can ignore xrealloc() and continue to work as it always has.

This would allow an implementation to allocate more memory than you
ask for, while still permitting it to take back some of the extra
space if it needs it.

I think that's an excellent idea. Perhaps there is some refinement
still to be done here. But the basic idea is fundamentally a win-win.

-Howard
 
C

CBFalconer

Howard said:
Let's go through a specific example. I'm going to assume the
struct I showed before:

Why bother with all these gyrations? The malloc package for your
system probably understands your system better than you do, and can
take advantage of its peculiarities. As am example, here is an
excerpt from the realloc code in my nmalloc package for DJGPP,
which attempts to avoid moving data (among other things)

/* if decreasing simply reduce size and move excess to free */
if (szneed <= m->sz) {
DBGPRTR(EOL " Realloc is reducing");
if ((m->sz - szneed) >= MINSAVE) {
m = split(&m1, szneed);
mv2freelist(m1);
}
/* else just return old pointer, i.e. NOP */
}
else if (szneed > ((ulong)(INT_MAX - 65536))) {
/* reject excessive size request */
p = NULL; goto exeunt;
}
else if (ISFREE(m->next) &&
(szneed <= (m->sz + m->next->sz)) ) {
/* the 'next' block is free and adequate so use it */
DBGPRTR(EOL " Realloc is combining, next is free");
m = m1 = combinehi(m);
/* now split off the excess, if any */
if ((m->sz - szneed) >= MINSAVE) {
m = split(&m1, szneed);
mv2freelist(m1);
}
/* else m is the oversized return block */
}
else if ((lastsbrk == m->next) &&

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
C

CBFalconer

Stephen said:
My first thought was "how clever", but there are serious problems
with this:

1. it assumes the pointer the callee received is to the start of
the object
2. realloc() may relocate the object

No it doesn't, and besides I do not recommend the strategy. There
is never any guarantee that realloc will return the same pointer.
Some systems try to.
The latter is a serious problem if you can't communicate that fact
back to the caller; it is cleaner for the interface to provide a
way to indicate the object's size to the callee in the first place
so that this hack isn't needed.

It was intended to point out the absurdity of such strategies.
Unless you are excessively amused by snarling dogs playing
tug-of-war.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top