Why does ANSI not define a function to determine the size of (m)allocatedmem? (like _msize)

  • Thread starter bilbothebagginsbab5 AT freenet DOT de
  • Start date
B

bilbothebagginsbab5 AT freenet DOT de

Hello, hello.

So. I've read what I could find on google(groups) for this, also the faq of
comp.lang.c.

But still I do not understand why there is not standard method to "(...)
query the malloc package to find out how big an allocated block is". (
Question 7.27)

Is there somwhere explained why - because it would seem to me, that free()
and realloc(..) would have to know the size of allocated space, and I would
like to know the reason this information is not "disclosed" by a std-library.

Best regards,
Martin
 
M

Mike Wahler

"bilbothebagginsbab5 AT freenet DOT de" <"bilbothebagginsbab5 AT freenet DOT
de"> wrote in message news:[email protected]...
Hello, hello.

So. I've read what I could find on google(groups) for this, also the faq of
comp.lang.c.

But still I do not understand why there is not standard method to "(...)
query the malloc package to find out how big an allocated block is". (

First, most obvious question is: Why do you need to know?

You'll have to go visit the folks at comp.std.c to discuss
how and why the language is as it is. However, my stance is
because it's not necessary. When you allocate, you know how
much you're allocating. Simply 'remember' this value (store
it in a variable), and refer to it when needed.
Question 7.27)

Is there somwhere explained why - because it would seem to me, that free()
and realloc(..) would have to know the size of allocated space,

They do need to 'know' (but this 'knowledge' might be
implemented at a lower-level, i.e. in the OS itself; iow
'free()' might simply query the OS for this info[1]). The
mechanical details of allocation/deallocation are left up
to the implementation, and are not specified by the language.
You as the programmer don't need to know. 'free()' is required
to Do The Right Thing(tm).
and I would
like to know the reason this information is not "disclosed" by a
std-library.

Again, you'll need to ask in comp.std.c

To me, it's simple. Not needed. The smaller the library, the
less stuff imposed on folks that don't want or need it. IMO
a Good Thing(tm).

[1] If your implementation indeed does work this way and documents
exactly what it does and how it works, you might want to check
your OS API to see if the same info can be obtained via an API
call, or perhaps a (nonstandard) library extension. But personally
I would not go to such lengths without a very compelling reason
to do so.

-Mike
 
M

mfhaigh

bilbothebagginsbab5 said:
Hello, hello.

So. I've read what I could find on google(groups) for this, also the faq of
comp.lang.c.

But still I do not understand why there is not standard method to "(...)
query the malloc package to find out how big an allocated block is". (
Question 7.27)

Is there somwhere explained why - because it would seem to me, that free()
and realloc(..) would have to know the size of allocated space, and I would
like to know the reason this information is not "disclosed" by a
std-library.

Why should it be disclosed? I see no compelling reason. If you need
the size of a block later, then keep track of it when you allocate it.
You could also easily wrap malloc and free to provide this
functionality.

Different implementations have different allocation strategies to fit
their needs. Adding an additional, rarely used requirement will
negatively impact some implementations in terms of memory usage and
execution time.


Mark F. Haigh
(e-mail address removed)
 
M

Malcolm

"bilbothebagginsbab5 AT freenet DOT de" <"bilbothebagginsbab5 AT freenet DOT
de"> wrote
But still I do not understand why there is not standard method to "(...)
query the malloc package to find out how big an allocated block is". (
Question 7.27)
It's a decision by the original designers of the C language, and later by
ANSI.

The reason was probably that malloc() rounded up the space of the allocated
block to the nearest divisor by eight, and so an msize() method couldn't
return the exact size without rewriting the library, which was more trouble
than worth.

An msize() function for C is not necessarily a bad idea, though it
encourages functions of the form

/*
calculates the mean of an array of doubles
Notes: array must be allocated by malloc()
*/
double mean(double *x)

rather than

double mean(double *x, size_t N)

which is handier if the array happens to be on the stack.

If you think that msize() should be part of C2005 then no one will object to
you arguing this, and maybe your ideas will be incorporated into the
language. However the present situation is that it is not part of the
standard library.
 
P

Peter Nilsson

... If you need
the size of a block later, then keep track of it when you allocate it.
You could also easily wrap malloc and free to provide this
functionality.

You can do it if you know _all_ the types being allocated with malloc
by a given program, but there's no bullet proof general purpose way in
standard C alone.
 
M

Michael Mair

Peter said:
You can do it if you know _all_ the types being allocated with malloc
by a given program, but there's no bullet proof general purpose way in
standard C alone.

Have I missed something? What speaks against
struct my_malloc_entry {
void *data;
size_t size;
};
or the linked list equivalent and keeping track of address
and size? After handling size==0 and checking whether malloc()
was successful, you store the data you need. If you are asked
for the size, you go through the array/list/whatever. At freeing,
you either "invalidate" the memory or actually free it.
If you want, you can implement a test mode which does not free
the memory and enables you to ask for potentially invalid
pointers or whatever.
All in standard C.
Or you allocate a large chunk of memory and manage "dealing out"
parts of it by yourself. In standard C.
Either way, the requested functionality _can_ be provided.

I do not claim that these are efficient ways of doing it, hence
probably not "general purpose" enough for some people.
Maybe you thought only of storing the struct immediately before
the malloc()ed memory; it would at least explain your remark
about the types.

Cheers
Michael
 
D

Dik T. Winter

> bilbothebagginsbab5 AT freenet DOT de wrote: .... ....
> Why should it be disclosed? I see no compelling reason. If you need
> the size of a block later, then keep track of it when you allocate it.
> You could also easily wrap malloc and free to provide this
> functionality.

But there are allocaters that will allocate more than requested in many
cases. If you are tight on memory and want to keep track of what you
are actually using you might wish such a capability.
 
E

E. Robert Tisdale

bilbothebagginsbab5 said:
So. I've read what I could find on google(groups) for this,
also the faq of comp.lang.c.

But still I do not understand why there is not standard method to "(...)
query the malloc package to find out how big an allocated block is".
(Question 7.27)

Is there somwhere explained why - because it would seem to me that
free() and realloc(..) would have to know the size of allocated space,
and I would like to know the reason [why]
this information is not "disclosed" by a std-library.
> cat main.c
#include <stdio.h>
#include <stdlib.h>

size_t allocation(const void* p) {
return ((size_t*)p)[-1] - 1;
}

int main(int argc, char* argv[]) {
if (1 < argc) {
const size_t n = atoi(argv[1]);
const void* p = malloc(n);
fprintf(stdout, "size = %u\n", allocation(p));
free((void*)p);
}
return EXIT_SUCCESS;
}
> gcc -Wall -std=c99 -pedantic -o main main.c
> ./main 33 size = 40
> gcc --version
gcc (GCC) 3.4.1

The ANSI/ISO C standards don't specify such a function
because it isn't necessary.
The standard doesn't even require the implementation
to keep track of the size of the memory allocated.
But all viable implementations *do* keep track
of the size information and you only need to find out where.
My implementation reserves an extra double word (8 bytes)
just before p to store the size information and sets

((size_t*)p)[-1] = ((size + 11)%8)*8 + 1
 
M

Martin T.

bilbothebagginsbab5 said:
Hello, hello.

But still I do not understand ...
Best regards,
Martin


Dear Folks.
May I thank everybody who shared some thoughts on the issue.

After a good night's sleep, I came up with the following conclusion:

What I didn't take into account was, that - as is mentioned by some -
the implementaion only has to guarantee the _at_least_ the size of
memory requested is allocated.
So if the impl. would reserve more memory it would only have to keep
track of the memory it actually reserved, and not the size which the
programmer wants to use.
So if _msize() cannot tell me what I've requested, but only what is
actually reserved, it's pretty useless for the thing I wanted it for.
(Well, propably it would have worked anyway, since I have an allocated
array of a pretty big struct, so it seems *very* unlikely to me that
the system would reserve excess mem longer than this struct ... but I
don't think I will take the risk :) )

best regards,
Martin
 
L

Lawrence Kirby

On Fri, 17 Dec 2004 00:43:52 +0100, Michael Mair wrote:

....
Have I missed something? What speaks against
struct my_malloc_entry {
void *data;
size_t size;
};
or the linked list equivalent and keeping track of address
and size? After handling size==0 and checking whether malloc()
was successful, you store the data you need. If you are asked
for the size, you go through the array/list/whatever. At freeing,
you either "invalidate" the memory or actually free it.
If you want, you can implement a test mode which does not free
the memory and enables you to ask for potentially invalid
pointers or whatever.
All in standard C.

That's fine because you're leaving malloc() to handle all alignment
issues.
Or you allocate a large chunk of memory and manage "dealing out" parts
of it by yourself. In standard C. Either way, the requested
functionality _can_ be provided.

This is problematic because to deal out parts of an allocated area you
must ensure that each part is correctly aligned. There is no portable way
of doing this.

Lawrence
 
W

websnarf

First, most obvious question is: Why do you need to know?

Performance reasons. Generally malloc actually allocates somewhat more
memory that was requested. If malloc is being used to back a resizable
array, and there is actually memory available that wasn't originally
asked for, then knowing the real size can let you postpone the realloc
as you, perhaps, add entries to your vector. There are definately
quantifiable cases where my string library http://bstring.sf.net/ could
improve its performance with this knowledge if it was available.
They do need to 'know' (but this 'knowledge' might be
implemented at a lower-level, i.e. in the OS itself; iow
'free()' might simply query the OS for this info[1]).
[...] The smaller the library, the
less stuff imposed on folks that don't want or need it.

I think you are missing the OP's point. He is saying that there is
code linked into your application that accesses the allocated size
anyways (hidden in the code for free() and realloc()), whether you want
it to or not. You just don't have any direct access to it.

The only real advantage of the standard not exposing this might be if
this, in fact, was not exactly true. For example, the size might
*change* as a side effect of other allocations. I.e., the size might
increase because the sliver between it and an adjacent allocation is
too small and so it gets attached to a previous allocation to decrease
fragmentation or leaks. Or the allocation scheme may use a heuristic
to "predict" impending "reallocs", which may cause it to decrease the
size as a results of a later heuristic failures or something.

But, from my own research into memory allocation schemes none of these
ideas are representative of good high performance or memory stingyness
or fragment reducing solutions. I.e., I think it would be a good idea
to go ahead and add such a thing to the standard. However, there are a
lot more things I would like to add to the standard regarding memory
allocation as well.
 
M

Mike Wahler

Performance reasons.
Generally malloc actually allocates somewhat more
memory that was requested.

It might, it might not. It's required to allocate
*at least* the requested size, and allowed to allocate
more. But your program is only allowed to legally access
the specific amount requested. Hands off the 'extra'
if you want your program's behavior to remain well-defined.
If malloc is being used to back a resizable
array, and there is actually memory available that wasn't originally
asked for, then knowing the real size can let you postpone the realloc

No. See above.
as you, perhaps, add entries to your vector. There are definately
quantifiable cases where my string library http://bstring.sf.net/ could
improve its performance with this knowledge if it was available.

I don't see how, not in a standard manner.
They do need to 'know' (but this 'knowledge' might be
implemented at a lower-level, i.e. in the OS itself; iow
'free()' might simply query the OS for this info[1]).
[...] The smaller the library, the
less stuff imposed on folks that don't want or need it.

I think you are missing the OP's point. He is saying that there is
code linked into your application that accesses the allocated size
anyways (hidden in the code for free() and realloc()), whether you want
it to or not. You just don't have any direct access to it.

Right. But accessing it is outside the realm of standard C.
The only real advantage of the standard not exposing this might be if
this, in fact, was not exactly true. For example, the size might
*change* as a side effect of other allocations.

The implementation is free to perform whatever internal
machinations it likes in order to provide the required
behavior. But again, access and manipulation of such
'internals' is necessarily nonstandard, platform-specific.
I.e., the size might
increase because the sliver between it and an adjacent allocation is
too small and so it gets attached to a previous allocation to decrease
fragmentation or leaks. Or the allocation scheme may use a heuristic
to "predict" impending "reallocs", which may cause it to decrease the
size as a results of a later heuristic failures or something.

But, from my own research into memory allocation schemes

BUT: C is intentionally designed to leave the selection of
such 'schemes' to the implementation, in the interest of
maximal portability.
none of these
ideas are representative of good high performance or memory stingyness
or fragment reducing solutions. I.e., I think it would be a good idea
to go ahead and add such a thing to the standard. However, there are a
lot more things I would like to add to the standard regarding memory
allocation as well.

Yes, I know, many people have their own favorite things they
want added to the language. But I doubt such things as this
'low level' memory stuff will be, in the interest of keeping
things as abstract as possible, to keep the language as portable
as possible.

I do realize that many times it can be necessary to work
more 'intimately' with a given platform in the interest of
e.g. performance. But such things are (imo properly) outside
the scope of the standard language.

-Mike
 
K

Keith Thompson

Mike Wahler said:
It might, it might not. It's required to allocate
*at least* the requested size, and allowed to allocate
more. But your program is only allowed to legally access
the specific amount requested. Hands off the 'extra'
if you want your program's behavior to remain well-defined.
[...]

Certainly, given the current language definition.

IMHO, it wouldn't be unreasonable to add something like the following
to <stdlib.h>:

size_t bytes_allocated(void *ptr);

It invokes undefined behavior in the same circumstances as free(ptr),
except that bytes_allocated(NULL) also invokes undefined behavior.
Otherwise, ptr is a pointer earlier returned by malloc(), calloc(), or
realloc(), and the function returns a number of bytes that the
implementation guarantees the program is able to access, which is at
least the number of bytes requested in the *alloc() call.

An implementation that just returns the number of bytes requested
would be conforming.

If the value returned is (sometimes) greater than the number
requested, it can sometimes save the need for a call to realloc().

This should be easy to implement (though impossible to implement
portably). If there are implementations for which it isn't easy to
implement this function, that would be a good argument against adding
it to the standard.

On the other hand, it wouldn't be unreasonable *not* to add this
function to the standard. For most purposes, the existing interface
is good enough. Creeping featurism is always a risk, and the burden
of proof is on anyone advocating an addition. I don't pretend that
I've met that burden.
 
J

Jack Klein

Mike Wahler said:
It might, it might not. It's required to allocate
*at least* the requested size, and allowed to allocate
more. But your program is only allowed to legally access
the specific amount requested. Hands off the 'extra'
if you want your program's behavior to remain well-defined.
[...]

Certainly, given the current language definition.

IMHO, it wouldn't be unreasonable to add something like the following
to <stdlib.h>:

size_t bytes_allocated(void *ptr);

It invokes undefined behavior in the same circumstances as free(ptr),
except that bytes_allocated(NULL) also invokes undefined behavior.
Otherwise, ptr is a pointer earlier returned by malloc(), calloc(), or
realloc(), and the function returns a number of bytes that the
implementation guarantees the program is able to access, which is at
least the number of bytes requested in the *alloc() call.

An implementation that just returns the number of bytes requested
would be conforming.

If the value returned is (sometimes) greater than the number
requested, it can sometimes save the need for a call to realloc().

Now you've fallen into a very nasty trap. You're assuming not only
that the implementation tells you that memory is there, but also lets
you use more than you asked for with defined results. And that leads
to performance losses in some situations.

Let's just say that such a function exists. And let's say that you
allocate some number of bytes, indicated by the macro SIZE.

my_ptr = malloc(SIZE);

Now as your program continues, you realize that you could use a few
more bytes, let's say exactly three more.

if (bytes_allocated(my_ptr) < SIZE + 3)
{
/* use realloc() to resize the block larger */
}

/* add three more bytes to the block */

Then a little further on, you need more memory still, and your
bytes_allocated() function indicates there is not enough, so you must
call realloc(). If malloc() has to move the block to extend it to the
new size, it must copy the contents of the original block. Under
today's standard, that would be SIZE bytes.

But since you can actually store data in bytes_allocated() bytes,
without bothering to inform the library that you are doing so, it must
copy all of those bytes into the newly allocated block.

So every program pays a potentially heavy price on every call to
realloc(), just so you can avoid a realloc() once in a while. This is
quite the opposite of the spirit of C, where you don't pay for what
you don't use.
On the other hand, it wouldn't be unreasonable *not* to add this
function to the standard. For most purposes, the existing interface
is good enough. Creeping featurism is always a risk, and the burden
of proof is on anyone advocating an addition. I don't pretend that
I've met that burden.

It would be very reasonable *not* to add this functionality to the
standard library. I don't want to pay the price for the extra copying
when I call realloc().

The only other case it solves is that of the lazy programmer, who
can't be bothered to remember the size he/she asked for and pass it
around as necessary.
 
W

websnarf

Now you've fallen into a very nasty trap. You're assuming
not only that the implementation tells you that memory is
there, but also lets you use more than you asked for with
defined results. And that leads to performance losses in
some situations.

Not necessarily. In the scenario you describe, the implementation can
track whether or not bytes_allocated() has been called on it or not.
And remember since the whole point is to reduce the *number* of
reallocs, we are gaining back that peformance in ideal situations
anyhow. And even if the implementation situation I describe is not
common you can back off to the current assumptions just by returning
the original memory size requested (which must be known to leverage the
realloc scenario you suggest.)

But the value of such a function obviously includes debugging. So I
don't see the inclusion of such a function as either a trap or an
irrelevancy.
 
K

Keith Thompson

Jack Klein said:
in comp.lang.c: [...]
IMHO, it wouldn't be unreasonable to add something like the following
to <stdlib.h>:

size_t bytes_allocated(void *ptr);
[...]
Now you've fallen into a very nasty trap. You're assuming not only
that the implementation tells you that memory is there, but also lets
you use more than you asked for with defined results.

Strictly speaking, I was suggesting giving the implementation an
opportunity to promise more useful memory than the user asked for.
The implementation isn't obligated to take advantage of this.

But ...

[...]
But since you can actually store data in bytes_allocated() bytes,
without bothering to inform the library that you are doing so, it must
copy all of those bytes into the newly allocated block.

So every program pays a potentially heavy price on every call to
realloc(), just so you can avoid a realloc() once in a while. This is
quite the opposite of the spirit of C, where you don't pay for what
you don't use.

That's a very good point; I hadn't thought of that.

I've thought of two or three workarounds, but the results are ugly, so
I think I'll give up on the whole idea.
 
C

Chris Croughton

Not necessarily. In the scenario you describe, the implementation can
track whether or not bytes_allocated() has been called on it or not.
And remember since the whole point is to reduce the *number* of
reallocs, we are gaining back that peformance in ideal situations
anyhow. And even if the implementation situation I describe is not
common you can back off to the current assumptions just by returning
the original memory size requested (which must be known to leverage the
realloc scenario you suggest.)

The implementation may well have to use more memory to store the length,
thus wasting resources in the more common cases (on some machines quite
possibly wasting 16 bytes or more in order to get the alignment for the
worst case). Or it might have to spend time looking for the length in
some implementations (for instance ones using garbage collection where
separate lists of pointers and lengths are kept). There's also one
common one where the last block allocated on a 'heap' has an allocated
size of "the rest of the heap" until another block has to be allocated
after it.
But the value of such a function obviously includes debugging. So I
don't see the inclusion of such a function as either a trap or an
irrelevancy.

If you want it for debugging you can implement it on top of the existing
functions (as debuggng libraries such as dmalloc do), or your debugger
can interface with the allocation libraries at low level (since
debuggers are inherently system dependent).

How do you think C programming has survived for many decades without it?
It obviously isn't essential, it isn't even in C++ (where it could have
been added easily if they had wanted to do so). Does any language
actually allow you to allocate something anf find out how big it is
later?

Chris C
 
M

Malcolm

Chris Croughton said:
Does any language actually allow you to allocate something anf find out how
big it is later?
In Java you can allocate an array of objects

eg
int [] catch = new int[daysfishing];

when you want you use the array you can find the length

for(i=0;i<catch.length;i++)
total += catch;

this is handy since it means you don't have to bother keeping track of the
array size, and also means that the array and the size cannot get out of
synch.

The disadvantage is that you pay a price for carrying about bounds
information internally.
 
C

Chris Croughton

Chris Croughton said:
Does any language actually allow you to allocate something anf find out how
big it is later?
In Java you can allocate an array of objects

eg
int [] catch = new int[daysfishing];

when you want you use the array you can find the length

for(i=0;i<catch.length;i++)
total += catch;

this is handy since it means you don't have to bother keeping track of the
array size, and also means that the array and the size cannot get out of
synch.


True, you can do it with vectors in C++ as well. C++ ones also allow
you to dynamically extend them (and get both the current length and the
current allocated size), but they are basically just structures with
dedicated functions to access them and are part of the library not of
the syntax.
The disadvantage is that you pay a price for carrying about bounds
information internally.

And probably speed penalties as well, if it uses it for length checking
on access (which Java does, I believe and C== STL vectors generally
don't).

C was designed to be lean & mean, if you don't know what you're doing
use some other language with more protection...

Chris C
 
M

Michael Mair

Lawrence said:
On Fri, 17 Dec 2004 00:43:52 +0100, Michael Mair wrote:

...



That's fine because you're leaving malloc() to handle all alignment
issues.



This is problematic because to deal out parts of an allocated area you
must ensure that each part is correctly aligned. There is no portable way
of doing this.

You are of course right... I noticed it but was unable to amend it
myself (business trip).
Thanks for the correction :)

Cheers
Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top