Length of C++ arrays allocated by operator new[]

N

Nick Keighley

AFAIK it is impossible to detect the original length of an array from
the object pointer.

class SomeClass
{ // ...

};

SomeClass* array = new SomeClass[10];

size_t length = get_length_of_array(array); // Impossible

But the runtime needs to keep track of the number of elements anyway, to
call the appropriate number of destructors at delete[]. Why does the
language not allow to access this internal field?

1. backwards compatibility with C
2. new may allocate more memory than specified (though to call the
right number of DTORs it needs to know how many objects are actually
allocated)
3. no way to distinguish new'ed from other allocations


void fred(T* a)
{
for (i = 0; i < get_length_of_array(a); i++)
do_stuff (a);
}

this won't work for

a = (T*)malloc(n*sizeof(a));
T a[n];
 
N

none

SomeClass* array = new SomeClass[10];

size_t length = get_length_of_array(array); // Impossible

any particular reason you don't want to use vector instead and then use
vect::size ?

- The old platform does not support STL well.

Really? What's that old platform? I was using the STL in
old compilers in the last century simply by adding STLPort to the
project.
- STL causes the executable size to explode. One disadvantage of
template meta programming over generics. (Of course, there are many
advantages on the other side.)

"Explode" is a bit strong. But yes, it will increase a bit but you
seem to totally ignore the "Of course, there are many advantages on
the other side." Don't just say it. Seriously consider it.
std::vector has the major disadvantage that any piece of code that has
write access to its elements can also change its size and, more
importantly, cause reallocations. That's sometimes not wanted and could
cause hard to find bugs with UB, especially if the array is shared
between threads.

Euh, how is it different from raw dynamic arrays? They have been
source of hard to find bugs for as long as programming has existed.
Bad programming is bad programming. At least std::vector gives you
the tools to encourage or event enforce better programming.

The problems you list above can be handled simply be controlling
access (using const by default unless write access is needed) or as
proposed elsewhere by offering a more constrained interface via
composition.

If you really share raw arrays between threads directly, I suggest that
you have a much more fundamental problem to address that the possible
reallocation that a std::vector may do.

Yan
 
G

Goran

  Why is this urban legend so persistent? People repeat it without actually
trying it.
  I just tested with these two programs:
//---------------------------------------------------------
int main()
{
    int* array = new int[10000];
    for(int i = 0; i < 10000; ++i) array = i;
    int value = 0;
    for(int i = 0; i < 10000; ++i) value += array;
    delete[] array;
    return value;
}
//---------------------------------------------------------

//---------------------------------------------------------
#include <vector>
int main()
{
    std::vector<int> array(10000);
    for(int i = 0; i < 10000; ++i) array = i;
    int value = 0;
    for(int i = 0; i < 10000; ++i) value += array;
    return value;
}
//---------------------------------------------------------

  The size of the first executable? 5104 bytes.
  The size of the second executable? 5528 bytes.
  Yeah, the size really exploded.

Use it for some dozen different types, not just int.
Use many functions of the STL containers not just the easiest ones.

Templates have to be instantiated for each unique parameter set.


Depending on your build chain and build options, they won't be. All
vector<random_pointer_type>, and at least one integral type (e.g.
vector<int>) will all have __identical__ functions. Every single one
will match. Build chain do detect this and "fold" all these functions
into one. With such build chains, perhaps your build time will be
affected, but you will not have e.g. vector::insert(iterator, pt1),
vector::insert(iterator, pt1)... vector::insert(iterator, int). You
will have only one such insert, and that will be called for all
pointer types (pt1, pt2, ... and int).

What also happens is that you get completely different source of
functions, optimizations turn 2+ of those into same executable, and
again, only one of them goes to the final executable.

Goran.
 
J

Juha Nieminen

Marcel Müller said:
Use it for some dozen different types, not just int.
Use many functions of the STL containers not just the easiest ones.

Templates have to be instantiated for each unique parameter set. Even if
90% of the generated functions do not depend on all parameters or the
dependency ends once the compiler has done the type checks, the compiler
will generate distinct implementations. No programmer would repeat his
self that much without the templates (well, hopefully).

And exactly how would it be different from using non-templated code for
each used type? If you want an array of ints and an array of doubles,
you'll have to write twice as much code to handle those arrays. The only
difference with the templated version is that you will have to write the
code only once (it's the compiler that does the duplication rather than
yourself).

Your argument is as silly as saying "if I write two functions instead
of one, the size of the binary will increase". Of course it increases.
You are writing more code.

The question is, however, does the size of the code "explode" when you
use several types?

I understand "explosion" to mean that the size of the code increases
very significantly. Like if using one type would make the binary 5 kilobytes
but using two types would make it 100 kilobytes and using three would make
it 1 megabyte in size. That's an "explosion".

Well, rather than just theoretizing, I went and tried the example program
with two different types to see if the binary size "explodes". Or even
doubles:

//--------------------------------------------------
include <vector>

int main()
{
std::vector<int> array1(10000);
std::vector<short> array2(10000);
for(int i = 0; i < 10000; ++i) { array1 = i*3; array2 = i; }
int value = 0;
for(int i = 0; i < 10000; ++i) value += array1 + array2;
return value;
}
//--------------------------------------------------

The size of the executable? 5856 bytes.

Yeah, a true explosion in size.
 
P

Paul

On 22/08/2011 01:34, Paul wrote:
On 21/08/2011 23:01, Marcel Müller wrote:
<snip>
So vector<char>    initializes the storage? To zero I guess. I was not
aware of that.
It initializes *elements* in that storage:
std::vector<char>    v(100); // will have 100 chars of value 0
char* a = new char[100]; // will have 100 uninitialized chars
You have previoulsy argued that 'a' points to only one single char.
Are you now suggesting that 'a' points to 100 chars?
"will have" does not mean "points to"; 'a' points to the first of 100
uninitialized chars; 'a' points into an array of 100 uninitialized chars.
Also in the old thread on this topic I did say that I found the relaxed
(i.e. not technically accurate) language of "points to an array"
acceptable but not "pointer to an array".
With a sequence you have a start point, an end point and all the rest
inbetween. The term "pointing  into an array" suggests that it's
pointing to somewhere inbetween the start and end points(non
inclusive).

Nonsense; IMO "pointing into an array" includes the first and last elements.

This is not what "into" means. To point into something means you point
inside it, which is not the same as pointing to the beginning of it.
I think your terminology is confusing what seems to be a simple
contextual difference in the term "is a pointer to" where it can mean
either:
"is a pointer to X memory" or "is a pointer of type pointer to T".

IMO "is a pointer to" is a description of type; the key parts of this
term are the words "is a".


'a' points to the allocated memory. The allocated memory is freed by:
delete[] a;
I think it's techincally inaccurate to suggest that 'a' doesn't point
to the allocated memory(which is an array of chars).

It is technically accurate to say that 'a' points to the start of the
allocated array.
You say here that it's technically accurate to say that it points to
the beginning of the array.
Yet you have previously said it's technically inaccurate to say that
its pointing to an array.

If it points to an array how can it be inaccurate to say that it's
pointing to an array?
And, what you find not only technically inaccurate but unacceptable,
if it points to an array, how can it not be a pointer to an array?

I think your ideas fundamentally flawed.
The one and only meaning you find acceptable is that of the pointer
type. This is simply a different context and it does not imply the
"what it points to" context is in any way technically inaccurate.
The "what it points to" context is the more commonly used context and
is probably more techincally accurate because to say:
"a is a pointer to a single char" is techincally inaccurate because it
points to a sequnce of chars not one single char.
It would however be technically accurate to say:
"a's type is pointer to char" or " a is a pointer to char type".
Notice that the accurate way to refer to the type context is to use
the word "type", omitting this word naturally implies the "what is
pointed to" context i.e:
"a points to a char" , which implies what is pointed to. The pointer
could be a void pointer or some other type in theory but would
generally be asusmed to be a char* .
 
R

Richard Damon

But the runtime needs to keep track of the number of elements anyway, to
call the appropriate number of destructors at delete[]. Why does the
language not allow to access this internal field?

Because array could also have been set by:

class SomeClass
{ //
};

SomeClass element;

SomeClass* array = &element;


or you could have even changed you example to have been

SomeClass* array = (new SomeClass[10])[4];


In either of these cases there is no "length" to get.

Fine. But requesting the length of all these objects is no more
undefined behavior than applying delete[] to them. I only expect a
reasonable length information if I could use operator delete[]. In all
other cases it would be obviously a serious overhead.

Also, the runtime isn't always required to keep track of how many
elements are in the array, if SomeClass has a trivial destructor (like
all of its members are base types and no explicit destructor is defined)
than by the "as-if" rule, the run time doesn't need to know or store the
number of elements (new probably still needs to somewhere store the
allocation size for delete, but that is a different issue).

I am in doubt if any recent implementation uses anything else than the
allocation size for the invocation of the destructors.

Allowing this help minimize the overhead in things like new char[10]

That's the point. I see no additional overhead.


Marcel

For no overhead, since the C++ specification currently does NOT require
that new char[10] save the number of elements allocated (since char has
a trivial destructor which can thus be elided) adding an operator that
returns the number of elements from such a allocation would now add that
overhead to such an allocation.

Also, remember that new[] can be user defined, so to add an array size
operator like this would require that classes that use a user defined
operator new[] would also need to provide their own version of this new
length operator (admittedly, unless they knew their destructor was
trivial and thus able to be elided they would have to have stored it
somewhere too)
 
M

Marcel Müller

none said:
Really? What's that old platform? I was using the STL in
old compilers in the last century simply by adding STLPort to the
project.

IBM Visual Age C++ 3.0. AFAIR around 1996. Some fixes are from 1999.
The template support of this old compiler is surprisingly good. But it
can't parse "using" nor "namespace".
"Explode" is a bit strong. But yes, it will increase a bit but you
seem to totally ignore the "Of course, there are many advantages on
the other side." Don't just say it. Seriously consider it.

It is not an option because of the above restriction.

Euh, how is it different from raw dynamic arrays? They have been
source of hard to find bugs for as long as programming has existed.

Not if you keep track of their size with a small wrapper class. Boundary
checks could be restricted to debug builds.

It is a long time ago when I had occasionally problems with UB because
of out of bounds data access. Using reasonable string classes solved
many of the problems, consequently using smart pointers fixed the rest.
I prefer smart pointers for non-resizable array types that keep track of
the size. But they always need to allocate an additional word for the
size. That's why asked the question. The smart array pointer can only
take objects that can be destroyed by delete[] anyway.

The problems you list above can be handled simply be controlling
access (using const by default unless write access is needed) or as

Of course I use const frequently. But I also like lock-free data
structures. They really dislike to be moved around in memory.
Nevertheless, at the first glance they happen to work, but with ugly
race conditions waiting to fire.
proposed elsewhere by offering a more constrained interface via
composition.

Yes. But this is much of work, and I am a bit lazy too.

If you really share raw arrays between threads directly, I suggest that
you have a much more fundamental problem to address that the possible
reallocation that a std::vector may do.

No serious problem, if the elements are of atomic size (pointer to
something, including intrusive smart pointers). With a few tricks you
will even get strong thread safety this way. And if your favorite string
class works this way too, you have a lock-free, fully thread-safe array
of strings. Well, as long as you do not move it around.

Similar things apply to array elements that are heavy weight or
non-copyable. I prefer not to move objects around in memory. In fact
many of them are non-copyable. This is a no-go for std::vector. AFAIK
even if you don't use the dynamic resizing.
OK, most of the time complex and non-copyable objects should be used as
reference objects.

Another point is temporary storage in an algorithms. In this cases the
array size usually will not change, once it is constructed. Even tough
vector will do the job, I prefer fixed size array classes for this
purpose. The tell the reader more, namely that they are not intended to
grow (or shrink) somewhere in the backwaters of the code.


Marcel
 
M

Marcel Müller

Richard said:
For no overhead, since the C++ specification currently does NOT require
that new char[10] save the number of elements allocated (since char has
a trivial destructor which can thus be elided) adding an operator that
returns the number of elements from such a allocation would now add that
overhead to such an allocation.

Well, I guess any implementation of the memory management must keep
track of the total size for other reasons.

Also, remember that new[] can be user defined,

But you are right. This is a show stopper. It will prohibit any internal
interaction between the C++ runtime library and the standard allocator.
Now you are back to the char[10] problem above.


Marcel
 
P

Paul

On 08/20/2011 01:38 PM, Paul wrote:
void foo(T(&t)[N] ){
what is&t?
You have no t object to address.
Ruben
This function take one argument of type T(&)[N]. A referernce to an
array of N T's
It doesn't take the address of t , t is the function param name.

I don't know how your parsing that, but if you wrote that code, I'd
likely fire you.

Personally I think you should be fired for thinking a reference to an
array is taking the address of a non-existent object.
 
P

Paul Brettschneider

ruben said:
On 08/20/2011 01:38 PM, Paul wrote:

void foo(T(&t)[N] ){

what is&t?

You have no t object to address.

Ruben


This function take one argument of type T(&)[N]. A referernce to an
array of N T's
It doesn't take the address of t , t is the function param name.


I don't know how your parsing that, but if you wrote that code, I'd
likely fire you.

If you can't parse that, maybe _you_ should be fired? Seriously, this
continuous bragging of putative "code reviewers" about their small position
of power is annoying.

I must say I'm glad that I'm not in the coding business and the opinion of
people like you costs me not more than an shrug.
 
N

none

And exactly how would it be different from using non-templated code for
each used type? If you want an array of ints and an array of doubles,
you'll have to write twice as much code to handle those arrays. The only
difference with the templated version is that you will have to write the
code only once (it's the compiler that does the duplication rather than
yourself).

Your argument is as silly as saying "if I write two functions instead
of one, the size of the binary will increase". Of course it increases.
You are writing more code.

<Irony on>
Ah, but there are two fantastic alternatives to templates:

1-
Actually type the code for each types that are being used manually
(well maybe using copy-paste too). That way, you can line count your
source files in a much more reliable way.

The fact that you have duplication in your code base. That this is
harder to maintain and that you could even make an error in one of the
type-specific implementation is negligeable compared to the
incredible benefits of avoiding templates.

2-
Implement a generic container using void *. That way, your code size
will be a few % points smaller.

The fact that you are loosing type safety (introducing a whole lots
of bug oppurtunities) is negligeable compared to the benefit of
avoiding templates and avoid your executable size to have *exploded* by
a couple of % points.

<Irony off>

OK, more seriously:

for a trivial program that does nothing useful but only use some
templated container, you may very well be able to notice a significant
increase in size between:

#include <vector>
int main()
{
std::vector<int> v;
}

and

#include <vector>
int main()
{
std::vector<int> v;
std::vector<char> v;
std::vector<double> v;
std::vector<bool> v;
}

etc.

But for a real application that does real work and is composed of
thousands of lines of code, I really doubt that the template
instantiation would make such a significant difference in your
executable size. This all sounds like premature optimisation to me.
especially with the use of the biased "explode" for executable size
without any figures to back it up.


Yan
 
J

Juha Nieminen

Yannick Tremblay said:
Implement a generic container using void *. That way, your code size
will be a few % points smaller.

That's actually how generic containers are implemented in most other
OO languages (and even some non-OO ones). Of course it's much safer in
those because the language hides the details behind the safer syntax.

The problem is that yes, the executable size decreases a bit, but at
the cost of significantly increasing memory consumption (usually by
significantly more than the reduction in binary size). That's because
all the objects have to be allocated dynamically, adding memory
management overhead, and handled through references/pointers (which
are inexistent when dealing with objects by value, as in the case
of a std::vector having objects as elements).

(As a side-effect of this the data container becomes polymorphic,
which is sometimes nice, but in practice this is actually needed much
less frequently than containers with uniform elements.)

When people complain about the executable size growing, they seldom
seem to offer a viable alternative that wouldn't trade executable size
with an order of magnitude or two of increased memory usage.
 
J

James Kanze

On 08/22/2011 06:05 AM, Paul wrote:
On 08/20/2011 01:38 PM, Paul wrote:
void foo(T(&t)[N] ){
what is&t?
You have no t object to address.
This function take one argument of type T(&)[N]. A referernce to an
array of N T's
It doesn't take the address of t , t is the function param name.
I don't know how your parsing that, but if you wrote that code, I'd
likely fire you.

That code is a standard idiom, and I can't imagine a C++
application which didn't contain at least one instance of it.
(The most frequent use is now part of the standard library, but
it was in everyone's tool kit before that.)
 
N

Noah Roberts

I hear you. I was recently playing with a recursive call metaprogram
that recreates a switch statement...because I had nothing better to
do. I implemented the switch version, a function address lookup table
version, and the type based metaprogram version. I compiled all into
ASM and checked out the output. I'm not an expert on asm by any
measure but I noticed some rather interesting things. The smallest of
all the versions was the metaprogram version. Although it and the
switch version ended up being functionally equivelent, the main
section of ASM was exactly the same, the switch version had a lot of
extra stuff going on outside of main that I couldn't figure out what
it was or did, if anything. It also had function definitions that
were never used, while it the templated output they were discarded.

Completely useless technique to be sure, unless you were in some
embedded environment looking for every ounce of memory space...which
is exactly the kind of people that are scared of C++ "bloat" the most.
 
N

Noah Roberts

On 08/20/2011 01:38 PM, Paul wrote:
void foo(T(&t)[N] ){
what is&t?
You have no t object to address.
This function take one argument of type T(&)[N]. A referernce to an
array of N T's
It doesn't take the address of t , t is the function param name.
I don't know how your parsing that, but if you wrote that code, I'd
likely fire you.

That code is a standard idiom, and I can't imagine a C++
application which didn't contain at least one instance of it.
(The most frequent use is now part of the standard library, but
it was in everyone's tool kit before that.)

It's something that should be known for sure, but it's really not all
that common from what I've seen.
 
N

Nobody

Completely useless technique to be sure, unless you were in some
embedded environment looking for every ounce of memory space...which
is exactly the kind of people that are scared of C++ "bloat" the most.

I'm not so sure. The kind of programmer that likes to consider themselves
a cut above the CRUD-ers worries about C++ bloat. People who actually have
experience with low-level work know that you don't actually know until
you've looked at the assembly.
 
N

Noah Roberts

SomeClass* array = new SomeClass[10];
size_t length = get_length_of_array(array); // Impossible
any particular reason you don't want to use vector instead and then use
vect::size ?

Yes, well said.  std::vector should be preferred over dynamic arrays for
most cases; about the only case where one still needs to use dynamic
arrays is if you want to allocate but not initialize a buffer.

std::vector<AType> vect;
vect.reserve(100);

There you go. Space is allocated but not initialized.
 
G

Gerald Breuer

The length of the allocated array must be stored somewhere
by new so that delete[] knows how many destructors to call.
But theres no standard way to get this length.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
SterlingLa
Top