A few questions on C++

I

Ian Collins

Phlip said:
(Not reading the whole thread here but) doesn't return value optimization
apply?
The copy is made before the increment, so RVO does not apply.
 
J

Jack Klein

AFAIUI, the "2's complement" part is not the requirement. The actual
representation is implementation-defined.

Actually, the "2's complement" part is a requirement for the C99
<stdint.h> integer types, and I assume that it will also be a
requirement for the <cstdint> types in C++0x. Specifically:

7.18.1.1 Exact-width integer types

1 The typedef name intN_t designates a signed integer type with width
N, no padding bits, and a two’s complement representation. Thus,
int8_t denotes a signed integer type with a width of exactly 8 bits.

2 The typedef name uintN_t designates an unsigned integer type with
width N. Thus, uint24_t denotes an unsigned integer type with a width
of exactly 24 bits.

3 These types are optional. However, if an implementation provides
integer types with widths of 8, 16, 32, or 64 bits, it shall define
the corresponding typedef names.

Yes, see paragraph 3 of my quotation, above.
I am not sure about this one. 'long' is required to be at least 32
bits, so if your implementation provides 'long' (as it should to be
a standard C++ compliant implementation), so 'int32_t' would be
typedef'ed to 'long', and provided no matter what. And 'int64_t'
could probably be emulated (using some class). C++ has no limitations
C99 has when type implementation is concerned.

Not that the OP is likely to run into it, but I remember from long ago
a Motorola DSP with a 24-bit word size. Typical of DSPs, it could not
address memory in smaller units than a word. So in its C
implementation (this was a long time ago, it had no C++ compiler),
char, short and int all had 24 bits, and long had 48. Since this was
long before C99 and there was no need for long long, this
implementation met the C and C++ standards for all the integer types.

But in C99, and presumably not in C++0x, it would be invalid for the
compiler to contain a header applying a typedef of "long" as
"int32_t". It would be proper, and required, for this compiler to
typedef long as both int_least32_t and as int_fast32_t, both of which
are required.
We are talking C++ here, not C99.

You can write your own header for any conforming C++ implementation
that supplies the required (u)int_leastN_t and (u)int_fastN_t for at
least the 8, 16, and 32-bit versions. On most desktop C++
implementations today, there is a type you can use for x_64_t.

And since 99.8 percent of C++ programmers never have and never will
program for a DSP or an antique architecture, they don't really have
to worry about C++ implementations that don't have exact width 8, 16,
and 32-bit 2's complement types.
Why not? Worked well for empty structs, the ternary operator...

V

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html
 
J

Jerry Coffin

[ ... ]
1) Should one prefer short / long to int in favour of better
compability? Does the sizeof(int) still vary dramatically between
platforms?

With typical current, hosted implementations, sizeof(int) is either 4 or
8. For most of those, long varies exactly the same way int does (i.e.
long and int are the same size).
2)Should one check a pointer for NULL before deleting it? Although
deleting a NULL pointer is said to be safe in C++, I see a lot of code
doing the check before deletion. Is this because they still preserve
the C attitude?

There's no particularly good reason to check whether you have a null
pointer before using delete. OTOH, if the situation arises often enough
to notice, you're probably doing something you shouldn't -- then again,
widespread use of new and delete tends (IMO, anyway) to be an indication
of a problematic design.
3) When is a struct favourable over a class?

Clearly when you're dealing with POD. Even if you add a few things that
might prevent it from being a POD, if you have something that basically
just groups some data together, a struct is often preferable.
4) Should one favor '++iterator' over 'iterator++'? I did some
performance tests myself and did not see a big gap between their
execution times ( by using containers with size 10000000 which contain
objects )

If you're not using the resulting value, yes, you might as well use the
prefix form -- short of malice aforethought, it'll normally be at least
as efficient as the postfix form, and it might be more efficient.
Depending on the type, there may be no difference at all (e.g. for most
built-in types there's rarely any difference between the two).
 
J

James Kanze

AFAIUI, the "2's complement" part is not the requirement. The actual
representation is implementation-defined.

No. The C standard says: "The typedef name intN_t designates a
signed integer type with width N, no padding bits, and a
two\u2019s complement representation." And the current draft of
the C++ standard says (concerning <cstdint>: "The header defines
all functions, types, and macros the same as C99 subclause
7.18."
I am not sure about this one.

Again, from the C standard: "These types [intN_t] are optional.
However, if an implementation provides integer types with widths
of 8, 16, 32, or 64 bits, it shall define the corresponding
typedef names."
'long' is required to be at least 32
bits, so if your implementation provides 'long' (as it should to be
a standard C++ compliant implementation), so 'int32_t' would be
typedef'ed to 'long', and provided no matter what.

Only if long is exactly 32 bits (which it isn't on most of my
machines), and uses 2's complement representation (which isn't
the case on at least one machine currently being sold).
And 'int64_t' could probably be emulated (using some class).
C++ has no limitations C99 has when type implementation is
concerned.

What you mean is that it has operator overloading, so you can
emulate the behavior of existing types fairly closely.
Realistically, however, most modern compilers will already
provide long long (from C99 and the current draft), and quite
often <stdint.h>. So if you need 64 bit support, you might as
well use it, with the advantage that on a 64 bit machine, you'll
use the machine instructions, rather than emulating them.
We are talking C++ here, not C99.

Yes, but this part of C99 has already been integrated into the
current draft; it will be part of the next C++ standard, and has
already been implemented by most compilers. If you need a 64
bit type, and you target general purpose machines, you might as
well go ahead and use it.
 
J

James Kanze

sizeof(int) is implementation-defined and hence no conclusions
can be drawn (rather, it is incorrect to draw any conclusions)
regarding whether it varies across platforms and/or
implementations.

It depends. Formally, the standard guarantees that an int has
at least 16 bits. In practice, it depends on the type of
platform you're targetting; for general purpose machines, you
can pretty much count on 32 bits today.

And of course, short and long have no more guarantees. From the
standard, a short is at least 16 bits, and a long 32; on general
purpose machines, however, only long really varies: 32 bits on
older systems, 64 on more recent ones (or when compiling in 32
bit mode).

There are, of course, exceptions. I expect that some embedded
processors still have 16 bit int's, and at least one mainframe
has 36 bit int's (and 1's complement).
According to the C++ standard, deleting a NULL pointer is
safe. So it is not mandatory to test the pointer for NULL
before deleting it.

And the C standard says that you can safely call free with a
null pointer as well. This may not have worked with some early,
pre-standard C compilers, but I can't imagine it being a problem
today. (I stopped testing for null some time around 1993 or
1994, and I've never encountered any problems.)

[...]
If you have all public members and you want to avoid writing
"public" access specifier in the definition:

It's a question of coding guidelines, really. My rule is
generally to use struct if the class has data members, and all
of them are public, and to use class otherwise. Other people
have other rules, however, and they aren't necessarily wrong
either.

[...]
Preincrements are generally faster than postincrements since
postincrements need to make a copy before doing an increment. No such
copy-construction is necessary for pre-increment.

Have you actually measured the difference? I did, and found no
measurable difference. Obviously, you can design an iterator so
that there will be a difference, but such iterators will
probably cause performance problems elsewhere anyway---the STL
expects iterators to be cheap to copy.

About the most you can say is that it is almost certain that
pre-increment won't be slower than post-increment. So if all
other things are equal, why not use pre-increment. Except, of
course, that all other things rarely are equal---most of us have
to deal with existing code, and in such cases, it is decidedly
better to follow whatever conventions it uses, rather than to
worry about non-existant performance problems.
 
J

James Kanze

E pore. All the ones I use do. At least, my measurements show
no difference in performance.
Post-increment always performs copying. So there is always difference.
Sure, you won't see it for integers etc.
Always use pre-increment except situation when you really need pre-
increment.

That's the sort of pronouncement an inexperienced amateur might
make. A professional will consider all of the variables
involved. On a green fields project, with no existing code, you
choose one; why not pre-increment? If existing code is present,
you conform to its standards, unless there is a strong advantage
in not doing so. In this case, there is none.
 
J

James Kanze

The copy is made before the increment, so RVO does not apply.

But the as if rule applies. If the copy is never used, and the
copy constructor has no side effects, the compiler can suppress
it. All of the compilers I know do, at least when the operator
and the copy constructor are visible (i.e. inline or templates).

I've actually measured the performance of the four variants:

for ( Iter it = c.begin() ; it != c.end() ; ++ it ) ...
for ( Iter it = c.begin(), end = c.end() ; it != end ; ++ it ) ...
for ( Iter it = c.begin() ; it != c.end() ; it ++ ) ...
for ( Iter it = c.begin(), end = c.end() ; it != end ; it ++ ) ...

With g++, for *all* of the standard iterators (both forward and
reverse, for vector, deque, list and set), there was no
measurable difference between the first and the third, nor
between the second and the fourth. (There was a measurable
difference, albeit not enormous, between the first and third and
the second and fourth.)

So the correct answer is: don't worry about it. If you already
have existing code, be consistant, and do what it does.
 
P

Philip Potter

Victor said:
We are talking C++ here, not C99.

As others have confirmed, they have the same semantics. And so I wouldn't
recommend using C++ intNN_t types unless you knew they were necessary, because
they aren't guaranteed to exist on all C++ implementations (though in practice
will exist more than 90% of the time). int_leastNN_t types will do the same job
and be portable to all C++ implementations. The only cases where you should use
intNN_t are where you *really* need an exact width, such as for binary file I/O,
or device driver code.

There has been much discussion that because the int_leastNN_t and int_fastNN_t
types are more useful than intNN_t, one of them should have been given the
shorter name and the exact-width types been named int_exactNN_t.
Why not? Worked well for empty structs, the ternary operator...

Perhaps I should say instead that if C++ wishes to deviate from C, it should
have a good reason for doing so. Implicit void * casts are one example - in C,
the return value from malloc doesn't have to be cast to the pointer type, while
in C++ the return value from new is already the correct type so the implicit
cast is only needed for legacy code which uses malloc, and otherwise can
introduce subtle bugs.

intNN_t types are not a good example. There is no sensible reason for C++ to
deviate from C here (and the standards committee seems to agree).
 
W

werasm

D. Susman said:

No comment.
2)Should one check a pointer for NULL before deleting it? Although
deleting a NULL pointer is said to be safe in C++, I see a lot of code
doing the check before deletion. Is this because they still preserve
the C attitude?

Yes, I used to do that prior to knowing that it is
safe to delete a NULL pointer. When I see this...

if( p != NULL ){ delete p; }

.... I usually believe that the programmer simply
did not know. But to be honest, nowadays
I never have deletes in my code. I've written
entire projects without calling delete once (because
its called automatically). I even to global searches
for delete to find the source of problems (as I've
spent many a late night because of pointers).
3) When is a struct favourable over a class?

I would guess for plain old data (no behaviour) or
when all the members can be public, for instance
in a traits or policy class, and you are too lazy to
type public...
4) Should one favor '++iterator' over 'iterator++'? I did some
performance tests myself and did not see a big gap between their
execution times ( by using containers with size 10000000 which contain
objects )

While the possibility exists for things to be optimized,
especially in the case of built-in types, I prefer using ++iterator
consistently. Why pessimize prematurely?

Regards,

Werner
 
J

James Kanze

Use int, and (per a recent thread here) if you are not
portable, don't waste time pretending you are portable. If
your boss requests portability, actually run and test on
multiple platforms as you code.

More to the point, define portable. Formally, the C++ standard
requires an int to be at least 16 bits, no more. Practically,
if you're targeting general purpose computers, you can count on
32 bits; I think the risk of seeing less today is negligible,
except in embedded systems.

And don't program assuming a specific size. If you need a
specific size for some reason, use the C types defined in
Your question contains the latent assumption that you can
never change your code once its 'int's are installed. If you
instead write lots of unit tests, they will check things like
the maximum values for your important numbers, and they will
help you change your variable types when the time comes.

I don't think it's that simple. You probably don't want to
change all of your int's; just those which are used for specific
values. Which means you have to identify what each int is used
for, to decide whether to change it or not.

Ideally, there would be a typedef for each different use of int,
and you'd only have to change the appropriate typedef.
Practically, I don't think I've ever seen that much rigor.
Your remaining int awareness invests in writing tests that
check your program's boundaries.

The first thing you have to do is determine the limits.
Typically, a user won't specify a specific limit; he'll just say
"as big as possible". If pushed, he'll come out with some large
round figure, like 1,000,000; that doesn't mean that he
wants you to reject input of 1,000,001, however. The usual
solution is to consider what you do with the input, determine
the actual limits, and verify those, rejecting input which you
cannot handle. Thus, for example, if you multiply some input by
2, you'll check that it is less than INT_MAX/2, and output an
error message if it isn't.

And you should also test this in your test suite: test the
maximum you can handle, to verify that you get correct results
with it, and test one more than the maximum, to verify that you
get the error message, and not wrong results. This is difficult
to do with black box testing, although some sort of binary
search could be used. (Test input larger than INT_MAX, to
ensure that it causes an error. Then test INT_MAX/2: it should
either return the correct results, or there should be an error
message. If it returns the correct results, try adding
INT_MAX/4; if it returns an error message, try subtracting
INT_MAX/4. At some point, you should end up at a place where
one value returns the correct results, and adding one to it
causes the error message. I don't know of any test framework
which has support for this sort of test, however.)
No, you should use a smart pointer that wraps all such checks
up for you.

Why? What does a smart pointer buy you, if all it does is an
unnecessary test?

Don't forget, too, that most delete's are in fact "delete this".
And "this" cannot be a smart pointer.
Next, the line delete(int*)NULL is well-formed and
well-defined to do nothing.

Exactly. So there's no need in the test.
(And note you should use a C++ style cast, in the form
"elaborate_cast<int*>", instead of a simple (int*).)
It's because they don't fold duplication up into smart
pointers. _That_ is preserving the C attitude.

It's because they don't know the rules of the language. And it
has nothing to do with a C attitude, since the rules are exactly
the same for free(), in C.
When you need a data bucket that is itself private to an outer class.

When you need a data bucket, period. I agree that it will
usually be private to an outer class, but there are exceptions.

There are also special cases: I'd guess that most of my POD
classes are used to allow static initialization (and thus avoid
order of initialization issues).

And of course, the choice of the keyword (struct or class) when
defining a class is purely a matter of convention. My own rule
is that I use struct if the class contains at least one data
member, and all data members are public. I've seen or heard of
at least two other conventions, however: struct is used if the
class contains *only* public data, and nothing else, and struct
is used if the class contains only public members, and nothing
else. (Note that this last rule results in a lot of "interface"
classes being defined with the keyword struct.)

As with most conventions, the important thing is to choose one,
and be consistent.
Favor ++it for almost exactly that reason. Specifically, if it
is a raw pointer the pre- and post-increments will be nearly
the same speed. But if you upgrade it into a big object, the
it++ form will create a new object, copy it, and throw it
away. The compiler might not be able to optimize this away.

This from you? I thought you believed in testing:). Have you
ever seen an actual case where the choice made a significant
difference in program runtime? ("Significant", here, means that
using one, the program did not meet its performance
requirements, and using the other, it did.) All of the tests
I've done indicate that it makes absolutely no difference, and
that the choice is purely arbitrary.

Just to shut people up, I'll use prefix on a green fields
project. But I won't bother to change if existing code uses
postfix.
Now use /el Goog/ and look up "premature optimization". When
you learn C++ you should start by learning clean code and good
design, and don't worry about performance until you find a
real need.

Exactly.

Note here that the original poster is doing exactly what he
should. He was probably told by someone that it made a
difference, but given the well established fact that programmers
are always wrong about what makes a difference, except when
they've actually measured, he tested the assumption. And found
it wrong. All I can say is that his experience corresponds to
mine. (What can make a difference, on the other hand, is saving
end() in a variable, rather than calling the function each time
in the loop. Apparently, even when inlined, compilers have
problems detecting that the function will return the same value
each time, so they reread it from memory, rather than using a
value cached in a register.)
 
K

Kai-Uwe Bux

James said:
D. Susman wrote: [snip]
2)Should one check a pointer for NULL before deleting it?
No, you should use a smart pointer that wraps all such checks
up for you.

Why? What does a smart pointer buy you, if all it does is an
unnecessary test?

Don't forget, too, that most delete's are in fact "delete this".
And "this" cannot be a smart pointer.

Are you serious?

I venture the conjecture that this heavily depends on your code base and on
the problem domain. I did a quick

find -name "*.cc" -exec cat {} \; | grep " delete" | wc
632 3350 23086

Of course, not all of these actually are calls of delete. Anyway, now for
delete this:

find -name "*.cc" -exec cat {} \; | grep " delete" | grep "this" | wc
19 98 774

Upon closer inspection, it turned out that these 19 cases used this only
because of dependent name issues in templates, so the code reads like:

delete [] this->data;

Of all the 632 lines containing delete, only one (sic) read:

delete this;


So here is a question: given that uses cases frequencies can differ
dramatically, can one give rational general advice concerning smart
pointers? and if so, what would that advice be?

[snip]


Best

Kai-Uwe Bux
 
P

Phlip

Kai-Uwe Bux said:
Of all the 632 lines containing delete, only one (sic) read:

delete this;

How many called delete while 'this' was still above its deletor on the call
stack?
So here is a question: given that uses cases frequencies can differ
dramatically, can one give rational general advice concerning smart
pointers? and if so, what would that advice be?

Another way to say all that is this guideline:

Q: After delete, should you always NULL the deleted pointer?

A: No.

Reason: Delete should generally be the last line of a terminal method of an
object which is itself undergoing destruction.

That's the root principle of RAII, and you can (generally) generate smart
pointers by following it.
 
P

Phlip

This from you? I thought you believed in testing:).

Testing to design, not to wander the fringes of your performance envelop.

(And I try not to believe "in" anything. Except evolution, global climate
change, supply-side economics, etc...;)
Have you ever seen an actual case where the choice made a significant
difference in program runtime?

Of course not! The style guideline remains - prefer ++x simply because the
programmer is aware it does less, and it interferes less with the order of
events in its expression.

So C++ should have been ++C!
 
J

James Kanze

James said:
D. Susman wrote: [snip]
2)Should one check a pointer for NULL before deleting it?
No, you should use a smart pointer that wraps all such checks
up for you.
Why? What does a smart pointer buy you, if all it does is an
unnecessary test?
Don't forget, too, that most delete's are in fact "delete this".
And "this" cannot be a smart pointer.
Are you serious?

Yes. Most (not all) objects are either values or entity
objects. Value objects aren't normally allocated dynamically,
so the question doesn't occur. And entity objects usually (but
not always) manage their own lifetime.
I venture the conjecture that this heavily depends on your
code base and on the problem domain.

And your style. If you're trying to write Java in C++, and
dynamically allocating value objects, then it obviously won't be
true. If you're writing well designed, idiomatic C++, then a
large percentage of your deletes probably will be "delete this".

[...]
So here is a question: given that uses cases frequencies can differ
dramatically, can one give rational general advice concerning smart
pointers? and if so, what would that advice be?

The only "rational" advice would be to use them when
appropriate:). Which depends a lot on context---if you're
using the Boehm collector, for example, you'll probably need
them less than if you aren't. But on the whole, even without
the Boehm collector, I doubt that they'd represent more than
about 10% of your pointers in a well designed application.

They obviously don't apply to entity objects, whose lifetime
must be explicitly managed. And how many other things would you
allocate dynamically? Most of the time I see a lot of smart
pointers, it's for things that shouldn't have been allocated
dynamically to begin with.
 
K

Kai-Uwe Bux

James said:
James said:
D. Susman wrote: [snip]
2)Should one check a pointer for NULL before deleting it?
No, you should use a smart pointer that wraps all such checks
up for you.
Why? What does a smart pointer buy you, if all it does is an
unnecessary test?
Don't forget, too, that most delete's are in fact "delete this".
And "this" cannot be a smart pointer.
Are you serious?

Yes. Most (not all) objects are either values or entity
objects. Value objects aren't normally allocated dynamically,
so the question doesn't occur. And entity objects usually (but
not always) manage their own lifetime.

Most of my dynamically allocated objects are used to implement container
like classes (like a matrix class), wrappers like tr1::function, or other
classes providing value semantics on the outside, but where the value is
encoded in something like a decorated graph.

The internally allocated nodes do not manage their own lifetime: they are
owned by the ambient container/wrapper/graph.

And your style. If you're trying to write Java in C++, and
dynamically allocating value objects, then it obviously won't be
true.

I have no idea about Java. My code is heavily template based, uses value
semantics 95% of the time, and new/delete is rather rare (about one delete
in 500 lines of code). In my codebase, the lifetime of an object is managed
by the creator, not by the object itself. Ownership almost never is
transfered. The reason that the object is created dynamically is, e.g.,
that its size was unknown (in the case of an array) or that a client asked
for an entry to be added to a data structure.

If you're writing well designed, idiomatic C++, then a
large percentage of your deletes probably will be "delete this".

I disagree. Could it be that you are thinking of object oriented designs? I
can see that in the OO world most dynamically allocated objects may be
entity objects. But I don't think your claim has any chance of being true
(or even close to true) outside the OO scope. You are ignoring a whole lot
of what can be done beautifully and idiomatically in C++.

[...]
So here is a question: given that uses cases frequencies can differ
dramatically, can one give rational general advice concerning smart
pointers? and if so, what would that advice be?

The only "rational" advice would be to use them when
appropriate:). Which depends a lot on context---if you're
using the Boehm collector, for example, you'll probably need
them less than if you aren't. But on the whole, even without
the Boehm collector, I doubt that they'd represent more than
about 10% of your pointers in a well designed application.

That depends on what you count as a smart pointer. E.g., tr1::function or
boost::any are very close to smart pointers with copy semantics. However,
it clearly does not compete with pointers.

However, by and large, I also found that (smart) pointers rarely ever make
it into client code. When I put a class in my library, it usually provides
value semantics, and in fact, most of my classes do not have virtual
functions or virtual destructors.[1] Thus, client code has no reason to use
dynamic allocation.

They obviously don't apply to entity objects, whose lifetime
must be explicitly managed. And how many other things would you
allocate dynamically?

A whole lot. E.g., very often in math programming, I find myself dealing
with _values_ that are best represented by trees, pairs of trees, trees
with some decoration, or graphs. Implementing those classes requires a
whole lot of dynamic allocation, but in the end that is just some means to
realize a class that has value semantics from the outside. The objects are
then in charge of destroying the internal nodes whose graph structure
encodes the mathematical value of the object. Leaving that to smart
pointers is very helpful in prototyping.

Most of the time I see a lot of smart
pointers, it's for things that shouldn't have been allocated
dynamically to begin with.

I cannot refute that observation. However, that is a function of the code
you are looking at.




[1] I was tempted to suggest "virtual" as a rarely used keyword in the other
thread started by Alf Steinbach :)



Best

Kai-Uwe Bux
 
J

James Kanze

Kai-Uwe Bux said:
James Kanze wrote:
James Kanze wrote:
D. Susman wrote:
[snip]
2)Should one check a pointer for NULL before deleting it?
No, you should use a smart pointer that wraps all such checks
up for you.
Why? What does a smart pointer buy you, if all it does is an
unnecessary test?
Don't forget, too, that most delete's are in fact "delete this".
And "this" cannot be a smart pointer.
Are you serious?
Yes. Most (not all) objects are either values or entity
objects. Value objects aren't normally allocated dynamically,
so the question doesn't occur. And entity objects usually (but
not always) manage their own lifetime.
Most of my dynamically allocated objects are used to implement container
like classes (like a matrix class), wrappers like tr1::function, or other
classes providing value semantics on the outside, but where the value is
encoded in something like a decorated graph.
The internally allocated nodes do not manage their own lifetime: they are
owned by the ambient container/wrapper/graph.

That is one of the cases where "delete this" would not be used.
But it accounts for how many delete's, in all? (Of course, in a
numerics application, there might not be any "entity" objects,
in the classical sense, and these would be the only delete's,
even if they aren't very numerous.)
I have no idea about Java. My code is heavily template based,
uses value semantics 95% of the time, and new/delete is rather
rare (about one delete in 500 lines of code).

Curious. Not necessarily about the value semantics; if you're
working on numerical applications, that might be the rule. But
templates at the application level? Without export, it's
unmanageable for anything but the smallest project. (The
companies I work for tend to ban them, for a number of reasons.)
In my codebase, the lifetime of an object is managed
by the creator, not by the object itself.

There are cases where it is appropriate. There are also cases
where the lifetime will be managed by some external entity such
as the transaction manager.
Ownership almost never is transfered. The reason that the
object is created dynamically is, e.g., that its size was
unknown (in the case of an array) or that a client asked for
an entry to be added to a data structure.

O.K. That's a case that is almost always handled by a standard
container in my code. I have entity objects which react to
external events.
I disagree. Could it be that you are thinking of object oriented designs?

More to the point, I'm thinking of commercial applications. I
sort of think you may be right with regards to numerical
applications.
[...]
So here is a question: given that uses cases frequencies can differ
dramatically, can one give rational general advice concerning smart
pointers? and if so, what would that advice be?
The only "rational" advice would be to use them when
appropriate:). Which depends a lot on context---if you're
using the Boehm collector, for example, you'll probably need
them less than if you aren't. But on the whole, even without
the Boehm collector, I doubt that they'd represent more than
about 10% of your pointers in a well designed application.
That depends on what you count as a smart pointer. E.g.,
tr1::function or boost::any are very close to smart pointers
with copy semantics. However, it clearly does not compete with
pointers.

I'm not sure I agree. I'd be tempted to say that if you can't
dereference it, it isn't a smart pointer. STL iterators are
smart pointers because they support dereferencing. Still, in a
commercial application, *most* pointers are used for navigation
between entity objects. You rarely iterate; you recover the
However, by and large, I also found that (smart) pointers
rarely ever make it into client code. When I put a class in my
library, it usually provides value semantics, and in fact,
most of my classes do not have virtual functions or virtual
destructors.[1] Thus, client code has no reason to use dynamic
allocation.

Are you writing libraries? Obviously, something like
std::vector<> won't use delete this for the memory it manages.
Something that primitive probably won't use a classical smart
pointer, either, but I guess more complex containers might.

In the applications I work on, of course, such low level library
code represents something like 1% or 2% of the total code base.
And for the most part, we don't write it; the standard
containers are sufficient (with wrappers, in general, to provide
a more convenient interface).
A whole lot. E.g., very often in math programming, I find myself dealing
with _values_ that are best represented by trees, pairs of trees, trees
with some decoration, or graphs. Implementing those classes requires a
whole lot of dynamic allocation, but in the end that is just some means to
realize a class that has value semantics from the outside. The objects are
then in charge of destroying the internal nodes whose graph structure
encodes the mathematical value of the object. Leaving that to smart
pointers is very helpful in prototyping.

I think that's the difference. I guess you could say that my
code also contains a lot of trees or graphs, but we don't think
of them as such; we consider it navigating between entity
objects---the objects have actual behavior in the business
logic. And the variable sized objects (tables, etc.) are all
handled by standard containers.
I cannot refute that observation. However, that is a function
of the code you are looking at.

Certainly. I also see a lot of code in which there is only one
or two deletes in a million lines of code; the value types are
all copied (and either have fixed size, or contain something
like std::string), and the entity types are managed by a single
object manager. In many cases, the architecture was designed
like this just to avoid "delete this", but the delete request to
the object manager is invoked from the object that is to be
deleted---which means that it's really a delete this as well.
And in my last job, the application worked with a fixed set of
entity objects, and I don't think that there was any new/delete
outside of initialization and shutdown code; given the behavior
of the objects, we really could have skipped the delete in the
shutdown, and had code without a single delete. (In our code;
the application did use std::set a lot for secondary indexes,
with a lot of insert/erase, since the secondary indexes were
mutable values. So there was actually a lot of use of dynamic
memory. Just not in our code.)
 
I

Ian Collins

James said:
Curious. Not necessarily about the value semantics; if you're
working on numerical applications, that might be the rule. But
templates at the application level? Without export, it's
unmanageable for anything but the smallest project. (The
companies I work for tend to ban them, for a number of reasons.)
That's a strange thing to say, most code bases I've worked with and in
my own, templates are widespread. I don't see where unmanageable comes
from.

Then again, I can't remember the last time I used "delete this", the
current fairly large project I'm working with doesn't use it at all.
 
J

James Kanze

Ian said:
James Kanze wrote:
That's a strange thing to say, most code bases I've worked with and in
my own, templates are widespread. I don't see where unmanageable comes
from.

Using templates (e.g. std::vector) is no problem. Defining a
template at the application level, on the other hand, results in
serious source code coupling; without export, the slightest
change in the *implementation* requires recompiling all client
code. So in general, templates are only allowed in code that
"won't be changed"; ultra stable low level components.
Then again, I can't remember the last time I used "delete
this", the current fairly large project I'm working with
doesn't use it at all.

Maybe you're just hiding it behind an object manager, or some
such. If the dynamic memory is used for dynamic sizing, I don't
know---in my code, all of the dynamically sized objects are
based on standard containers, and the new/delete is in the
standard containers, not in my code. If the dynamic memory is
used for explicit lifetime management, then managing the
lifetime of the object is normally part of that object's
behavior, in which case, "delete this" is the clearest and most
easily understood way of specifying this.
 
K

Kai-Uwe Bux

James said:
Kai-Uwe Bux said:
James Kanze wrote:
James Kanze wrote:
D. Susman wrote:
[snip]
2)Should one check a pointer for NULL before deleting it?
No, you should use a smart pointer that wraps all such checks
up for you.
Why? What does a smart pointer buy you, if all it does is an
unnecessary test?
Don't forget, too, that most delete's are in fact "delete this".
And "this" cannot be a smart pointer.
Are you serious?
Yes. Most (not all) objects are either values or entity
objects. Value objects aren't normally allocated dynamically,
so the question doesn't occur. And entity objects usually (but
not always) manage their own lifetime.
Most of my dynamically allocated objects are used to implement container
like classes (like a matrix class), wrappers like tr1::function, or other
classes providing value semantics on the outside, but where the value is
encoded in something like a decorated graph.
The internally allocated nodes do not manage their own lifetime: they are
owned by the ambient container/wrapper/graph.

That is one of the cases where "delete this" would not be used.
But it accounts for how many delete's, in all? (Of course, in a
numerics application, there might not be any "entity" objects,
in the classical sense, and these would be the only delete's,
even if they aren't very numerous.)

It's not just numerics. But numerics applications are definitely a very good
example of what I had in mind. I think that a lot of scientific computing
looks like this.

Curious. Not necessarily about the value semantics; if you're
working on numerical applications, that might be the rule. But
templates at the application level? Without export, it's
unmanageable for anything but the smallest project. (The
companies I work for tend to ban them, for a number of reasons.)

Indeed, my programs are _very_ small, something between 50000 and 100000
lines after preprocessing. Moreover, all an application usually does is
read data from stdin, perform some (highly involved and complicated)
computation, and write the results to stdout. Here is an example of a
complete application:

// scx_homology.cc (C) Kai-Uwe Bux [2006]
// ======================================

#include "kubux/sequenceio"
#include "kubux/set_of_set"
#include "kubux/matrix"
#include "kubux/homology"
// #include "kubux/integer"

#include <iostream>
#include <vector>

// typedef kubux::integer Integer;
typedef long Integer;
typedef kubux::matrix< Integer > IntMatrix;
typedef std::vector< IntMatrix > ChainComplex;
typedef kubux::set_of_set< int > SimplicialComplex;
typedef std::vector< kubux::AbelianGroup< Integer > > Homology;

int main ( void ) {
SimplicialComplex cx;
while ( std::cin >> cx ) {
ChainComplex ch =
kubux::chain_complex< ChainComplex >
( kubux::homotopy_simplify( cx ) );
Homology hom;
kubux::copy_homology( ch.begin(), ch.end(),
std::inserter( hom, hom.begin() ) );
while ( ( ! hom.empty() )
&& ( hom.back().first == 0 )
&& ( hom.back().second.empty() ) ) {
hom.pop_back();
}
std::cout << hom << '\n';
}
}

// end of file

As you can see, it' just a trivial filter; and all the real code is in the
library. That, in turn, is templated for flexibility. E.g., the matrix
class is supposed to work just as nicely with infinite precision integers,
and an algorithm picking out the maximal elements (with respect to some
partial order) from a sequence should be generic.

As you have figured, it is somewhat like number crunching (except that I am
dealing more with topological and combinatorial algorithms, so enumerating
all objects of a given size and type is a typical thing that happens in my
code).


Now, with respect to huge applications, I see that templates are an issue.
On the other hand, I thought, that is what nightly builds are for: You have
a bug to fix, you locate it, you add a unit test for the failing component
that displays the bug without using all the unrelated crap from the huge
ambient application; and then you work on that component until it passes
all tests. After a commit to the code base, the huge application is rebuilt
over night and all automatic tests are run. Working on your component in
isolation, you still have short edit-compile-test cycles.

There are cases where it is appropriate. There are also cases
where the lifetime will be managed by some external entity such
as the transaction manager.


O.K. That's a case that is almost always handled by a standard
container in my code. I have entity objects which react to
external events.



More to the point, I'm thinking of commercial applications. I
sort of think you may be right with regards to numerical
applications.

By "commercial", do you mean "software for sale" or "software used in the
sales department" :)

I agree that programs that act in complex environments and have to respond
to thousand different kind of events will use objects to model the world
they operate in (I am thinking of transactions between banks, simulations,
GUI, games, etc). On the other hand, programs that perform highly
complicated transformations in batch mode are likely to be different. That
would go for most of number crunching, scientific programming, compilers,
symbolic computation, combinatorial optimization, and so on. I expect the
code for those to be more similar to mine than to yours. There are programs
for sale in all these categories (but I would not expect a typical sales
department to make heavy use of a PDE solver).

I think the difference is not commercial versus non-commercial, but more
whether your application is event-driven or has the classical (ancient?)
parse_input....write_output format.

[...]
So here is a question: given that uses cases frequencies can differ
dramatically, can one give rational general advice concerning smart
pointers? and if so, what would that advice be?
The only "rational" advice would be to use them when
appropriate:). Which depends a lot on context---if you're
using the Boehm collector, for example, you'll probably need
them less than if you aren't. But on the whole, even without
the Boehm collector, I doubt that they'd represent more than
about 10% of your pointers in a well designed application.
That depends on what you count as a smart pointer. E.g.,
tr1::function or boost::any are very close to smart pointers
with copy semantics. However, it clearly does not compete with
pointers.

I'm not sure I agree. I'd be tempted to say that if you can't
dereference it, it isn't a smart pointer.

That's why I said "close". I agree that the term smart pointer should be
reserved for something you can dereference.
STL iterators are
smart pointers because they support dereferencing. Still, in a
commercial application, *most* pointers are used for navigation
between entity objects. You rarely iterate; you recover the
real pointer from the return value of std::map<>::find almost
immediately, etc.

That just says, that you need different smart pointers :)

Think of a smart pointer that does not interfere with life time but helps
with the typical problems when pointers are used for navigation. E.g., you
can wrap the observer pattern into a smart pointer so that all those
objects that have a handle to a potentially suicidal one get notified just
before it jumps off the cliff.

However, by and large, I also found that (smart) pointers
rarely ever make it into client code. When I put a class in my
library, it usually provides value semantics, and in fact,
most of my classes do not have virtual functions or virtual
destructors.[1] Thus, client code has no reason to use dynamic
allocation.

Are you writing libraries?

Well, that is the way the code organizes naturally. Most of it goes into
libraries that provide an abstract (usually templated) data type or some
transformation from one type to another. The actual applications are
trivial filters.

Obviously, something like
std::vector<> won't use delete this for the memory it manages.
Something that primitive probably won't use a classical smart
pointer, either, but I guess more complex containers might.

I don't really like smart pointers there either. However, they are really
handy in getting a prototype up and running, which is a good thing during
the design phase when you are experimenting with the interface and whip up
the initial test cases. When the design is stabilizing, I tend to first
replace smart pointers (and raw pointers) by pointer wrappers that support
hunting double deletion and memory leaks, and finally by pointer wrappers
that wrap new and delete and provide hooks for an allocator to be specified
by the client code.

In the applications I work on, of course, such low level library
code represents something like 1% or 2% of the total code base.
And for the most part, we don't write it; the standard
containers are sufficient (with wrappers, in general, to provide
a more convenient interface).

I forgot to mention one other reason to use T* instead of T: in template
programming, the first is available for incomplete T. For instance, there
are two obvious implementation of the box container (a box can be empty or
contain a single item; I think such a container is sometimes called
fallible or optional). One implementation has a T* data field and the other
has a T data field. The first will work with incomplete T the second won't.
When doing template programming, one has to be aware of the conceptual
requirements created by an implementation approach. Sometimes, that forces
or suggests dynamic allocation.

I think that's the difference. I guess you could say that my
code also contains a lot of trees or graphs, but we don't think
of them as such; we consider it navigating between entity
objects---the objects have actual behavior in the business
logic. And the variable sized objects (tables, etc.) are all
handled by standard containers.

Spot on. That is the difference. My objects usually do not have any behavior
at all. They just have values, which can change.

Certainly. I also see a lot of code in which there is only one
or two deletes in a million lines of code; the value types are
all copied (and either have fixed size, or contain something
like std::string), and the entity types are managed by a single
object manager. In many cases, the architecture was designed
like this just to avoid "delete this", but the delete request to
the object manager is invoked from the object that is to be
deleted---which means that it's really a delete this as well.

I did not want to argue for or against "delete this". I can see how this
idiom is useful. I was just flabbergasted by your claim that most deletes
are of this form. But now, I can see where you were coming from.

However, it is somewhat funny that "delete this" looks scary enough that
people invent roundabout ways to avoid it.

[snip]


Best

Kai-Uwe Bux
 
J

James Kanze

Kai-Uwe Bux said:
James said:
Kai-Uwe Bux said:
James Kanze wrote:
James Kanze wrote:
D. Susman wrote:
[snip]
2)Should one check a pointer for NULL before deleting it?
No, you should use a smart pointer that wraps all such checks
up for you.
Why? What does a smart pointer buy you, if all it
does is an unnecessary test?
Don't forget, too, that most delete's are in fact
"delete this". And "this" cannot be a smart pointer.
Are you serious?
Yes. Most (not all) objects are either values or entity
objects. Value objects aren't normally allocated
dynamically, so the question doesn't occur. And entity
objects usually (but not always) manage their own
lifetime.
Most of my dynamically allocated objects are used to
implement container like classes (like a matrix class),
wrappers like tr1::function, or other classes providing
value semantics on the outside, but where the value is
encoded in something like a decorated graph.
The internally allocated nodes do not manage their own
lifetime: they are owned by the ambient
container/wrapper/graph.
That is one of the cases where "delete this" would not be used.
But it accounts for how many delete's, in all? (Of course, in a
numerics application, there might not be any "entity" objects,
in the classical sense, and these would be the only delete's,
even if they aren't very numerous.)
It's not just numerics. But numerics applications are
definitely a very good example of what I had in mind. I think
that a lot of scientific computing looks like this.

Yes. I think the difference is that you're using the computer
to calculate. I've working in a number of different domains,
but in all cases, while there was some calculation, the computer
was mainly being used to process large data sets in some
systematic and logical way. The calculations were only a very
small part of the application.

[...]
As you can see, it' just a trivial filter; and all the real
code is in the library. That, in turn, is templated for
flexibility. E.g., the matrix class is supposed to work just
as nicely with infinite precision integers, and an algorithm
picking out the maximal elements (with respect to some partial
order) from a sequence should be generic.

Presumably, too, the matrix class is very, very stable.

In at least some such use, maintenance isn't that important
either; once you've gotten the results from a program, you don't
use it any more. (That has been the case in the few somewhat
distant contacts I've had with such software; the basic library
is a constant---maintained, but highly stable---and the
individual applications usually run just once or twice. But my
contacts with this type of software are few enough that I doubt
they have any statistical significance.)
As you have figured, it is somewhat like number crunching
(except that I am dealing more with topological and
combinatorial algorithms, so enumerating all objects of a
given size and type is a typical thing that happens in my
code).
Now, with respect to huge applications, I see that templates
are an issue. On the other hand, I thought, that is what
nightly builds are for: You have a bug to fix, you locate it,
you add a unit test for the failing component that displays
the bug without using all the unrelated crap from the huge
ambient application; and then you work on that component until
it passes all tests. After a commit to the code base, the huge
application is rebuilt over night and all automatic tests are
run. Working on your component in isolation, you still have
short edit-compile-test cycles.

Nightly builds suppose that you can compile the entire
application, on all target platforms, overnight. That's not
necessarily the case.

Most of the places I've worked at do try and do a weekly build,
over the week-end, but I've worked on projects large enough that
even that required some optimization (linking on the servers,
compiling in parallel on the hundreds of workstations connected
to the network, etc.).

Where the problem really hits is the individual developer. Who
needs to run systematic unit tests for every small modification.
Touching a header which contains a template which he uses (in a
library) may trigger a recompilation of an hour or so, rather
than just a couple of minutes.

[...]
By "commercial", do you mean "software for sale" or "software
used in the sales department" :)

Software used for commercial applications. Not just the sales
department, but yes, software dealing with external entities
such as customers, products or employees.
I agree that programs that act in complex environments and
have to respond to thousand different kind of events will use
objects to model the world they operate in (I am thinking of
transactions between banks, simulations, GUI, games, etc). On
the other hand, programs that perform highly complicated
transformations in batch mode are likely to be different.

Exactly. Except that in the commercial world, the programs
which perform transformations in batch mode are still usually
written in Cobol, not in C++:). And even in batch mode, its
often relevent to think in terms of behavior of specific
entities.

This is, of course, what I mean when I speak of an object having
an explicit lifetime. A "CustomerOrder" doesn't belong to any
other entity in the application; it has an explicit lifetime,
based on external events.
That would go for most of number crunching, scientific
programming, compilers, symbolic computation, combinatorial
optimization, and so on. I expect the code for those to be
more similar to mine than to yours. There are programs for
sale in all these categories (but I would not expect a typical
sales department to make heavy use of a PDE solver).

The closest I've gotten to that is when I wrote a compiler.
Thinking back on it... It was long enough ago to be in C, but
even in C++, I think that yes, things like a parse tree would
have a lifetime which was managed by some external object or
condition; logically, perhaps, the parse tree might even have a
"automatic" lifetime, but since it's size and structure are
very, very dynamic, of course, dynamic allocation would have to
be used.
I think the difference is not commercial versus
non-commercial, but more whether your application is
event-driven or has the classical (ancient?)
parse_input....write_output format.

I think you've hit on it. There are doubtlessly exceptions;
classical batch applications which do some sort of event
simulation in their processing, or commercial batch applications
which implement business logic over business entities, for
example. But by and large, your characterization probable
holds.
Think of a smart pointer that does not interfere with life
time but helps with the typical problems when pointers are
used for navigation. E.g., you can wrap the observer pattern
into a smart pointer so that all those objects that have a
handle to a potentially suicidal one get notified just before
it jumps off the cliff.

In theory. In practice, it tends to be more complicated; the
smart pointer isn't sufficient, and once you've implemented the
additional stuff, it isn't necessary. Thus, for example, in the
observer pattern, the observable normally doesn't have a
"pointer" to the observer, but a container of pointers. And
when one of the observables commits suicide, not removing the
pointer from the container will result in a memory leak.

Some fifteen years ago, when I started C++, there was a lot of
discussion (at least where I was working) about relationship
management, and a lot of effort was expended trying to find a
good generic solution, so that you didn't have to write so much
code by hand, each time around. As far as I know, however, no
good generic solution was ever found.

[...]
I don't really like smart pointers there either.

It's not that I don't like them; when they are appropriate, I
don't hesitate using them. I don't like them being presented as
a silver bullet, as they so often are. Nor do I like the fact
that many people are suggesting that you should never use raw
pointers, or that there is one magical smart pointer
(boost::shared_ptr) that will solve all (or even most) of your
problems.
However, they are really handy in getting a prototype up and
running, which is a good thing during the design phase when
you are experimenting with the interface and whip up the
initial test cases. When the design is stabilizing, I tend to
first replace smart pointers (and raw pointers) by pointer
wrappers that support hunting double deletion and memory
leaks, and finally by pointer wrappers that wrap new and
delete and provide hooks for an allocator to be specified by
the client code.

Interesting. I use external tools for much of this. For memory
management within the process, I usually use the Boehm
collector; why should I have to worry about which smart pointer
to use, or how to break a cycle, *IF* the object doesn't have an
explicit lifetime (i.e. if there is no explicit behavior
associated with its ceasing to exist). For the rest, I'll use
Purify or valgrind, or even my own debugging new/delete.
I forgot to mention one other reason to use T* instead of T:
in template programming, the first is available for incomplete
T. For instance, there are two obvious implementation of the
box container (a box can be empty or contain a single item; I
think such a container is sometimes called fallible or
optional). One implementation has a T* data field and the
other has a T data field. The first will work with incomplete
T the second won't. When doing template programming, one has
to be aware of the conceptual requirements created by an
implementation approach. Sometimes, that forces or suggests
dynamic allocation.

Very good point. Writing good, generic templates is difficult,
and typically, you do end up violating a number of rules that
would apply elsewhere. The results are often much more
difficult to understand, as well. Yet another reason why a lot
of companies don't like templates at the application level. (In
most of my applications, the majority of the application
programmers are domain specialists, and not C++ or software
engineering specialists. My role in such projects is usually to
handle such low level or generic stuff in such a way that the
application programmers don't have to worry about it.)

[...]
I did not want to argue for or against "delete this". I can
see how this idiom is useful. I was just flabbergasted by your
claim that most deletes are of this form. But now, I can see
where you were coming from.

Yes. There was also some intentional hyperbole in my statement.
There are definitly cases where "delete this" isn't the rule.
I've worked on business systems, for example, where all of that
actual deletes of entity objects where in the Transaction
object---an on stack object which acted as a "temporary" owner
for objects involved in the transaction. The object was
"logically" destructed during the transaction, but the actual
delete only occured commit. It's very difficult to role back an
object that has really been deleted. (Of course, this might be
handled by a delete this in the commit function of the object.)
However, it is somewhat funny that "delete this" looks scary
enough that people invent roundabout ways to avoid it.

In a very real sense, there's something scary about any delete;
you have to be very sure that no one else is using the object,
and that all concerned parties are notified. Delete this is
really no different in this respect. And most of the time I've
seen people try to avoid it, per se, they end up just hiding
it---obfuscating the potential problems; there is no fundamental
difference between "delete this" and
"ObjectManager::instance().removeObject( this )", except that it
is far more explicit in the first case that the object won't
exist after the statement.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,197
Latest member
ScottChare

Latest Threads

Top