Pass-by-reference instead of pass-by-pointer = a bad idea?

P

Paul Groke

Steven T. Hatton wrote:
[]
Most C++ books recommend references whenever possible, according to the
general perception that references are "safer and nicer" than pointers. In
contrast, at Trolltech, we tend to prefer pointers because they make the
user code more readable. Compare:

color.getHsv(&h, &s, &v);
color.getHsv(h, s, v);


Only the first line makes it clear that there's a high probability that h,
s, and v will be modified by the function call.

Hm. I think if the semantics is "return value" then it should be a
return value grammatically too - i.e. return some HSV struct or
similar. If there is some strong reason to pass in a pointer/ref
and store the return value there, then I tend to agree that it's
easier to read if it's a pointer.
However in this case the name "getHsv" makes it pretty clear too
what's going to happen...

And when it's const, I'd never use a pointer if I don't have to.
If the function needs an object and cannot deal with a null
pointer/ref, then why should it check for null? If the client
passes in some value by dereferencing a pointer, then it's clearly
the clients responsibility to assure that this pointer can never
be null...
I know that dealing with non-perfect but real-life programmers
isn't easy, but I think one has to draw a line somewhere. If
a programmer isn't able to understand the simple fact that he/she
can't dereference a null pointer then he/she really shouldn't be
a programmer.

And: how do your functions handle situations where the client
passes in a null pointer for something that may not be null?
assert, abort, throw (std::invalid_argument, ...?)?
 
S

Steven T. Hatton

David said:
There's no mention of segfaulting in the standard. The behaviour is
undefined. That means that anything can happen, including the function
executing without any apparent problem.

DW

So what happens when you run it?

BTW, just as an aside, 'NULL' is the name of a #MACRO defined as '0'. A
pointer T* ptr; such that NULL == T; is an object of type 'pointer to T'.
'NULL' is, IMO, not appropriate for use as an adjective.

http://www.research.att.com/~bs/bs_faq2.html#null
 
S

Stuart MacMartin

Sheesh.

The caller could just as easily do:

int i;
int* pI = &i + 1;
f(i);

or

int* i = new int;
delete i;
f(i);

C++ gives you all sorts of ways of doing something bad, and there are
lots of bad pointers other than 0.

I claim passing by reference helps keep the noise level down and make
it clear where the valid pointer test should be made. Passing by
pointer muddies the water: whose responsibility is it?

Stuart
 
D

David White

Steven said:
So what happens when you run it?

It doesn't matter what happens when I run it. All that matters is what the
standard allows to happen, and that is anything at all. You can't assume
that because it segfaults on a particular platform using a particular
compiler that it will do so for every platform and compiler, or even that
the program that segfaulted for you will do so the next time you run it.
Some computers (e.g., Intel 8085, 8086) are not even capable of segfaulting.
Instead, they are likely to just grab the value at address 0 (or whatever
address is used as the null pointer) and happily execute the function.

DW
 
S

Steven T. Hatton

Stuart said:
Well I suppose most of this comes down to local convention. I prefer to
work with references, and allow a semantic difference between a
parameter passed by reference rather than passed by pointer.

I suspect that could lead to problems if you apply that to code from outside
your local community. It would also seem wrong to criticize others who do
not share your conventional semantics.
Your comment:
Because code that does use pointers, and proper error checking won't
crash due to a null pointer, but code that uses references to pass
variables *can* crash because of a null pointer being passed to a
reference. So far as I know, there is no way to protect yourself from
that, other than checking for null before dereferencing.

And that's precisely the point.

I figure you should check for null before dereferencing (or otherwise
using what a pointer points to), and once you've safely dereferenced
continue to use that reference. You feel you should not trust your
caller, so insist on checking the pointer over and over again - and
instead of trusting your callers like I do, you trust the people who
write the routines you call.

Well, I have the advantage that I currently use only open source, so I can
verify, as well as trust. But, the fact of the matter is, I don't really
need to trust the function I call, other than to worry about subsequent
changes. If the function has a default null pointer parameter, I am pretty
safe passing a null pointer to it.
Since you'd check that the address of a
reference isn't 0, you see no semantic difference between passing a
pointer and a reference.

Actually, I can't check that the address of a reference is 0, but I can
assume it is non-zero in view of the fact that it makes no sense to have a
reference to '\0'.
Just out of curiosity: do you do dynamic_cast of all your variables to
make sure the types are the same as the parameters you specified? This
might sound snide, but I've had people downcast or force a cast
incorrectly, with problems only showing up when the data model changes
significantly or subtly on on rare data. If you can't believe they are
passing a reference to an object when you ask for a reference, how can
you believe that they are passing the correct kind of object?

Actually, there are some circumstances when the caller wouldn't really be
"wrong" to pass a parameter different than the one I handle.

virtual void operator()(osg::Node *node, osg::NodeVisitor* nv){
if(_doRefresh){
if(PAT_T* pat = dynamic_cast<PAT_T*>(node)){
_refresh(pat);
_doRefresh=false;
}
}
traverse(node,nv);
}

Note that the code above is rather standard for 3D animation where
performance is critical. But this is called once per traversal. OTOH, if
the if() condition is satisfied, the call to _refresh() will result in
thousands of operations.
It's
easy to fool the compiler into allowing this. The compiler can help
us, but only to a point. Eventually someone has to understand what a
routine is supposed to do before they call it.

Another thing that puzzles me in all of this: doesn't anyone use a
development environment? How can you not know the parameters and
whether they are passed by value or pointer or reference, const or
non-const?

Well, I use Emacs and KDevelop. Typically the technique I use to know what
the function declaration looks like is called RTFS. Don't get me wrong, I
believe IDEs are very valuable. It's one of the greatest weaknesses of C++
that creating good IDEs is extremely difficult. There are several nice
ones for Java, and monodevelop seems to be shaping up nicely. I'm hoping
to figure out how to provide some additional features to KDevelop's C++
part.

Sadly, it would seem Borland is not going to be able to keep pace with their
primary competitor. This is mostly due to the fact that Borland doesn't
control the platform that holds 90% or more of the desktop OS market share,
while their competitor does.
 
N

niklasb

Steven said:
Larry said:
Steven T. Hatton wrote: [...]
AFAIK, you *can* pass a null to func() in your example, and the compiler
will accept it. Your code will segfault when you do so. If you pass a
pointer, you can check for null, before accessing it. I, therefore,
suggest using a pointer instead of a reference.

There is no such thing as a null reference. However, see below.
If a function takes a ref, then NULL can not be passed.
Attempting to pass NULL causes a compile error.

For example:
[snip]
func(0); // causes a compile error

In your example, you are not trying to pass null, you are trying to pass a
literal. That results in the attempt to create a temporary object of type
int and assign it to the non-const reference.

True, 0 or NULL is the null pointer constant, which is not the same
thing as a null pointer.
This, however, will compile:

#include <string>
std::string func(std::string& s) { return s; }
int main() {
std::string* s(0);
func(*s);
}

True, this will compile, and will result in undefined behavior.
However, the unbehavior is not in func for trying to use a "null
reference" (there being, again, no such thing) but in main for
dereferencing a null pointer.

"But," you may say, "main doesn't dereference the pointer. The
seg fault [or whatever] occurs in func!"

First, it is not unusual for a run-time error (if one occurs
at all) to happen some time after the undefined behavior that
caused it.

Second, people tend to mistakenly think that "deferencing" a
pointer means accessing the memory at the machine address stored
in the pointer. This is confusing a possible implementation of a
language concept with the concept itself. In C and C++, any time
you form an expression by applying the '*' operator to a pointer,
you have deferenced that pointer.

One might imagine it's a special case if the expression is used
to initialize a reference. After all, references and pointers are
both "just addresses" so "under the hood" you're not doing anything
at all. Again, this is confusing a possible implementation with the
language itself. In C++, if an expression dereferences a null
pointer, it is undefined behavior, regardless of how that
expression is then used.

In general, although it is nice to have some idea of what goes
on at the assembly code level, one cannot reductively understand
C++ in terms of machine code. To see how doing so might lead
one astray, consider an example which does not involve references:

int foo()
{
int n = 5;
int* p = &n;
return *p;
}

You don't have be a language lawyer to see that the expression *p
in this function dereferences the pointer p. Yet here, stripped
of comments, is the assembly listing for the above function
(from VC++ 7.1 with "optimize for size" specified):

?foo@@YAHXZ PROC NEAR
push 5
pop eax
ret 0
?foo@@YAHXZ ENDP

So what Steven's example shows is not that it is possible to
create a null reference, but rather that one must be careful
not to dereference a null pointer. That is certainly good
advice. :)
 
S

Steven T. Hatton

Steven said:
Larry said:
Steven T. Hatton wrote: [...]
AFAIK, you *can* pass a null to func() in your example, and the
compiler
will accept it. Your code will segfault when you do so. If you pass
a
pointer, you can check for null, before accessing it. I, therefore,
suggest using a pointer instead of a reference.

There is no such thing as a null reference. However, see below.
If a function takes a ref, then NULL can not be passed.
Attempting to pass NULL causes a compile error.

For example:
[snip]
func(0); // causes a compile error

In your example, you are not trying to pass null, you are trying to pass
a
literal. That results in the attempt to create a temporary object of
type int and assign it to the non-const reference.

True, 0 or NULL is the null pointer constant, which is not the same
thing as a null pointer.

Actually it is an integer literal which when assigned to a pointer to type T
is converted to a null pointer constant to T. The distinction may seem
trivial, but the error is not due to the fact that 0 is being used. It is
due to the fact that a literal is being used to initialize a temporary of
type int.
True, this will compile, and will result in undefined behavior.
However, the unbehavior is not in func for trying to use a "null
reference" (there being, again, no such thing) but in main for
dereferencing a null pointer.

The act resulting in the undefined behavior is in main where the attempt to
dereference a null pointer is made. Where the actual /behavior/ occurs is
a different story. Basically, accessing the null pointer gives
'permission' to produce undefined behavior.
"But," you may say, "main doesn't dereference the pointer. The
seg fault [or whatever] occurs in func!"

First, it is not unusual for a run-time error (if one occurs
at all) to happen some time after the undefined behavior that
caused it.

The segfault *is* the undefined behavior that results from the attempt to
dereference a null pointer.
Second, people tend to mistakenly think that "deferencing" a
pointer means accessing the memory at the machine address stored
in the pointer. This is confusing a possible implementation of a
language concept with the concept itself. In C and C++, any time
you form an expression by applying the '*' operator to a pointer,
you have deferenced that pointer.

The C++ Standard does not define what is meant by dereferencing a pointer.

§5.3.1/1
"The unary * operator performs /indirection/: the expression to which it is
applied shall be a pointer to an object type, or a pointer to a function
type and the result is an lvalue referring to the object or function to
which the expression points. If the type of the expression is ?pointer to
T,? the type of the result is ?T.?"

An lvalue is said to refer to specific storage, so I would say that pretty
much means returning an address.
One might imagine it's a special case if the expression is used
to initialize a reference. After all, references and pointers are
both "just addresses" so "under the hood" you're not doing anything
at all. Again, this is confusing a possible implementation with the
language itself. In C++, if an expression dereferences a null
pointer, it is undefined behavior, regardless of how that
expression is then used.

No, applying the indirection operator to a variable of type pointer to type
T is the definition of a behavior. The _result_ is undefined.
In general, although it is nice to have some idea of what goes
on at the assembly code level, one cannot reductively understand
C++ in terms of machine code. To see how doing so might lead
one astray, consider an example which does not involve references:

int foo()
{
int n = 5;
int* p = &n;
return *p;
}

You don't have be a language lawyer to see that the expression *p
in this function dereferences the pointer p. Yet here, stripped
of comments, is the assembly listing for the above function
(from VC++ 7.1 with "optimize for size" specified):

?foo@@YAHXZ PROC NEAR
push 5
pop eax
ret 0
?foo@@YAHXZ ENDP

Well, yes, you can invoke the "as if" clause.
So what Steven's example shows is not that it is possible to
create a null reference, but rather that one must be careful
not to dereference a null pointer. That is certainly good
advice. :)

For the record, I don't believe I actually said the result was a null
reference.
 
S

Steven T. Hatton

Steven said:
Actually, there are some circumstances when the caller wouldn't really be
"wrong" to pass a parameter different than the one I handle.

virtual void operator()(osg::Node *node, osg::NodeVisitor* nv){
if(_doRefresh){
if(PAT_T* pat = dynamic_cast<PAT_T*>(node)){
_refresh(pat);
_doRefresh=false;
}
}
traverse(node,nv);
}

BTW, my reading of the Standard suggests there is no need to check that node
is not null before I try to dynamic_cast it. Does anybody disagree with
that understanding?
 
B

benben

Stuart MacMartin said:
Sheesh.

The caller could just as easily do:

int i;
int* pI = &i + 1;
f(i);

What wrong does the above line?
or

int* i = new int;
delete i;
f(i);

The above won't even compile.
C++ gives you all sorts of ways of doing something bad, and there are
lots of bad pointers other than 0.
Agree!


I claim passing by reference helps keep the noise level down and make
it clear where the valid pointer test should be made. Passing by
pointer muddies the water: whose responsibility is it?

Yes and no. It really depends on the semantics of the function. For example:
- If a referential parameter is needed for read operation only, I will
use const reference, or neither of both;
- If it is needed for write operation, then I will have it as a pointer,
so the user is aware of the write operation;
- In some cases where the write operation is conceptually trivial,
references are also preferred;
- If the function is to be used by C programs, then pointer is a must;
 
J

John Carson

Ian said:
As has been show elsewhere on this thread, references can be null.

No they can't. The C++ standard, section 8.3.2/4 says:

"a null reference cannot exist in a well-defined program, because the only
way to create such a reference would be to bind it to the "object" obtained
by dereferencing a null pointer, which causes undefined behavior."

Of course, nothing actually stops programmers from dereferencing null
pointers. The fundamental point here is that if you work with references all
(or almost all) the time, then you generally don't need to dereference
pointers, null or otherwise, and thus you avoid doing something that is
dangerous.

If the client code uses pointers, then it doesn't make a lot of difference
if the function accepts pointers or references; the pointer is going to have
to be dereferenced by either the client (if the function takes references)
or the function (if the function takes pointers). The gain is when *neither*
client *nor* function uses pointers.
 
S

Steven T. Hatton

Steven said:
(e-mail address removed) wrote:


Actually it is an integer literal which when assigned to a pointer to type
T
is converted to a null pointer constant to T. The distinction may seem
trivial, but the error is not due to the fact that 0 is being used. It is
due to the fact that a literal is being used to initialize a temporary of
type int.

I said that wrong. A temporary of type int would be created and used to
initialize the non-const reference. That's what won't work. And for
obvious reasons.
 
A

Andre Kostur

BTW, my reading of the Standard suggests there is no need to check
that node is not null before I try to dynamic_cast it. Does anybody
disagree with that understanding?

Well... dynamic_casting a null pointer is defined. Section 5.2.7, clause
4.
 
A

Andre Kostur

Well you CAN pass a reference to nothing though:

void f(int& i)
{
i ++;
}

int main()
{
int* i = new int;
int& ref = *i;
delete i;

You're setting yourself up for Undefined Behaviour. It's your
responsibility to ensure that the lifetime of the object that reference is
referring to is longer than the lifetime of the reference.
f(ref); // tell me what does ref referencing to?

Don't care... undefined behaviour.
 
J

John Carson

David White said:
Not necessarily. Try passing a null pointer to ::strlen(). Just
because a pointer is passed doesn't necessarily mean that the
function will permit it to be null.

Really? When I try this, the code compiles and the program crashes when run.
Of course, functions can check for null pointers (though this won't stop
compilation) but at least my copy of strlen apparently doesn't.
Yes, because you at least don't have to bother finding out if a null
pointer is acceptable.


One thing about the reference case is that you can't tell from the
call whether it is pass by value or pass by reference. In the pointer
case it is at least obvious from the call that an address is being
passed, so you are alerted that the function might change the object.


void fooptr(int * ptr);
void fooptr(const int * ptr);

vs

void fooref(int & ref);
void fooref(const int & ref);

int main()
{
int x;
fooptr(&x);
fooref(x);
}

What do you know from the fooptr(&x) call that you don't know from the
fooref(x) call or vice versa?

The only time you can know something from a function call is if you *never*
use references. In that case, when you pass by value you know that the value
won't be changed. If you pass by pointer, you still don't know either way.
The situation in which you never use references is of course when
programming in C. I think this preference for pointers is just a hangover
from C.

The real point is surely that when you call a function you are supposed to
know what it does. You figure out what you want done and you call a function
that is documented to do it. You don't call functions because the function
call "looks" like it will do what you want.
 
S

Steven T. Hatton

John said:
Really? When I try this, the code compiles and the program crashes when
run. Of course, functions can check for null pointers (though this won't
stop compilation) but at least my copy of strlen apparently doesn't.

On my system it throws an exception.
void fooptr(int * ptr);
void fooptr(const int * ptr);

vs

void fooref(int & ref);
void fooref(const int & ref);

int main()
{
int x;
fooptr(&x);
fooref(x);
}
What do you know from the fooptr(&x) call that you don't know from the
fooref(x) call or vice versa?

in the case of fooptr() I can see that the address is being taken, and
therefor _know_ it is not passed by value. In the case of fooref(), I
can't tell if it's a pass by reference, or a pass by value.
The only time you can know something from a function call is if you
*never* use references. In that case, when you pass by value you know that
the value won't be changed. If you pass by pointer, you still don't know
either way.

I know it is not pass by value.
The situation in which you never use references is of course
when programming in C. I think this preference for pointers is just a
hangover from C.

The head guys at Trolltech have been programming in C++ since the late
1980s. I don't believe their recommendations are due to being stuck in C.
The real point is surely that when you call a function you are supposed to
know what it does. You figure out what you want done and you call a
function that is documented to do it. You don't call functions because the
function call "looks" like it will do what you want.

Well, you can read the documentation, or the source. Nonetheless, if you
can provide useful information at the call point with minimal cost, it
makes little sense not to do so. There are many times when a person
reading the code may not have read all the documentation, or all the
source. For example, if you are trying to debug someone else's code. It's
also the case that people forget things.
 
J

John Carson

Steven T. Hatton said:
in the case of fooptr() I can see that the address is being taken, and
therefor _know_ it is not passed by value. In the case of fooref(), I
can't tell if it's a pass by reference, or a pass by value.

The point is that you can't tell if the function changes x.
Well, you can read the documentation, or the source. Nonetheless, if
you can provide useful information at the call point with minimal
cost, it makes little sense not to do so.

The cost is never using non-const references, which I consider to be a high
cost. It is actually a prohibitive cost, because it means you can't use
libraries that use non-const references. In the real world, we have to work
with other people's code and other people's decisions, which means that
self-documenting pass-by-value schemes can't work.
There are many times when
a person reading the code may not have read all the documentation, or
all the source. For example, if you are trying to debug someone
else's code. It's also the case that people forget things.

I find it rather extraordinary that a person could call a function without
knowing whether or not it changes values within the calling scope. It really
seems to me to be such a fundamental difference in function behaviour that a
programmer should know about it as a matter of course. In any event, if you
pass a pointer to any function, then you have to independently ascertain
whether values within the calling scope are changed even with C.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top