Best practice: return by value versus const ref

M

Mikewhy

Apologies. I know this has to have been discussed, but I'm unable to google
my way past the shallower, newbie discussions of temporary's lifetime, etc.

A class Lookup contains a collection of strings referenced by some key, the
values representing, perhaps, a database lookup.. The key is representative
of its field name. Here, we're interested only in the accessor methods,
get_val() and get_value(). The first returns a copy of the string by value.
The other returns a const reference to the value.

class Lookup
{
public:
enum FieldId {
...
};

std::string get_val(FieldId) const;

const std::string & get_value(FieldId) const;
...
};

The values already exist as std::string in the Lookup object. Returning by
const reference then is direct and has the advantage of efficiency.
Returning by value, even with return value optimization, incurs at minimum a
string copy.

The question is, assuming that multithread issues are not present, how
easily can naive code cause trouble for itself with a const reference return
interface? What usage patterns (that have a chance of surviving the low
threshold of a peer review) might prove problematic?
 
R

Rui Maciel

Mikewhy said:
The question is, assuming that multithread issues are not present, how
easily can naive code cause trouble for itself with a const reference
return interface? What usage patterns (that have a chance of surviving the
low threshold of a peer review) might prove problematic?

Returning references may cause some significant problems in some
circumstances, if you can't guarantee that the object will only be accessed
during its lifetime. If you can't guarantee that then it's quite possible
that you might end up trying to access an object which doesn't really exist.
For example, the following code will compile just fine with g++:

<code>

#include <iostream>

struct Foo
{
int some_object;

Foo(int some_value = 0);
~Foo();

int const &getter() const {return some_object; }
};

Foo::Foo(int some_value)
: some_object(some_value)
{
std::cout << "Object constructed" << std::endl;
}

Foo::~Foo()
{
std::cout << "Object destructed. He's dead, Jim." << std::endl;
}

int main(void)
{
const int *bar = NULL;

{
Foo foo;
bar = &foo.getter();
std::cout << "accessing object at " << bar << ". value is: "
<< *bar << std::endl;
}

std::cout << "accessing object at " << bar << ". value is: " << *bar
<< std::endl;
return 0;
}

</code>


Rui Maciel
 
R

Rui Maciel

Rui said:
Returning references may cause some significant problems in some
circumstances, if you can't guarantee that the object will only be
accessed during its lifetime. If you can't guarantee that then it's quite
possible that you might end up trying to access an object which doesn't
really exist.

Here is another example which better demonstrates what I was referring to:

<code>
#include <iostream>

struct Foo
{
int some_object;

Foo(int some_value = 0);
~Foo();

int const &getter() const {return some_object; }
};

Foo::Foo(int some_value)
: some_object(some_value)
{
std::cout << "Object constructed" << std::endl;
}

Foo::~Foo()
{
std::cout << "Object destructed. He's dead, Jim." << std::endl;
}


struct Bar
{
float some_other_object;

Bar(float some_value = 5.0f);
~Bar();

float const &getter() const {return some_other_object; }
};

Bar::Bar(float some_value)
: some_other_object(some_value)
{
std::cout << "Bar object constructed" << std::endl;
}

Bar::~Bar()
{
std::cout << "Bar object destructed" << std::endl;
}


int main(void)
{
char buffer[10] = {0};

const int *value = NULL;
const float *other_value = NULL;

// object is created in a specific address
Foo *foo = new(buffer) Foo(1);
value = &foo->getter();
std::cout << "accessing foo's object at " << value << ". value is: "
<< *value << std::endl;

// foo's lifetime ends
foo->~Foo();

// a new object has been allocated on top of the old one
Bar *bar = new(buffer) Bar;
other_value = &bar->getter();
std::cout << "accessing foo's object at " << value << ". value is: "
<< *value << std::endl;
std::cout << "accessing bar's object at " << other_value << ". value
is: " << *other_value << std::endl;
bar->~Bar();
return 0;
}
</code>


Rui Maciel
 
M

Marcel Müller

Hi,

The values already exist as std::string in the Lookup object. Returning
by const reference then is direct and has the advantage of efficiency.
Returning by value, even with return value optimization, incurs at
minimum a string copy.

taking into account that many (almost all?) std::string implementations
provide a copy constructor with O(1) together with copy on write, you
should not care too much about performance here. You will not copy the
string either. You will copy a reference to the string.

The question is, assuming that multithread issues are not present, how
easily can naive code cause trouble for itself with a const reference
return interface? What usage patterns (that have a chance of surviving
the low threshold of a peer review) might prove problematic?

If you use a reference you should not change the value of the string as
long as you access the reference. Maybe you also must not make any
changes to the dictionary, because changing the dictionary content may
cause reallocations of nodes and destroy the original string.


Marcel
 
N

Nobody

taking into account that many (almost all?) std::string implementations
provide a copy constructor with O(1) together with copy on write,

Not any more. That used to be common until multi-threading started
being widely used, at which point the overhead of locking every access
outweighed the advantages of CoW. It might still be used on platforms
which have separate single-threaded and multiple-threaded versions of the
standard library. But in general, you should assume that string copies are
O(n).
 
M

Mikewhy

"Rui Maciel" wrote in message
The question is, assuming that multithread issues are not present, how
easily can naive code cause trouble for itself with a const reference
return interface? What usage patterns (that have a chance of surviving the
low threshold of a peer review) might prove problematic?


int main(void)
{
const int *bar = NULL;

{
Foo foo;
bar = &foo.getter();
std::cout << "accessing object at " << bar << ". value is: "
<< *bar << std::endl;
}

std::cout << "accessing object at " << bar << ". value is: " << *bar
<< std::endl;
return 0;
}

</code>
======================

That's basically the gist of it. One expects even the most cursory review to
catch something so blatant, even a few levels deep in function calls or
object ownership.

As a short sidebar, before moving on, in your example above, return by value
would also be equally noteworthy with some older compilers. Just how bad
depends on which compiler (some being already quite old and will continue to
be in widespread use for some long time to come). To wit:

const std::string & aval = alookup.get_val(); // return by value,

This is (purportedly?) reasonably safe in C++11, but at least one compiler I
know personally will allow it to fail at runtime with only an ignorable
warning at compile time.

It isn't the dunderheaded error exemplified by both of the above that I'm
after. I'm after reasonable looking usage that despite best efforts and
intentions of experienced and competent engineers still can fail. When we
talk of safety in this context, we are talking specifically about relative
lifetimes of the reference and referencee. Rather than a revelation, this is
already I expect an ingrained and abiding paranoia in everyone still reading
this. It is, after all, the basis and reason that shared_ptr and friends
exist.

I think I'll conclude here, maybe prematurely, that it is NOT return by
const ref that is unsound or unsafe. It is careless or inappropriate usage
or storage that can be problematic. Return by const ref rather than value
does not of itself make usage errors any more or less likely. If you don't
know or can't control the lifetime of the referenced, either through
ownership, explicit guarantee, or reference counts, storing a copy is a
reasonable alternative. I believe this to apply both in general and in this
particular discussion.

Your further thoughts, please....
 
M

Marcel Müller

Not any more. That used to be common until multi-threading started
being widely used, at which point the overhead of locking every access
outweighed the advantages of CoW. It might still be used on platforms
which have separate single-threaded and multiple-threaded versions of the
standard library. But in general, you should assume that string copies are
O(n).

Hmm, that's a pity.
It confirms my behavior not to use std::string and to prefer other
immutable string implementations that can be copied by reference.
Unfortunately something like that is not part of he standard.

I do not care much about the memcpy for a usually small string content,
but over and over using allocators might be an impact and is subject to
some kind of synchronization too - at least at the delete operation.

In fact, CoW only needs a few synchronization points. Firstly at the
copy constructor, secondly when a non-unique reference is detached from
a string instance. Both can be implemented lock free with atomic
increment/decrement. And if you use an inner and an outer reference
count, most of the atomic increments at copy construction can also be
avoided at the cost of some more logic when detaching from references
with higher outer count.


Marcel
 
M

Marcel Müller

int main(void)
{
const int *bar = NULL;

{
Foo foo;
bar = &foo.getter();
std::cout << "accessing object at " << bar << ". value
is: "
<< *bar << std::endl;
}

std::cout << "accessing object at " << bar << ". value is: " <<
*bar
<< std::endl;
return 0;
}

</code>
======================

AFAIK this is broken code. bar takes a dangling reference. It doesn't
care whether getter returns by reference or by value.

As a short sidebar, before moving on, in your example above, return by
value would also be equally noteworthy with some older compilers. Just
how bad depends on which compiler (some being already quite old and will
continue to be in widespread use for some long time to come). To wit:

const std::string & aval = alookup.get_val(); // return by value,

This is (purportedly?) reasonably safe in C++11, but at least one
compiler I know personally will allow it to fail at runtime with only an
ignorable warning at compile time.

Extending the lifetime of temporaries with const references has been
included far before C++11. I think it was in the early 90s, but I maybe
wrong. So unless we are talking about a specific broken compiler, this
code is fine.

I think I'll conclude here, maybe prematurely, that it is NOT return by
const ref that is unsound or unsafe. It is careless or inappropriate
usage or storage that can be problematic. Return by const ref rather
than value does not of itself make usage errors any more or less likely.
If you don't know or can't control the lifetime of the referenced,
either through ownership, explicit guarantee, or reference counts,
storing a copy is a reasonable alternative. I believe this to apply both
in general and in this particular discussion.

Ack.


Marcel
 
M

Mikewhy

"Marcel Müller" wrote in message

As a short sidebar, before moving on, in your example above, return by
value would also be equally noteworthy with some older compilers. Just
how bad depends on which compiler (some being already quite old and will
continue to be in widespread use for some long time to come). To wit:

const std::string & aval = alookup.get_val(); // return by value,

This is (purportedly?) reasonably safe in C++11, but at least one
compiler I know personally will allow it to fail at runtime with only an
ignorable warning at compile time.

Extending the lifetime of temporaries with const references has been
included far before C++11. I think it was in the early 90s, but I maybe
wrong. So unless we are talking about a specific broken compiler, this
code is fine.

================
Yes, and I managed to learn something despite myself in these past few
hours. I attributed it to C++11 because I first came across it in that
context. Googling around finds that it was in as early as C++03. Lifetime of
the temporary is extended to that of the const ref that binds to it.

Which raises a second question, about just how deep that scope can be. For
the following (example of hazardous code in contest of OP discussion),
VS2008 issued a compiler warning, and subsequently dumped garbage at run
time, while g++ was quietly completely boring. (Which was the basis for my
earlier comment.)

class Bar
{
private:
const std::string & aval; // ctor from temporary value.
const std::string & aref; // ctor from reference return.

public:
Bar(const Lookup & a)
: aval(a.get_val()), aref(a.get_value())
{}
 
R

Rui Maciel

Marcel said:
AFAIK this is broken code. bar takes a dangling reference. It doesn't
care whether getter returns by reference or by value.

That's not true. The bar pointer takes a perfectly valid reference to a
perfectly valid object. A problem only arises if foo's lifetime somehow
ends. That was the whole point of the example.

Conversely, the same example would be broken if struct Foo's getter returned
by value, as that assignment requires an lvalue. This is caught by any
compiler.


Rui Maciel
 
P

Pavel

Nobody said:
Not any more. That used to be common until multi-threading started
being widely used, at which point the overhead of locking every access
outweighed the advantages of CoW. It might still be used on platforms
which have separate single-threaded and multiple-threaded versions of the
standard library. But in general, you should assume that string copies are
O(n).
Just FYI: GNU STL's std::string is COW, for both single- and multi- threaded
compilations; and have been for quite a while since multi-threading became common.

-Pavel
 
G

goran.pusic

Apologies. I know this has to have been discussed, but I'm unable to google

my way past the shallower, newbie discussions of temporary's lifetime, etc.



A class Lookup contains a collection of strings referenced by some key, the

values representing, perhaps, a database lookup.. The key is representative

of its field name. Here, we're interested only in the accessor methods,

get_val() and get_value(). The first returns a copy of the string by value.

The other returns a const reference to the value.



class Lookup

{

public:

enum FieldId {

...

};



std::string get_val(FieldId) const;



const std::string & get_value(FieldId) const;

...

};



The values already exist as std::string in the Lookup object. Returning by

const reference then is direct and has the advantage of efficiency.

Returning by value, even with return value optimization, incurs at minimum a

string copy.



The question is, assuming that multithread issues are not present, how

easily can naive code cause trouble for itself with a const reference return

interface? What usage patterns (that have a chance of surviving the low

threshold of a peer review) might prove problematic?

Except from what Rui Maciel had shown, I can only think of a subsequent change:

class v1
{
type field;
public:
const type& getfield() const { return field; }
};

class v2 // v1, but code changed in the meantime
{
type calculated() { return type(...); }
public:
const type& getfield() const { return calculated(); }
};

This will produce at least a diagnostic when compiled ("returning a reference to a temporary" or some such), so well-cranked up warning level is beneficial... ;-)

At any rate, the core issue is the same here and in the other example: one took a reference to something, and that something ceased to exist.

As for your question "how easily can naive code?"... Hard to say. It depends very much on people working on the code. The less experience, the more chance there is this will happen and vice verso.

Goran.

P.S. random rambler on the 'net once said: "OOP in C++ means "Object Ownership Protocols" ;-).
 
N

none

Apologies. I know this has to have been discussed, but I'm unable to google
my way past the shallower, newbie discussions of temporary's lifetime, etc.

A class Lookup contains a collection of strings referenced by some key, the
values representing, perhaps, a database lookup.. The key is representative
of its field name. Here, we're interested only in the accessor methods,
get_val() and get_value(). The first returns a copy of the string by value.
The other returns a const reference to the value.

class Lookup
{
public:
enum FieldId {
...
};

std::string get_val(FieldId) const;

const std::string & get_value(FieldId) const;
...
};

The values already exist as std::string in the Lookup object. Returning by
const reference then is direct and has the advantage of efficiency.
Returning by value, even with return value optimization, incurs at minimum a
string copy.

The question is, assuming that multithread issues are not present, how
easily can naive code cause trouble for itself with a const reference return
interface? What usage patterns (that have a chance of surviving the low
threshold of a peer review) might prove problematic?

My concerns are similar to Goran. You are painting yourself in a
corner for future refactoring.

you are stating that the Lookup class does have a member variable for
every one of the FieldId that can be queried. That is now. However,
for something like a "Lookup", it would seem plausible that you may
not forever want to keep a cache of all data in the object. The
interface you present forces you to do it forever in order to possibly
save a few CPU cycles (which have not been profiled as being
critical).

Note also that a client that uses the interface as follow may gain a
small performance improvement:

std::string const & value = aLookup.get_value(FieldId);

However, a client that attempt to use it as follow:
(a very common way to write)

std::string value = aLookup.get_value(FieldId);

may very much be less performant that if you'd return by value.

Anyone that want to di into the details of what RVO and Copy Elision
my mean in optimized code should read through Dave Abrahams article:
http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/

and this article is not even C++11 aware.

Yannick
 
M

Mikewhy

"none (Yannick Tremblay)" wrote in message

Apologies. I know this has to have been discussed, but I'm unable to google
my way past the shallower, newbie discussions of temporary's lifetime, etc.

A class Lookup contains a collection of strings referenced by some key, the
values representing, perhaps, a database lookup.. The key is representative
of its field name. Here, we're interested only in the accessor methods,
get_val() and get_value(). The first returns a copy of the string by value.
The other returns a const reference to the value.

class Lookup
{
public:
enum FieldId {
...
};

std::string get_val(FieldId) const;

const std::string & get_value(FieldId) const;
...
};

The values already exist as std::string in the Lookup object. Returning by
const reference then is direct and has the advantage of efficiency.
Returning by value, even with return value optimization, incurs at minimum
a
string copy.

The question is, assuming that multithread issues are not present, how
easily can naive code cause trouble for itself with a const reference
return
interface? What usage patterns (that have a chance of surviving the low
threshold of a peer review) might prove problematic?

My concerns are similar to Goran. You are painting yourself in a
corner for future refactoring.

you are stating that the Lookup class does have a member variable for
every one of the FieldId that can be queried. That is now. However,
for something like a "Lookup", it would seem plausible that you may
not forever want to keep a cache of all data in the object. The
interface you present forces you to do it forever in order to possibly
save a few CPU cycles (which have not been profiled as being
critical).

Note also that a client that uses the interface as follow may gain a
small performance improvement:

std::string const & value = aLookup.get_value(FieldId);

However, a client that attempt to use it as follow:
(a very common way to write)

std::string value = aLookup.get_value(FieldId);

may very much be less performant that if you'd return by value.
================

Sure. Point taken about copy construction versus. It's a slightly different
context, but certainly one of the differences that I had asked for.

The actual usage that led to my OP is illustrated by (but not actually):

if (foo[field1] == "VALUE1" ||
foo[field1] == "VALUE2 ||
(foo[field1] == "VALUE3 &&
foo[field4] == "VALUE4"))
{ ... }
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,999
Latest member
MakersCBDGummiesReview

Latest Threads

Top