Getter returning strings by value -- good or bad?

M

Matthias Kaeppler

Hi,

I was wondering why library implementors often
make getter functions return strings by value
(copies). For example, in boost::filesystem the
leaf() function returns an std::string by value.
So does Gnome::Vfs::FileInfo::get_name().

Isn't that unnecessary overhead? I could as well
return by reference to const-string and avoid
copying the string, right? That's what I always
do, and it works quite well.
Okay, for strings it's maybe not that of a
performance hit, but with more complex objects I'd
reckon it's a bad idea to return copies.

I'm asking because I have this scenario:
I have a class called File, which defines an
interface for accessing information about a file.
Its state is mostly defined by a
Gnome::Vfs::FileInfo object.
Many functions in the File interface are just
wrappers; they delegate calls to the encapsulated
FileInfo object.

One such example is a function get_name(), which
returns the name of a file. It's implemented like
this:

Glib::ustring File::get_name() const {
return m_finfo_ptr->get_name();
}

I previously wanted to implement it like this:

const Glib::ustring& File::get_name() const {
return m_finfo_ptr->get_name();
}

That however issues a warning, because I'm
referencing a temporary (FileInfo::get_name()
returns a copy of the name-string, that's actually
why this question came up).

But back to the first version:

Glib::ustring File::get_name() const {
return m_finfo_ptr->get_name();
}

How many copies are created here?
FileInfo::get_name() creates a copy, and I'm then
returning a copy of that copy, right? That's terrible!

Please, someone shed some light on this.

Regards,
Matthias
 
V

Victor Bazarov

Matthias said:
I was wondering why library implementors often make getter functions
return strings by value (copies). For example, in boost::filesystem the
leaf() function returns an std::string by value. So does
Gnome::Vfs::FileInfo::get_name().

Isn't that unnecessary overhead? I could as well return by reference to
const-string and avoid copying the string, right? [...]

What would that be a reference to, and how is it going to fare in a multi-
threaded program? Returning a string by value is perfectly fine from
a member function called for a temporary, but returning a reference is
most likely isn't...

Compilers can optimize but they can't fix broken code for you. Keep that
in mind :)

V
 
M

Matthias Kaeppler

Victor said:
What would that be a reference to, and how is it going to fare in a multi-
threaded program? Returning a string by value is perfectly fine from
a member function called for a temporary, but returning a reference is
most likely isn't...

Well, it would be a reference to an object which
already exists. Like this:

class FileInfo {
private:
Glib::ustring name;
public:
const Glib::ustring& get_name() const {
return name;
}
};

I don't see why this is a problem in multithreaded
environments. The reference returned is a
reference to const, so you can't write it anyway;
you don't need a lock on this object and several
threads may read it at will without breaking
integrity.
Compilers can optimize but they can't fix broken code for you. Keep that
in mind :)

I don't see where the code above is broken.
 
V

Victor Bazarov

Matthias said:
Well, it would be a reference to an object which already exists. Like this:

class FileInfo {
private:
Glib::ustring name;
public:
const Glib::ustring& get_name() const {
return name;
}
};

I don't see why this is a problem in multithreaded environments. The
reference returned is a reference to const, so you can't write it
anyway; you don't need a lock on this object and several threads may
read it at will without breaking integrity.

Yes, but if one of the threads wants to write to it, the readers would
have to wait, no? But if they decided to use the reference without
locking it, what's going to happen if somebody changes the contents of
that value in some writer thread?
I don't see where the code above is broken.

The code above doesn't do anything hence it can't be broken. Now, imagine
somebody writes

std::string const& sr = someclass().somememberreturningaconstref();

a temporary is gone and 'sr' has become invalid as soon as it was defined.
It's more difficult if you return a string by value.

V
 
P

Panjandrum

Matthias said:
I was wondering why library implementors often
make getter functions return strings by value
(copies).

Sometimes because they assume a reference-counted string class.
For example, in boost::filesystem the
leaf() function returns an std::string by value.

Boost is 'off topic'. They just imitate the inefficient STL-style where
everything is 'by value'.
So does Gnome::Vfs::FileInfo::get_name().
Isn't that unnecessary overhead?

Yes, if the string is duplicated. No, if the string is ref-counted.
I could as well
return by reference to const-string and avoid
copying the string, right?

Yes, if it's possible (i.e. if you have an object for which you can
return a refernce).
That's what I always
do, and it works quite well.
Okay, for strings it's maybe not that of a
performance hit, but with more complex objects I'd
reckon it's a bad idea to return copies.

I'm asking because I have this scenario:
I have a class called File, which defines an
interface for accessing information about a file.
Its state is mostly defined by a
Gnome::Vfs::FileInfo object.
Many functions in the File interface are just
wrappers; they delegate calls to the encapsulated
FileInfo object.

One such example is a function get_name(), which
returns the name of a file. It's implemented like
this:

Glib::ustring File::get_name() const {
return m_finfo_ptr->get_name();
}

I previously wanted to implement it like this:

const Glib::ustring& File::get_name() const {
return m_finfo_ptr->get_name();
}

That however issues a warning, because I'm
referencing a temporary (FileInfo::get_name()
returns a copy of the name-string, that's actually
why this question came up).

But back to the first version:

Glib::ustring File::get_name() const {
return m_finfo_ptr->get_name();
}

How many copies are created here?
FileInfo::get_name() creates a copy, and I'm then
returning a copy of that copy, right? That's terrible!

You are right. But there is a huge difference between what is described
in tutorials/books and what is used in the real world. Do you know if
Glib::ustring is ref-counted? If not then the implementation is
probably 'suboptimal'.
 
I

Ian

Matthias said:
Well, it would be a reference to an object which already exists. Like this:

class FileInfo {
private:
Glib::ustring name;
public:
const Glib::ustring& get_name() const {
return name;
}
};
Why would you want to return a reference in a more general case? The
example you quote is fine, but on some systems it may be more efficient
to return by value.
I don't see why this is a problem in multithreaded environments. The
reference returned is a reference to const, so you can't write it
anyway; you don't need a lock on this object and several threads may
read it at will without breaking integrity.
In this trivial case, no. But in situations where the string is
constructed in the function, maybe using a member variable, you hit
problems.
I don't see where the code above is broken.
As a trivial case that can get inlined away, it isn't!

Ian
 
M

Matthias Kaeppler

Panjandrum said:
Sometimes because they assume a reference-counted string class.

That's a good point, I didn't think about that. Actually I don't know if
it's ref-counted. I'll look into that.
You are right. But there is a huge difference between what is described
in tutorials/books and what is used in the real world. Do you know if
Glib::ustring is ref-counted? If not then the implementation is
probably 'suboptimal'.

But the compiler will optimize away the second copy, right? (assuming
the worst case, that the string is NOT ref-counted)?
I mean, it doesn't lokk like if I had a choice here.
 
M

Matthias Kaeppler

Victor said:
Yes, but if one of the threads wants to write to it, the readers would
have to wait, no? But if they decided to use the reference without
locking it, what's going to happen if somebody changes the contents of
that value in some writer thread?

Okay, you have a point here.

But: If the string should be indeed reference counted, doesn't that mean
all those problems are still there? The "copy" returned by the getter
would probably have a pointer to the string data which it shares with
another string object and this data could at the same time be accessed
by a second thread--because locking the object itself doesn't do
anything. Both objects are internally pointing to the same string data,
but they are still different objects!
That seems to be even worse than returning by reference.
The code above doesn't do anything hence it can't be broken. Now, imagine
somebody writes

std::string const& sr = someclass().somememberreturningaconstref();

a temporary is gone and 'sr' has become invalid as soon as it was defined.
It's more difficult if you return a string by value.

Not sure what you want to tell me here. Why would sr be invalid? sr is
initialized with the string returned by the member function which
returns a reference to it. After that line, working with sr is just like
working with the "real" string, just that you have read-only access.
Nothing is invalid here.
 
M

Matthias Kaeppler

Ian said:
Why would you want to return a reference in a more general case? The
example you quote is fine, but on some systems it may be more efficient
to return by value.

Under what circumstances would it be more efficient to copy-construct an
object than to return a reference to it? ^^
As a trivial case that can get inlined away, it isn't!

Hm. What would happen if the code can't get inlined?
 
I

Ian

Matthias said:
Under what circumstances would it be more efficient to copy-construct an
object than to return a reference to it? ^^
Consider a reference counted string, this could have a size of 8 bytes
(two pointers). Eight bytes is 64bits which is a single register on a
number of machines. Compilers often reserve one or more registers for
function returns. Some architectures use register wheels, so the sting
could be in a single out register.
Hm. What would happen if the code can't get inlined?
OK, inlined away and returns a constant - the last bit being the
important bit.

Ian
 
I

Ian

Matthias said:
Okay, you have a point here.

But: If the string should be indeed reference counted, doesn't that mean
all those problems are still there? The "copy" returned by the getter
would probably have a pointer to the string data which it shares with
another string object and this data could at the same time be accessed
by a second thread--because locking the object itself doesn't do
anything. Both objects are internally pointing to the same string data,
but they are still different objects!
That seems to be even worse than returning by reference.
That's where copy-on-write comes in. The same representation will be
passed around until something changes it. At this time, the
representation will be cloned and then modified.
Not sure what you want to tell me here. Why would sr be invalid? sr is
initialized with the string returned by the member function which
returns a reference to it. After that line, working with sr is just like
working with the "real" string, just that you have read-only access.
Nothing is invalid here.
I think Victor was thinking of a member returning a local string variable.

Ian
 
B

Bart

Matthias said:
But: If the string should be indeed reference counted, doesn't that mean
all those problems are still there? The "copy" returned by the getter
would probably have a pointer to the string data which it shares with
another string object and this data could at the same time be accessed
by a second thread--because locking the object itself doesn't do
anything. Both objects are internally pointing to the same string data,
but they are still different objects!
That seems to be even worse than returning by reference.

Whether the string class is ref counted or not is irrelevant to the
client, and if it is ref counted then it's the classes responsibility
to ensure that the references are not messed up. As the client you
should read the documentation to see what guarantees the class makes
about thread safety and so forth.
 
P

Panjandrum

Matthias said:
But the compiler will optimize away the second copy, right?

Maybe, maybe not. Still one needless heap allocation.
(assuming
the worst case, that the string is NOT ref-counted)?
I mean, it doesn't lokk like if I had a choice here.

It also depends on how often you copy the string. I wouldn't care for a
few dozens superfluous copies (that's the abstraction penalty of C++).
OTOH, things are probably different if you want to create and sort a
large vector<Glib::ustring>.
 
G

Greg

Matthias said:
Under what circumstances would it be more efficient to copy-construct an
object than to return a reference to it? ^^


Hm. What would happen if the code can't get inlined?

Inlined or not, the getter function can be optimized with the Return
Value Optimization or the Named Return Value Optimization (RVO and
NRVO, repectively), to eliminate the temporary and the copy, and
construct the object returned, in place at the calling site.

In fact, returning by value can often be more efficient than the usual
alternative of having the client pass a reference to an object to use
to hold the result. The drawback with this approach:


void getString( std::string& outResult) const


is the inconvenient syntax that forces the client to declare a
variable, even in the case that the result is to be passed onto another
function. One workaround is for the function to return a reference to
the supplied object:


std::string& getString( std::string& outResult) const


and while an improvement in some ways, it makes the overall interface
more complicated than needed.

But once the RVO optimization is applied, neither of those two routines
is as efficient as the most natural declaration:


std::string getString() const;


Returning by value need not be inefficient, as many still think.

An class object should certainly never return a non-const reference to
one of its own data members. Doing so essentially allows clients to
change its state behind its back, and pretty much renders useless any
sort of encapsulation the class was meant to provide. One may as well
make the data member a global variable at that point.

Greg
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,062
Latest member
OrderKetozenseACV

Latest Threads

Top