Regarding copy constructors and mutators

J

J. Romano

Dear Perl community,

One thing that confused me about Perl (no matter how often I read
the explanations) was the issue of operator overloading, specifically
about overloading the '=' operator. The information I read in the
"perldoc overload" page and in the Camel book explain that overloading
'=' does NOT overload the Perl assignment operator. I found this
confusing, and from browsing through archived UseNet posts, I found
that I wasn't the only one.

Therefore, I decided to experiment around until this matter was
clear to me. I finally have it figured out, and I thought I would
share my findings here, so at the very least someone else who was
confused about this might now be able to understand. This write-up
assumes that the reader already knows something about Perl references
and objects. The knowledge of operator overloading in C++ will come
in handy here.

Along the way I'll offer clarifications, which are statements that,
once learned and understood, helped me understand more advanced issues
pertaining to operator overloading.

In fact, here's one now: :)

CLARIFICATION:
A "mutator" is an operator that changes an object ("Mutate" comes from
the Latin verb "mutare" which means "to change"). These are '++' and
'--', as well as '+=', '-=', '*=', '/=', '%=', '**=', '<<=', '>>=',
'x=', and '.='. (In my opinion, the "perldoc overload' page does not
make this clear, citing only '++' and '--' as mutators. It doesn't
say that the others aren't mutators; it just doesn't clearly list them
as such.)

Now that we understand this, let's explore the following statement
from "Chapter 13: Overloading" of the Camel book:

The '=' handler lets you intercept the mutator and
copy the object yourself so that the copy alone
is mutated.

So what exactly does this mean? Well, let me offer the following two
scenarios:


Scenario 1:
Person A loves programming with references. In fact, his C/C++ code
is filled with pointers (maybe you know someone like this). When he
writes Perl code like:

$a = new SomeObject;
$b = $a;
$b->addToValue(5);

he expects the an attribute of both $a and $b to be increased by 5.
He thinks this is logical because, when he types "$b = $a", he doesn't
consider $b to be a copy separate from $a, but rather that $b is now
an alias (or pointer, or reference) to $a, and that whatever change is
made to one must also be made automatically to the other.


Scenario 2:
Person B loves programming with objects and tries to avoid references
when they are not required. In fact, her C/C++ code is filled with
objects declared on the stack, and she rarely uses calls to malloc()
and new() (maybe you know someone like this). When she writes Perl
code like this:

$a = new SomeObject;
$b = $a;
$b += 5;

she expects only $b to change. She thinks this is logical because,
when she types "$b = $a", she considers $b to be a separate (though
identical) copy from $a. Therefore, if a change is made to $b, $a
should be left unchanged, because only $b was explicitly changed.


So given these two scenarios, which programmer is correct? Which
one will be happy that Perl supports his or her style of programming,
and which one will end up grumbling, wishing that Perl had a way of
handling his or her own style?

Here's the good Perl supports both scenarios!

Now, if you already knew that, then you might not gain anything
useful from reading the rest of this write-up (but if you anything
wrong with what I write, by all means, please correct me). But if you
didn't know this and you're wondering how this can be so, read on!

So you might be wondering: how can this be possible? In one
scenario, the line "$b = $a" makes $b point to $a, but in the other,
it makes $b be a separate copy.

And here's an answer: Perl knows whether $b is intended to be used
as a reference to an object or as an actual copy of an object. And
how does it know that, exactly? Simple: Perl can read your mind.
Well, no... that's not quite correct. A better explanation is that
Perl invokes the copy constructor (that's reserved for the '='
operator) NOT when "$b = $a" is called, but WHEN THE NEXT MUTATOR IS
USED on $a or $b. In the above example, that would be when the line
"$b += 5" is evaluated.

In other words, the line "$b = $a" does NOT invoke the copy
constructor and set $b to its own copy; what it does is tell the Perl
interpreter to get ready to assign a new copy to $b as soon as it sees
$b using a mutator operator. So if the line "$b = $a" is used and $b
(or $a) never uses a mutator operator, then the copy constructor
doesn't get called.

It can be confusing to figure out what gets called when when a lot
of these calls are done "under the hood." Therefore, I wrote up a
short module to help visualize what is going on. Take the following
code and save it to a file named "Groat.pm":

#!/usr/bin/perl -w

use strict;

package Groat;
use overload
'=' => \&copyConstructor,
'+' => \&plus,
'++' => \&plusPlus,
'+=' => \&plusEquals,
'""' => \&getValue,
;

# The printInfo() function is meant to be called like this:
# printInfo(@_);
# It prints information about the calling function and
# its arguments:
sub printInfo
{
print "In function ", (caller(1))[3], "\n";

use Data::Dumper;
# Get argument string:
my $argString = Dumper @_;
# Change multi-lines into one line:
$argString =~ s/\n\s+/ /g;
# Indent each line:
$argString =~ s/^/ /gm; # indent each line
# Change all $VAR1's to $_[0] (and so on):
$argString =~ s/(?<!\\)\$VAR(\d+)\b/"\$_[".($1-1)."]"/eg;
# Now print modified argument string:
print $argString;
}

sub new
{
printInfo(@_);
return bless {value => $_[1]}, ref($_[0]) || $_[0];
}

sub setValue
{
printInfo(@_);
$_[0]->{value} = $_[1];
}

sub getValue
{
printInfo(@_);
return $_[0]->{value};
}

sub copyConstructor
{
printInfo(@_);
my $newObject = bless { }, "Groat";
$newObject->{value} = $_[0]->{value};
return $newObject;
# We could have said instead:
# use Data::Dumper;
# return eval Dumper $_[0];
}

sub plus
{
printInfo(@_);
my $newObject = bless { }, "Groat";

if (ref($_[1])) {
# We passed in an object (so we will assume it's a Groat):
$newObject->{value} = $_[0]->{value} + $_[1]->{value};
} else {
# We passed in a non-object (so we will assume it's a number):
$newObject->{value} = $_[0]->{value} + $_[1];
}

return $newObject;
}

sub plusPlus
{
printInfo(@_);
$_[0]->{value}++;
}

sub plusEquals
{
printInfo(@_);

if (ref($_[1])) {
# We passed in an object (so we will assume it's a Groat):
$_[0]->{value} += $_[1]->{value};
} else {
# We passed in a non-object (so we will assume it's a number):
$_[0]->{value} += $_[1];
}

return $_[0]; # we return $_[0] so that it gets assigned
}

1; # return a true value

__END__


Let me explain some things about this package. A "Groat" object is
really a blessed reference to a anonymous hash (if that makes no sense
to you, you should probably read up on Perl objects -- you might want
to try reading the perldocs on "perlboot", "perltoot", "perltooc", and
"perlbot"). A Groat object really does nothing special other than
just store a numeric value in the "value" entry of its hash. The only
operators that are overloaded are '=', '""' (the "string-ify"
operator), and the addition operators (that is, '+', '++', '+=').

In order to use this module for learning purposes, you can start up
the interactive Perl interpreter by typing:

perl -de 1

Then you can type "use Groat;" at the debugger prompt and hit ENTER to
have access to the Groat class. Since you are in the Perl debugger,
you will have to type "q" (and hit ENTER) to exit the interactive Perl
session (typing "exit" won't work here).

So if you haven't already done so, start the interactive Perl
debugger and type "use Groat;" (I probably shouldn't have to add that
you need to hit ENTER after every command... :) . It would also be a
good idea to have a the Groat.pm file open in a text editor nearby so
you can reference it periodically.

Now, create a new Groat object with the line:

$a = new Groat(5);

You will see text telling you what method is being called and what its
arguments are. I put this in so that I could easily see what
arguments the overloaded operators were expecting. Basically, I
created a subroutine called printInfo that prints the caller's
subroutine and its arguments that gets called at the beginning of
every other subroutine. That why you see the lines:

In function Groat::new
$_[0] = 'Groat';
$_[1] = 5;

Now you can printing the object you just created with the line:

print $a;

You will now see the lines:

In function Groat::getValue
$_[0] = bless( { 'value' => 5 }, 'Groat' );
$_[1] = undef;
$_[2] = '';

5

What just happened is this: by attempting to print $a, you invoke the
getValue() method, because of the line in the overload pragma that
says:

'""' => \&getValue,

The getValue() method returns the number 5 (which then gets printed
due to the print statement), which is why you see a "5" standing out
there on a line of its own.

You might notice that the first argument passed into the
Groat::getValue() method is a long line that includes the word
"bless". If that looks strange to you, don't worry about it; all that
means is that the actual object reference is passed in as the first
argument.

Now set $b equal to $a with this line:

$b = $a;

You'll notice that the copyConstructor() method was not called. You
may know by now that that won't get called until you use $a or $b with
its first mutator.

Now print $b. You'll see that it has the same value as $a.

Now add 2 to $b by typing:

$b->plusEquals(2);

You'll notice that the copy constructor still was not called. That's
because the change made to $b here was also made to $a. You can print
both $a and $b to verify that they both print out 7.

Now add 2 to $b again by typing:

$b += 2;

This time you'll notice that the copy constructor gets called.
Because the '+=' operator is a mutator and that $a and $b share the
same object, Perl knows to knows to break their "shared-object-ness"
by calling the copy constructor and assigning a new object to $b.

Now when you'll print out both $a and $b you'll notice that $a prints
out 7, whereas $b prints out 9. So while $a and $b started out as
references to the same object, that fact was longer true as soon as
Perl started to evaluate the "$b += 2" line.

If you type the "$b += 2" line a second time, no copy constructor is
called. This is because $b is already separate from $a.

"But wait!" you might be thinking. If '+=' was overloaded to be
"plusEquals", shouldn't the following two lines be identical?:

$b->plusEquals(2);
$b += 2;

That's a good point. If "$b += 2" called the copy constructor,
shouldn't "$b->plusEquals(2)" have called it as well? Well, now it's
time for another clarification:

CLARIFICATION:
In C++, the lines:

b.operator+=(2);
b += 2;

are identical. But in Perl, the (seemingly equivalent) lines:

$b->plusEquals(2);
$b += 2;

are NOT identical! The two lines are indeed very similar, but there
is one important difference: If a line such as "$b = $a" (or "$a =
$b") occured sometime before the above code, the line
"$b->plusEquals(2)" will affect both $a and $b, whereas the line "$b
+= 2" will cause the copy constructor to assign a copy of $a to $b,
making the "+= 2" change affect only $b.

But remember, "$b += 2" will only call the copy constructor if it
needs to. If no other object shares $b's reference, then no copy
constructor gets called.

Now try this:

$a = $b;
$b++;

You'll see that the copy constructor gets called again, as you'd
expect. But now try "$b++" again. And again. You'll see that the
copy constructor KEEPS getting called, even though $b no longer shares
its reference with $a. This seems to contradict my earlier statement
that the copy constructor will ONLY get called if another object
shares $b's reference. So what's happening?

What's happening is that there IS another object that shares $b's
reference! Recall what the difference is between these two lines of
code:

$a = ++$b;
$a = $b++;

The top line uses "pre-incrementation", whereas the bottom uses
"post-incrementation." Basically, the top line will increment $b, and
then assign its new value to $a. But the bottom line assigns $b's old
value to $a AND THEN increments $b. In order for this to happen, a
temporary copy is made from $b, which is exactly why the copy
constructor gets called when you type "$b++" repeatedly on a line by
itself. This temporary copy does not need to be made with "++$b"
(unless, of course, there is another object that shares its referece).
(This is why you see a lot of C++ loops that are declared with a
pre-increment operator like this:

for (i = 0; i < len; ++i)
{
...
}

it's because using the pre-increment operator is slightly faster and
more efficient than using the post-increment operator (since a
temporary copy does not have to be made).)

Now try this: Create five variables and assign them all to the
same value:

$a = $b = $c = $d = $e = new Groat(4);

Now if you were to type:

$c += 1;

what would happen? We already know that only the value for $c would
change (the copy constructor takes care of that), but just how many
times will the copy constructor be called? Four times? You might
think the answer is four in order for the other variables to break
their reference from $c. However, the copy constructor only gets
called once.

This is because, when $c mutates, only $c has to break off its
reference from the others; the others still have their shared
reference to each other.

In fact, doing "+= 1" to all the other objects will result in the
copy constructor being called for all of them except for the last one
(since the last object no longer has a shared reference).

Here's something to watch out for:

Let's say you have the following lines of code:

$a = new Groat(5);
$b = $a;
$a->{'comment'} = 'set in $a';

This looks a bit strange, but since a Groat object is just a reference
to an anonymous hash (a blessed reference, but still a reference), you
can continue to add new entries to that hash. All we did was add a
new hash entry that had 'comment' as the key and 'set in $a' as the
value.

Now, as you should know by now, this change affects both $a and $b.
You can verify this by using Data::Dumper, a standard Perl module:

use Data::Dumper;
print Dumper($a);
print Dumper($b);

In fact, you can see that they still share the reference by typing:

print Dumper($a, $b);

It says that $VAR2 equals $VAR1, meaning that $b still points to $a.

Now when you type:

$a++;

$a and $b no longer have shared objects. But who has that comment we
just added? $a or $b? Or both? Well, to find out, let us type:

print Dumper($a, $b);

The first thing you should notice is that $VAR2 no longer equals $VAR1
(which is not surprising, from what we know now). But what might
surprise you is that it's $VAR2 (which corresponds to $b) that has the
comment we added to $a. $a no longer has that comment.

How can this be? Well, this is because when we typed "$a++" the
copy constructor was invoked (which we expected), but it gave its new
copy to $a, not $b. The old object reference stayed at $b. Now it's
time for another clarification:

CLARIFICATION:
When the copy constructor is called due to a mutator on a shared
object reference, the variable that uses that mutator is the one that
gets the new copy created by the copy constructor. In other words:

$a += 3;

will give $a a new object regardless of whether the previous line was
"$a = $b" or "$b = $a".

At this point you might be thinking, "Okay, so $a got the copy of
$b. But shouldn't $a still get the comment that we added?". Well,
that depends on how we implemented the copy constructor. If you look
back at the copy constructor, you'll see that only the 'value' entry
of the hash was copied to the new object, which is why $a appeared to
lose it, while $b kept it.

So now let's try a little experiment. Let's comment out the
overload line that says:

'=' => \&copyConstructor,

and then restart our interactive Perl interpreter. (If we restart it
with the command:

perl -MGroat -de 1

you won't have to type "use Groat;" to be able to access the Groat
package.)

Now let's type:

$a = new Groat(5);
$b = $a;
$a->{'comment'} = 'set in $a';
$a += 3;

Now who gets the comment? Actually, that's a trick question. Since
'=' no longer points to a copy construction, you should have received
the error:

Operation `=': no method found

telling you that the line "$a += 3" is not valid.

Now let's modify the Groat.pm file again. Leave the '=' line
commented out, and comment out the following line:

'+=' => \&plusEquals,

and restart the interactive Perl interpreter. Now type:

$a = new Groat(5);
$b = $a;
$a += 3;

For some reason that worked without complaining, even though the '+='
AND '=' operators were commented out in the overload pragma. We can
dump out their values to see what was affected with these lines:

use Data::Dumper;
print Dumper($a, $b);

This gives us:

$VAR1 = bless( {
'value' => 8
}, 'Groat' );
$VAR2 = bless( {
'value' => 5
}, 'Groat' );

which means that only $a was affected by the line "$a += 3", and that
$b did in fact get its very own copy of $a. But how did that happen
without a call to plusEquals() or to the copy constructor?

This happened because, since '+=' was not overloaded, Perl was
smart enough to "autogenerate" it from the same function that '+'
uses. In other words, the line:

$a += 3;

was treated as if it was:

$a = $a + 3;

making it the function used for '+', and not for '+=', was used. The
function used for '+' was the plus() method, which does NOT mutate (or
change) any parameter passed in. Instead (and you can refer back to
the code), I wrote the plus() method to create a new Groat object,
modify its value, and return that instance. This mutates the new
Groat object but it doesn't affect either of the values passed in.
Because of this, '+' isn't considered by Perl to be a mutator, so it
will not call the copy constructor (and will not give an error if a
copy constructor doesn't exist). It just simply creates a new object
from $a and 3 and assigns it back to $a, making $a lose its original
object reference.

CLARIFICATION:
If '+=' is used on a shared reference when '+=' is OVERLOADED, it WILL
call the copy constructor. But if it is AUTOGENERATED (from '+'), it
WILL NOT call the copy constructor, because '+' has no need for one.

The same goes for '++'. If it's not overloaded, Perl is smart enough
to autogenerate by converting "$a++" to "$a = $a + 1" (which doesn't
need the copy constructor) to give you exactly what you'd expect.

Real quick, type the follwing code:

$a = new Groat(4);
$b = 5 + $a;

Everything should work as expected, but you should see that when the
method Groat::plus was called, $a was the first parameter and 5 was
the second parameter (instead of 5 being the first parameter, like you
called it). That's because when an overloaded function is called, it
always passes the object of its own type in first, and the other
object (or non-object) in second. Sometimes it doesn't make a
difference (like in addition where 10 + 1 == 1 + 10), but sometimes it
makes a big difference (like in subtraction where 10 - 1 does NOT
equal 1 - 10). In that case, you will be able to tell if the order of
objects was reversed by checking the third parameter passed into the
function. In other words, if $_[2] is a true value, then the
parameters were passed in reverse order of how they were called.

Now let's go back to the original two scenarios. Person A is
happy, because as long as he treats his objects like references, they
will act just like he would expect pointers to act like. And Person B
is happy, because as long as she treats her objects like separate
object instantiations, they act like separate objects. Is there
anyone who isn't happy?

Well, that would be Person C in this next scenario:

Scenario 3:
Person C doesn't really prefer one style of programming over another.
In fact, its (Person C is an android) C/C++ code is a mixture of
pointers and object instantiations (maybe you know someone like this).
When it writes Perl code like:

$a = new SomeObject;
$b = $a;
$b->addToValue(5);
$b += 5;
$b->addToValue(7);
$b += 7;

it really doesn't know what to expect. Androids are really just a
little more than computers with limbs, and since computers aren't very
intelligent, neither is Person C. (That's why androids aren't hired
to be programmers.) Because of its low intelligence, Person C gets
replaced by Person D (that's you), a very intelligent Perl programmer
who has a good grasp of Perl operator overloading. Your task is to
figure out what Person C's code did.

By looking over the code you understand that the line
"$b->addToValue(5)" changes both $b and $a, but all the other lines
change only $b, because the line "$b += 5" separates $b from $a. You
present your knowledge to the boss, who gives you a big raise, and you
live happily ever after.


So that's pretty much it! If you bothered to read this whole
write-up, I'll feel pretty good about writing it. And if you find
that I made a mistake, feel free to correct me.

But before I go, let me just offer a few more clarifications, ones
that would have made my task of understanding all this a little bit
easier:

CLARIFICATION:
The copy constructor used by '=' technically takes three parameters,
but only the first is used. The first parameter is the object to be
copied. The return value should be a new object (whose contents
should be identical to the object passed in as the first parameter,
but that's really up to the programmer).

CLARIFICATION:
If, for some reason, you DON'T want the line "$a++" to create a
separate object (when it was previously pointing to a shared object),
you can replace the '=' line in the overload pragma with this line:

'=' => sub { $_[0] },

This line will make a useless copy constructor -- that is, the copy
constructor still gets called, but all it does is return the reference
to the first parameter. But in order for this to work, the '++'
operator must be overloaded (and not autogenerated), or else the
function for '+' will be used, which does not call the copy
constructor. In this way, "$a++" will change itself and all the
references pointing to it, as well.

CLARIFICATION:
The functions used by the assignment mutators (such as '+=', '-=',
'*=', etc.) should modify $_[0] (based on what $_[1] is) and then
should return $_[0]. If $_[0] is not returned, the wrong value will
be assigned to the object doing the mutating. (You might want to
modify $_[1] and return it instead of $_[0] depending on whether $_[2]
tells you if the parameters were reversed but this is a rare case.)

CLARIFICATION:
The functions used by '++' and '--' should modify $_[0]. They do not
have to return anything (whereas they do for '+=' and the like). As a
result, by using this overload pragma:

use overload
'=' => \&copy,
'++' => sub { },
;

you can replace this line that calls the copy constructor:

$b = $a->copy();

with these lines:

$b = $a;
$a++; # no modification, but a copy is made

But please don't. If you do this, the maintainers that come after you
will think bad thoughts about you and call YOU an android (or worse!).
:)

CLARIFICATION:
Because of autogeneration, an operator such as '+=' doesn't have to be
overloaded, provided that the '+' operator already is. The main
reasons you would want to overload the '+=' operator (and '-=', '*=',
etc.) is if you would want the following lines:

$a += 3;
$a = $a + 3;

to do something different (which is normally a bad idea) or if you are
looking for speed and efficiency. It's more efficient to overload
'+=' because then a new copy doesn't have to be created.

The same thing goes for '++' and '--'. They do not need to be
overloaded since they can be autogenerated. But if they are
overloaded, the copy constructor will be called implicitly if they are
used on an object that shares a reference.

Got all that? I'm done now (really). I hope this write-up cleared
up more confusion than it created.

(Everyone has my permission to distribute this write-up. If you
know someone who might benefit from reading this write up, feel free
to make copies of it and re-distribute it. Thank-you for your time.)

Happy Perling,

Jean-Luc Romano
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,052
Latest member
LucyCarper

Latest Threads

Top