Regarding copy constructors and mutators

Discussion in 'Perl Misc' started by J. Romano, Jun 26, 2004.

  1. J. Romano

    J. Romano Guest

    Dear Perl community,

    One thing that confused me about Perl (no matter how often I read
    the explanations) was the issue of operator overloading, specifically
    about overloading the '=' operator. The information I read in the
    "perldoc overload" page and in the Camel book explain that overloading
    '=' does NOT overload the Perl assignment operator. I found this
    confusing, and from browsing through archived UseNet posts, I found
    that I wasn't the only one.

    Therefore, I decided to experiment around until this matter was
    clear to me. I finally have it figured out, and I thought I would
    share my findings here, so at the very least someone else who was
    confused about this might now be able to understand. This write-up
    assumes that the reader already knows something about Perl references
    and objects. The knowledge of operator overloading in C++ will come
    in handy here.

    Along the way I'll offer clarifications, which are statements that,
    once learned and understood, helped me understand more advanced issues
    pertaining to operator overloading.

    In fact, here's one now: :)

    CLARIFICATION:
    A "mutator" is an operator that changes an object ("Mutate" comes from
    the Latin verb "mutare" which means "to change"). These are '++' and
    '--', as well as '+=', '-=', '*=', '/=', '%=', '**=', '<<=', '>>=',
    'x=', and '.='. (In my opinion, the "perldoc overload' page does not
    make this clear, citing only '++' and '--' as mutators. It doesn't
    say that the others aren't mutators; it just doesn't clearly list them
    as such.)

    Now that we understand this, let's explore the following statement
    from "Chapter 13: Overloading" of the Camel book:

    The '=' handler lets you intercept the mutator and
    copy the object yourself so that the copy alone
    is mutated.

    So what exactly does this mean? Well, let me offer the following two
    scenarios:


    Scenario 1:
    Person A loves programming with references. In fact, his C/C++ code
    is filled with pointers (maybe you know someone like this). When he
    writes Perl code like:

    $a = new SomeObject;
    $b = $a;
    $b->addToValue(5);

    he expects the an attribute of both $a and $b to be increased by 5.
    He thinks this is logical because, when he types "$b = $a", he doesn't
    consider $b to be a copy separate from $a, but rather that $b is now
    an alias (or pointer, or reference) to $a, and that whatever change is
    made to one must also be made automatically to the other.


    Scenario 2:
    Person B loves programming with objects and tries to avoid references
    when they are not required. In fact, her C/C++ code is filled with
    objects declared on the stack, and she rarely uses calls to malloc()
    and new() (maybe you know someone like this). When she writes Perl
    code like this:

    $a = new SomeObject;
    $b = $a;
    $b += 5;

    she expects only $b to change. She thinks this is logical because,
    when she types "$b = $a", she considers $b to be a separate (though
    identical) copy from $a. Therefore, if a change is made to $b, $a
    should be left unchanged, because only $b was explicitly changed.


    So given these two scenarios, which programmer is correct? Which
    one will be happy that Perl supports his or her style of programming,
    and which one will end up grumbling, wishing that Perl had a way of
    handling his or her own style?

    Here's the good news: Perl supports both scenarios!

    Now, if you already knew that, then you might not gain anything
    useful from reading the rest of this write-up (but if you anything
    wrong with what I write, by all means, please correct me). But if you
    didn't know this and you're wondering how this can be so, read on!

    So you might be wondering: how can this be possible? In one
    scenario, the line "$b = $a" makes $b point to $a, but in the other,
    it makes $b be a separate copy.

    And here's an answer: Perl knows whether $b is intended to be used
    as a reference to an object or as an actual copy of an object. And
    how does it know that, exactly? Simple: Perl can read your mind.
    Well, no... that's not quite correct. A better explanation is that
    Perl invokes the copy constructor (that's reserved for the '='
    operator) NOT when "$b = $a" is called, but WHEN THE NEXT MUTATOR IS
    USED on $a or $b. In the above example, that would be when the line
    "$b += 5" is evaluated.

    In other words, the line "$b = $a" does NOT invoke the copy
    constructor and set $b to its own copy; what it does is tell the Perl
    interpreter to get ready to assign a new copy to $b as soon as it sees
    $b using a mutator operator. So if the line "$b = $a" is used and $b
    (or $a) never uses a mutator operator, then the copy constructor
    doesn't get called.

    It can be confusing to figure out what gets called when when a lot
    of these calls are done "under the hood." Therefore, I wrote up a
    short module to help visualize what is going on. Take the following
    code and save it to a file named "Groat.pm":

    #!/usr/bin/perl -w

    use strict;

    package Groat;
    use overload
    '=' => \&copyConstructor,
    '+' => \&plus,
    '++' => \&plusPlus,
    '+=' => \&plusEquals,
    '""' => \&getValue,
    ;

    # The printInfo() function is meant to be called like this:
    # printInfo(@_);
    # It prints information about the calling function and
    # its arguments:
    sub printInfo
    {
    print "In function ", (caller(1))[3], "\n";

    use Data::Dumper;
    # Get argument string:
    my $argString = Dumper @_;
    # Change multi-lines into one line:
    $argString =~ s/\n\s+/ /g;
    # Indent each line:
    $argString =~ s/^/ /gm; # indent each line
    # Change all $VAR1's to $_[0] (and so on):
    $argString =~ s/(?<!\\)\$VAR(\d+)\b/"\$_[".($1-1)."]"/eg;
    # Now print modified argument string:
    print $argString;
    }

    sub new
    {
    printInfo(@_);
    return bless {value => $_[1]}, ref($_[0]) || $_[0];
    }

    sub setValue
    {
    printInfo(@_);
    $_[0]->{value} = $_[1];
    }

    sub getValue
    {
    printInfo(@_);
    return $_[0]->{value};
    }

    sub copyConstructor
    {
    printInfo(@_);
    my $newObject = bless { }, "Groat";
    $newObject->{value} = $_[0]->{value};
    return $newObject;
    # We could have said instead:
    # use Data::Dumper;
    # return eval Dumper $_[0];
    }

    sub plus
    {
    printInfo(@_);
    my $newObject = bless { }, "Groat";

    if (ref($_[1])) {
    # We passed in an object (so we will assume it's a Groat):
    $newObject->{value} = $_[0]->{value} + $_[1]->{value};
    } else {
    # We passed in a non-object (so we will assume it's a number):
    $newObject->{value} = $_[0]->{value} + $_[1];
    }

    return $newObject;
    }

    sub plusPlus
    {
    printInfo(@_);
    $_[0]->{value}++;
    }

    sub plusEquals
    {
    printInfo(@_);

    if (ref($_[1])) {
    # We passed in an object (so we will assume it's a Groat):
    $_[0]->{value} += $_[1]->{value};
    } else {
    # We passed in a non-object (so we will assume it's a number):
    $_[0]->{value} += $_[1];
    }

    return $_[0]; # we return $_[0] so that it gets assigned
    }

    1; # return a true value

    __END__


    Let me explain some things about this package. A "Groat" object is
    really a blessed reference to a anonymous hash (if that makes no sense
    to you, you should probably read up on Perl objects -- you might want
    to try reading the perldocs on "perlboot", "perltoot", "perltooc", and
    "perlbot"). A Groat object really does nothing special other than
    just store a numeric value in the "value" entry of its hash. The only
    operators that are overloaded are '=', '""' (the "string-ify"
    operator), and the addition operators (that is, '+', '++', '+=').

    In order to use this module for learning purposes, you can start up
    the interactive Perl interpreter by typing:

    perl -de 1

    Then you can type "use Groat;" at the debugger prompt and hit ENTER to
    have access to the Groat class. Since you are in the Perl debugger,
    you will have to type "q" (and hit ENTER) to exit the interactive Perl
    session (typing "exit" won't work here).

    So if you haven't already done so, start the interactive Perl
    debugger and type "use Groat;" (I probably shouldn't have to add that
    you need to hit ENTER after every command... :) . It would also be a
    good idea to have a the Groat.pm file open in a text editor nearby so
    you can reference it periodically.

    Now, create a new Groat object with the line:

    $a = new Groat(5);

    You will see text telling you what method is being called and what its
    arguments are. I put this in so that I could easily see what
    arguments the overloaded operators were expecting. Basically, I
    created a subroutine called printInfo that prints the caller's
    subroutine and its arguments that gets called at the beginning of
    every other subroutine. That why you see the lines:

    In function Groat::new
    $_[0] = 'Groat';
    $_[1] = 5;

    Now you can printing the object you just created with the line:

    print $a;

    You will now see the lines:

    In function Groat::getValue
    $_[0] = bless( { 'value' => 5 }, 'Groat' );
    $_[1] = undef;
    $_[2] = '';

    5

    What just happened is this: by attempting to print $a, you invoke the
    getValue() method, because of the line in the overload pragma that
    says:

    '""' => \&getValue,

    The getValue() method returns the number 5 (which then gets printed
    due to the print statement), which is why you see a "5" standing out
    there on a line of its own.

    You might notice that the first argument passed into the
    Groat::getValue() method is a long line that includes the word
    "bless". If that looks strange to you, don't worry about it; all that
    means is that the actual object reference is passed in as the first
    argument.

    Now set $b equal to $a with this line:

    $b = $a;

    You'll notice that the copyConstructor() method was not called. You
    may know by now that that won't get called until you use $a or $b with
    its first mutator.

    Now print $b. You'll see that it has the same value as $a.

    Now add 2 to $b by typing:

    $b->plusEquals(2);

    You'll notice that the copy constructor still was not called. That's
    because the change made to $b here was also made to $a. You can print
    both $a and $b to verify that they both print out 7.

    Now add 2 to $b again by typing:

    $b += 2;

    This time you'll notice that the copy constructor gets called.
    Because the '+=' operator is a mutator and that $a and $b share the
    same object, Perl knows to knows to break their "shared-object-ness"
    by calling the copy constructor and assigning a new object to $b.

    Now when you'll print out both $a and $b you'll notice that $a prints
    out 7, whereas $b prints out 9. So while $a and $b started out as
    references to the same object, that fact was longer true as soon as
    Perl started to evaluate the "$b += 2" line.

    If you type the "$b += 2" line a second time, no copy constructor is
    called. This is because $b is already separate from $a.

    "But wait!" you might be thinking. If '+=' was overloaded to be
    "plusEquals", shouldn't the following two lines be identical?:

    $b->plusEquals(2);
    $b += 2;

    That's a good point. If "$b += 2" called the copy constructor,
    shouldn't "$b->plusEquals(2)" have called it as well? Well, now it's
    time for another clarification:

    CLARIFICATION:
    In C++, the lines:

    b.operator+=(2);
    b += 2;

    are identical. But in Perl, the (seemingly equivalent) lines:

    $b->plusEquals(2);
    $b += 2;

    are NOT identical! The two lines are indeed very similar, but there
    is one important difference: If a line such as "$b = $a" (or "$a =
    $b") occured sometime before the above code, the line
    "$b->plusEquals(2)" will affect both $a and $b, whereas the line "$b
    += 2" will cause the copy constructor to assign a copy of $a to $b,
    making the "+= 2" change affect only $b.

    But remember, "$b += 2" will only call the copy constructor if it
    needs to. If no other object shares $b's reference, then no copy
    constructor gets called.

    Now try this:

    $a = $b;
    $b++;

    You'll see that the copy constructor gets called again, as you'd
    expect. But now try "$b++" again. And again. You'll see that the
    copy constructor KEEPS getting called, even though $b no longer shares
    its reference with $a. This seems to contradict my earlier statement
    that the copy constructor will ONLY get called if another object
    shares $b's reference. So what's happening?

    What's happening is that there IS another object that shares $b's
    reference! Recall what the difference is between these two lines of
    code:

    $a = ++$b;
    $a = $b++;

    The top line uses "pre-incrementation", whereas the bottom uses
    "post-incrementation." Basically, the top line will increment $b, and
    then assign its new value to $a. But the bottom line assigns $b's old
    value to $a AND THEN increments $b. In order for this to happen, a
    temporary copy is made from $b, which is exactly why the copy
    constructor gets called when you type "$b++" repeatedly on a line by
    itself. This temporary copy does not need to be made with "++$b"
    (unless, of course, there is another object that shares its referece).
    (This is why you see a lot of C++ loops that are declared with a
    pre-increment operator like this:

    for (i = 0; i < len; ++i)
    {
    ...
    }

    it's because using the pre-increment operator is slightly faster and
    more efficient than using the post-increment operator (since a
    temporary copy does not have to be made).)

    Now try this: Create five variables and assign them all to the
    same value:

    $a = $b = $c = $d = $e = new Groat(4);

    Now if you were to type:

    $c += 1;

    what would happen? We already know that only the value for $c would
    change (the copy constructor takes care of that), but just how many
    times will the copy constructor be called? Four times? You might
    think the answer is four in order for the other variables to break
    their reference from $c. However, the copy constructor only gets
    called once.

    This is because, when $c mutates, only $c has to break off its
    reference from the others; the others still have their shared
    reference to each other.

    In fact, doing "+= 1" to all the other objects will result in the
    copy constructor being called for all of them except for the last one
    (since the last object no longer has a shared reference).

    Here's something to watch out for:

    Let's say you have the following lines of code:

    $a = new Groat(5);
    $b = $a;
    $a->{'comment'} = 'set in $a';

    This looks a bit strange, but since a Groat object is just a reference
    to an anonymous hash (a blessed reference, but still a reference), you
    can continue to add new entries to that hash. All we did was add a
    new hash entry that had 'comment' as the key and 'set in $a' as the
    value.

    Now, as you should know by now, this change affects both $a and $b.
    You can verify this by using Data::Dumper, a standard Perl module:

    use Data::Dumper;
    print Dumper($a);
    print Dumper($b);

    In fact, you can see that they still share the reference by typing:

    print Dumper($a, $b);

    It says that $VAR2 equals $VAR1, meaning that $b still points to $a.

    Now when you type:

    $a++;

    $a and $b no longer have shared objects. But who has that comment we
    just added? $a or $b? Or both? Well, to find out, let us type:

    print Dumper($a, $b);

    The first thing you should notice is that $VAR2 no longer equals $VAR1
    (which is not surprising, from what we know now). But what might
    surprise you is that it's $VAR2 (which corresponds to $b) that has the
    comment we added to $a. $a no longer has that comment.

    How can this be? Well, this is because when we typed "$a++" the
    copy constructor was invoked (which we expected), but it gave its new
    copy to $a, not $b. The old object reference stayed at $b. Now it's
    time for another clarification:

    CLARIFICATION:
    When the copy constructor is called due to a mutator on a shared
    object reference, the variable that uses that mutator is the one that
    gets the new copy created by the copy constructor. In other words:

    $a += 3;

    will give $a a new object regardless of whether the previous line was
    "$a = $b" or "$b = $a".

    At this point you might be thinking, "Okay, so $a got the copy of
    $b. But shouldn't $a still get the comment that we added?". Well,
    that depends on how we implemented the copy constructor. If you look
    back at the copy constructor, you'll see that only the 'value' entry
    of the hash was copied to the new object, which is why $a appeared to
    lose it, while $b kept it.

    So now let's try a little experiment. Let's comment out the
    overload line that says:

    '=' => \&copyConstructor,

    and then restart our interactive Perl interpreter. (If we restart it
    with the command:

    perl -MGroat -de 1

    you won't have to type "use Groat;" to be able to access the Groat
    package.)

    Now let's type:

    $a = new Groat(5);
    $b = $a;
    $a->{'comment'} = 'set in $a';
    $a += 3;

    Now who gets the comment? Actually, that's a trick question. Since
    '=' no longer points to a copy construction, you should have received
    the error:

    Operation `=': no method found

    telling you that the line "$a += 3" is not valid.

    Now let's modify the Groat.pm file again. Leave the '=' line
    commented out, and comment out the following line:

    '+=' => \&plusEquals,

    and restart the interactive Perl interpreter. Now type:

    $a = new Groat(5);
    $b = $a;
    $a += 3;

    For some reason that worked without complaining, even though the '+='
    AND '=' operators were commented out in the overload pragma. We can
    dump out their values to see what was affected with these lines:

    use Data::Dumper;
    print Dumper($a, $b);

    This gives us:

    $VAR1 = bless( {
    'value' => 8
    }, 'Groat' );
    $VAR2 = bless( {
    'value' => 5
    }, 'Groat' );

    which means that only $a was affected by the line "$a += 3", and that
    $b did in fact get its very own copy of $a. But how did that happen
    without a call to plusEquals() or to the copy constructor?

    This happened because, since '+=' was not overloaded, Perl was
    smart enough to "autogenerate" it from the same function that '+'
    uses. In other words, the line:

    $a += 3;

    was treated as if it was:

    $a = $a + 3;

    making it the function used for '+', and not for '+=', was used. The
    function used for '+' was the plus() method, which does NOT mutate (or
    change) any parameter passed in. Instead (and you can refer back to
    the code), I wrote the plus() method to create a new Groat object,
    modify its value, and return that instance. This mutates the new
    Groat object but it doesn't affect either of the values passed in.
    Because of this, '+' isn't considered by Perl to be a mutator, so it
    will not call the copy constructor (and will not give an error if a
    copy constructor doesn't exist). It just simply creates a new object
    from $a and 3 and assigns it back to $a, making $a lose its original
    object reference.

    CLARIFICATION:
    If '+=' is used on a shared reference when '+=' is OVERLOADED, it WILL
    call the copy constructor. But if it is AUTOGENERATED (from '+'), it
    WILL NOT call the copy constructor, because '+' has no need for one.

    The same goes for '++'. If it's not overloaded, Perl is smart enough
    to autogenerate by converting "$a++" to "$a = $a + 1" (which doesn't
    need the copy constructor) to give you exactly what you'd expect.

    Real quick, type the follwing code:

    $a = new Groat(4);
    $b = 5 + $a;

    Everything should work as expected, but you should see that when the
    method Groat::plus was called, $a was the first parameter and 5 was
    the second parameter (instead of 5 being the first parameter, like you
    called it). That's because when an overloaded function is called, it
    always passes the object of its own type in first, and the other
    object (or non-object) in second. Sometimes it doesn't make a
    difference (like in addition where 10 + 1 == 1 + 10), but sometimes it
    makes a big difference (like in subtraction where 10 - 1 does NOT
    equal 1 - 10). In that case, you will be able to tell if the order of
    objects was reversed by checking the third parameter passed into the
    function. In other words, if $_[2] is a true value, then the
    parameters were passed in reverse order of how they were called.

    Now let's go back to the original two scenarios. Person A is
    happy, because as long as he treats his objects like references, they
    will act just like he would expect pointers to act like. And Person B
    is happy, because as long as she treats her objects like separate
    object instantiations, they act like separate objects. Is there
    anyone who isn't happy?

    Well, that would be Person C in this next scenario:

    Scenario 3:
    Person C doesn't really prefer one style of programming over another.
    In fact, its (Person C is an android) C/C++ code is a mixture of
    pointers and object instantiations (maybe you know someone like this).
    When it writes Perl code like:

    $a = new SomeObject;
    $b = $a;
    $b->addToValue(5);
    $b += 5;
    $b->addToValue(7);
    $b += 7;

    it really doesn't know what to expect. Androids are really just a
    little more than computers with limbs, and since computers aren't very
    intelligent, neither is Person C. (That's why androids aren't hired
    to be programmers.) Because of its low intelligence, Person C gets
    replaced by Person D (that's you), a very intelligent Perl programmer
    who has a good grasp of Perl operator overloading. Your task is to
    figure out what Person C's code did.

    By looking over the code you understand that the line
    "$b->addToValue(5)" changes both $b and $a, but all the other lines
    change only $b, because the line "$b += 5" separates $b from $a. You
    present your knowledge to the boss, who gives you a big raise, and you
    live happily ever after.


    So that's pretty much it! If you bothered to read this whole
    write-up, I'll feel pretty good about writing it. And if you find
    that I made a mistake, feel free to correct me.

    But before I go, let me just offer a few more clarifications, ones
    that would have made my task of understanding all this a little bit
    easier:

    CLARIFICATION:
    The copy constructor used by '=' technically takes three parameters,
    but only the first is used. The first parameter is the object to be
    copied. The return value should be a new object (whose contents
    should be identical to the object passed in as the first parameter,
    but that's really up to the programmer).

    CLARIFICATION:
    If, for some reason, you DON'T want the line "$a++" to create a
    separate object (when it was previously pointing to a shared object),
    you can replace the '=' line in the overload pragma with this line:

    '=' => sub { $_[0] },

    This line will make a useless copy constructor -- that is, the copy
    constructor still gets called, but all it does is return the reference
    to the first parameter. But in order for this to work, the '++'
    operator must be overloaded (and not autogenerated), or else the
    function for '+' will be used, which does not call the copy
    constructor. In this way, "$a++" will change itself and all the
    references pointing to it, as well.

    CLARIFICATION:
    The functions used by the assignment mutators (such as '+=', '-=',
    '*=', etc.) should modify $_[0] (based on what $_[1] is) and then
    should return $_[0]. If $_[0] is not returned, the wrong value will
    be assigned to the object doing the mutating. (You might want to
    modify $_[1] and return it instead of $_[0] depending on whether $_[2]
    tells you if the parameters were reversed but this is a rare case.)

    CLARIFICATION:
    The functions used by '++' and '--' should modify $_[0]. They do not
    have to return anything (whereas they do for '+=' and the like). As a
    result, by using this overload pragma:

    use overload
    '=' => \&copy,
    '++' => sub { },
    ;

    you can replace this line that calls the copy constructor:

    $b = $a->copy();

    with these lines:

    $b = $a;
    $a++; # no modification, but a copy is made

    But please don't. If you do this, the maintainers that come after you
    will think bad thoughts about you and call YOU an android (or worse!).
    :)

    CLARIFICATION:
    Because of autogeneration, an operator such as '+=' doesn't have to be
    overloaded, provided that the '+' operator already is. The main
    reasons you would want to overload the '+=' operator (and '-=', '*=',
    etc.) is if you would want the following lines:

    $a += 3;
    $a = $a + 3;

    to do something different (which is normally a bad idea) or if you are
    looking for speed and efficiency. It's more efficient to overload
    '+=' because then a new copy doesn't have to be created.

    The same thing goes for '++' and '--'. They do not need to be
    overloaded since they can be autogenerated. But if they are
    overloaded, the copy constructor will be called implicitly if they are
    used on an object that shares a reference.

    Got all that? I'm done now (really). I hope this write-up cleared
    up more confusion than it created.

    (Everyone has my permission to distribute this write-up. If you
    know someone who might benefit from reading this write up, feel free
    to make copies of it and re-distribute it. Thank-you for your time.)

    Happy Perling,

    Jean-Luc Romano
     
    J. Romano, Jun 26, 2004
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steven T. Hatton

    The real issue with accessors and mutators

    Steven T. Hatton, Nov 9, 2004, in forum: C++
    Replies:
    12
    Views:
    558
    Gary Labowitz
    Nov 10, 2004
  2. Jeremy Smith
    Replies:
    2
    Views:
    620
    Jeremy Smith
    Aug 3, 2006
  3. Jess
    Replies:
    5
    Views:
    640
    Ron Natalie
    Jun 7, 2007
  4. Travis
    Replies:
    3
    Views:
    413
    Erik Wikström
    May 12, 2008
  5. srp113
    Replies:
    3
    Views:
    494
Loading...

Share This Page