Deep versus Shallow Copy - Part 1

B

blangela

I have decided (see earlier post) to paste my Word doc here so that it
will be simpler for people to provide feedback (by directly inserting
their comments in the post). I will post it in 3 parts to make it more
manageable.

Below is a draft of a document that I plan to give to my introductory
C++ class.

Please note that I have purposely left out any mention of safety issues
in the ctors which could be resolved thru some combination smart
pointers or exception handling. These topics will be introduced in the
follow up course.

If anyone would like the complete document in MS Word (after I have
incorporated improvements you folks suggest), send me an e-mail with
"Deep versus Shallow Copy" as a title. The appearance in the Word doc
is much better (this tool is rather limited when it comes to formatting
choices and it seems to get the indentation wrong when translating from
Word).

PART 1:

Subject: Explanation of Deep versus Shallow Copy
Author: Bob Langelaan
Date: Nov. 25th, 2006

Note: The reader should already have studied the following C++
concepts in order to be able to understand this document: pointers and
references; the "new" and "delete" operators; the member
initialization list (MIL); the "this" pointer.

1.0 Introduction

Let us start with the following class:

class Example1
{
private:
int ii;
double dd;
ABC abc;
};

Now let us instantiate several Example1 objects:

Example1 xyz1, xyz2;
..
.. // Assume xyz1's value modified here
..
xyz2 = xyz1; // Is this allowed?

Yes, the above program statement is allowed in C++. The assignment
operator is normally automatically overloaded (see section "3.0
Advanced Topic Addendum" below for details on some exceptions) for
any class in C++. If the programmer does not overload the assignment
operator, the compiler will do it for you. Let us assume that in this
case the programmer has not explicitly overloaded the assignment
operator. What will the compiler supplied assignment operator
(sometimes called the "implicit" or "synthesized" assignment
operator) do in the statement above?

The ii member of xyz1 will be assigned to the ii member of xyz2; the dd
member of xyz1 will be assigned to the dd member of xyz2; and the abc
member of xyz1 will be assigned to the abc member of xyz2. This
process of assigning the members of one object to the members of
another object is called a "memberwise" assignment. Programmers
may also refer to this as a "shallow copy".

What if we have the following C++ statement?

Example1 xyz3 = xyz1; // Does this invoke the overloaded
// assignment operator as well?

No. In this case the copy constructor of the Example1 class will be
invoked. Just as with the overloaded assignment operator, if the
programmer does not supply a copy constructor for their class, the
compiler will. The compiler supplied copy constructor (the implicit or
synthesized copy constructor) does the equivalent to initializing all
of the members of the new object in its member initialization list
(MIL). This, in turn, will cause the copy constructor for each of the
members to be invoked. Therefore, in the previous C++ statement, the
compiler supplied copy constructor will effectively invoke the copy
constructor for each of the members of xyz3, passing to each copy
constructor the corresponding member of xyz1.

But what if we having the following class:

class Example2
{
public:
Example2();
private:
int ii;
double dd;
ABC * abcPtr; // will point to a dynamically created ABC object
};

And here is the implementation of the Example2 constructor:

Example2::Example2()
{
abcPtr = new ABC; // dynamically create an ABC object
// and have the class member abcPtr point to it.
.
.
.
}

Now let us instantiate several Example2 objects:

Example2 xyz1, xyz2;
..
.. // Assume xyz1's value modified here
..

xyz2 = xyz1; // Is a shallow copy good enough here?

The answer is very likely no. A shallow assignment will cause the
abcPtr member of both xyz1 and xyz2 to point to the same ABC object
(the ABC object that xyz1's abcPtr is pointing to). As well, with a
shallow assignment, the ABC object that xyz2's abcPtr was pointing to
is probably lost, thereby creating a memory leak! This is likely not
the result the programmer is trying to achieve.

In such a case we need to do a "non-memberwise" assignment.
Programmers may also refer to this as a "deep copy".

The assignment operator, instead of effectively doing this:

xyz1.abcPtr = xyz2.abcPtr; // this simply assigns one
// pointer to the other
pointer

needs to effectively do this:

*(xyz1.abcPtr) = *(xyz2.abcPtr); // This will take the ABC object
that
// xyz2.abcPtr is pointing to and
assign it to the
// ABC object that xyz1.abcPtr is
pointing to.

Therefore, in the case of class Example2, we need to overload the
assignment operator, and not rely on the compiler supplied assignment
operator. The programmer supplied assignment operator will need to do
a deep copy.

What if we have the following?

Example2 xyz3 = xyz1; // Can we use the compiler
// supplied copy constructor here?

No. The compiler supplied copy constructor will do a shallow copy. It
will effectively do this:

xyz3.abcPtr = xyz1.abcPtr; // this simply assigns one pointer
// to the other pointer

when what we need it to effectively do is:

xyz3.abcPtr = new ABC; // first dynamically create an ABC object
// (remember that xyz3 is constructed in this copy constructor)
*(xyz3.abcPtr) = *(xyz1.abcPtr); // then do the same as the
// programmer supplied assignment operator - a deep copy

Therefore, the programmer will need to supply a copy constructor for
the Example2 class as well. In a case such as this, the programmer
will also need to provide a destructor for their class. A C++ rule of
thumb, often referred to as the "Rule of Three", is that if your
class needs a destructor, then it likely needs a programmer defined
copy constructor and assignment operator as well. See the next section
below titled "2.0 Sample Code" for the implementation of all 3 of
these for the Example2 class.
 
?

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

What if we have the following?

Example2 xyz3 = xyz1; // Can we use the compiler
// supplied copy constructor here?

No. The compiler supplied copy constructor will do a shallow copy. It
will effectively do this:

xyz3.abcPtr = xyz1.abcPtr; // this simply assigns one pointer
// to the other pointer

when what we need it to effectively do is:

xyz3.abcPtr = new ABC; // first dynamically create an ABC object
// (remember that xyz3 is constructed in this copy constructor)
*(xyz3.abcPtr) = *(xyz1.abcPtr); // then do the same as the
// programmer supplied assignment operator - a deep copy


Unless you have some pedagogical reason not to wouldn't it be better
with something like this:

xyz3.abcPtr = new ABC(*(xyz1.abcPtr)); // Copy-create a new ABC object

Of course, since you expect the reader to be familiar with
initialization, it would be even better to use initialization in the
examples:

Example2::Example2(Example2& e)
: abcPtr(new ABS(*(e.abcPtr))
{
}

It's a bit late here so there are probably some errors in the above but
I think you'll get the idea.
 
B

blangela

Erik said:
Unless you have some pedagogical reason not to wouldn't it be better
with something like this:

xyz3.abcPtr = new ABC(*(xyz1.abcPtr)); // Copy-create a new ABC object

Of course, since you expect the reader to be familiar with
initialization, it would be even better to use initialization in the
examples:

Example2::Example2(Example2& e)
: abcPtr(new ABS(*(e.abcPtr))
{
}

It's a bit late here so there are probably some errors in the above but
I think you'll get the idea.

My students are still getting comfortable with pointers, MILs and the
new operator, so doing it the way I have shown makes the code easier to
understand. Otherwise, I agree with your suggestions.

Bob
 
G

Greg

blangela said:
I have decided (see earlier post) to paste my Word doc here so that it
will be simpler for people to provide feedback (by directly inserting
their comments in the post). I will post it in 3 parts to make it more
manageable.

Below is a draft of a document that I plan to give to my introductory
C++ class.

Please note that I have purposely left out any mention of safety issues
in the ctors which could be resolved thru some combination smart
pointers or exception handling. These topics will be introduced in the
follow up course.

If anyone would like the complete document in MS Word (after I have
incorporated improvements you folks suggest), send me an e-mail with
"Deep versus Shallow Copy" as a title. The appearance in the Word doc
is much better (this tool is rather limited when it comes to formatting
choices and it seems to get the indentation wrong when translating from
Word).

PART 1:

Subject: Explanation of Deep versus Shallow Copy
Author: Bob Langelaan
Date: Nov. 25th, 2006

Note: The reader should already have studied the following C++
concepts in order to be able to understand this document: pointers and
references; the "new" and "delete" operators; the member
initialization list (MIL); the "this" pointer.

1.0 Introduction

Let us start with the following class:

class Example1
{
private:
int ii;
double dd;
ABC abc;
};

I think the paper would be improved by using a more meaningful class
than "Example1" as the example. After all, if the example appears to be
contrived, then the student may question the relevancy of the material.
So the more "realistic" the example provided, the more likely that a
student will be able to apply it to C++ code they see and to the C++
code that they write. The specific class chosen does not matter all
that much: a "Shape" class, or a "Vehicle" class, or even a "String"
class are some of the usual choices.

Another important programming lesson is to choose the names of
identifiers carefully. Well-chosen names document the program and
minimize the likelihood of making a mistake. For that reason, this
program is not setting a good example by having identifiers with names
that differ only by one character at the very end of the name.
"Example1" and "Example2" (besides not communicating what either class
actually represents) may suggest that class names should resemble each
other as much as possible - when in fact the goal is the opposite.
Because the last thing that a programmer wants is to have two classes
whose names are easy to mix up.

Greg
 
B

blangela

Greg said:
I think the paper would be improved by using a more meaningful class
than "Example1" as the example. After all, if the example appears to be
contrived, then the student may question the relevancy of the material.
So the more "realistic" the example provided, the more likely that a
student will be able to apply it to C++ code they see and to the C++
code that they write. The specific class chosen does not matter all
that much: a "Shape" class, or a "Vehicle" class, or even a "String"
class are some of the usual choices.

Another important programming lesson is to choose the names of
identifiers carefully. Well-chosen names document the program and
minimize the likelihood of making a mistake. For that reason, this
program is not setting a good example by having identifiers with names
that differ only by one character at the very end of the name.
"Example1" and "Example2" (besides not communicating what either class
actually represents) may suggest that class names should resemble each
other as much as possible - when in fact the goal is the opposite.
Because the last thing that a programmer wants is to have two classes
whose names are easy to mix up.

Greg

Thanks for your input - Bob.
 
O

Old Wolf

blangela said:
Below is a draft of a document that I plan to give to my introductory
C++ class.

Subject: Explanation of Deep versus Shallow Copy

Personally, I find the terms "shallow copy" and "deep copy"
confusing w.r.t. C++, and I'm glad my learning resources
never mentioned them.

The reality is that when you copy an object using the
default assignment operator or copy constructor, you get
an exact replica of the original. If the original had a handle
on a resource (eg. pointer to memory) then the copy will
have another handle on that same resource.

This is a very simple concept (to me, anyway) and does
not need to be obfuscated with discussions about "deep"
and "shallow". The term "shallow copy" can imply that
if an object contains another object, then the sub-object
won't be copied -- which is not true.

If it were me teaching, the point I would be making is
that if you want to allocate new resources when
copying an object, you are going to need a user-defined
assignment operator & copy constructor.

BTW I don't think it's correct to talk about the default
assignment operator as being an "overload" -- not sure
on that though.
 
S

Steve Pope

Old Wolf said:
Personally, I find the terms "shallow copy" and "deep copy"
confusing w.r.t. C++, and I'm glad my learning resources
never mentioned them.
The reality is that when you copy an object using the
default assignment operator or copy constructor, you get
an exact replica of the original. If the original had a handle
on a resource (eg. pointer to memory) then the copy will
have another handle on that same resource.
This is a very simple concept (to me, anyway) and does
not need to be obfuscated with discussions about "deep"
and "shallow". The term "shallow copy" can imply that
if an object contains another object, then the sub-object
won't be copied -- which is not true.

I totally agree. "Shallow copy" means the same thing as
simply "copy". "Deep copy" means you have an algorithm that
descends (at least some) indirect references and makes
duplicates. "Deep copies" involve procedural code that
is situation specific, and saying "this makes a deep copy"
is in general ambiguous.

Steve
 
B

blangela

Steve said:
I totally agree. "Shallow copy" means the same thing as
simply "copy". "Deep copy" means you have an algorithm that
descends (at least some) indirect references and makes
duplicates. "Deep copies" involve procedural code that
is situation specific, and saying "this makes a deep copy"
is in general ambiguous.

Steve

A shallow copy is a member-wise copy. A deep copy is anything other
than a shallow copy. What could be simpler than that?
 
S

Steve Pope

blangela said:
Steve Pope wrote:
A shallow copy is a member-wise copy. A deep copy is anything other
than a shallow copy. What could be simpler than that?

It's simple, but it also makes the definition of "deep copy"
so broad as to be fairly useless.

Steve
 
O

Old Wolf

blangela said:
A shallow copy is a member-wise copy. A deep copy is anything other
than a shallow copy. What could be simpler than that?

Firstly, if that is your definition then why not just call them
what they are: "member-wise copy" and "non member-wise copy" ?

Second, that's not the commonly accepted definition of
'shallow copy' and 'deep copy' in computer science, so
it is confusing. Look at this example, where Foo is some class:

// C++:
class S { public: Foo foo; };

// Java:
class S { public Foo foo; };

In Java, assigning an object of type S to another object of type S
results in both objects having a handle on a single shared instance
of Foo. I think we would agree that this is a shallow copy.

In C++, the same assignment results in both objects having their
own Foo objects (so there are now two Foos in existence).

Now, you are trying to say that the C++ behaviour is a "shallow copy".
It is clearly different from the Java behaviour which is a shallow
copy.

I think this shows that it is mistaken to call the C++ member-wise
copy a "shallow copy".
 
A

Alf P. Steinbach

* Old Wolf:
Firstly, if that is your definition then why not just call them
what they are: "member-wise copy" and "non member-wise copy" ?

Second, that's not the commonly accepted definition of
'shallow copy' and 'deep copy' in computer science, so
it is confusing. Look at this example, where Foo is some class:

// C++:
class S { public: Foo foo; };

// Java:
class S { public Foo foo; };

In Java, assigning an object of type S to another object of type S
results in both objects having a handle on a single shared instance
of Foo. I think we would agree that this is a shallow copy.

It is, but not of the object. It is a shallow copy of references.
Thinking of references as the referred to objects is a common conceptual
error among Java programmers.

In C++, the same assignment results in both objects having their
own Foo objects (so there are now two Foos in existence).

Now, you are trying to say that the C++ behaviour is a "shallow copy".
It is clearly different from the Java behaviour which is a shallow
copy.

You're comparing apples to politicians.
 
?

=?iso-8859-1?q?Kirit_S=E6lensminde?=

Greg said:
I think the paper would be improved by using a more meaningful class
than "Example1" as the example. After all, if the example appears to be
contrived, then the student may question the relevancy of the material.
So the more "realistic" the example provided, the more likely that a
student will be able to apply it to C++ code they see and to the C++
code that they write. The specific class chosen does not matter all
that much: a "Shape" class, or a "Vehicle" class, or even a "String"
class are some of the usual choices.

Another important programming lesson is to choose the names of
identifiers carefully. Well-chosen names document the program and
minimize the likelihood of making a mistake. For that reason, this
program is not setting a good example by having identifiers with names
that differ only by one character at the very end of the name.
"Example1" and "Example2" (besides not communicating what either class
actually represents) may suggest that class names should resemble each
other as much as possible - when in fact the goal is the opposite.
Because the last thing that a programmer wants is to have two classes
whose names are easy to mix up.

Greg

I have to agree. Abstract examples just don't do it for me. I've
written on my web site about the importance of good examples.

Yes, they're really hard to come up with, but that's what your students
are paying you for.

A good example will make the students *want* to get it right and
further, make the issue stand out. With a duff example though they
normally can't care less.


K
 
F

Frederick Gotham

blangela:
A shallow copy is a member-wise copy. A deep copy is anything other
than a shallow copy. What could be simpler than that?


There's also bit-wise copy:

memcpy(&obj1,&obj2,sizeof obj1);

This might differ from member-wise copy if:

(1) There's padding.
(2) Any members have an overloaded assignment operator.
 
B

blangela

Frederick said:
blangela:



There's also bit-wise copy:

memcpy(&obj1,&obj2,sizeof obj1);

This might differ from member-wise copy if:

(1) There's padding.
(2) Any members have an overloaded assignment operator.

Good point!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top