Why not cloneable by default?

T

Thomas G. Marshall

John C. Bollinger coughed up:
CLJA removed as not relevant to this branch of the discussion.


Actually, IMO you are wrong here. This is your post, and you are of course
free to post it wherever you like, no problems there. But even /this/
branch of yours brings into view possible design notions for java, which is
inextricably entwined with comparison to other similar languages, since that
is where we all gain our non-java experience from. I can certainly see
people interested in java advocacy being a part of this discussion, as many
of them have views on C#'s design, and even C#'s cloning, for example.


....[rip]...
 
T

Thomas G. Marshall

Chris Smith coughed up:
Well, in hypothetical-land where this contract doesn't exist for
java.util.List, there would almost certainly be a standard API
Collections.compare(List,List) method to do the job for you.


Yes!

To pin it down precisely, my objection is not to the method name
"equals", which is vague enough to encompass a lot of different
meanings. I object to defining an incomplete concept of equality when
that same equality is going to be used by the core API as if it were
sufficient to recognize all of the important differences between two
objects.

It's precisely that fact -- that Object.equals is widely used to ask
about the *equivalence* of two objects -- that makes it wrong for
mutable objects to call themselves equal with that method. By being
mutable, those objects are *not* equivalent

Which IMO (strong opinion) is using a definition of equivalent that I do not
believe is consistent with OO methodologies or computer science in general.
But forget this for now, we've discussed this already, read below.

... and there are numerous
places where it's consequently assumed that they are.

So it seems that if equals() were removed from Object, then you would happy.
In other words, if what the user wanted was ==, then use ==, and if they
/wanted/ value comparison, then that object should contain an equals().

I don't have a *huge* objection to this. I /do/ wish that you didn't raise
mutability in this discussion however, since I believe that it obscures your
point dramatically.
 
J

John C. Bollinger

Patricia said:
I quite often have a class where I know a good,
class-appropriate copy would cost some thought and
implementation effort, but I don't know of any reason to
copy its objects. I don't like the idea of leaving
inappropriate copy methods lying around. Currently, I just
don't mark them Cloneable. How should those cases be handled?

This is a key question that we must settle to come up with a design.
Should the object duplication mechanism require classes to opt-in, as
Java cloning does, or not. If not, then should it provide an opt-out?
Chris has not yet persuaded me that opt-in is inappropriate.
Not directly related to Chris' article, but something that
should be considered whenever deep copy is discussed: What
happens when object A has two references, through different
structures, to object B. Should each be copied separately,
or should A have two references to a single copy of B?

That's an excellent point, and one for which I suspect there is no
universal answer. An additional aspect to it is the question of who
should decide the answer -- the object being copied, or the object
requesting the copy. I can imagine a mechanism where the copy facility
would support some kind of strategy object that influences the choices
in this area, but that may be getting too heavy.
 
L

Lee Fesperman

John said:
In our consideration of the appropriate specification for a better
cloning mechanism, then, are you suggesting that we drop the requirement
that a clone be distinct from the original object?

The contract for Object#clone() states that it is not an absolute requirement. This
would simply be a specific situation where the requirement is not (absolutely) applied.

Note that the standard immutable classes don't implement Cloneable.
 
L

Lee Fesperman

Chris said:
That is exactly what is NOT MEANT by the fundamental OO principle of
objects owning their identity. Identity is its own property of an
object, and is *not* a consequence of its state. Without understanding
this, the best that be accomplished is development in some kind of
pseudo-OO paradigm; a kind of weird cross between OO and the relational
model.

Pseudo-OO is a good characterization. It could be applied to Caché. See the rather
expansive thread "ODB (Cache?) vs ORM" on comp.lang.java.databases, for more details.

Looking at the earlier example of multiple objects describing a car at different
locations at different times. A relational design might have the VIN and
location/timestamp as the primary key for a table representing such objects.
Note: normalization would require that other (constant) properties associated with the
car be placed in another table; the first table would only have VIN. A relational system
is not tied to a single operation like equals(). It can easily search for all entries
with the same VIN. It also could search for a specific entry by using all the primary
key. Of course, you could also search by location. This is difficult to support with
Java Objects without facilities way beyond equals(), and equals() probably shouldn't be
contorted to meet those requirements.
Certainly, you could circumvent this by introducing your own internal
employee ID. This is commonly done, in fact, in relational databases...
where it is called surrogate keys, and is used to work around the fact
that TUPLES DON'T HAVE IDENTITY. Any such technique in OO modeling,
though, is highly suspect because the OO model provides a mechanism for
preserving the identity of objects without relying on naturally
identifying state.

In the c.l.j.d thread I mentioned above, I stated " ... many SQL DBMSs support their own
version of artificial keys (auto-increment, identity, sequence, ...), but those should
be used very rarely. In fact, I have yet to see a single case where they are justified
versus a key that reflects the real world."

To create an employee table for your company, I would go ahead and assign employee
numbers external to the db. Do you have other cases? You might want to respond over in
c.l.j.d.
 
L

Lee Fesperman

John said:
Does the summary document I refer to in my last post adequately address
your request?

Yes, it does. Thanks for making a page for this discussion. Good idea, usenet
discussions can be rather scattered. Where appropriate, I'll use private email for
comments like below to reduce clutter if that is ok.

On that page in the section, "Distinct Duplicates", take note of my later posting that
this is not an absolute requirement in the contract for Object#clone().
 
T

Thomas G. Marshall

Patricia Shanahan coughed up:

....[rip]...

[...] but I don't know of any reason to
copy its objects. I don't like the idea of leaving
inappropriate copy methods lying around. Currently, I just
don't mark them Cloneable. How should those cases be handled?

Currently, you're stuck implementing a public final clone() that punts. One
of the reasons I was advocating the interface UnCloneable in the very
beginning.

Not directly related to Chris' article, but something that
should be considered whenever deep copy is discussed: What
happens when object A has two references, through different
structures, to object B. Should each be copied separately,
or should A have two references to a single copy of B?\

Single copy. Not that big a deal there: it's the same issue with circular
references: A list of references is kept by the copy mechanism, so that they
can be re-used when appropriate.

But deep copy is a disaster for so many /other/ very similar issues: What
do you do if your class has a reference to an object B that must always be
the same object B as used by another object? In this case, you may have
created a copy that is fully broken at the start.

I suspect far more objects /cannot/ be deep cloned than can be.

....[rip]...
 
C

Chris Smith

Thomas said:
So it seems that if equals() were removed from Object, then you would happy.
In other words, if what the user wanted was ==, then use ==, and if they
/wanted/ value comparison, then that object should contain an equals().

I don't have a *huge* objection to this. I /do/ wish that you didn't raise
mutability in this discussion however, since I believe that it obscures your
point dramatically.

I don't think we're understanding each other yet. In fact, I believe
that equals(Object) has a critically important role to play -- for
pseudo-objects. That is, Java provides objects as the *only* way of
creating user-defined data types. There are, however, plenty of non-
primitive types (e.g., Date) that do not need object semantics.

I have often in the past argued that, in fact, the '==' operator in Java
when applied to reference types should merely call Object.equals unless
one or both operands are null. Although this is impractical in a modern
Java where the purpose of equals(Object) is vague and confused, it's
still not a bad theoretical idea. IMO, it is a mistake to ever compare
references, unless you KNOW that .equals(Object) will give back the same
answer and you are only doing the reference comparison for performance.

So I think I'm quite far from arguing that equals(Object) should not
exist; I think that == should not exist, except as a shorthand for the
former, with its current behavior only available when inherited from
java.lang.Object.

Mutability comes about because it conflicts with certain important
characteristics of objects that are "equal" to each other; namely, that
they can't become unequal later on. I could put it differently, though.
Basically, equals(Object) should only be overridden in classes that
desire value semantics instead of object semantics. Since such classes
must be immutable anyway, it naturally follows that mutable classes
should not override equals(Object)

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
T

Tor Iver Wilhelmsen

As far as I am aware, you can easily get this level of protection on
an arbitrary Java application by editing your security policy file.

Or write a custom SecurityManager.
 
J

John C. Bollinger

Thomas said:
Patricia Shanahan coughed up:


Single copy. Not that big a deal there: it's the same issue with circular
references: A list of references is kept by the copy mechanism, so that they
can be re-used when appropriate.

But deep copy is a disaster for so many /other/ very similar issues: What
do you do if your class has a reference to an object B that must always be
the same object B as used by another object? In this case, you may have
created a copy that is fully broken at the start.

I suspect far more objects /cannot/ be deep cloned than can be.

It should be possible to construct a deep copy of an arbitrary object A
such that each distinct object in A's graph is copied exactly once.
Each object would be mapped to its copy in, for instance, an
IdentityHashMap, which would be passed around to all the objects as they
are copied. A duplicate of any particular class would be made only if
there wasn't one already present in the map; otherwise a reference to
the previously-made duplicate would be used. There is a potential issue
with circular references, but I think that can be solved.
 
J

John C. Bollinger

Thomas said:
Aw c'mon.....I was at least *partly* the intigator in this thing. I was the
initial OP for crimeny sakes. Geez. {grumble} {grumble} ;)

Sorry about that. <vi index.html> Better now?
 
J

John C. Bollinger

Chris said:
John C. Bollinger wrote:




I think that this condition is too strong. If we weaken it to allow an
implementation of copy as:

public <whatever> copy() { return this; }

then the system will work more gracefully.

The 'copy' operation is not a precisely defined operation -- there is no
God-given (or Gosling-given) single idea of what a 'copy' is. So the method
called copy() can only ever represent a class-designers /best guess/ at what is
/likely/ to be most useful to users of that class. In particular, the amount
of state that is replicated (as opposed to shared) between the original and the
copy is not something that can be decided by some global policy. The degree of
'deepness' needed for a copy is determined (even ignoring context) by private
details of the object's implementation, and by the mapping between the objects
state and its semantics.

The operational details of the 'copy' operation are not precisely
defined, but we can and should define the purpose of copying and / or
the expected characteristics of the result (as they relate to our
efforts). I have added this to the web page: "The purpose of
duplication as chosen for this project is to produce from one object a
second object such that the two may be used independently without
concern for operations on one unexpectedly affecting the behavior of the
other, including when the two are used in any combination among multiple
threads without external synchronization." This statement is subject to
debate; it is based in part on one of Chris Smith's comments elsewhere
in this thread regarding whether or not a copy needs to be distinct from
the original. Are there other conceivable purposes for object copying
that are not covered by that statement (and that we should consider
supporting)?

The suggested purpose for copying is consistent with your observations
about the necessary depth of copying (or lack thereof). It also
implies, as do you in passing, that the necessary depth may be context
dependent.
For instance a Rectangle class that internally
maintains its definition/state as two fields of class Point should obviously
(for most normal purposes) implement copy as a fully deep operation. OTOH, a
ColoredRectangle that implemented its state as 4 floats and a Color would
probably implement copy as shallow.

Note that how much state is replicated depends (in part) on details that are
private, and hence the caller (in many circumstances) would be wrong to attempt
to dictate how much of the state is replicated vs. shared.

(Incidentally, that's why I object to, and have avoided using, the method name
clone() -- which I think strongly implies a particular strategy for copying,
and one that is not necessarily appropriate.)

I think that the limiting case of shallowness is a "copy" that replicates no
state, and shares all state. I.e.
return this;

That turns out to be a natural implementation of "copy" for genuine Singletons
(if there are any ;-). And -- much more importantly -- for any object which is
intended to act as the "sole representative" of some other entity. This
interpretation of copy() is that it should return an object that is as distinct
/as semantically possible/ from the original (up to some context and
implementation-dependent limit on deepness). The limiting case of "distinct as
semantically possible" is /no/ difference; if the only semantically valid
"equivalent" of some object is that object itself, then that's what the "copy"
should provide.

I think I am persuaded.
Returning to the ColoredRectangle. I said that it would probably implement
copy as a shallow copy (a pure clone()), but really that's a design error.
Rectangle /might/ know how Colors work, and whether the best way to get a
duplicate 'handle' on a colour is simple to copy the reference. It's not
entirely implausible that it /would/ know that, but I don't think it should
/have/ to know it. If Colors are expensive (not likely I admit) then it would
be an entirely plausible design that Color provided a number of pre-defined
instances that (purely for efficiency reasons) should remain unique --
Color.RED, Color.BLUE, -- but that less commonly used instances would be
created on-the-fly and discarded at need. But that's not possible if classes
like Rectangle are going to second-guess what it means to copy a colour. So
the implementation of ColoredRectangle.copy() should also copy() the Color.
The implementation of Color.copy() would just "return this;" (assuming that the
objects were immutable, and if the class designer had thought it worthwhile to
implement an optimised version of copy()).

I agree that it is bad for classes to be forced to know or assume
implementation details of other classes. On the other hand, there is a
major practicality issue with enabling (and relying on) copying
arbitrary objects: _every_ existing class would have to be examined to
ensure that it provided an appropriate implementation of copy(). Many
could be safely copied via an inherited mechanism that recursively
copied all members (taking into account multiple references to the same
objects) but some could not, and those would all suddenly be exposed to
breakage.

If we were designing this feature for a new language I would be
satisfied to make it usable on any object, by any object. For
retrofitting Java with this facility, however, I think we need to make
classes opt in. That leaves the question of what to do in the fairly
likely scenario of running into an object that cannot be copied inside
another object that you're trying to copy. I don't see a single correct
answer, so it may be that we need to provide multiple options.
Anyway, with all that out of the way. How about the following, as a
start-point ?

public class Object
{
....

// the JVM-level clone operation
private native Object __clone();

protected final Object clone()
throws CloneNotSupportedException
{
if (this instanceOf NotClonableMarkerInterface)
throw new CloneNotSupportedException();
return this.__clone();
}

With the hope that this may become more than just an academic exercise,
I think we need to keep compatibility in mind as a goal. To that end,
we cannot make Object.clone() final, and we should not change its
behavior with respect to the Cloneable interface. I think the best
alternative may simply be to deprecate Object.clone() and build a
parallel facility.
public Object copy()
throws CloneNotSupportedException
{
Object copy = this.clone();
copy.postClone();
return copy;
}

protected void postClone()
{
// default is no action
}

...
}

Some notes:

1) I'm not completely convinced that it's ever appropriate for an object to be
marked with NotClonableMarkerInterface. There may /be/ a good reason, though,
so I've left the test-and-throw in for now. Unless a good use for it can be
found, though, (perhaps something to do with security) then I think it would be
much better to get rid of it and CloneNotSupportedException.

I dislike marker interfaces, so I would be quite satisfied to both
deprecate Cloneable and avoid introducing a new marker interface. I
don't know whether we can do the former, but I am confident that can do
the latter.
2) I'm not completely convinced that the clone() method should be available to
subclasses at all. It might be better to make clone() private, and change the
copy implementation to something like:

public final Object copy()
throws CloneNotSupportedException
{
if (this instanceOf SingularInstanceMarkerInterface)
return this;
Object copy = this.clone();
copy.postClone();
return copy;
}

but that /reeks/ of over-engineered brittleness to me...

I agree, I'm not satisfied with that. And not only because it
introduces _another_ marker interface.
3) Subclasses have the choice of overriding copy(), which they would do if they
wanted to
return this;
or if -- for whatever reason -- they knew a better way to implement copying
than going via the system-level clone() operation.

Right. I think that's a good feature.
4) ... or they could override postClone() to do any massaging of the results of
clone() that were necessary to preserve sanity. Nearly all implementations
would take the form:

protected void postClone()
{
m_field1 = m_field1.copy();
m_field6= m_field6.copy();
//...
}

That seems a little sugary to me. If copy() can be overridden then why
do we need postClone()? What is gained by providing two different
avenues for fixing up the state of the copy?
5) In the above I've assumed that the copy() operation should be universally
available. That way seems best to me (especially in a language which makes the
equally problematical equals() method public). The main reasons for
restricting it seem to be that (a) the semantics are /not/ obvious so users
should be forced to think before using it, (b) some objects should not be
copyable. The point (a) is valid, but I don't think that restricting copy()
actually helps. And, as I've said, with the extended interpretation of copy()
that I'm urging, I think that most (perhaps all) of the examples of
non-copyable object evaporate.

I'm leaning the other way, as I described above, but I'm willing to
suspend judgment until we work out some of the other design issues.
6) But, that said, I don't really see much harm in including an /unchecked/
CopyNotSupportedException that subclasses could override copy() to throw.

If we think that copy() should not be supported by default, then I'd just
remove copy() (and postClone()) from Object, add an interface
interface Copyable
{
Object copy();
}
that subclasses could implement at will.

I think that in the end we will probably need to offer such an
exception. I am satisfied with it being unchecked, especially if we
make classes declare themselves copyable instead of being copyable by
default.
 
T

Thomas G. Marshall

John C. Bollinger coughed up:
It should be possible to construct a deep copy of an arbitrary object
A such that each distinct object in A's graph is copied exactly once.

Right---within it's own "graph".

Each object would be mapped to its copy in, for instance, an
IdentityHashMap, which would be passed around to all the objects as
they are copied. A duplicate of any particular class would be made
only if there wasn't one already present in the map; otherwise a
reference to the previously-made duplicate would be used. There is a
potential issue with circular references, but I think that can be
solved.


Yes, that is what I was talking about as not a big deal. And that is solved
the same way for circular references. Not the problem.

The problem comes in when you are deep copying something that contains deep
within it a reference to an object that /must/ be shared with a completely
different object deep within another object. There is no way to tell when
an object must be duplicated or shared.

Furthermore, deep cloning without regard to use would likely result in
objects that at first glance only "seem" to work.
 
T

Thomas G. Marshall

John C. Bollinger coughed up:
Sorry about that. <vi index.html> Better now?

Now I'm stuck. If I say "yes" then I appear as a shallow shmuck. If I say
"no", then I'm clearly a prick.

Ah well, I'll just assume the former is how I appear anyway, and say "yes".
:)
 
C

Chris Smith

John C. Bollinger said:
I dislike marker interfaces, so I would be quite satisfied to both
deprecate Cloneable and avoid introducing a new marker interface. I
don't know whether we can do the former, but I am confident that can do
the latter.

I also dislike marker interfaces. Given that this is a forward-looking
proposal, it is probably a good idea to introduce an annotation, if
classes need some way to indicate their cloning behavior aside from
overriding a method.

That may or may not resolve your issue with marker interfaces, but it
does resolve *some* issue with marker interfaces.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
T

Thomas G. Marshall

Chris Smith coughed up:
I also dislike marker interfaces.

That purist side of you is relentless. :) I personally like the ability to
mark certain classes for the sole of inclusion to a particular group. I
much prefer, for example:

InMyPocket thingsInMyPocket = { new Keys(), new Frog(), new
WalletFilledWithCrap() };

than

Object thingsInMyPocket = { (etc.) ...
 
T

Thomas G. Marshall

Thomas G. Marshall coughed up:

....[rip]...
I much prefer, for example:

InMyPocket thingsInMyPocket = { new Keys(), new Frog(), new
WalletFilledWithCrap() };

than

Object thingsInMyPocket = { (etc.) ...


Whoops. Forgot a [ ] on each line...
....[rip]...
 
C

Chris Smith

Bent C Dalager said:
I would agree that we could do with a decent way of overriding the
notion of equality for special cases, much as we do for ordering. As
it stands, we are largely stuck with putting objects into wrappers or
other surrogate objects that define their own equals. Which is ugly.

I can't recall having ever done such a thing. Then again, we are
discussing wildly different approaches to the purpose of Object.equals,
so I suppose it follows that the problems with one don't exist in the
other.
This has nothing to do with any particular language. It is a feature
of the problem domain. If you have two or more people with the same
name, you need to have _some_ strategy for telling them apart.

The strategy is that, if there's ever confusion between the two, you
look at something else related to that entity to figure out which one it
is.

Since this hypothetical software system doesn't exist, we're reaching
the limits of theoretical discussion; so let me propose a real-world
example. As a "spare time" project right now, I'm working on software
to run a tournament in an educational activity. In this activity, there
are a couple dozen "events", and students compete in several of them.
One of the goals of the software is to schedule the events so as few
students as possible want to compete in two events that are scheduled
for the same time period.

Each team, therefore, has a collection of students, and a student has a
first and last name. There's a separate data structure that keeps track
of which students wish to compete in which events. I don't care if two
students on the team have the same name. I only care that the right
events are associated with the right objects.

When I associate the students with an event, they end up acting as keys
in a HashMap, and it would be a disaster if I chose to override the
"equals" method to compare their names (which is the only field of the
Student class). I wisely decided that these objects should inherit the
concept of equality from Object. This means that I'm able to make use
of an object's identity; something which is a fundamental concept of OO
programming and is quite useful for solving this kind of problem.
If your boss insists on having a computer system that cannot tell
people apart, I am sure that could be accommodated. It might not be
very useful, but it would be possible.

You insist on assuming that inherent identity of objects does not exist.
It does. The software system would be quite capable of telling the
difference between two objects, even if all of their fields are the
same. However, for it to do this in common Java data structures, you
have to not override equals.
What is the fundamental difference between a property of an object and
the object's state? I would have thought that one or more of the
former is what makes up the latter?

I didn't mean property as "field" or in the JavaBeans sense. I meant it
in the general sense. Perhaps "feature" would be better, since it
doesn't seem to be widely used for a programming language concept. So
read the above as "identity is its own feature of an object...".
If I have defined one natural identity and I find I need other
comparisons as well, then I would certainly have to implement them. I
don't see why this is a problem.

It's a problem because comparing the cars involved in the two traffic
camera readings is not equality. In order for a method to be
overridden, you need to provide an implementation that's meaningful in
the context of the class that defines the method. Your implementation
of equals is only useful in a situation where you know that both objects
are traffic camera readings, and you know that you only want to compare
a certain subset of their state. In that case, you may as well write
your own method, and it won't cause such problems if you later decide
that you need to store readings in a different data structure.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
C

Chris Smith

Lee Fesperman said:
A relational system is not tied to a single operation like equals().
It can easily search for all entries with the same VIN. It also could
search for a specific entry by using all the primary key. Of course,
you could also search by location.

Of course, with appropriate data structures, you could also do all of
these things in an OO application with its own data structures as well.
The equals(Object) method is a mechanism, in Java, to describe whether
something -- that's nominally an object at the language level -- ought
to have object semantics or value semantics. Relational data, on the
other hand, always uses value semantics... so of course the concept is
foreign there.
This is difficult to support with
Java Objects without facilities way beyond equals(), and equals()
probably shouldn't be contorted to meet those requirements.

I agree that equals should not provide search capability, nor should it
be used for general-purpose comparison of arbitrary subsets of state.
To create an employee table for your company, I would go ahead and
assign employee numbers external to the db. Do you have other cases?
You might want to respond over in c.l.j.d.

Since I don't follow .database regularly, I'll keep the conversation
here instead. In any case, MindIQ has no desire to change anything to
use identifying numbers of any kind, so any attempt to impose such
things onto our organization in the interest of one piece of software is
simply avoiding the fundamental question. Our company *does* operate,
and we *don't* assign identifying numbers, as much as people seem to be
claiming that such a thing is not possible.

In order to store MindIQ employees in a database, you would need to
create some kind of artificial key to ensure uniqueness in a relational
database. You don't necessarily need to use the auto-increment column
feature of the database; but if you chose instead to tell us, "assign
your own damned employee numbers", then we would just find a different
provider for our software.

The point, though, is that this is necessary for the relational model
only. Inventing surrogate keys for object-oriented data models is
fundamentally broken, because it's a very difficult way to reproduce
something that's *already there*.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top