iterating the difference of two collections

  • Thread starter Andreas Leitgeb
  • Start date
A

Andreas Leitgeb

Given two Collection parameters a,b that I'm *not* supposed to modify,
what's the best way to iterate over the elements in a but not in b?

Create a new collection d with elements of a, then d.removeAll(b),
then iterate over d.
or
for(ElemType e:a) { if (b.contains(e)) continue; ... }

Actually it happens that Collection b is actually a Set,
so my preference would be for the second, except that it
feels hackish. Is this feeling right or wrong?
 
F

Flo 'Irian' Schaetz

And thus spoke Andreas Leitgeb...
Given two Collection parameters a,b that I'm *not* supposed to modify,
what's the best way to iterate over the elements in a but not in b?

I would say, that depends...

.... how many elements do the collections have?
.... how complex are the actions of get and contains?
Create a new collection d with elements of a, then d.removeAll(b),
then iterate over d.
or
for(ElemType e:a) { if (b.contains(e)) continue; ... }

Actually it happens that Collection b is actually a Set,
so my preference would be for the second, except that it
feels hackish. Is this feeling right or wrong?

Depends on the type of the set, imho: It it's a hashset, b.contains(e)
would be aprox. constant, so I would prefer it, too.

Flo
 
D

Daniel Pitts

Andreas said:
Given two Collection parameters a,b that I'm *not* supposed to modify,
what's the best way to iterate over the elements in a but not in b?

Create a new collection d with elements of a, then d.removeAll(b),
then iterate over d.
or
for(ElemType e:a) { if (b.contains(e)) continue; ... }

Actually it happens that Collection b is actually a Set,
so my preference would be for the second, except that it
feels hackish. Is this feeling right or wrong?

Either way works. If you have the answer, why ask the question?

If you wanted to get really tricky, you would create a custom
collections called "Difference" which defines a view into the
collection (a - b);
 
P

Patricia Shanahan

Flo said:
And thus spoke Andreas Leitgeb...


I would say, that depends...

... how many elements do the collections have?
... how complex are the actions of get and contains?


Depends on the type of the set, imho: It it's a hashset, b.contains(e)
would be aprox. constant, so I would prefer it, too.

Flo

You can also get total O(|a|+|b|) time, regardless of the original
collection implementations, the first way, using a HashSet as the
working collection.

Patricia
 
F

Flo 'Irian' Schaetz

And thus spoke Patricia Shanahan...
You can also get total O(|a|+|b|) time, regardless of the original
collection implementations, the first way, using a HashSet as the
working collection.

Good idea, didn't think of that. If a or b is a hashset, the 2nd way
would perhaps be better (O(a) or O(b), if I'm not mistaken), but
otherwise this sounds like a good idea.

Flo
 
A

Andreas Leitgeb

Flo 'Irian' Schaetz said:
And thus spoke Patricia Shanahan...

Good idea, didn't think of that. If a or b is a hashset, the 2nd way
would perhaps be better (O(a) or O(b), if I'm not mistaken), but
otherwise this sounds like a good idea.

The sizes are small enough that probably the overhead of creating
another HashSet outweighs the actual iteration.

Nevertheless I had to use a newly created temporary collection
anyway, because I found that the originally used collection "a"
could be replaced by some already maintained subset "a1" of "a",
which contained only the relevant elements, but is unfortunately
modified inside the loop (some of the iterated elements are being
removed).
Using an iterator (and iterator's remove) is a bit complicated,
because the remove actually happens inside another method...

Thanks however for all responses!
 
M

Mike Schilling

Daniel Pitts said:
Either way works. If you have the answer, why ask the question?

If you wanted to get really tricky, you would create a custom
collections called "Difference" which defines a view into the
collection (a - b);

Or create a Subset class, constructed from a Set and an expression
determining which members of another Set are in the subset: (none of this
compiled or tested)

public class Subset extends AbstractSet implements Set {
public Subset(Set superset, Criterion criterion){ ... }
...
public interface Criterion {
public boolean isInSubset(Object member);
}
}

and implement a Difference criterion

public class Difference implements Subset.Criterion {
private Set toExclude;
public Difference(Set toExclude) {
this.toExclude = toExclude;
}
public boolean isInSubset(Object member) {
return !toExclude.contains(member);
}
}
 
A

Andreas Leitgeb

My question was for the choice among those. While I had a
tendency, I didn't consider that an answer, myself.
Or create a Subset class, constructed from a Set and an expression
determining which members of another Set are in the subset: (none of this
compiled or tested)

I'd still need to implement an iterator to make it all work.
Depending on how I did that, it would be equivalent to either
of my own suggestions, but either way it would be much more "OO"
than mine :)
public class Subset extends AbstractSet implements Set {

Btw., is there a reason for "implements Set", when AbstractSet
already does it?
public class Difference implements Subset.Criterion {
[...]
}

A truely elegant wrapping of my "if (c.contains(e)) continue;" :)
 
J

John Ersatznom

Mike said:
Documentation.

The class name contains "set". It extends a class whose name contains
"Set". "All implemented interfaces" will include "Set". What more
documentation do you want? ;)

The really interesting thing here is whether to make Subset modifiable.
One supposes removal should remove the element from the backing Set, and
addition should add to the backing Set and throw something if the
criterion is violated (the new element would not actually now appear in
the Subset). Anyone who wants an unmodifiable one has
Collections.unmodifiableSet available to them, and anyone who wants a
Subset to diverge from its parent Set and just initially contain the
appropriate Subset, then be modifiable at will (and not reflect changes
in the backing Set), can use Set foo = new HashSet();
foo.addAll(mySubset); to copy the Subset into a normal-behaving HashSet.
 
P

Patricia Shanahan

John said:
The class name contains "set". It extends a class whose name contains
"Set". "All implemented interfaces" will include "Set". What more
documentation do you want? ;)

I agree with you about the "All implemented interfaces", but "Set" in a
name only means the class is something to do with the general concept of
a set, not that it implements java.util.Set. See java.util.BitSet.

Patricia
 
M

Mike Schilling

John Ersatznom said:
The class name contains "set". It extends a class whose name contains
"Set". "All implemented interfaces" will include "Set". What more
documentation do you want? ;)

An explicit statment that it implements Set. If you can show any
disadvantage to that, I'll reconsider :)
 
D

Daniel Dyer

An explicit statment that it implements Set. If you can show any
disadvantage to that, I'll reconsider :)

I don't have a real problem with it, but I'll argue against it anyway...

It's redundant. You are not restating any of the other things that are
being inherited from the base class, so why make an exception for this
interface? To be consistent you ought to also list Iterable and possibly
even Collection in the implements list.

Dan.
 
M

Mike Schilling

Daniel Dyer said:
I don't have a real problem with it, but I'll argue against it anyway...

It's redundant. You are not restating any of the other things that are
being inherited from the base class, so why make an exception for this
interface? To be consistent you ought to also list Iterable and possibly
even Collection in the implements list.

Set implies those two, of course, and the fact that Subset implements Set is
part of its interface. The fact that I'm implementing Subset by subclassing
AbstractSet is an implementation detail. Should I decide, during the
implementation, that I'd rather implement it a different way, I'd need to
change

Subset extends AbstractSet

to

Subset implements Set

It's simpler and safer to say

Subset extends AbstractSet implements Set

in the first place
 
C

Chris Uppal

Mike said:
An explicit statment that it implements Set. If you can show any
disadvantage to that, I'll reconsider :)

The only disadvantage I can think of is that unexpected redundancy can be a
little confusing ("Eh ? Why he telling us that ? Isn't it obvious ? What am
I missing ?").

In this specific case, emphasising that a class which is derived from
java.util.AbstractSet (which is well-known to be just /for/ creating
implementations of java.util.Set) does in fact implement java.util.Set, would
make me look twice to see if there was something odd going on (e.g. did the
name "Set" or "AbstractSet" refer to something other than java.util.<whatever>
in this context ?).

Not a big deal...

-- chris
 
J

John Ersatznom

Patricia said:
I agree with you about the "All implemented interfaces", but "Set" in a
name only means the class is something to do with the general concept of
a set, not that it implements java.util.Set. See java.util.BitSet.

Even when we have "Fooset extends AbstractSet"? :)
 
J

John Ersatznom

Mike said:
An explicit statment that it implements Set. If you can show any
disadvantage to that, I'll reconsider :)

Code bloat. Documentation bloat. Slippery slope (to be consistent, you
should put "implements Serializable" on everything that does, for
instance, and so forth for every other interface)...
 
J

John Ersatznom

Mike said:
Set implies those two, of course, and the fact that Subset implements Set is
part of its interface. The fact that I'm implementing Subset by subclassing
AbstractSet is an implementation detail.

This leads to an interesting question. Given that implementation details
should be hidden, perhaps the whole notion of inheritance and
subclassing needs an overhaul?

Basically, the fact that a class extends another class would become
"protected". It would no longer be legal for client code to do this:

A foo = new B();
A.someMethod();

given

public class A {
public void someMethod() { ... }
}

public class B extends A { ... }

If client code needed to polymorphically call someMethod you'd have to have

public interface A {
public void someMethod();
}

public abstract class AbstractA {
public void someMethod() { ... }
}

public class B extends AbstractA ...

A foo = new B();
A.someMethod(); // Legal again

Of course, the docs for B would note "implements A" rather than "extends
AbstractA". In fact, it would just note what's currently in "all
implemented interfaces". Unless you generated docs for "protected"
stuff, in which case its extending AbstractA would be noted. Actually,
I'd further suggest generating separate docs for the public interface of
a class and the protected one, with the former linking to the latter
where it exists, the latter not existing for final classes, and the
latter being focused on extenders and the former on users. So there'd be
say three new javadoc tags, @users, @implementers and @implementation;
the first would begin and the second would end the user-oriented
information, the latter also starting the implementer-oriented
information; the third would give implementation notes with a nice
IMPLEMENTATION NOTES: heading and be illegal on interfaces and their
members. The public interface doc for a class or interface would give
the public members and user-oriented information, while the protected
interface doc would give the public and protected members and
implementer-oriented information. Both would give the implementation
notes and the stuff prior to @user. Example:

/**
* This is the class summary and some info of interest to users and
* extenders.
* @users
* More information, geared towards users
* @implementers
* More information, geared towards extenders
*/
public class B extends AbstractA {
/**
* This is the method summary and some info of interest to
* users and extenders.
* @users
* Watch for the IOExceptions and possibly-null return value!
* @implementers
* If you override this, remember to fuzzle the foobar for
* the benefit of pre-version-2.0 clients, and to check baz
* for being out of range or null and throw an exception early.
* We don't want null being stored in the table and an exception
* only thrown when it's pulled back out; eager throwing ensures
* the stack trace points closer to the real error.
* @implementation
* Uses O(n log n) quicksort for performance
* @param baz whatever
* @return a Foo or null
* @throws IOException if something goes wrong
* @throws IllegalArgumentException if baz is out of range
* @throws NullPointerException if baz is null
*/
public Foo someMethod (Bar baz) throws IOException { ... }
/**
* Blah blah.
* @users
* Blah blah.
* // @implementers illegal here; final method!
*/
public final void someMethod ();
}

Output docs would look like

public class B implements A

This is the class summary and some info of interest to users and extenders.
More information, geared towards users.

<link to extender docs>

Method summary ...

public Foo someMethod (Bar baz) throws IOException

This is the method summary and some info of interest to users and extenders.
Watch for the IOExceptions and possibly-null return value!
IMPLEMENTATION NOTE:
Uses O(n log n) quicksort for performance
Parameters:
baz -- whatever
Returns: a Foo or null
Throws:
IOException if something goes wrong
IllegalArgumentException if baz is out of range
NullPointerException if baz is null

public void someMethod ()

Blah blah.
Blah blah.


Extender docs:

public class B extends AbstractA

This is the class summary and some info of interest to users and extenders.
More information, geared towards extenders.

public Foo someMethod (Bar baz) throws IOException

This is the method summary and some info of interest to users and extenders.
If you override this, remember to fuzzle the foobar for the benefit of
pre-version-2.0 clients, and to check baz for being out of range or null
and throw an exception early. We don't want null being stored in the
table and an exception only thrown when it's pulled back out; eager
throwing ensures the stack trace points closer to the real error.
IMPLEMENTATION NOTE:
Uses O(n log n) quicksort for performance
Parameters:
baz -- whatever
Returns: a Foo or null
Throws:
IOException if something goes wrong
IllegalArgumentException if baz is out of range
NullPointerException if baz is null

public final void someMethod ()

Blah blah.
Blah blah.

Note that the "final" on someMethod only appears in the extender doc for
the class B.

Actually there is one situation where it's sometimes desirable to have
an abstract class but no interface -- to force all implementations to
conform in some way, for instance by having a bunch of final public
methods and some protected non-final methods that some of the public
ones call. To support this I'd suggest allowing

public class X

public class Y extends X implements X

which shows as "implements X" on the user-docs, "extends X implements X"
on the extender docs, and makes the public interface of X part of that
of Y without exposing to users that X actually has some skeleton
implementation or using an XInterface that someone might implement
without extending X.

Note that the docs' "methods inherited from superclass" section would go
away in the user-oriented docs for a class, just showing all the public
methods of the class on an equal footing (even for the "Y extends X
implements X" case for methods of X not overridden by Y), but would show
them the traditional way in the extender docs and highlight the
finalness and protected/publicness of methods there.

The above would be better OO, I think, hiding more implementation
details from users and separating docs-for-clients and
docs-for-extenders. (Docs-just-for-maintenance-programmers being normal,
non-Javadoc comments in the source code.) The encapsulation is improved
and so is the documentation. Too bad it can't happen as Sun can't break
all that legacy code. But Sun could do the next best thing, by doing the
suggested javadoc changes and allowing "protected extends AbstractA". In
other words, the above but with "extends X implements X" -> "extends X"
and "extends X // but doesn't implement X" -> "protected extends X".
That won't break legacy code and overloads "protected" rather than
adding a new keyword.

(Another suggestion: allow declarations like

private Object obj !null;
public void foo (String str !null, int quuxCount) {...}

This overloads the existing keyword null and operator token ! and the
meaning is fairly clear: NPE should be thrown on attempting to assign a
null to the references, including via calls like foo(null, 3). It looks
fairly self-explanatory and saves programmers having to write explicit
tests for null when storing a value for use only later or else face NPEs
that don't point to where the problem happened.)
 
M

Mike Schilling

..
(Another suggestion: allow declarations like

private Object obj !null;
public void foo (String str !null, int quuxCount) {...}

This overloads the existing keyword null and operator token ! and the
meaning is fairly clear: NPE should be thrown on attempting to assign a
null to the references, including via calls like foo(null, 3). It looks
fairly self-explanatory and saves programmers having to write explicit
tests for null when storing a value for use only later or else face NPEs
that don't point to where the problem happened.)

I've had the same idea, and discussed it here not long ago. It get more
complex than you might think at first.

http://groups.google.com/group/comp..._frm/thread/7910c61c7dd2da68/181e5bf1cda3aecd
 
L

Lew

John said:
Code bloat. Documentation bloat. Slippery slope (to be consistent, you
should put "implements Serializable" on everything that does, for
instance, and so forth for every other interface)...

I agree with John and others on this. Redundancy is itself a disadvantage
unless it contributes safety or some other aspect. As Chris said,
java.util.AbstractSet ... is well-known to be just /for/ creating
implementations of java.util.Set

You'd have to change the class declaration if you change the class declaration
anyway.

- Lew
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top