# iterating the difference of two collections

Discussion in 'Java' started by Andreas Leitgeb, Jan 6, 2007.

1. ### Andreas LeitgebGuest

Given two Collection parameters a,b that I'm *not* supposed to modify,
what's the best way to iterate over the elements in a but not in b?

Create a new collection d with elements of a, then d.removeAll(b),
then iterate over d.
or
for(ElemType e:a) { if (b.contains(e)) continue; ... }

Actually it happens that Collection b is actually a Set,
so my preference would be for the second, except that it
feels hackish. Is this feeling right or wrong?

Andreas Leitgeb, Jan 6, 2007

2. ### Flo 'Irian' SchaetzGuest

And thus spoke Andreas Leitgeb...

> Given two Collection parameters a,b that I'm *not* supposed to modify,
> what's the best way to iterate over the elements in a but not in b?

I would say, that depends...

.... how many elements do the collections have?
.... how complex are the actions of get and contains?

> Create a new collection d with elements of a, then d.removeAll(b),
> then iterate over d.
> or
> for(ElemType e:a) { if (b.contains(e)) continue; ... }
>
> Actually it happens that Collection b is actually a Set,
> so my preference would be for the second, except that it
> feels hackish. Is this feeling right or wrong?

Depends on the type of the set, imho: It it's a hashset, b.contains(e)
would be aprox. constant, so I would prefer it, too.

Flo

Flo 'Irian' Schaetz, Jan 6, 2007

3. ### Daniel PittsGuest

Andreas Leitgeb wrote:
> Given two Collection parameters a,b that I'm *not* supposed to modify,
> what's the best way to iterate over the elements in a but not in b?
>
> Create a new collection d with elements of a, then d.removeAll(b),
> then iterate over d.
> or
> for(ElemType e:a) { if (b.contains(e)) continue; ... }
>
> Actually it happens that Collection b is actually a Set,
> so my preference would be for the second, except that it
> feels hackish. Is this feeling right or wrong?

Either way works. If you have the answer, why ask the question?

If you wanted to get really tricky, you would create a custom
collections called "Difference" which defines a view into the
collection (a - b);

Daniel Pitts, Jan 6, 2007
4. ### Patricia ShanahanGuest

Flo 'Irian' Schaetz wrote:
> And thus spoke Andreas Leitgeb...
>
>> Given two Collection parameters a,b that I'm *not* supposed to modify,
>> what's the best way to iterate over the elements in a but not in b?

>
> I would say, that depends...
>
> ... how many elements do the collections have?
> ... how complex are the actions of get and contains?
>
>> Create a new collection d with elements of a, then d.removeAll(b),
>> then iterate over d.
>> or
>> for(ElemType e:a) { if (b.contains(e)) continue; ... }
>>
>> Actually it happens that Collection b is actually a Set,
>> so my preference would be for the second, except that it
>> feels hackish. Is this feeling right or wrong?

>
> Depends on the type of the set, imho: It it's a hashset, b.contains(e)
> would be aprox. constant, so I would prefer it, too.
>
> Flo

You can also get total O(|a|+|b|) time, regardless of the original
collection implementations, the first way, using a HashSet as the
working collection.

Patricia

Patricia Shanahan, Jan 6, 2007
5. ### Flo 'Irian' SchaetzGuest

And thus spoke Patricia Shanahan...

> You can also get total O(|a|+|b|) time, regardless of the original
> collection implementations, the first way, using a HashSet as the
> working collection.

Good idea, didn't think of that. If a or b is a hashset, the 2nd way
would perhaps be better (O(a) or O(b), if I'm not mistaken), but
otherwise this sounds like a good idea.

Flo

Flo 'Irian' Schaetz, Jan 6, 2007
6. ### Andreas LeitgebGuest

Flo 'Irian' Schaetz <> wrote:
> And thus spoke Patricia Shanahan...
>> You can also get total O(|a|+|b|) time, regardless of the original
>> collection implementations, the first way, using a HashSet as the
>> working collection.

>
> Good idea, didn't think of that. If a or b is a hashset, the 2nd way
> would perhaps be better (O(a) or O(b), if I'm not mistaken), but
> otherwise this sounds like a good idea.

The sizes are small enough that probably the overhead of creating
another HashSet outweighs the actual iteration.

Nevertheless I had to use a newly created temporary collection
anyway, because I found that the originally used collection "a"
could be replaced by some already maintained subset "a1" of "a",
which contained only the relevant elements, but is unfortunately
modified inside the loop (some of the iterated elements are being
removed).
Using an iterator (and iterator's remove) is a bit complicated,
because the remove actually happens inside another method...

Thanks however for all responses!

Andreas Leitgeb, Jan 7, 2007
7. ### Mike SchillingGuest

"Daniel Pitts" <> wrote in message
news:...
>
> Andreas Leitgeb wrote:
>> Given two Collection parameters a,b that I'm *not* supposed to modify,
>> what's the best way to iterate over the elements in a but not in b?
>>
>> Create a new collection d with elements of a, then d.removeAll(b),
>> then iterate over d.
>> or
>> for(ElemType e:a) { if (b.contains(e)) continue; ... }
>>
>> Actually it happens that Collection b is actually a Set,
>> so my preference would be for the second, except that it
>> feels hackish. Is this feeling right or wrong?

>
> Either way works. If you have the answer, why ask the question?
>
> If you wanted to get really tricky, you would create a custom
> collections called "Difference" which defines a view into the
> collection (a - b);

Or create a Subset class, constructed from a Set and an expression
determining which members of another Set are in the subset: (none of this
compiled or tested)

public class Subset extends AbstractSet implements Set {
public Subset(Set superset, Criterion criterion){ ... }
...
public interface Criterion {
public boolean isInSubset(Object member);
}
}

and implement a Difference criterion

public class Difference implements Subset.Criterion {
private Set toExclude;
public Difference(Set toExclude) {
this.toExclude = toExclude;
}
public boolean isInSubset(Object member) {
return !toExclude.contains(member);
}
}

Mike Schilling, Jan 7, 2007
8. ### Andreas LeitgebGuest

Mike Schilling <> wrote:
> "Daniel Pitts" <> wrote:
>> Andreas Leitgeb wrote:
>>> Given two Collection parameters a,b that I'm *not* supposed to modify,
>>> what's the best way to iterate over the elements in a but not in b?

>> Either way works. If you have the answer, why ask the question?

My question was for the choice among those. While I had a
tendency, I didn't consider that an answer, myself.

> Or create a Subset class, constructed from a Set and an expression
> determining which members of another Set are in the subset: (none of this
> compiled or tested)

I'd still need to implement an iterator to make it all work.
Depending on how I did that, it would be equivalent to either
of my own suggestions, but either way it would be much more "OO"
than mine

> public class Subset extends AbstractSet implements Set {

Btw., is there a reason for "implements Set", when AbstractSet

> public class Difference implements Subset.Criterion {
> [...]
> }

A truely elegant wrapping of my "if (c.contains(e)) continue;"

Andreas Leitgeb, Jan 7, 2007
9. ### Mike SchillingGuest

"Andreas Leitgeb" <> wrote in message
news:...
> Mike Schilling <> wrote:

>
>> public class Subset extends AbstractSet implements Set {

>
> Btw., is there a reason for "implements Set", when AbstractSet

Documentation.

Mike Schilling, Jan 7, 2007
10. ### John ErsatznomGuest

Mike Schilling wrote:
> "Andreas Leitgeb" <> wrote in message
> news:...
>
>>Mike Schilling <> wrote:

>
>
>>> public class Subset extends AbstractSet implements Set {

>>
>>Btw., is there a reason for "implements Set", when AbstractSet

>
> Documentation.

The class name contains "set". It extends a class whose name contains
"Set". "All implemented interfaces" will include "Set". What more
documentation do you want?

The really interesting thing here is whether to make Subset modifiable.
One supposes removal should remove the element from the backing Set, and
addition should add to the backing Set and throw something if the
criterion is violated (the new element would not actually now appear in
the Subset). Anyone who wants an unmodifiable one has
Collections.unmodifiableSet available to them, and anyone who wants a
Subset to diverge from its parent Set and just initially contain the
appropriate Subset, then be modifiable at will (and not reflect changes
in the backing Set), can use Set foo = new HashSet();
foo.addAll(mySubset); to copy the Subset into a normal-behaving HashSet.

John Ersatznom, Jan 8, 2007
11. ### Patricia ShanahanGuest

John Ersatznom wrote:
> Mike Schilling wrote:
>> "Andreas Leitgeb" <> wrote in message
>> news:...
>>
>>> Mike Schilling <> wrote:

>>
>>
>>>> public class Subset extends AbstractSet implements Set {
>>>
>>> Btw., is there a reason for "implements Set", when AbstractSet

>>
>> Documentation.

>
> The class name contains "set". It extends a class whose name contains
> "Set". "All implemented interfaces" will include "Set". What more
> documentation do you want?

I agree with you about the "All implemented interfaces", but "Set" in a
name only means the class is something to do with the general concept of
a set, not that it implements java.util.Set. See java.util.BitSet.

Patricia

Patricia Shanahan, Jan 8, 2007
12. ### Mike SchillingGuest

"John Ersatznom" <> wrote in message
news:ent3hr\$slf\$...
> Mike Schilling wrote:
>> "Andreas Leitgeb" <> wrote in message
>> news:...
>>
>>>Mike Schilling <> wrote:

>>
>>
>>>> public class Subset extends AbstractSet implements Set {
>>>
>>>Btw., is there a reason for "implements Set", when AbstractSet

>>
>> Documentation.

>
> The class name contains "set". It extends a class whose name contains
> "Set". "All implemented interfaces" will include "Set". What more
> documentation do you want?

An explicit statment that it implements Set. If you can show any

Mike Schilling, Jan 8, 2007
13. ### Daniel DyerGuest

On Mon, 08 Jan 2007 16:49:53 -0000, Mike Schilling
<> wrote:

>
> "John Ersatznom" <> wrote in message
> news:ent3hr\$slf\$...
>> Mike Schilling wrote:
>>> "Andreas Leitgeb" <> wrote in message
>>> news:...
>>>
>>>> Mike Schilling <> wrote:
>>>
>>>
>>>>> public class Subset extends AbstractSet implements Set {
>>>>
>>>> Btw., is there a reason for "implements Set", when AbstractSet
>>>
>>> Documentation.

>>
>> The class name contains "set". It extends a class whose name contains
>> "Set". "All implemented interfaces" will include "Set". What more
>> documentation do you want?

>
> An explicit statment that it implements Set. If you can show any
> disadvantage to that, I'll reconsider
>

I don't have a real problem with it, but I'll argue against it anyway...

It's redundant. You are not restating any of the other things that are
being inherited from the base class, so why make an exception for this
interface? To be consistent you ought to also list Iterable and possibly
even Collection in the implements list.

Dan.

--
Daniel Dyer
http://www.uncommons.org

Daniel Dyer, Jan 8, 2007
14. ### Mike SchillingGuest

"Daniel Dyer" <"You don't need it"> wrote in message
news...
> On Mon, 08 Jan 2007 16:49:53 -0000, Mike Schilling
> <> wrote:
>
>>
>> "John Ersatznom" <> wrote in message
>> news:ent3hr\$slf\$...
>>> Mike Schilling wrote:
>>>> "Andreas Leitgeb" <> wrote in message
>>>> news:...
>>>>
>>>>> Mike Schilling <> wrote:
>>>>
>>>>
>>>>>> public class Subset extends AbstractSet implements Set {
>>>>>
>>>>> Btw., is there a reason for "implements Set", when AbstractSet
>>>>
>>>> Documentation.
>>>
>>> The class name contains "set". It extends a class whose name contains
>>> "Set". "All implemented interfaces" will include "Set". What more
>>> documentation do you want?

>>
>> An explicit statment that it implements Set. If you can show any
>> disadvantage to that, I'll reconsider
>>

>
> I don't have a real problem with it, but I'll argue against it anyway...
>
> It's redundant. You are not restating any of the other things that are
> being inherited from the base class, so why make an exception for this
> interface? To be consistent you ought to also list Iterable and possibly
> even Collection in the implements list.

Set implies those two, of course, and the fact that Subset implements Set is
part of its interface. The fact that I'm implementing Subset by subclassing
AbstractSet is an implementation detail. Should I decide, during the
implementation, that I'd rather implement it a different way, I'd need to
change

Subset extends AbstractSet

to

Subset implements Set

It's simpler and safer to say

Subset extends AbstractSet implements Set

in the first place

Mike Schilling, Jan 8, 2007
15. ### Chris UppalGuest

Mike Schilling wrote:

> An explicit statment that it implements Set. If you can show any
> disadvantage to that, I'll reconsider

The only disadvantage I can think of is that unexpected redundancy can be a
little confusing ("Eh ? Why he telling us that ? Isn't it obvious ? What am
I missing ?").

In this specific case, emphasising that a class which is derived from
java.util.AbstractSet (which is well-known to be just /for/ creating
implementations of java.util.Set) does in fact implement java.util.Set, would
make me look twice to see if there was something odd going on (e.g. did the
name "Set" or "AbstractSet" refer to something other than java.util.<whatever>
in this context ?).

Not a big deal...

-- chris

Chris Uppal, Jan 8, 2007
16. ### John ErsatznomGuest

Patricia Shanahan wrote:
> I agree with you about the "All implemented interfaces", but "Set" in a
> name only means the class is something to do with the general concept of
> a set, not that it implements java.util.Set. See java.util.BitSet.

Even when we have "Fooset extends AbstractSet"?

John Ersatznom, Jan 8, 2007
17. ### John ErsatznomGuest

Mike Schilling wrote:
> An explicit statment that it implements Set. If you can show any
> disadvantage to that, I'll reconsider

Code bloat. Documentation bloat. Slippery slope (to be consistent, you
should put "implements Serializable" on everything that does, for
instance, and so forth for every other interface)...

John Ersatznom, Jan 8, 2007
18. ### John ErsatznomGuest

Mike Schilling wrote:
> Set implies those two, of course, and the fact that Subset implements Set is
> part of its interface. The fact that I'm implementing Subset by subclassing
> AbstractSet is an implementation detail.

This leads to an interesting question. Given that implementation details
should be hidden, perhaps the whole notion of inheritance and
subclassing needs an overhaul?

Basically, the fact that a class extends another class would become
"protected". It would no longer be legal for client code to do this:

A foo = new B();
A.someMethod();

given

public class A {
public void someMethod() { ... }
}

public class B extends A { ... }

If client code needed to polymorphically call someMethod you'd have to have

public interface A {
public void someMethod();
}

public abstract class AbstractA {
public void someMethod() { ... }
}

public class B extends AbstractA ...

A foo = new B();
A.someMethod(); // Legal again

Of course, the docs for B would note "implements A" rather than "extends
AbstractA". In fact, it would just note what's currently in "all
implemented interfaces". Unless you generated docs for "protected"
stuff, in which case its extending AbstractA would be noted. Actually,
I'd further suggest generating separate docs for the public interface of
a class and the protected one, with the former linking to the latter
where it exists, the latter not existing for final classes, and the
latter being focused on extenders and the former on users. So there'd be
say three new javadoc tags, @users, @implementers and @implementation;
the first would begin and the second would end the user-oriented
information, the latter also starting the implementer-oriented
information; the third would give implementation notes with a nice
IMPLEMENTATION NOTES: heading and be illegal on interfaces and their
members. The public interface doc for a class or interface would give
the public members and user-oriented information, while the protected
interface doc would give the public and protected members and
implementer-oriented information. Both would give the implementation
notes and the stuff prior to @user. Example:

/**
* This is the class summary and some info of interest to users and
* extenders.
* @users
* @implementers
*/
public class B extends AbstractA {
/**
* This is the method summary and some info of interest to
* users and extenders.
* @users
* Watch for the IOExceptions and possibly-null return value!
* @implementers
* If you override this, remember to fuzzle the foobar for
* the benefit of pre-version-2.0 clients, and to check baz
* for being out of range or null and throw an exception early.
* We don't want null being stored in the table and an exception
* only thrown when it's pulled back out; eager throwing ensures
* the stack trace points closer to the real error.
* @implementation
* Uses O(n log n) quicksort for performance
* @param baz whatever
* @return a Foo or null
* @throws IOException if something goes wrong
* @throws IllegalArgumentException if baz is out of range
* @throws NullPointerException if baz is null
*/
public Foo someMethod (Bar baz) throws IOException { ... }
/**
* Blah blah.
* @users
* Blah blah.
* // @implementers illegal here; final method!
*/
public final void someMethod ();
}

Output docs would look like

public class B implements A

This is the class summary and some info of interest to users and extenders.

Method summary ...

public Foo someMethod (Bar baz) throws IOException

This is the method summary and some info of interest to users and extenders.
Watch for the IOExceptions and possibly-null return value!
IMPLEMENTATION NOTE:
Uses O(n log n) quicksort for performance
Parameters:
baz -- whatever
Returns: a Foo or null
Throws:
IOException if something goes wrong
IllegalArgumentException if baz is out of range
NullPointerException if baz is null

public void someMethod ()

Blah blah.
Blah blah.

Extender docs:

public class B extends AbstractA

This is the class summary and some info of interest to users and extenders.

public Foo someMethod (Bar baz) throws IOException

This is the method summary and some info of interest to users and extenders.
If you override this, remember to fuzzle the foobar for the benefit of
pre-version-2.0 clients, and to check baz for being out of range or null
and throw an exception early. We don't want null being stored in the
table and an exception only thrown when it's pulled back out; eager
throwing ensures the stack trace points closer to the real error.
IMPLEMENTATION NOTE:
Uses O(n log n) quicksort for performance
Parameters:
baz -- whatever
Returns: a Foo or null
Throws:
IOException if something goes wrong
IllegalArgumentException if baz is out of range
NullPointerException if baz is null

public final void someMethod ()

Blah blah.
Blah blah.

Note that the "final" on someMethod only appears in the extender doc for
the class B.

Actually there is one situation where it's sometimes desirable to have
an abstract class but no interface -- to force all implementations to
conform in some way, for instance by having a bunch of final public
methods and some protected non-final methods that some of the public
ones call. To support this I'd suggest allowing

public class X

public class Y extends X implements X

which shows as "implements X" on the user-docs, "extends X implements X"
on the extender docs, and makes the public interface of X part of that
of Y without exposing to users that X actually has some skeleton
implementation or using an XInterface that someone might implement
without extending X.

Note that the docs' "methods inherited from superclass" section would go
away in the user-oriented docs for a class, just showing all the public
methods of the class on an equal footing (even for the "Y extends X
implements X" case for methods of X not overridden by Y), but would show
them the traditional way in the extender docs and highlight the
finalness and protected/publicness of methods there.

The above would be better OO, I think, hiding more implementation
details from users and separating docs-for-clients and
docs-for-extenders. (Docs-just-for-maintenance-programmers being normal,
and so is the documentation. Too bad it can't happen as Sun can't break
all that legacy code. But Sun could do the next best thing, by doing the
suggested javadoc changes and allowing "protected extends AbstractA". In
other words, the above but with "extends X implements X" -> "extends X"
and "extends X // but doesn't implement X" -> "protected extends X".
That won't break legacy code and overloads "protected" rather than

(Another suggestion: allow declarations like

private Object obj !null;
public void foo (String str !null, int quuxCount) {...}

This overloads the existing keyword null and operator token ! and the
meaning is fairly clear: NPE should be thrown on attempting to assign a
null to the references, including via calls like foo(null, 3). It looks
fairly self-explanatory and saves programmers having to write explicit
tests for null when storing a value for use only later or else face NPEs
that don't point to where the problem happened.)

John Ersatznom, Jan 8, 2007
19. ### Mike SchillingGuest

"John Ersatznom" <> wrote in message
news:enufoe\$5pr\$...
..
>
> (Another suggestion: allow declarations like
>
> private Object obj !null;
> public void foo (String str !null, int quuxCount) {...}
>
> This overloads the existing keyword null and operator token ! and the
> meaning is fairly clear: NPE should be thrown on attempting to assign a
> null to the references, including via calls like foo(null, 3). It looks
> fairly self-explanatory and saves programmers having to write explicit
> tests for null when storing a value for use only later or else face NPEs
> that don't point to where the problem happened.)

I've had the same idea, and discussed it here not long ago. It get more
complex than you might think at first.

Mike Schilling, Jan 8, 2007
20. ### LewGuest

Mike Schilling wrote:
>> An explicit statment that it implements Set. If you can show any
>> disadvantage to that, I'll reconsider

John Ersatznom wrote:
> Code bloat. Documentation bloat. Slippery slope (to be consistent, you
> should put "implements Serializable" on everything that does, for
> instance, and so forth for every other interface)...

I agree with John and others on this. Redundancy is itself a disadvantage
unless it contributes safety or some other aspect. As Chris said,

> java.util.AbstractSet ... is well-known to be just /for/ creating
> implementations of java.util.Set

You'd have to change the class declaration if you change the class declaration
anyway.

- Lew

Lew, Jan 9, 2007