iterating the difference of two collections

Discussion in 'Java' started by Andreas Leitgeb, Jan 6, 2007.

  1. Given two Collection parameters a,b that I'm *not* supposed to modify,
    what's the best way to iterate over the elements in a but not in b?

    Create a new collection d with elements of a, then d.removeAll(b),
    then iterate over d.
    or
    for(ElemType e:a) { if (b.contains(e)) continue; ... }

    Actually it happens that Collection b is actually a Set,
    so my preference would be for the second, except that it
    feels hackish. Is this feeling right or wrong?
    Andreas Leitgeb, Jan 6, 2007
    #1
    1. Advertising

  2. And thus spoke Andreas Leitgeb...

    > Given two Collection parameters a,b that I'm *not* supposed to modify,
    > what's the best way to iterate over the elements in a but not in b?


    I would say, that depends...

    .... how many elements do the collections have?
    .... how complex are the actions of get and contains?

    > Create a new collection d with elements of a, then d.removeAll(b),
    > then iterate over d.
    > or
    > for(ElemType e:a) { if (b.contains(e)) continue; ... }
    >
    > Actually it happens that Collection b is actually a Set,
    > so my preference would be for the second, except that it
    > feels hackish. Is this feeling right or wrong?


    Depends on the type of the set, imho: It it's a hashset, b.contains(e)
    would be aprox. constant, so I would prefer it, too.

    Flo
    Flo 'Irian' Schaetz, Jan 6, 2007
    #2
    1. Advertising

  3. Andreas Leitgeb

    Daniel Pitts Guest

    Andreas Leitgeb wrote:
    > Given two Collection parameters a,b that I'm *not* supposed to modify,
    > what's the best way to iterate over the elements in a but not in b?
    >
    > Create a new collection d with elements of a, then d.removeAll(b),
    > then iterate over d.
    > or
    > for(ElemType e:a) { if (b.contains(e)) continue; ... }
    >
    > Actually it happens that Collection b is actually a Set,
    > so my preference would be for the second, except that it
    > feels hackish. Is this feeling right or wrong?


    Either way works. If you have the answer, why ask the question?

    If you wanted to get really tricky, you would create a custom
    collections called "Difference" which defines a view into the
    collection (a - b);
    Daniel Pitts, Jan 6, 2007
    #3
  4. Flo 'Irian' Schaetz wrote:
    > And thus spoke Andreas Leitgeb...
    >
    >> Given two Collection parameters a,b that I'm *not* supposed to modify,
    >> what's the best way to iterate over the elements in a but not in b?

    >
    > I would say, that depends...
    >
    > ... how many elements do the collections have?
    > ... how complex are the actions of get and contains?
    >
    >> Create a new collection d with elements of a, then d.removeAll(b),
    >> then iterate over d.
    >> or
    >> for(ElemType e:a) { if (b.contains(e)) continue; ... }
    >>
    >> Actually it happens that Collection b is actually a Set,
    >> so my preference would be for the second, except that it
    >> feels hackish. Is this feeling right or wrong?

    >
    > Depends on the type of the set, imho: It it's a hashset, b.contains(e)
    > would be aprox. constant, so I would prefer it, too.
    >
    > Flo


    You can also get total O(|a|+|b|) time, regardless of the original
    collection implementations, the first way, using a HashSet as the
    working collection.

    Patricia
    Patricia Shanahan, Jan 6, 2007
    #4
  5. And thus spoke Patricia Shanahan...

    > You can also get total O(|a|+|b|) time, regardless of the original
    > collection implementations, the first way, using a HashSet as the
    > working collection.


    Good idea, didn't think of that. If a or b is a hashset, the 2nd way
    would perhaps be better (O(a) or O(b), if I'm not mistaken), but
    otherwise this sounds like a good idea.

    Flo
    Flo 'Irian' Schaetz, Jan 6, 2007
    #5
  6. Flo 'Irian' Schaetz <> wrote:
    > And thus spoke Patricia Shanahan...
    >> You can also get total O(|a|+|b|) time, regardless of the original
    >> collection implementations, the first way, using a HashSet as the
    >> working collection.

    >
    > Good idea, didn't think of that. If a or b is a hashset, the 2nd way
    > would perhaps be better (O(a) or O(b), if I'm not mistaken), but
    > otherwise this sounds like a good idea.


    The sizes are small enough that probably the overhead of creating
    another HashSet outweighs the actual iteration.

    Nevertheless I had to use a newly created temporary collection
    anyway, because I found that the originally used collection "a"
    could be replaced by some already maintained subset "a1" of "a",
    which contained only the relevant elements, but is unfortunately
    modified inside the loop (some of the iterated elements are being
    removed).
    Using an iterator (and iterator's remove) is a bit complicated,
    because the remove actually happens inside another method...

    Thanks however for all responses!
    Andreas Leitgeb, Jan 7, 2007
    #6
  7. "Daniel Pitts" <> wrote in message
    news:...
    >
    > Andreas Leitgeb wrote:
    >> Given two Collection parameters a,b that I'm *not* supposed to modify,
    >> what's the best way to iterate over the elements in a but not in b?
    >>
    >> Create a new collection d with elements of a, then d.removeAll(b),
    >> then iterate over d.
    >> or
    >> for(ElemType e:a) { if (b.contains(e)) continue; ... }
    >>
    >> Actually it happens that Collection b is actually a Set,
    >> so my preference would be for the second, except that it
    >> feels hackish. Is this feeling right or wrong?

    >
    > Either way works. If you have the answer, why ask the question?
    >
    > If you wanted to get really tricky, you would create a custom
    > collections called "Difference" which defines a view into the
    > collection (a - b);


    Or create a Subset class, constructed from a Set and an expression
    determining which members of another Set are in the subset: (none of this
    compiled or tested)

    public class Subset extends AbstractSet implements Set {
    public Subset(Set superset, Criterion criterion){ ... }
    ...
    public interface Criterion {
    public boolean isInSubset(Object member);
    }
    }

    and implement a Difference criterion

    public class Difference implements Subset.Criterion {
    private Set toExclude;
    public Difference(Set toExclude) {
    this.toExclude = toExclude;
    }
    public boolean isInSubset(Object member) {
    return !toExclude.contains(member);
    }
    }
    Mike Schilling, Jan 7, 2007
    #7
  8. Mike Schilling <> wrote:
    > "Daniel Pitts" <> wrote:
    >> Andreas Leitgeb wrote:
    >>> Given two Collection parameters a,b that I'm *not* supposed to modify,
    >>> what's the best way to iterate over the elements in a but not in b?

    >> Either way works. If you have the answer, why ask the question?


    My question was for the choice among those. While I had a
    tendency, I didn't consider that an answer, myself.

    > Or create a Subset class, constructed from a Set and an expression
    > determining which members of another Set are in the subset: (none of this
    > compiled or tested)


    I'd still need to implement an iterator to make it all work.
    Depending on how I did that, it would be equivalent to either
    of my own suggestions, but either way it would be much more "OO"
    than mine :)

    > public class Subset extends AbstractSet implements Set {


    Btw., is there a reason for "implements Set", when AbstractSet
    already does it?

    > public class Difference implements Subset.Criterion {
    > [...]
    > }


    A truely elegant wrapping of my "if (c.contains(e)) continue;" :)
    Andreas Leitgeb, Jan 7, 2007
    #8
  9. "Andreas Leitgeb" <> wrote in message
    news:...
    > Mike Schilling <> wrote:


    >
    >> public class Subset extends AbstractSet implements Set {

    >
    > Btw., is there a reason for "implements Set", when AbstractSet
    > already does it?


    Documentation.
    Mike Schilling, Jan 7, 2007
    #9
  10. Mike Schilling wrote:
    > "Andreas Leitgeb" <> wrote in message
    > news:...
    >
    >>Mike Schilling <> wrote:

    >
    >
    >>> public class Subset extends AbstractSet implements Set {

    >>
    >>Btw., is there a reason for "implements Set", when AbstractSet
    >>already does it?

    >
    > Documentation.


    The class name contains "set". It extends a class whose name contains
    "Set". "All implemented interfaces" will include "Set". What more
    documentation do you want? ;)

    The really interesting thing here is whether to make Subset modifiable.
    One supposes removal should remove the element from the backing Set, and
    addition should add to the backing Set and throw something if the
    criterion is violated (the new element would not actually now appear in
    the Subset). Anyone who wants an unmodifiable one has
    Collections.unmodifiableSet available to them, and anyone who wants a
    Subset to diverge from its parent Set and just initially contain the
    appropriate Subset, then be modifiable at will (and not reflect changes
    in the backing Set), can use Set foo = new HashSet();
    foo.addAll(mySubset); to copy the Subset into a normal-behaving HashSet.
    John Ersatznom, Jan 8, 2007
    #10
  11. John Ersatznom wrote:
    > Mike Schilling wrote:
    >> "Andreas Leitgeb" <> wrote in message
    >> news:...
    >>
    >>> Mike Schilling <> wrote:

    >>
    >>
    >>>> public class Subset extends AbstractSet implements Set {
    >>>
    >>> Btw., is there a reason for "implements Set", when AbstractSet
    >>> already does it?

    >>
    >> Documentation.

    >
    > The class name contains "set". It extends a class whose name contains
    > "Set". "All implemented interfaces" will include "Set". What more
    > documentation do you want? ;)


    I agree with you about the "All implemented interfaces", but "Set" in a
    name only means the class is something to do with the general concept of
    a set, not that it implements java.util.Set. See java.util.BitSet.

    Patricia
    Patricia Shanahan, Jan 8, 2007
    #11
  12. "John Ersatznom" <> wrote in message
    news:ent3hr$slf$...
    > Mike Schilling wrote:
    >> "Andreas Leitgeb" <> wrote in message
    >> news:...
    >>
    >>>Mike Schilling <> wrote:

    >>
    >>
    >>>> public class Subset extends AbstractSet implements Set {
    >>>
    >>>Btw., is there a reason for "implements Set", when AbstractSet
    >>>already does it?

    >>
    >> Documentation.

    >
    > The class name contains "set". It extends a class whose name contains
    > "Set". "All implemented interfaces" will include "Set". What more
    > documentation do you want? ;)


    An explicit statment that it implements Set. If you can show any
    disadvantage to that, I'll reconsider :)
    Mike Schilling, Jan 8, 2007
    #12
  13. Andreas Leitgeb

    Daniel Dyer Guest

    On Mon, 08 Jan 2007 16:49:53 -0000, Mike Schilling
    <> wrote:

    >
    > "John Ersatznom" <> wrote in message
    > news:ent3hr$slf$...
    >> Mike Schilling wrote:
    >>> "Andreas Leitgeb" <> wrote in message
    >>> news:...
    >>>
    >>>> Mike Schilling <> wrote:
    >>>
    >>>
    >>>>> public class Subset extends AbstractSet implements Set {
    >>>>
    >>>> Btw., is there a reason for "implements Set", when AbstractSet
    >>>> already does it?
    >>>
    >>> Documentation.

    >>
    >> The class name contains "set". It extends a class whose name contains
    >> "Set". "All implemented interfaces" will include "Set". What more
    >> documentation do you want? ;)

    >
    > An explicit statment that it implements Set. If you can show any
    > disadvantage to that, I'll reconsider :)
    >


    I don't have a real problem with it, but I'll argue against it anyway...

    It's redundant. You are not restating any of the other things that are
    being inherited from the base class, so why make an exception for this
    interface? To be consistent you ought to also list Iterable and possibly
    even Collection in the implements list.

    Dan.

    --
    Daniel Dyer
    http://www.uncommons.org
    Daniel Dyer, Jan 8, 2007
    #13
  14. "Daniel Dyer" <"You don't need it"> wrote in message
    news:eek:...
    > On Mon, 08 Jan 2007 16:49:53 -0000, Mike Schilling
    > <> wrote:
    >
    >>
    >> "John Ersatznom" <> wrote in message
    >> news:ent3hr$slf$...
    >>> Mike Schilling wrote:
    >>>> "Andreas Leitgeb" <> wrote in message
    >>>> news:...
    >>>>
    >>>>> Mike Schilling <> wrote:
    >>>>
    >>>>
    >>>>>> public class Subset extends AbstractSet implements Set {
    >>>>>
    >>>>> Btw., is there a reason for "implements Set", when AbstractSet
    >>>>> already does it?
    >>>>
    >>>> Documentation.
    >>>
    >>> The class name contains "set". It extends a class whose name contains
    >>> "Set". "All implemented interfaces" will include "Set". What more
    >>> documentation do you want? ;)

    >>
    >> An explicit statment that it implements Set. If you can show any
    >> disadvantage to that, I'll reconsider :)
    >>

    >
    > I don't have a real problem with it, but I'll argue against it anyway...
    >
    > It's redundant. You are not restating any of the other things that are
    > being inherited from the base class, so why make an exception for this
    > interface? To be consistent you ought to also list Iterable and possibly
    > even Collection in the implements list.


    Set implies those two, of course, and the fact that Subset implements Set is
    part of its interface. The fact that I'm implementing Subset by subclassing
    AbstractSet is an implementation detail. Should I decide, during the
    implementation, that I'd rather implement it a different way, I'd need to
    change

    Subset extends AbstractSet

    to

    Subset implements Set

    It's simpler and safer to say

    Subset extends AbstractSet implements Set

    in the first place
    Mike Schilling, Jan 8, 2007
    #14
  15. Andreas Leitgeb

    Chris Uppal Guest

    Mike Schilling wrote:

    > An explicit statment that it implements Set. If you can show any
    > disadvantage to that, I'll reconsider :)


    The only disadvantage I can think of is that unexpected redundancy can be a
    little confusing ("Eh ? Why he telling us that ? Isn't it obvious ? What am
    I missing ?").

    In this specific case, emphasising that a class which is derived from
    java.util.AbstractSet (which is well-known to be just /for/ creating
    implementations of java.util.Set) does in fact implement java.util.Set, would
    make me look twice to see if there was something odd going on (e.g. did the
    name "Set" or "AbstractSet" refer to something other than java.util.<whatever>
    in this context ?).

    Not a big deal...

    -- chris
    Chris Uppal, Jan 8, 2007
    #15
  16. Patricia Shanahan wrote:
    > I agree with you about the "All implemented interfaces", but "Set" in a
    > name only means the class is something to do with the general concept of
    > a set, not that it implements java.util.Set. See java.util.BitSet.


    Even when we have "Fooset extends AbstractSet"? :)
    John Ersatznom, Jan 8, 2007
    #16
  17. Mike Schilling wrote:
    > An explicit statment that it implements Set. If you can show any
    > disadvantage to that, I'll reconsider :)


    Code bloat. Documentation bloat. Slippery slope (to be consistent, you
    should put "implements Serializable" on everything that does, for
    instance, and so forth for every other interface)...
    John Ersatznom, Jan 8, 2007
    #17
  18. Mike Schilling wrote:
    > Set implies those two, of course, and the fact that Subset implements Set is
    > part of its interface. The fact that I'm implementing Subset by subclassing
    > AbstractSet is an implementation detail.


    This leads to an interesting question. Given that implementation details
    should be hidden, perhaps the whole notion of inheritance and
    subclassing needs an overhaul?

    Basically, the fact that a class extends another class would become
    "protected". It would no longer be legal for client code to do this:

    A foo = new B();
    A.someMethod();

    given

    public class A {
    public void someMethod() { ... }
    }

    public class B extends A { ... }

    If client code needed to polymorphically call someMethod you'd have to have

    public interface A {
    public void someMethod();
    }

    public abstract class AbstractA {
    public void someMethod() { ... }
    }

    public class B extends AbstractA ...

    A foo = new B();
    A.someMethod(); // Legal again

    Of course, the docs for B would note "implements A" rather than "extends
    AbstractA". In fact, it would just note what's currently in "all
    implemented interfaces". Unless you generated docs for "protected"
    stuff, in which case its extending AbstractA would be noted. Actually,
    I'd further suggest generating separate docs for the public interface of
    a class and the protected one, with the former linking to the latter
    where it exists, the latter not existing for final classes, and the
    latter being focused on extenders and the former on users. So there'd be
    say three new javadoc tags, @users, @implementers and @implementation;
    the first would begin and the second would end the user-oriented
    information, the latter also starting the implementer-oriented
    information; the third would give implementation notes with a nice
    IMPLEMENTATION NOTES: heading and be illegal on interfaces and their
    members. The public interface doc for a class or interface would give
    the public members and user-oriented information, while the protected
    interface doc would give the public and protected members and
    implementer-oriented information. Both would give the implementation
    notes and the stuff prior to @user. Example:

    /**
    * This is the class summary and some info of interest to users and
    * extenders.
    * @users
    * More information, geared towards users
    * @implementers
    * More information, geared towards extenders
    */
    public class B extends AbstractA {
    /**
    * This is the method summary and some info of interest to
    * users and extenders.
    * @users
    * Watch for the IOExceptions and possibly-null return value!
    * @implementers
    * If you override this, remember to fuzzle the foobar for
    * the benefit of pre-version-2.0 clients, and to check baz
    * for being out of range or null and throw an exception early.
    * We don't want null being stored in the table and an exception
    * only thrown when it's pulled back out; eager throwing ensures
    * the stack trace points closer to the real error.
    * @implementation
    * Uses O(n log n) quicksort for performance
    * @param baz whatever
    * @return a Foo or null
    * @throws IOException if something goes wrong
    * @throws IllegalArgumentException if baz is out of range
    * @throws NullPointerException if baz is null
    */
    public Foo someMethod (Bar baz) throws IOException { ... }
    /**
    * Blah blah.
    * @users
    * Blah blah.
    * // @implementers illegal here; final method!
    */
    public final void someMethod ();
    }

    Output docs would look like

    public class B implements A

    This is the class summary and some info of interest to users and extenders.
    More information, geared towards users.

    <link to extender docs>

    Method summary ...

    public Foo someMethod (Bar baz) throws IOException

    This is the method summary and some info of interest to users and extenders.
    Watch for the IOExceptions and possibly-null return value!
    IMPLEMENTATION NOTE:
    Uses O(n log n) quicksort for performance
    Parameters:
    baz -- whatever
    Returns: a Foo or null
    Throws:
    IOException if something goes wrong
    IllegalArgumentException if baz is out of range
    NullPointerException if baz is null

    public void someMethod ()

    Blah blah.
    Blah blah.


    Extender docs:

    public class B extends AbstractA

    This is the class summary and some info of interest to users and extenders.
    More information, geared towards extenders.

    public Foo someMethod (Bar baz) throws IOException

    This is the method summary and some info of interest to users and extenders.
    If you override this, remember to fuzzle the foobar for the benefit of
    pre-version-2.0 clients, and to check baz for being out of range or null
    and throw an exception early. We don't want null being stored in the
    table and an exception only thrown when it's pulled back out; eager
    throwing ensures the stack trace points closer to the real error.
    IMPLEMENTATION NOTE:
    Uses O(n log n) quicksort for performance
    Parameters:
    baz -- whatever
    Returns: a Foo or null
    Throws:
    IOException if something goes wrong
    IllegalArgumentException if baz is out of range
    NullPointerException if baz is null

    public final void someMethod ()

    Blah blah.
    Blah blah.

    Note that the "final" on someMethod only appears in the extender doc for
    the class B.

    Actually there is one situation where it's sometimes desirable to have
    an abstract class but no interface -- to force all implementations to
    conform in some way, for instance by having a bunch of final public
    methods and some protected non-final methods that some of the public
    ones call. To support this I'd suggest allowing

    public class X

    public class Y extends X implements X

    which shows as "implements X" on the user-docs, "extends X implements X"
    on the extender docs, and makes the public interface of X part of that
    of Y without exposing to users that X actually has some skeleton
    implementation or using an XInterface that someone might implement
    without extending X.

    Note that the docs' "methods inherited from superclass" section would go
    away in the user-oriented docs for a class, just showing all the public
    methods of the class on an equal footing (even for the "Y extends X
    implements X" case for methods of X not overridden by Y), but would show
    them the traditional way in the extender docs and highlight the
    finalness and protected/publicness of methods there.

    The above would be better OO, I think, hiding more implementation
    details from users and separating docs-for-clients and
    docs-for-extenders. (Docs-just-for-maintenance-programmers being normal,
    non-Javadoc comments in the source code.) The encapsulation is improved
    and so is the documentation. Too bad it can't happen as Sun can't break
    all that legacy code. But Sun could do the next best thing, by doing the
    suggested javadoc changes and allowing "protected extends AbstractA". In
    other words, the above but with "extends X implements X" -> "extends X"
    and "extends X // but doesn't implement X" -> "protected extends X".
    That won't break legacy code and overloads "protected" rather than
    adding a new keyword.

    (Another suggestion: allow declarations like

    private Object obj !null;
    public void foo (String str !null, int quuxCount) {...}

    This overloads the existing keyword null and operator token ! and the
    meaning is fairly clear: NPE should be thrown on attempting to assign a
    null to the references, including via calls like foo(null, 3). It looks
    fairly self-explanatory and saves programmers having to write explicit
    tests for null when storing a value for use only later or else face NPEs
    that don't point to where the problem happened.)
    John Ersatznom, Jan 8, 2007
    #18
  19. "John Ersatznom" <> wrote in message
    news:enufoe$5pr$...
    ..
    >
    > (Another suggestion: allow declarations like
    >
    > private Object obj !null;
    > public void foo (String str !null, int quuxCount) {...}
    >
    > This overloads the existing keyword null and operator token ! and the
    > meaning is fairly clear: NPE should be thrown on attempting to assign a
    > null to the references, including via calls like foo(null, 3). It looks
    > fairly self-explanatory and saves programmers having to write explicit
    > tests for null when storing a value for use only later or else face NPEs
    > that don't point to where the problem happened.)


    I've had the same idea, and discussed it here not long ago. It get more
    complex than you might think at first.

    http://groups.google.com/group/comp..._frm/thread/7910c61c7dd2da68/181e5bf1cda3aecd
    Mike Schilling, Jan 8, 2007
    #19
  20. Andreas Leitgeb

    Lew Guest

    Mike Schilling wrote:
    >> An explicit statment that it implements Set. If you can show any
    >> disadvantage to that, I'll reconsider :)


    John Ersatznom wrote:
    > Code bloat. Documentation bloat. Slippery slope (to be consistent, you
    > should put "implements Serializable" on everything that does, for
    > instance, and so forth for every other interface)...


    I agree with John and others on this. Redundancy is itself a disadvantage
    unless it contributes safety or some other aspect. As Chris said,

    > java.util.AbstractSet ... is well-known to be just /for/ creating
    > implementations of java.util.Set


    You'd have to change the class declaration if you change the class declaration
    anyway.

    - Lew
    Lew, Jan 9, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Doug Poland
    Replies:
    9
    Views:
    708
    VisionSet
    Sep 27, 2003
  2. Richard
    Replies:
    5
    Views:
    312
    Hari Pulapaka
    Aug 13, 2004
  3. Replies:
    5
    Views:
    343
    Michael Rauscher
    Oct 31, 2006
  4. carl
    Replies:
    5
    Views:
    2,333
    James Kanze
    Nov 25, 2009
  5. mutex
    Replies:
    0
    Views:
    198
    mutex
    Jul 27, 2003
Loading...

Share This Page