bags? 2.5.x?

Discussion in 'Python' started by Dan Stromberg, Jan 14, 2008.

  1. Is there a particular reason why bags didn't go into 2.5.x or 3000?

    I keep wanting something like them - especially bags with something akin
    to set union, intersection and difference.
     
    Dan Stromberg, Jan 14, 2008
    #1
    1. Advertising

  2. Dan Stromberg wrote:
    > Is there a particular reason why bags didn't go into 2.5.x or 3000?
    >
    > I keep wanting something like them - especially bags with something akin
    > to set union, intersection and difference.
    >

    How about this recepie
    <URL:http://www.ubookcase.com/book/Oreilly/Python.Cookbook.2nd.edition/0596007973/pythoncook2-chp-18-sect-8.html>?

    /W
     
    Wildemar Wildenburger, Jan 14, 2008
    #2
    1. Advertising

  3. On Mon, 14 Jan 2008 20:41:27 +0100, Wildemar Wildenburger wrote:

    > Dan Stromberg wrote:
    >> Is there a particular reason why bags didn't go into 2.5.x or 3000?
    >>
    >> I keep wanting something like them - especially bags with something
    >> akin to set union, intersection and difference.
    >>

    > How about this recepie
    > <URL:http://www.ubookcase.com/book/Oreilly/

    Python.Cookbook.2nd.edition/0596007973/pythoncook2-chp-18-sect-8.html>?
    >
    > /W


    I'd found this with google, but thank you for making sure I was aware of
    it.

    The author of the bag class said that he was planning to submit bags for
    inclusion in 2.5 - is there a particular reason why they didn't go in?

    I keep finding a need for bags. In the past, I've done this sort of
    thing with dictionaries, but it's much nicer to have a bag class, and of
    course it's better to have it in the standard library than to slurp it
    into this, that and the other project.
     
    Dan Stromberg, Jan 17, 2008
    #3
  4. Dan Stromberg wrote:
    > The author of the bag class said that he was planning to submit bags for
    > inclusion in 2.5 - is there a particular reason why they didn't go in?
    >

    I wouldn't know. Not enough convincing use cases, I guess. Fools ;)


    > I keep finding a need for bags. In the past, I've done this sort of
    > thing with dictionaries, but it's much nicer to have a bag class, and of
    > course it's better to have it in the standard library than to slurp it
    > into this, that and the other project.
    >

    Then again, it is only one class. Also, if I may be so bold, why
    wouldn't a simple list fit your needs (other than performance, of course)?


    /W
     
    Wildemar Wildenburger, Jan 17, 2008
    #4
  5. Re: bags? 2.5.x?

    > >> I keep wanting something like them - especially bags with something
    > >> akin to set union, intersection and difference.

    >
    > > How about this recepie
    > > <URL:http://www.ubookcase.com/book/Oreilly/

    >
    > The author of the bag class said that he was planning to submit bags for
    > inclusion in 2.5 - is there a particular reason why they didn't go in?


    Three reasons:

    1. b=collections.defaultdict(int) went a long way
    towards meeting the need to for a fast counter.

    2. It's still not clear what the best API would be.
    What should list(b) return for b.dict = {'a':3, 'b':0, 'c':-3}?
    Perhaps, [('a', 3), ('b', 0), ('c', -3)]
    or ['a', 'a', 'a']
    or ['a']
    or ['a', 'b', 'c']
    or raise an Error for the negative entry.

    3. I'm still working on it and am not done yet.


    Raymond
     
    Raymond Hettinger, Jan 18, 2008
    #5
  6. Re: bags? 2.5.x?

    On Thu, 17 Jan 2008 18:18:53 -0800, Raymond Hettinger wrote:

    >> >> I keep wanting something like them - especially bags with something
    >> >> akin to set union, intersection and difference.

    >>
    >> > How about this recepie
    >> > <URL:http://www.ubookcase.com/book/Oreilly/

    >>
    >> The author of the bag class said that he was planning to submit bags
    >> for inclusion in 2.5 - is there a particular reason why they didn't go
    >> in?

    >
    > Three reasons:
    >
    > 1. b=collections.defaultdict(int) went a long way towards meeting the
    > need to for a fast counter.
    >
    > 2. It's still not clear what the best API would be. What should list(b)
    > return for b.dict = {'a':3, 'b':0, 'c':-3}? Perhaps, [('a', 3), ('b',
    > 0), ('c', -3)] or ['a', 'a', 'a']
    > or ['a']
    > or ['a', 'b', 'c']
    > or raise an Error for the negative entry.


    I'd suggest that .keys() return the unique list, and that list() return
    the list of tuples. Then people can use list comprehensions or map() to
    get what they really need.

    It might not be a bad thing to have an optional parameter on __init__
    that would allow the user to specify if they need negative counts or not;
    so far, I've not needed them though.

    > 3. I'm still working on it and am not done yet.
    >


    Decent reasons. :)

    Thanks!

    Here's a diff to bag.py that helped me. I'd like to think these meanings
    are common, but not necessarily!

    $ diff -b /tmp/bag.py.original /usr/local/lib/bag.py
    18a19,58
    > def _difference(lst):
    > left = lst[0]
    > right = lst[1]
    > return max(left - right, 0)
    > _difference = staticmethod(_difference)
    >
    > def _relationship(self, other, operator):
    > if isinstance(other, bag):
    > self_keys = set(self._data.keys())
    > other_keys = set(other._data.keys())
    > union_keys = self_keys | other_keys
    > #print 'union_keys is',union_keys
    > result = bag()
    > for element in list(union_keys):
    > temp = operator([ self[element], other

    [element] ])
    > #print 'operator is', operator
    > #print 'temp is', temp
    > result[element] += temp
    > return result
    > else:
    > raise NotImplemented
    >
    > def union(self, other):
    > return self._relationship(other, sum)
    >
    > __or__ = union
    >
    > def intersection(self, other):
    > return self._relationship(other, min)
    >
    > __and__ = intersection
    >
    > def maxunion(self, other):
    > return self._relationship(other, max)
    >
    > def difference(self, other):
    > return self._relationship(other, self._difference)
    >
    > __sub__ = difference
    >
     
    Dan Stromberg, Jan 21, 2008
    #6
  7. Dan Stromberg

    MRAB Guest

    Re: bags? 2.5.x?

    On Jan 21, 11:13 pm, Dan Stromberg <> wrote:
    > On Thu, 17 Jan 2008 18:18:53 -0800, Raymond Hettinger wrote:
    > >> >> I keep wanting something like them - especially bags with something
    > >> >> akin to set union, intersection and difference.

    >
    > >> > How about this recepie
    > >> > <URL:http://www.ubookcase.com/book/Oreilly/

    >
    > >> The author of the bag class said that he was planning to submit bags
    > >> for inclusion in 2.5 - is there a particular reason why they didn't go
    > >> in?

    >
    > > Three reasons:

    >
    > > 1. b=collections.defaultdict(int) went a long way towards meeting the
    > > need to for a fast counter.

    >
    > > 2. It's still not clear what the best API would be. What should list(b)
    > > return for b.dict = {'a':3, 'b':0, 'c':-3}? Perhaps, [('a', 3), ('b',
    > > 0), ('c', -3)] or ['a', 'a', 'a']
    > > or ['a']
    > > or ['a', 'b', 'c']
    > > or raise an Error for the negative entry.

    >
    > I'd suggest that .keys() return the unique list, and that list() return
    > the list of tuples. Then people can use list comprehensions or map() to
    > get what they really need.


    I think that a bag is a cross between a dict (but the values are
    always positive integers) and a set (but duplicates are permitted).

    I agree that .keys() should the unique list, but that .items() should
    return the tuples and list() should return the list of keys including
    duplicates. bag() should accept an iterable and count the duplicates.

    For example:

    >>> sentence = "the cat sat on the mat"
    >>> my_words = sentence.split()
    >>> print my_words

    ['the', 'cat', 'sat', 'on', 'the', 'mat']
    >>> my_bag = bag(my_words)
    >>> print my_bag

    bag({'on': 1, 'the': 2, 'sat': 1, 'mat': 1, 'cat': 1})
    my_list = list(my_bag)
    ['on', 'the', 'the', 'sat', 'mat', 'cat']

    It should be easy to convert a bag to a dict and also a dict to a bag,
    raising ValueError if it sees a value that's not a non-negative
    integer (a value of zero just means "there isn't one of these in the
    bag"!).

    >
    > It might not be a bad thing to have an optional parameter on __init__
    > that would allow the user to specify if they need negative counts or not;
    > so far, I've not needed them though.
    >
    > > 3. I'm still working on it and am not done yet.

    >
    > Decent reasons. :)
    >
    > Thanks!
    >
    > Here's a diff to bag.py that helped me. I'd like to think these meanings
    > are common, but not necessarily!
    >
    > $ diff -b /tmp/bag.py.original /usr/local/lib/bag.py
    > 18a19,58
    >
    > > def _difference(lst):
    > > left = lst[0]
    > > right = lst[1]
    > > return max(left - right, 0)
    > > _difference = staticmethod(_difference)

    >
    > > def _relationship(self, other, operator):
    > > if isinstance(other, bag):
    > > self_keys = set(self._data.keys())
    > > other_keys = set(other._data.keys())
    > > union_keys = self_keys | other_keys
    > > #print 'union_keys is',union_keys
    > > result = bag()
    > > for element in list(union_keys):
    > > temp = operator([ self[element], other

    > [element] ])
    > > #print 'operator is', operator
    > > #print 'temp is', temp
    > > result[element] += temp
    > > return result
    > > else:
    > > raise NotImplemented

    >
    > > def union(self, other):
    > > return self._relationship(other, sum)

    >
    > > __or__ = union

    >
    > > def intersection(self, other):
    > > return self._relationship(other, min)

    >
    > > __and__ = intersection

    >
    > > def maxunion(self, other):
    > > return self._relationship(other, max)

    >
    > > def difference(self, other):
    > > return self._relationship(other, self._difference)

    >
    > > __sub__ = difference
     
    MRAB, Jan 22, 2008
    #7
  8. Dan Stromberg wrote:
    > Is there a particular reason why bags didn't go into 2.5.x or 3000?
    >
    > I keep wanting something like them - especially bags with something akin
    > to set union, intersection and difference.


    Ask yourself the following questions:

    * Is the feature useful for the broad mass?
    * Has the feature been implemented and contributed for Python?
    * Is the code well written, tested and documented?
    * Is the code mature and used by lots of people?

    Can you answer every question with yes?

    Christian
     
    Christian Heimes, Jan 23, 2008
    #8
  9. On Wed, 23 Jan 2008 08:51:13 +0100, Christian Heimes wrote:

    > Dan Stromberg wrote:
    >> Is there a particular reason why bags didn't go into 2.5.x or 3000?
    >>
    >> I keep wanting something like them - especially bags with something
    >> akin to set union, intersection and difference.

    >
    > Ask yourself the following questions:
    >
    > * Is the feature useful for the broad mass?


    Yes, probably, at least if this kind of feature's inclusion in other
    languages and my two recent needs for it are any indication. In other
    languages, they are sometimes called bags or multisets.

    * Has the feature been
    > implemented and contributed for Python?


    Yes.

    * Is the code well written,
    > tested and documented?


    Yes, I believe so, though the patch I sent has only been used in
    production a little.

    * Is the code mature and used by lots of people?

    This is hard to get a handle on. It's been on the web for a while, so
    it's hard to say how many people have downloaded it and used it.

    > Can you answer every question with yes?


    3 yesses and a "hard to say for sure". But maybe we could check the web
    server log to get a download count for the maybe.


    > Christian
     
    Dan Stromberg, Feb 1, 2008
    #9
  10. Dan Stromberg

    Paul Rubin Guest

    Dan Stromberg <> writes:
    > > * Is the feature useful for the broad mass?

    >
    > Yes, probably, at least if this kind of feature's inclusion in other
    > languages and my two recent needs for it are any indication. In other
    > languages, they are sometimes called bags or multisets.


    I use defaultdict(int) all the time, but it's been fairly rare
    to want to do set operations (union, intersection, difference) on
    these. I'm not even sure what the intersection should mean.
     
    Paul Rubin, Feb 1, 2008
    #10
  11. Re: bags? 2.5.x?

    On Feb 1, 10:29 pm, Paul Rubin <http://> wrote:
    > Dan Stromberg <> writes:
    > > > * Is the feature useful for the broad mass?

    >
    > > Yes, probably, at least if this kind of feature's inclusion in other
    > > languages and my two recent needs for it are any indication.  In other
    > > languages, they are sometimes called bags or multisets.

    >
    > I use defaultdict(int) all the time, but it's been fairly rare
    > to want to do set operations (union, intersection, difference) on
    > these.  I'm not even sure what the intersection should mean.


    Let me write the multiset with 3 x's and 2y y' as {2x, 3y} (this is
    not a suggested Python syntax!).

    Note: in the following my multisets can have negative multiplicities,
    whereas the usual definition only accepts positive multiplicities.

    Let H be the collection of all python hashable objects.
    Let Z be the set of integers.

    So a multiset is a mapping from H to Z which maps cofinitely many
    objects
    to 0 (a sort of defaultdict(int))

    === For mathematically minded people ===

    Well, if one thinks about the mathematical meaning of multisets, then
    the collection of all multisets is the free Z-module over H.
    Therefore the meaningful operations for multisets are addition,
    subtraction and multiplication by a scalar.

    === For not so mathematically minded people ===

    One way of thinking about a multiset can be as a vector whose
    dimensions are explicitly named, instead of being implicit. For
    example in 3D the vector [2, 3, 4] could instead be written as {2x,
    3y, 4z}. Just like vectors, multisets can then be added, subtracted
    and multiplied by as scalar:


    {2x, 3y} + {4x, 7z} = {6x, 3y, 7z}
    {4x, 2y, 1z} - {1x, 2y, 3z} = {3x, -2z}
    {3x, 2y}*5 = {15x, 10y}

    Now if one wants to extend union and intersection to multisets:

    * For sets {x, y} union {y, z} = {x, y, z}. The natural way of
    extending this to multisets is having the union operator take the
    max of the multiplicities of each element, i.e.

    {2x, 7y, 1z} union {3y, 1z, 5t} = {2x, 7y, 1z, 5t}
    {3x, -5y} union {2z} = {3x, 2z} (!)

    * Similarly, for intersection one would take the min of
    multiplicities, i.e.

    {2x, 7y, 1z} intersection {3y, 1z, 5t} = {3y, 1z}
    {3x, -5y} intersection {2z} = {-5y} (!)

    * difference and symmetric_difference can be defined in terms of +, -,
    union, intersection:

    A difference B = A - (A intersection B)
    A symmetric_difference B = (A union B) - (A intersection B)

    Note that union, intersection, difference and symmetric_difference
    defined this way are fully compatible with the corresponding set
    operations in the following sense:

    If A and B are sets, then

    - multiset(A).union(multiset(B)) == multiset(A.union(B))

    - multiset(A).intersection(multiset(B)) == multiset(A.intersection(B))

    (same for difference, symmetric_difference)

    I think this is how I would like my multisets :)

    --
    Arnaud
     
    Arnaud Delobelle, Feb 2, 2008
    #11
  12. Dan Stromberg

    Paul Rubin Guest

    Re: bags? 2.5.x?

    Arnaud Delobelle <> writes:
    > * For sets {x, y} union {y, z} = {x, y, z}. The natural way of
    > extending this to multisets is having the union operator take the
    > max of the multiplicities of each element, i.e.


    That certainly doesn't fit the intuition of a bag of objects. I'd
    think of the union of two bags as the result of dumping the contents
    of both bags onto the table, i.e. you'd add the two vectors.

    > * Similarly, for intersection one would take the min of
    > multiplicities, i.e.


    This is a reasonable interpretation I guess.

    > A difference B = A - (A intersection B)


    I think difference would mean subtraction.
     
    Paul Rubin, Feb 2, 2008
    #12
  13. Dan Stromberg

    MRAB Guest

    Re: bags? 2.5.x?

    On Feb 2, 9:17 am, Paul Rubin <http://> wrote:
    > Arnaud Delobelle <> writes:
    > > * For sets {x, y} union {y, z} = {x, y, z}. The natural way of
    > > extending this to multisets is having the union operator take the
    > > max of the multiplicities of each element, i.e.

    >
    > That certainly doesn't fit the intuition of a bag of objects. I'd
    > think of the union of two bags as the result of dumping the contents
    > of both bags onto the table, i.e. you'd add the two vectors.
    >

    I could see uses for both types of union. You could have both A + B
    which adds the multiplicities (the smallest bag which contains both
    the bags) and A | B which takes the max of the multiplicities (the
    smallest bag which contains either of the bags).

    > > * Similarly, for intersection one would take the min of
    > > multiplicities, i.e.

    >
    > This is a reasonable interpretation I guess.
    >
    > > A difference B = A - (A intersection B)

    >
    > I think difference would mean subtraction.
     
    MRAB, Feb 2, 2008
    #13
  14. Re: bags? 2.5.x?

    On Sat, 2008-02-02 at 01:17 -0800, Paul Rubin wrote:
    > Arnaud Delobelle <> writes:
    > > * For sets {x, y} union {y, z} = {x, y, z}. The natural way of
    > > extending this to multisets is having the union operator take the
    > > max of the multiplicities of each element, i.e.

    >
    > That certainly doesn't fit the intuition of a bag of objects. I'd
    > think of the union of two bags as the result of dumping the contents
    > of both bags onto the table, i.e. you'd add the two vectors.


    In addition to possibly being more intuitive, the latter also agrees
    with a use case of bags in complex analysis. If you consider the zeroes
    of complex polynomials, let B1 be the zeroes of polynomial P1, and B2 be
    the zeroes of polynomial P2, then B1+B2 are the zeroes of the product
    P1*P2.

    Simple example: P1(z)=z-1, P2(z)=(z-1)^2. P1 has a simple zero at z=1,
    P2 has a double zero at z=1. The product is (z-1)^3, which has a triple
    zero at z=1.

    --
    Carsten Haese
    http://informixdb.sourceforge.net
     
    Carsten Haese, Feb 2, 2008
    #14
  15. Re: bags? 2.5.x?

    On Feb 2, 5:28 pm, Carsten Haese <> wrote:
    > On Sat, 2008-02-02 at 01:17 -0800, Paul Rubin wrote:
    > > Arnaud Delobelle <> writes:
    > > > * For sets {x, y} union {y, z} = {x, y, z}.  The natural way of
    > > >   extending this to multisets is having the union operator take the
    > > >   max of the multiplicities of each element, i.e.

    >
    > > That certainly doesn't fit the intuition of a bag of objects.  I'd
    > > think of the union of two bags as the result of dumping the contents
    > > of both bags onto the table, i.e. you'd add the two vectors.

    >
    > In addition to possibly being more intuitive, the latter also agrees
    > with a use case of bags in complex analysis. If you consider the zeroes
    > of complex polynomials, let B1 be the zeroes of polynomial P1, and B2 be
    > the zeroes of polynomial P2, then B1+B2 are the zeroes of the product
    > P1*P2.


    Note that I was proposing exactly this: B1+B2 would add
    multiplicities. On top of this, B1.union(B2) would take the max
    (maybe spelt B1|B2). IMHO this is more in the spirit of union:
    conceptually, the union of two things is the smallest thing that
    contains them both.

    --
    Arnaud
     
    Arnaud Delobelle, Feb 3, 2008
    #15
  16. Re: bags? 2.5.x?

    On Feb 2, 3:21 pm, MRAB <> wrote:
    > I could see uses for both types of union. You could have both A + B
    > which adds the multiplicities (the smallest bag which contains both
    > the bags) and A | B which takes the max of the multiplicities (the
    > smallest bag which contains either of the bags).


    I agree, and this is what I proposed in my post (except that I spelt A|
    B as A.union(B))

    --
    Arnaud
     
    Arnaud Delobelle, Feb 3, 2008
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SHJpc2hpIFI=?=

    Error while using "State Bags".

    =?Utf-8?B?SHJpc2hpIFI=?=, Nov 16, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    365
    =?Utf-8?B?SHJpc2hpIFI=?=
    Nov 16, 2004
  2. Peter Dobcsanyi

    bags in collections - PEP 320

    Peter Dobcsanyi, Apr 29, 2004, in forum: Python
    Replies:
    2
    Views:
    313
    Peter Dobcsanyi
    May 1, 2004
  3. Mark E. Fenner

    From bags to default dicts

    Mark E. Fenner, Sep 12, 2006, in forum: Python
    Replies:
    0
    Views:
    277
    Mark E. Fenner
    Sep 12, 2006
  4. fgggggggggg

    DOLCE&GABBANA bags

    fgggggggggg, Mar 20, 2009, in forum: Java
    Replies:
    0
    Views:
    366
    fgggggggggg
    Mar 20, 2009
  5. rrrrgfffffffffffffff

    DOLCE&GABBANA bags

    rrrrgfffffffffffffff, Mar 21, 2009, in forum: Java
    Replies:
    0
    Views:
    314
    rrrrgfffffffffffffff
    Mar 21, 2009
Loading...

Share This Page