pre-PEP generic objects

S

Steven Bethard

I promised I'd put together a PEP for a 'generic object' data type for
Python 2.5 that allows one to replace __getitem__ style access with
dotted-attribute style access (without declaring another class). Any
comments would be appreciated!

Thanks!

Steve

----------------------------------------------------------------------
Title: Generic Object Data Type
Version: $Revision: 1.0 $
Last-Modified: $Date: 2004/11/29 16:00:00 $
Author: Steven Bethard <[email protected]>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 29-Nov-2004
Python-Version: 2.5
Post-History: 29-Nov-2004


Abstract
========

This PEP proposes a standard library addition to support the simple
creation of 'generic' objects which can be given named attributes
without the need to declare a class. Such attribute-value mappings are
intended to complement the name-value mappings provided by Python's
builtin dict objects.


Motivation
==========

Python's dict objects provide a simple way of creating anonymous
name-value mappings. These mappings use the __getitem__ protocol to
access the value associated with a name, so that code generally appears
like::

mapping['name']

Occasionally, a programmer may decide that dotted-attribute style access
is more appropriate to the domain than __getitem__ style access, and
that their mapping should be accessed like::

mapping.name

Currently, if a Python programmer makes this design decision, they are
forced to declare a new class, and then build instances of this class.
When no methods are to be associated with the attribute-value mappings,
declaring a new class can be overkill. This PEP proposes adding a
simple type to the standard library that can be used to build such
attribute-value mappings.

Providing such a type allows the Python programmer to determine which
type of mapping is most appropriate to their domain and apply this
choice with minimal effort. Some of the suggested uses include:


Returning Named Results
-----------------------

It is often appropriate for a function that returns multiple items to
give names to the different items returned. The type suggested in this
PEP provides a simple means of doing this that allows the returned
values to be accessed in the usual attribute-style access::
... return Bunch(double=2*x, squared=x**2)
... 100


Representing Hierarchical Data
------------------------------

The type suggested in this PEP also allows a simple means of
representing hierarchical data that allows attribute-style access::
>>> x = Bunch(spam=Bunch(rabbit=1, badger=[2, 3, 4]), ham='neewom')
>>> x.spam.badger [2, 3, 4]
>>> x.ham
'neewom'


Rationale
=========

As Bunch objects are intended primarily to replace simple classes,
simple Bunch construction was a primary concern. As such, the Bunch
constructor supports creation from keyword arguments, dicts, and
sequences of (attribute, value) pairs::
>>> Bunch(eggs=1, spam=2, ham=3) Bunch(eggs=1, ham=3, spam=2)
>>> Bunch({'eggs':1, 'spam':2, 'ham':3}) Bunch(eggs=1, ham=3, spam=2)
>>> Bunch([('eggs',1), ('spam',2), ('ham',3)])
Bunch(eggs=1, ham=3, spam=2)

To allow attribute-value mappings to be easily combined, the update
method of Bunch objects supports similar arguments.

If Bunch objects are used to represent hierarchical data, comparison of
such objects becomes a concern. For this reason, Bunch objects support
object equality::
False

Additionally, to allow users of the Bunch type to convert other
hierarchical data into Bunch objects, a frommapping classmethod is
supported. This can be used, for example, to convert an XML DOM tree
into a tree of nested Bunch objects::
... if not isinstance(element, xml.dom.minidom.Element):
... raise TypeError('items only retrievable from Elements')
... if element.attributes:
... for key, value in element.attributes.items():
... yield key, value
... children = {}
... for child in element.childNodes:
... if child.nodeType == xml.dom.minidom.Node.TEXT_NODE:
... text_list = children.setdefault('text', [])
... text_list.append(child.nodeValue)
... else:
... children.setdefault(child.nodeName, []).append(
... Bunch.frommapping(child, getitems=getitems))
... for name, child_list in children.items():
... yield name, child_list
... ... <xml>
... <a attr_a="1">
... a text 1
... <b attr_b="2" />
... <b attr_b="3"> b text </b>
... a text 2
... </a>
... said:
>>> b = Bunch.frommapping(doc.documentElement, getitems=getitems)
>>> b.a[0].b[1]
Bunch(attr_b=u'3', text=[u' b text '])

Note that support for the various mapping methods, e.g.
__(get|set|del)item__, __len__, __iter__, __contains__, items, keys,
values, etc. was intentionally omitted as these methods did not seem to
be necessary for the core uses of an attribute-value mapping. If such
methods are truly necessary for a given use case, this may suggest that
a dict object is a more appropriate type for that use.


Reference Implementation
========================

(This will be replaced with a link to a SF patch when I think I've
made all the necessary corrections)::

import operator as _operator

class Bunch(object):
"""Bunch([bunch|dict|seq], **kwds) -> new bunch with specified
attributes

The new Bunch object's attributes are initialized from (if
provided) either another Bunch object's attributes, a
dictionary, or a sequence of (name, value) pairs, then from the
name=value pairs in the keyword argument list.

Example Usage:
>>> Bunch(eggs=1, spam=2, ham=3) Bunch(eggs=1, ham=3, spam=2)
>>> Bunch({'eggs':1, 'spam':2, 'ham':3}) Bunch(eggs=1, ham=3, spam=2)
>>> Bunch([('eggs',1), ('spam',2), ('ham',3)]) Bunch(eggs=1, ham=3, spam=2)
>>> Bunch(Bunch(eggs=1, spam=2), ham=3)
Bunch(eggs=1, ham=3, spam=2)
"""

def __init__(self, *args, **kwds):
"""Initializes a Bunch instance."""
self.update(*args, **kwds)

def __eq__(self, other):
"""x.__eq__(y) <==> x == y"""
return (isinstance(other, self.__class__)
and self.__dict__ == other.__dict__)

def __repr__(self):
"""x.__repr__() <==> repr(x)

If all attribute values in this bunch (and any nested
bunches) are reproducable with eval(repr(x)), then the Bunch
object is also reproducable for eval(repr(x)).
"""
return '%s(%s)' % (self.__class__.__name__,
', '.join('%s=%r' % (k, v)
for k, v
in self.__dict__.items()))

def update(self, *args, **kwds):
"""update([bunch|dict|seq], **kwds) -> None

Updates a Bunch object's attributes from (if provided)
either another Bunch object's attributes, a dictionary, or a
sequence of (name, value) pairs, then from the name=value
pairs in the keyword argument list.
"""
if len(args) == 1:
other, = args
if isinstance(other, self.__class__):
other = other.__dict__
try:
self.__dict__.update(other)
except TypeError:
raise TypeError('cannot update Bunch with %s' %
type(other).__name__)
elif len(args) != 0:
raise TypeError('expected 1 argument, got %i' %
len(args))
self.__dict__.update(kwds)

@classmethod
def frommapping(cls, mapping, getitems=None):
"""Create a Bunch object from a (possibly nested) mapping.

Note that, unlike the Bunch constructor, frommapping
recursively converts all mappings to bunches.

Example Usage: ... 'spam':{'ham':2, 'badger':3}})
Bunch(eggs=1, spam=Bunch(ham=2, badger=3))

Keyword Arguments:
mapping -- a mapping object
getitems -- a function that takes the mapping as a parameter
and returns an iterable of (key, value) pairs. If not
provided, the items method on the mapping object will be
used, or (key, mapping[key]) values will be generated if
the mapping object does not provide an items method.

Note that getitems will be applied recursively to each value
in the mapping. It should raise a TypeError if it is
applied to an object for which it cannot produce
(key, value) pairs.
"""
# determine which items() method to use
if getitems is None:
try:
getitems = type(mapping).items
except AttributeError:
getitems = _items
# build the Bunch from the mapping, recursively
result = cls()
for key, value in getitems(mapping):
try:
value = cls.frommapping(value, getitems=getitems)
except TypeError:
pass
setattr(result, key, value)
return result


def _items(mapping):
"""Produces (key, value) pairs from a mapping object.

Intended for use with mapping objects that do not supply an
items method.
"""
for key in mapping:
yield key, mapping[key]


Open Issues
===========
What should the type be named? Some suggestions include 'Bunch',
'Record' and 'Struct'.

Where should the type be placed? The current suggestion is the
collections module.


References
==========



...
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
 
F

Fredrik Lundh

Steven said:
Currently, if a Python programmer makes this design decision, they are
forced to declare a new class, and then build instances of this class.

FORCED to create a new class, and FORCED to create instances of
their own class instead of your class? without this, Python must surely
be unusable. no wonder nobody's ever managed to use it for anything.

</F>
 
P

Peter Otten

Steven said:
def __eq__(self, other):
"""x.__eq__(y) <==> x == y"""
return (isinstance(other, self.__class__)
and self.__dict__ == other.__dict__)

This results in an asymmetry:
True

With indirect use of __eq__() this puzzling behaviour disappears:
False

Whether this is intended, I don't know. If someone can enlighten me...

In any case I would prefer self.__class__ == other.__class__ over
isinstance().

Peter
 
N

Nick Craig-Wood

Steven Bethard said:
I promised I'd put together a PEP for a 'generic object' data type for
Python 2.5 that allows one to replace __getitem__ style access with
dotted-attribute style access (without declaring another class). Any
comments would be appreciated!

This sounds very much like this class which I've used to convert perl
programs to python

class Hash:
def __init__(self, **kwargs):
for key,value in kwargs.items():
setattr(self, key, value)
def __getitem__(self, x):
return getattr(self, x)
def __setitem__(self, x, y):
setattr(self, x, y)

My experience from using this is that whenever I used Hash(), I found
that later on in the refinement of the conversion it became its own
class.

So my take on the matter is that this encourages perl style
programming (just ram it in a hash, and write lots of functions acting
on it) rather than creating a specific class for the job which is dead
easy in python anyway and to which you can attach methods etc.

YMMV ;-)
 
N

Nick Coghlan

The proposed use cases sound more appropriate for a "named tuple" than any sort
of dictionary. (This may have been mentioned in previous discussions. I wasn't
keeping track of those, though)

Notice that I've used 'fromPairs' rather than 'fromMapping', since consistent
order matters for a tuple. Comparison semantics are inherited directly from
tuple, and don't care about names (they're only interested in values).

Also, it seems like there has to be a better way to do the "opposite of zip()"
in fromPairs(), but I sure as hell can't think of it.

Cheers,
Nick.
>>> a = named_tuple(['x', 'y'], (3, 8))
>>> a named_tuple(['x', 'y'], (3, 8))
>>> a.x 3
>>> a.y 8
>>> str(a) '(3, 8)'
>>> b = named_tuple.fromPairs(sorted({'x':3, 'y':8}.items()))
>>> b named_tuple(['x', 'y'], (3, 8))
>>> b.x 3
>>> b.y 8
>>> str(b) '(3, 8)'
>>> a == b True
>>>

And the code for the above:

class named_tuple(tuple):
def __new__(cls, names, *args):
self = tuple.__new__(cls, *args)
self._names = dict(zip(names, range(len(names))))
return self

@staticmethod
def fromPairs(items):
names = [x[0] for x in items]
values = [x[1] for x in items]
return named_tuple(names, values)

def __getattr__(self, attr):
if attr in self._names:
return self[self._names[attr]]
else:
return tuple.__getattr__(attr)

def __repr__(self):
return "named_tuple(%s, %s)" % (str(self.names()),
str(tuple.__repr__(self)))

def __str__(self):
return tuple.__repr__(self)

def names(self):
return sorted(self._names.keys(), key=self._names.__getitem__)
 
C

Carlos Ribeiro

The proposed use cases sound more appropriate for a "named tuple" than any sort
of dictionary. (This may have been mentioned in previous discussions. I wasn't
keeping track of those, though)

I agree with it. I was involved in that discussion, and got the the
point of listing a few desired features. As I am currently involved
into other project, I left it as it was, but I'll resume working as
soon as I can. I really think that both (generic objects and named
tuples) are slighly different but still very similar approaches to the
same problem, so some sort of "unification" of the efforts may be
interesting.

But there's something more important: while reading this document, and
some of the replies, it became clear that the main point is to
understand whether this proposed feature (in any possible
implementation) is in fact useful enough to deserve a place in the
standard library, and also if it represents a good coding style. With
some risk of being way too simplistic, it's something like this:

-- The people that is favorable to this implementation argue that one
should not be required to create a new class just to return a bunch of
results.

-- The people that is against it point out that, as soon as you start
returning multiple values, it's probable that you'll need to implement
a class anyway, so it's better off to do it sooner and forget generics
(or named tuples) entirely.

I see some parallels between this discussion and another one about
polymorphism. It's considered good Python practice to rely on
interfaces, or protocols, when designing the call signature of a
function or method. So if you receive an arbitrary object, you should
not check if it's a descendand of some abstract parent type; that's
just too rigid, and forces people to deal with complex multiple
inheritance stuff, and that's really not needed in Python. There is a
better way: just check if it exposes the desired interface, or
protocol. The adaptation framework (as described by PEP 246, and
extended by the PyProtocols package) is a nice implementation of this
concept.

A "generic" return object is just like this, but for a different
scenario: an adaptable return value, that doesn't enforce a class
signature when assigning the return value of a function or method.
It's perfectly symmetrical to the usage of interfaces on call. I think
that's a better, and much more powerful argument for the
implementation of a generic class, and also, for some supporting
machinery for it.

Extending this reasoning, generic return objects (implemented either
as dictionary based, or as named tuples) could be seen as "adaptable"
return values. Upon return, one could get such a temporary structure
and assign its members into another, more complex class, that would
accept fields of the same name, but possibly include other fields and
extra functionality. For example: a function that returns a complex
time structure does not need to return a "time class". It may return a
generic, or named tuple, which is in turn can be assigned to an object
that exposes a 'compatible' assignment interface. This assignment can
be done by a method of the generic clas itself, according either to
the names of the member of the generics, or the order of the tuple,
depending on the scenario.

For now, that's all that I have to contribute into this discussion.
There's also a lot of stuff in the c.l.py archives regarding named
tuples and also generics that is surely worth checking.

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)
 
S

Steven Bethard

Fredrik said:
FORCED to create a new class, and FORCED to create instances of
their own class instead of your class?

I don't see any way to produce the same behavior without *someone* (the
user or the stdlib) declaring a class. If you see a way to do this
without a class declared somewhere, please let me know how...
without this, Python must surely be unusable.

I definitely agree. Python without classes would be quite unpleasant to
use.
no wonder nobody's ever managed to use it for anything.

Yeah, I don't think anyone's managed to use Python without classes for
years, especially since things like int, str, etc. were converted to types.

Steve
 
S

Steven Bethard

Peter said:
Steven said:
def __eq__(self, other):
"""x.__eq__(y) <==> x == y"""
return (isinstance(other, self.__class__)
and self.__dict__ == other.__dict__)

This results in an asymmetry:
[snip]

Whether this is intended, I don't know. If someone can enlighten me...

In any case I would prefer self.__class__ == other.__class__ over
isinstance().

Unintended. I'll switch to
self.__class__ == other.__class__
or
type(self) == type(other)
Any preference?

Steve
 
S

Steven Bethard

Nick said:
My experience from using this is that whenever I used Hash(), I found
that later on in the refinement of the conversion it became its own
class.

This has also generally been my experience, though I'm not sure it's as
true for the XML DOM to Bunch translation. Did you use Hash() in the
same way for hierarchical data?
So my take on the matter is that this encourages perl style
programming (just ram it in a hash, and write lots of functions acting
on it) rather than creating a specific class for the job which is dead
easy in python anyway and to which you can attach methods etc.

You'll note that the (pre-)PEP explicitly notes that this object is
intended only for use when no methods are associated with the attributes:

"When no methods are to be associated with the attribute-value mappings,
declaring a new class can be overkill."

I do understand your point though -- people might not use Bunch in the
way it's intended. Of course, those same people can already do the same
thing with a dict instead (e.g. write a bunch of functions to handle a
certain type of dict). If someone wants to write Perl in Python,
there's not much we can really do to stop them...

Steve
 
S

Steven Bethard

Nick said:
The proposed use cases sound more appropriate for a "named tuple" than
any sort of dictionary. (This may have been mentioned in previous
discussions. I wasn't keeping track of those, though)

For the return values, yeah, a "named tuple" is probably at least as
appropriate. I'm not sure a "named tuple" is necessary for the
hierarchical data. (It wasn't for me in my DOM to Bunch example.)

I saw the "named tuple" thread slowly die, mainly because the most ideal
solution:

(name1:val1, name2:val2)

requires a change in Python's syntax, which is a tough route to go.

This PEP isn't attempting to solve the "named tuple" problem, though if
that thread picks back up again and produces a solution that also solves
the problems here, I'm more than willing to merge the two PEPs.

Note that I'm not trying to solve all the problems that a "named tuple"
could solve -- just the problem of converting __getattr__ syntax to
dotted-attribute syntax without the need to declare a class.
Notice that I've used 'fromPairs' rather than 'fromMapping', since
consistent order matters for a tuple. Comparison semantics are inherited
directly from tuple, and don't care about names (they're only interested
in values).

frommapping was intended to handle the recursive (hierarchical data)
case. (Perhaps it needs a better name to clarify this...) The shallow
conversion was handled in the Bunch constructor. I don't see that your
named_tuple type handles the recursive case, does it?
Also, it seems like there has to be a better way to do the "opposite of
zip()" in fromPairs(), but I sure as hell can't think of it.

I think zip(*) is usually the inverse of zip():

..>>> zip(*sorted({'x':3, 'y':8}.items()))
..[('x', 'y'), (3, 8)]

Steve
 
S

Scott David Daniels

Nick said:
class Hash:
def __init__(self, **kwargs):
for key,value in kwargs.items():
setattr(self, key, value)
def __getitem__(self, x):
return getattr(self, x)
def __setitem__(self, x, y):
setattr(self, x, y)

You can simplify this:
class Hash(object):
def __init__(self, **kwargs):
for key,value in kwargs.items():
setattr(self, key, value)
__getitem__ = getattr
__setitem__ = setattr

--Scott David Daniels
(e-mail address removed)
 
S

Steven Bethard

Scott said:
You can simplify this:
class Hash(object):
def __init__(self, **kwargs):
for key,value in kwargs.items():
setattr(self, key, value)
__getitem__ = getattr
__setitem__ = setattr

Oh, I guess I should mention that Hash actually does something Bunch is
not intended to -- it supports __getitem__ style access in addition to
dotted-attribute (__getattr__) style access. Bunch is intended only to
support dotted-attribute style access, though it does support the
one-way conversion of a mapping object to a Bunch.

Steve
 
?

=?iso-8859-1?Q?Fran=E7ois?= Pinard

[Scott David Daniels]
You can simplify this:
class Hash(object):
def __init__(self, **kwargs):
for key,value in kwargs.items():
setattr(self, key, value)

Might it be:

def __init__(self, **kwargs):
self.__dict__.update(kwargs)
 
P

Peter Otten

Steven said:
Peter said:
Steven said:
def __eq__(self, other):
"""x.__eq__(y) <==> x == y"""
return (isinstance(other, self.__class__)
and self.__dict__ == other.__dict__)

This results in an asymmetry:
[snip]

Whether this is intended, I don't know. If someone can enlighten me...
Unintended.

Oops, I meant CPython's rich comparison, not your __eq__() implementation.
I'll switch to
self.__class__ == other.__class__
or
type(self) == type(other)
Any preference?

Normally none of them. The former if hard pressed because all old-style
classes have the same type(). But it doesn't really matter here.

Peter
 
S

Steven Bethard

Peter said:
This results in an asymmetry:

True

With indirect use of __eq__() this puzzling behaviour disappears:

False

Whether this is intended, I don't know. If someone can enlighten me...

It does look like it's at least documented:

http://docs.python.org/ref/comparisons.html
"The operators <, >, ==, >=, <=, and != compare the values of two
objects. The objects need not have the same type. If both are numbers,
they are converted to a common type. Otherwise, objects of different
types always compare unequal, and are ordered consistently but arbitrarily."

This sounds like using "==" makes a guarantee that objects of different
types will compare unequal, while my __eq__ method (using isinstance)
did not make this guarantee.

I tried to check the C code to verify this (that different classes are
guaranteed to be unequal) but rich comparisons make that code pretty
complicated.

Steve
 
T

Terry Reedy

Since an instance of a subclass is an instance of a parent class, but not
vice versa, I believe you introduce here the assymetry you verify below.

Terry J. Reedy
 
S

Steven Bethard

Terry said:
Since an instance of a subclass is an instance of a parent class, but not
vice versa, I believe you introduce here the assymetry you verify below.

Yes, the asymmetry is due to isinstance.

I believe what Peter Otten was pointing out is that calling __eq__ is
not the same as using ==, presumably because the code for == checks the
types of the two objects and returns False if they're different before
the __eq__ code ever gets called.

Steve
 
N

Nick Craig-Wood

Steven Bethard said:
This has also generally been my experience, though I'm not sure it's as
true for the XML DOM to Bunch translation. Did you use Hash() in the
same way for hierarchical data?

Hash() got nested yes, but not in a general purpose structure like
your XML example.
You'll note that the (pre-)PEP explicitly notes that this object is
intended only for use when no methods are associated with the attributes:

"When no methods are to be associated with the attribute-value mappings,
declaring a new class can be overkill."

I do understand your point though -- people might not use Bunch in the
way it's intended. Of course, those same people can already do the same
thing with a dict instead (e.g. write a bunch of functions to handle a
certain type of dict). If someone wants to write Perl in Python,
there's not much we can really do to stop them...

No there isn't ;-)

The above does make it a lot more convenient though blob['foo'] is
rather difficult to type compared to blob.foo!
 
N

Nick Craig-Wood

Scott David Daniels said:
You can simplify this:
class Hash(object):
def __init__(self, **kwargs):
for key,value in kwargs.items():
setattr(self, key, value)
__getitem__ = getattr
__setitem__ = setattr

That doesn't work unfortunately...
.... def __init__(self, **kwargs):
.... for key,value in kwargs.items():
.... setattr(self, key, value)
.... __getitem__ = getattr
.... __setitem__ = setattr
....
Traceback (most recent call last):

I'm not exactly sure why though!
 
P

Peter Otten

Nick said:
Scott David Daniels said:
You can simplify this:
class Hash(object):
def __init__(self, **kwargs):
for key,value in kwargs.items():
setattr(self, key, value)
__getitem__ = getattr
__setitem__ = setattr

That doesn't work unfortunately...
... def __init__(self, **kwargs):
... for key,value in kwargs.items():
... setattr(self, key, value)
... __getitem__ = getattr
... __setitem__ = setattr
...
h=Hash(a=1,b=2)
h.a 1
h['a']
Traceback (most recent call last):

I'm not exactly sure why though!

Functions written in Python have a __get__ attribute while builtin functions
(implemented in C) don't. Python-coded functions therefore automatically
act as descriptors while builtins are just another attribute. See

http://mail.python.org/pipermail/python-list/2004-May/219424.html

for a strange example.

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,021
Latest member
AkilahJaim

Latest Threads

Top