Re: Object Database (ODBMS) for Python

Discussion in 'Python' started by Patrick K. O'Brien, Aug 29, 2003.

  1. "Pettersen, Bjorn S" <> writes:

    > > From: Patrick K. O'Brien [mailto:p]
    > >
    > > I'm working on an ODBMS written in Python, for Python, and was
    > > wondering if anyone was interested. In particular, I'd like to know
    > > what features would be useful, and what types of use cases people
    > > would have for a simple, but feature-rich object database.
    > >
    > > The system that I'm developing is PyPerSyst, which began as a simple
    > > persistence mechanism, but is now becoming a complete ODBMS. Some
    > > details are available here:
    > >
    > > http://www.orbtech.com/wiki/PyPerSyst
    > >
    > > The code is available in CVS on SF:
    > >
    > > http://sourceforge.net/projects/pypersyst/

    >
    > I'd be interested, but can't seem to find docs, demos or tests through
    > sf's web interface.. any pointers?


    First of all, let me just make a caveat that this is still in the
    early stages of development. By that I mean that many features are
    coded, and there are a good many unit tests, but I haven't got much in
    the way of docs and demos. PyPerSyst is being used in a commercial
    application, so it does work quite well. But in no way am I
    advertising it as a finished product. I'm just looking for feedback
    from early adopters and developers with an interest.

    The main pypersyst package is here:

    http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/pypersyst/pypersyst/pypersyst/

    The unit tests are here:

    http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/pypersyst/pypersyst/pypersyst/test/

    I'm working on a simple demo (twistedcred), but haven't committed it
    to cvs yet. In the mean time, here is what the database portion of an
    application looks like:

    import os

    from pypersyst.database import Database
    from pypersyst.engine.engine import Engine
    from pypersyst.storage.storage import Storage

    from twistedcred import data
    from twistedcred.schema import cred

    def database():
    """Return a PyPerSyst database."""
    dir = os.path.dirname(data.__file__)
    app = '.twistedcred'
    storage = Storage(dir, app, binary=False, python=True)
    engine = Engine(storage, cred.Root)
    database = Database(engine)
    return database

    ---

    And here is the schema for the twistedcred database:

    from pypersyst import root
    from pypersyst.entity.entity import Entity


    class Avatar(Entity):
    """Avatar class."""

    _attrSpec = [
    'realm',
    'user',
    'name',
    ]

    _altkeySpec = [
    ('user', 'realm', 'name',),
    ]

    def __init__(self, user, realm, name='Avatar'):
    """Create Avatar instance."""
    self._prep(locals())
    Entity.__init__(self)


    class Realm(Entity):
    """Realm class."""

    _attrSpec = [
    'name',
    ]

    _altkeySpec = [
    ('name',),
    ]

    def __init__(self, name):
    """Create Realm instance."""
    self._prep(locals())
    Entity.__init__(self)


    class User(Entity):
    """User class."""

    _attrSpec = [
    'name',
    'hashedPassword',
    ]

    _altkeySpec = [
    ('name',),
    ]

    def __init__(self, name, hashedPassword=None):
    """Create User instance."""
    self._prep(locals())
    Entity.__init__(self)


    class Root(root.Root):
    """Root class."""

    _EntityClasses = [
    Avatar,
    Realm,
    User,
    ]

    You can create the database using PyCrust, for example, and interact
    with it like this:

    >>> from twistedcred.database import database
    >>> db = database.database()
    >>> from pypersyst.entity import transaction as tx
    >>> t = tx.Create('User', name='Bob')
    >>> u1 = db.execute(t)
    >>> u1.name

    'Bob'
    >>> t = tx.Create('Realm', name='Whatever')
    >>> r1 = db.execute(t)
    >>> t = tx.Create('Avatar', name='MyAvatar', user=u1, realm=r1)
    >>> a1 = db.execute(t)
    >>> a1.user.name

    'Bob'
    >>> t = tx.Create('User', name='Bob')
    >>> u = db.execute(t)

    Traceback (most recent call last):
    File "<input>", line 1, in ?
    File "/home/pobrien/Code/pypersyst/database.py", line 27, in execute
    return self._engine.execute(transaction)
    File "/home/pobrien/Code/pypersyst/engine/engine.py", line 75, in execute
    return transaction.execute(self._root)
    File "/home/pobrien/Code/pypersyst/entity/transaction.py", line 31, in execute
    return self.EntityClass(**self.attrs)
    File "/home/pobrien/Code/twistedcred/schema/cred.py", line 65, in __init__
    Entity.__init__(self)
    File "/home/pobrien/Code/pypersyst/entity/entity.py", line 81, in __init__
    self.extent._insert(self)
    File "/home/pobrien/Code/pypersyst/entity/extent.py", line 213, in _insert
    self._validate(instance, instance._attrs())
    File "/home/pobrien/Code/pypersyst/entity/extent.py", line 325, in _validate
    self._validatekeys(instance, attrs)
    File "/home/pobrien/Code/pypersyst/entity/extent.py", line 335, in _validatekeys
    raise KeyError, msg
    KeyError: duplicate value ('Bob',) for altkey ('name',)
    >>> u1.links

    {('Avatar', 'user'): [<twistedcred.schema.cred.Avatar object at 0x88a6294>]}
    >>> r1.links

    {('Avatar', 'realm'): [<twistedcred.schema.cred.Avatar object at 0x88a6294>]}
    >>> db.root['Avatar'].match(name='Ava')

    []
    >>> db.root['Avatar'].search(name='Ava')

    [<twistedcred.schema.cred.Avatar object at 0x88a6294>]
    >>>


    I hope that helps demonstrate some of what it can do.

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
    Patrick K. O'Brien, Aug 29, 2003
    #1
    1. Advertising

  2. (Patrick K. O'Brien) writes:

    > (Patrick K. O'Brien) writes:
    >
    > > I hope that helps demonstrate some of what it can do.

    >
    > I forgot to show a cool feature:


    Here's another:

    >>> u1.name

    'Bob'
    >>> u1.name = 'Joe'

    Traceback (most recent call last):
    File "<input>", line 1, in ?
    File "/home/pobrien/Code/pypersyst/entity/entity.py", line 94, in __setattr__
    raise AttributeError, 'Modifications can only be made by transactions'
    AttributeError: Modifications can only be made by transactions

    So, let's use a transaction:

    >>> t = tx.Update(u1, name='Joe')
    >>> db.execute(t)

    <twistedcred.schema.cred.User object at 0x8884634>
    >>> u1.name

    'Joe'
    >>>


    Of course, nobody is perfect. So what happens when we send a bad
    transaction:

    >>> t = tx.Update(u1, foo='Joe')
    >>> db.execute(t)

    Traceback (most recent call last):
    File "<input>", line 1, in ?
    File "/home/pobrien/Code/pypersyst/database.py", line 27, in execute
    return self._engine.execute(transaction)
    File "/home/pobrien/Code/pypersyst/engine/engine.py", line 75, in execute
    return transaction.execute(self._root)
    File "/home/pobrien/Code/pypersyst/entity/transaction.py", line 76, in execute
    return root[self.classname]._update(self.instance, **self.attrs)
    File "/home/pobrien/Code/pypersyst/entity/extent.py", line 312, in _update
    self._validate(instance, combined)
    File "/home/pobrien/Code/pypersyst/entity/extent.py", line 324, in _validate
    instance._validate(attrs)
    File "/home/pobrien/Code/pypersyst/entity/entity.py", line 157, in _validate
    raise error.InvalidAttribute, '%r is not an attribute' % name
    InvalidAttribute: 'foo' is not an attribute
    >>>


    Can you tell I've been having some fun with this? ;-)

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
    Patrick K. O'Brien, Aug 29, 2003
    #2
    1. Advertising

  3. On Thu, 28 Aug 2003 20:26:41 -0500, Patrick K. O'Brien wrote:
    >>>> u1.name

    > 'Bob'
    >>>> u1.name = 'Joe'

    > Traceback (most recent call last):
    > File "<input>", line 1, in ?
    > File "/home/pobrien/Code/pypersyst/entity/entity.py", line 94, in
    > __setattr__
    > raise AttributeError, 'Modifications can only be made by transactions'
    > AttributeError: Modifications can only be made by transactions
    >
    > So, let's use a transaction:


    So why *isn't* it a transaction? Unless you have a good reason not to, I'd
    suggest automatically "coercing" that into a transaction instead of
    throwing an error.

    Give an indication in the docs about the performance issues if you like,
    but make the trivially easy case easy.

    (I'm only really entering my maturity (IMHO) as a software engineer, but
    one of my rules of thumb for developing software for other people to use
    is that the API can ***never*** be too easy. Doing something hard may be a
    little tricky but if you can make the easy case still work, you're way
    ahead. And Python is one ass-kicking language in that regard; it's one of
    the reasons I love it so much, the APIs can be made so easy to use they
    sometimes fade into complete transparency, like "u1.name = 'joe'". (I've
    been focusing on how to write APIs for others to use, esp. in Open Source
    though it applies equally to any team effort, that will be successful,
    rather then ignored.))
    Jeremy Bowers, Aug 29, 2003
    #3
  4. "Paul D. Fernhout" <> writes:

    > In Smalltalk, typically persistant objects may get stored and
    > retrieved as proxies, which is made possible by overriding the basic
    > storage and retrieval methods which are all exposed etc. Maybe Python
    > the language could do with more hooks for persistances as a PEP? I
    > know there are some lower level hooks for access, I'm just wondering
    > if they are enough for what you may want to do with PyPerSyst to make
    > an elegant API for persistant objects (perhaps better unique ID
    > support?), where you could then just go:
    >
    > import persistanceSystem import *
    > foo = MyClass()
    > PersistanceSystem_Wrap(foo)
    > # the following defaults to a transaction
    > foo.x = 10
    > # this makes a two change transaction
    > PersistanceSystem_StartTransaction()
    > foo.y = 20
    > foo.z = 20
    > foo.info = "I am a 3D Point"
    > PersistanceSystem_EndTransaction()
    > # what happens to foo on garbage collection? It persists!
    > ...
    > # Other code in another program
    > import persistanceSystem import *
    > foo = PersistanceSystem_Query(x=10, y=20, z=30)
    > print foo.info # prints --> "I am a 3D Point"
    >
    > That MyClass instance called foo and the related variable changes gets
    > stored in an ODBMS in transactions somewhere... Then I could do the
    > same for the Pointrel System somehow using the same simple hooks.


    Adding hooks to Python itself has been discussed (look for the
    persistence SIG), and not gone anywhere, as far as I know. And I'm
    not sure it would be so good to add to the language. One reason is
    that it would either only be able to capture very simple transactions,
    or would require quite a framework to handle all the requirements for
    real use cases. This is one area where it would be hard to please
    everyone, and I think the Python language has to appeal to a broad set
    of uses.

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
    Patrick K. O'Brien, Aug 29, 2003
    #4
  5. "Paul D. Fernhout" <> writes:

    > By the way, if you add support for the sorts of associative tuples
    > with the Pointrel System is based on, efficiently managed, maybe
    > I'll consider switching to using your system, if the API is simple
    > enough. :) Or, perhaps there is a way the Pointrel System can be
    > extended to support what you might want to do (in the sense of
    > transparent interaction with Python). In its use of the pickler, the
    > Pointrel System does not keep a list of previously pickled object,
    > so it can't transparently pickle objects that refer to previously
    > pickled object in the repository, so that is one way that the
    > Pointrel system can't do what your system does at all. (I'm not sure
    > how to do that without like PyPerSyst keeping lots of previously
    > pickled objects in memory at once for the Pickler to work
    > with). Also, in the Pointrel System repositories are sort of on the
    > fly made up of an arbitrary collection of archives where archives
    > may be added and removed dynamically, so I don't quite begin to see
    > to handle object persistance across a repository if subobjects are
    > stored in different archives which are dropped out of the
    > repository.


    Oy! There we go with the API thing again. ;-)

    PyPerSyst can manage anything that can be pickled. So it should be
    able to support your associative tuples. But to get the most bang for
    your buck, you'd want to subclass the Entity class that I recently
    added to PyPerSyst. I can't think of a reason it wouldn't work, but
    we'd have to give it a try and see.

    The root of a PyPerSyst database can be any Python object graph, with
    any kind of object referencing that Python supports. But transactions
    must be deterministic and independent, so they cannot contain
    references. If you saw my examples of the generic transactions you'll
    see that I passed in references. How can that be? The secret is the
    dereferencing that takes place in those transaction classes:

    """Generic transactions."""

    __author__ = "Patrick K. O'Brien <>"
    __cvsid__ = "$Id: transaction.py,v 1.8 2003/08/27 00:53:01 pobrien Exp $"
    __revision__ = "$Revision: 1.8 $"[11:-2]


    from pypersyst.entity.entity import Entity
    from pypersyst.transaction import Transaction


    class Create(Transaction):

    def __init__(self, classname, **attrs):
    Transaction.__init__(self)
    self.classname = classname
    self.attrs = attrs

    def __getstate__(self):
    self.refs = {}
    for name, value in self.attrs.items():
    if isinstance(value, Entity):
    self.refs[name] = (value.__class__.__name__, value.oid)
    self.attrs[name] = None
    return self.__dict__.copy()

    def execute(self, root):
    self.EntityClass = root._classes[self.classname]
    for name, (classname, oid) in self.refs.items():
    self.attrs[name] = root[classname][oid]
    return self.EntityClass(**self.attrs)


    class Delete(Transaction):

    def __init__(self, instance):
    Transaction.__init__(self)
    self.instance = instance

    def __getstate__(self):
    self.classname = self.instance.__class__.__name__
    self.oid = self.instance.oid
    d = self.__dict__.copy()
    del d['instance']
    return d

    def execute(self, root):
    return root[self.classname]._delete(self.oid)


    class Update(Transaction):

    def __init__(self, instance, **attrs):
    Transaction.__init__(self)
    self.instance = instance
    self.attrs = attrs

    def __getstate__(self):
    self.classname = self.instance.__class__.__name__
    self.oid = self.instance.oid
    self.refs = {}
    for name, value in self.attrs.items():
    if isinstance(value, Entity):
    self.refs[name] = (value.__class__.__name__, value.oid)
    self.attrs[name] = None
    d = self.__dict__.copy()
    del d['instance']
    return d

    def execute(self, root):
    self.instance = root[self.classname][self.oid]
    for name, (classname, oid) in self.refs.items():
    self.attrs[name] = root[classname][oid]
    return root[self.classname]._update(self.instance, **self.attrs)


    Try telling me that isn't one sweet API! ;-)

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
    Patrick K. O'Brien, Aug 29, 2003
    #5
  6. "Paul D. Fernhout" <> writes:

    > By the way, I like your overview of various related ODBMS projects here:
    > http://www.orbtech.com/wiki/PythonPersistence
    > (maybe http://munkware.sourceforge.net/ might go there now?)


    Boy that's old material. I forgot about that page. Look at that!

    "The main idea behind Persistence in Python (or any other language) is
    deceptively simple -- to have the ability to save all your code and
    all your data between executions of your Python program. The goal of
    Persistence for an object-oriented language, such as Python, is to be
    as transparent as possible. In other words, there should be little
    difference between a program whose objects are persistent and one
    whose objects are not."

    See, I used to dream the transparent persistence dream! ;-)

    > and your article at:
    > http://www-106.ibm.com/developerworks/library/l-pypers.html


    Just ignore the end where I say certain schema evolution changes can't
    be handled elegantly. I was wrong. (If I had to pay a dollar for
    every wrong thing I've published I'd be broke.)

    > And I'm just starting to poke around with your PyCrust to see if it
    > can't be used to support more Smalltalk like development of Python
    > apps.


    Cool. I design all my APIs to be maximally useable in PyCrust, btw.
    Just to get a little more mileage out of the API debate. ;-)


    > As a hint as to what I'd like to do :) I'm hoping to get a lot of
    > mileage out of code like:
    > newMethodSource = self.editText.GetValue()
    > print newMethodSource
    > self.expr = compile(newMethodSource, '<string>', 'exec')
    > exec self.expr in self.__class__.__dict__


    Something like that might work.

    > If PyPerSyst was as transparent to use as outlined above, maybe it
    > could then be used to store and retrieve hand built GUI instances
    > with their hand built methods (sort of like in a Squeakish Smalltalk
    > image with Morphic, but maybe better).


    That would be interesting...

    > So anyway, yours in friendly coopetition. :)


    Likewise.

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
    Patrick K. O'Brien, Aug 29, 2003
    #6
  7. (Patrick K. O'Brien) writes:

    > "Paul D. Fernhout" <> writes:
    >
    > > By the way, I like your overview of various related ODBMS projects here:
    > > http://www.orbtech.com/wiki/PythonPersistence
    > > (maybe http://munkware.sourceforge.net/ might go there now?)

    >
    > Boy that's old material. I forgot about that page. Look at that!


    I also forgot to mention that it is a wiki page, and you have my
    blessing to add whatever material you like (not that you needed my
    blessing, if you know what I mean).

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
    Patrick K. O'Brien, Aug 29, 2003
    #7
  8. Patrick K. O'Brien wrote:
    > Let me start by saying I'd love to cooperate, even if I am
    > competitive by nature. ;-)


    Nothing like a good controversy to get people paying attention. :)

    > This API looks rather verbose to me. I think mine would look like:
    >>>> t = tx.Create('User', name='Sir Galahad') user = db.execute(t)


    I think your notion of transactions is growing on me. :) I can see how
    you can generalize this to construct a transaction in a view of a
    database, querying on DB + T1 + T2 etc. while they are uncommitted and
    then commit them all (perhaps resolving multiuser multitransaction
    issues on commits). Kind of neat concept, I'll have to consider for some
    version of the Pointrel System.

    I think it is the special syntax of:
    tx.Update(u1, name='Joe')
    or:
    tx.Create('User', name='Sir Galahad')
    which I am recoiling some from.

    I think part of this comes from thinking as a transaction as something
    that encloses other changes, as opposed to something which is changed.
    Thus my discomfort at requesting services from a transaction other than
    commit or abandon. I'm not saying maybe I couldn't grow to love
    tx.Update(), just that it seems awkward at first compared to what I am
    used to, as well compared to making operations on a database itself
    after having told the database to begin a transaction. I'm also left
    wondering what the read value of the "name" field is when accessed
    directly as "u1.name" after doing the "wx.Update()" and before doing the
    "db.execute()". [By the way, pickly, picky, and I fall down on it too,
    but you use different capitalizations for those two functions.]

    So is it that in PyPerSyst there appears to be one way to access
    information (directly through the object using Python object attribute
    access dot syntax) [not sure about database queries?] and another way to
    change objects -- using tx.XYZ()? This mixing of mindsets could be
    confusing (especially within an object that changes its own values
    internally).

    Using tx.Update also becomes an issue of how to convert existing code to
    persistant code. Mind you, the Pointrel System can't do this
    transparently either, but it doesn't try to do it at all. The Pointrel
    System requires both looking up a value and storing it to use a
    different syntax. Is it just a matter of aesthetics about whether it is
    better to have the whole approach be unfamiliar or whether it is better
    to have only half of it be unfamiliar? Or is there something more here,
    some violation of programmer expectations? [See below.]

    > And unique ids (immutable, btw) are assigned by PyPerSyst:
    >>>> user.oid

    > 42


    Being competetive here :) I would love to know if you have a good
    approach for making them globally unique across all possible users of
    all PyPerSyst repositories for all time. The Pointrel has an approach to
    handle this (I don't say it will always work, or is efficient, but it
    tries). :) Feel free to raid that code (BSDish license, see
    license.txt), but that issue may have other deeper implications for your
    system.

    > And you can still access attributes directly, you just can't change
    > them outside of a transaction:
    >
    >
    >>>> user.name

    >
    > 'Sir Galahad'
    >
    > And the generic Update transaction is equally simple:
    >
    >
    >>>> t = tx.Update(user, name='Brian') db.execute(t) user.name

    > 'Brian'


    I know one rule of user interface design (not nexceesarily API of
    course) is that familiar elements should act familiar (i.e. a drop down
    list should not launch a dialog window on drop down) and that if you are
    going to experiment it should look very different so expectations are
    not violated.

    The issue here is in part that when you can reference "u1.name" and then
    "u1.name = 'Joe'" generates an exception (instead of automatically
    making an implict transaction), some user expectation of API symmetry
    may be violated...

    Also, on another issue, it seems like the persistant classes need to
    derive from a special class and define their persistant features in a
    special wy, i.e. class Realm(Entity): _attrSpec = [ 'name', ] etc.
    Again, this is going somewhat towards Python language integration yet
    not all the way.

    While I'd certainly agree your version is more concise than what I
    posted first (just an example of a system that does not attempt to use
    Python language features), later in the email (perhaps you'll get to it
    in your next reply) was the simpler:

    import persistanceSystem import *
    foo = MyClass()
    PersistanceSystem_Wrap(foo)
    # the following defaults to a transaction
    foo.x = 10
    # this makes a two change transaction
    PersistanceSystem_StartTransaction()
    foo.y = 20
    foo.z = 20
    foo.info = "I am a 3D Point"
    PersistanceSystem_EndTransaction()

    That approach does not violate any symmetry expectations by users -- you
    can assign and retrieve values just like always.

    >> Granted, the Pointrel System is essentially a single user single
    >> transaction system at the core. It (in theory, subject to bugs)
    >> supports atomicity (transactions), isolation (locking) and
    >> durability (logging&recovery). It only supports consistency by how
    >> applications use transactions as opposed to explicit constraints or
    >> rules maintained by the database, so one could argue it fails the
    >> ACID test there. (Although would any typical ODBMS pass consistency
    >> without extra code support? Does PyPerSyst have this as the
    >> database level?)

    >
    >
    > PyPerSyst can persist *any* picklable object graph.


    Are the graphs stand alone can they reference other previously persisted
    Python objects (not derived from "Root" or "Entity")?

    > But it also comes with an Entity class and a Root class (that
    > understands Entity classes) that provides additional functionality,
    > such as alternate indexes, referential integrity, instance
    > validation, etc.


    I guess I need to learn more about when these are better handled by the
    persistance system as opposed to the applications that use it.

    > I don't mind a friendly challenge. I'm just surprised that the bulk
    > of this thread is debating an API that has barely seen the light of
    > day, and that I consider to be drop-dead simple. I guess I need to
    > get a demo app created soon, just to put this to rest. Or at least
    > make sure we're all debating about the same thing. ;-)


    Good point.

    I think the issue is that with the other systems out there
    (MySQL, ZODB, etc.) it seems like a new system has to offer something
    really new (speed, footprint, simplicity, robustness, documentation :)
    etc.).

    Presumably a very transaparent API for persistance is still needed for
    an ODBMS which is Python friendly? (Does ZODB do any of this?) If I need
    to write any extra code at all for an object to be persistant, or derive
    from a specialized class, I could just derive from a class that knows
    how to use SQL to store pickled fields. Obviously, PyPerSyst may have
    many wonderful features (not having used it yet) which make it worth it
    to do a special derivation or write special code, but it just seems like
    it would have language transparency too. But, I haven't tried to do that
    in Python, so maybe it's not possible.

    > Right now we're debating an API that nobody on this thread has really
    > seen or used, other than me. The other thing I can say is that,
    > imo, the way you interact with persistent class instances is not the
    > same way you interact with regular class instances. Not if you value
    > the integrity and reliability of your data. And trying to make it
    > appear so is a disservice. I know everyone seems to think
    > transparent persistence is the holy grail, but I've come to think
    > otherwise.


    I think this is the core of the question of this part of the thread.
    You wrote "I've come to think otherwise". I'd be curious to hear more on
    any use cases or examples on why transaparency is not so compatible with
    reliability etc. I frankly don't know. I just don't see them being
    mutually exclusive, especially based on what I have read of Smalltalk
    systems that do persistance using proxies. But again, Smalltalk has
    "become:" which can essentially swap any arbitray instance and a proxy,
    thus making it easy to suddenly start using a proxy for a previously
    used instance and have all previous references point to the proxy. Maybe
    Python need's become? I could use it elsewhere. Maybe it has it and I
    never noticed?

    > Unfortunately, I don't have time to fully elaborate my position. But
    > you don't have to agree with me on this point. PyPerSyst is very
    > modular, and there implementations of transparent proxies in the
    > PyPerSyst CVS sandbox that some other developers on the team have
    > written. So it can be done.


    OK.

    Thanks for the reply.

    --Paul Fernhout
    http://www.pointrel.org



    -----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
    http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
    -----== Over 100,000 Newsgroups - 19 Different Servers! =-----
    Paul D. Fernhout, Aug 29, 2003
    #8
  9. "Paul D. Fernhout" <> writes:

    > Patrick K. O'Brien wrote:
    > > Let me start by saying I'd love to cooperate, even if I am
    > > competitive by nature. ;-)

    >
    > Nothing like a good controversy to get people paying attention. :)


    And never let the facts get in the way of a good story. ;-)

    > > This API looks rather verbose to me. I think mine would look like:
    > >>>> t = tx.Create('User', name='Sir Galahad') user = db.execute(t)

    >
    > I think your notion of transactions is growing on me. :) I can see how
    > you can generalize this to construct a transaction in a view of a
    > database, querying on DB + T1 + T2 etc. while they are uncommitted and
    > then commit them all (perhaps resolving multiuser multitransaction
    > issues on commits). Kind of neat concept, I'll have to consider for some
    > version of the Pointrel System.
    >
    > I think it is the special syntax of:
    > tx.Update(u1, name='Joe')
    > or:
    > tx.Create('User', name='Sir Galahad')
    > which I am recoiling some from.
    >
    > I think part of this comes from thinking as a transaction as something
    > that encloses other changes, as opposed to something which is changed.
    > Thus my discomfort at requesting services from a transaction other than
    > commit or abandon. I'm not saying maybe I couldn't grow to love
    > tx.Update(), just that it seems awkward at first compared to what I am
    > used to, as well compared to making operations on a database itself
    > after having told the database to begin a transaction.


    My use of the term "transaction" has certain subtleties that deserve
    clarification. First, a transaction is an instance of a Transaction
    class (or subclass). This instance must have an execute method that
    will get called by the database (after the transaction instance gets
    tested for picklability, and gets logged as a pickle). That execute
    method will be passed the root of the database. It is then free to do
    whatever it wants, as long as the sum total of what it does leaves the
    database in a consistent state. All transactions are executed
    sequentially. All changes made by a transaction must be
    deterministic, in case the transaction gets reapplied from the
    transaction log during a recovery, or restarting a database that
    wasn't dumped just prior to stopping.

    At this point, PyPerSyst does not have commit/rollback capability. So
    it is up to the transaction class instance to not leave the database
    in an inconsistent state. I'm looking into supporting
    commit/rollback, but the simple solution there would double RAM
    requirements, and other solutions are tricky, to say the least. So
    I'm still looking for something simple and elegant to fit in with the
    rest of the framework.

    The transactions I've shown, tx.Create, tx.Update, tx.Delete, are
    simply generic classes that come with PyPerSyst to make it easy to
    create, update and delete single instances of entities. Most real
    applications would define their own Transaction classes in addition to
    these.

    > I'm also left wondering what the read value of the "name" field is
    > when accessed directly as "u1.name" after doing the "wx.Update()"
    > and before doing the "db.execute()".


    t = tx.Update() merely creates a transaction instance, providing it
    with values that will be needed by its execute() method. (See the GOF
    Command pattern.) So nothing changes until the transaction is
    executed by the database, which happens when the transaction instance
    is passed to the database's execute method:

    db.execute(t)

    > [By the way, pickly, picky, and I fall down on it too, but you use
    > different capitalizations for those two functions.]


    There aren't two functions: tx.Update is a class, db.execute is a
    method. The capitalization is correct. ;-)

    > So is it that in PyPerSyst there appears to be one way to access
    > information (directly through the object using Python object
    > attribute access dot syntax) [not sure about database queries?] and
    > another way to change objects -- using tx.XYZ()? This mixing of
    > mindsets could be confusing (especially within an object that
    > changes its own values internally).


    You could define transactions that do queries as well. And some
    people prefer to do that. But I think for most reads it is easier to
    traverse the db.root object.

    If you use entities, and an instance of the Root class for your
    db.root, then your db.root is a dictionary-like object that gets you
    to the extent for each Entity subclass in your schema. The entity
    extent is an instance of an Entity class that manages the set of all
    instances of the class that it manages. The Extent class is how I'm
    able to provide Relational-like features.

    Inside of Entity instances, your code looks just like regular Python
    code. Its just application code that must go through transactions.
    Sure this mixing of mindsets is different than what people are used
    to, but we're talking about managing valuable data. If you simplify
    things too much, you lose the integrity of your data.

    > Using tx.Update also becomes an issue of how to convert existing
    > code to persistant code. Mind you, the Pointrel System can't do
    > this transparently either, but it doesn't try to do it at all. The
    > Pointrel System requires both looking up a value and storing it to
    > use a different syntax. Is it just a matter of aesthetics about
    > whether it is better to have the whole approach be unfamiliar or
    > whether it is better to have only half of it be unfamiliar? Or is
    > there something more here, some violation of programmer
    > expectations? [See below.]


    Existing code won't become magically persistent by adding PyPerSyst.

    > > And unique ids (immutable, btw) are assigned by PyPerSyst:
    > >>>> user.oid

    > > 42

    >
    > Being competetive here :) I would love to know if you have a good
    > approach for making them globally unique across all possible users
    > of all PyPerSyst repositories for all time. The Pointrel has an
    > approach to handle this (I don't say it will always work, or is
    > efficient, but it tries). :) Feel free to raid that code (BSDish
    > license, see license.txt), but that issue may have other deeper
    > implications for your system.


    Sorry, nothing special here. They are just incrementing ints unique
    within each extent. It would be easy to switch to a globally unique
    id if you have a good one, and as long as it was deterministic, and
    not random in any way.

    > > And you can still access attributes directly, you just can't
    > > change them outside of a transaction:
    > >
    > >>>> user.name

    > > 'Sir Galahad'
    > > And the generic Update transaction is equally simple:
    > >
    > >>>> t = tx.Update(user, name='Brian') db.execute(t) user.name

    > > 'Brian'

    >
    > I know one rule of user interface design (not nexceesarily API of
    > course) is that familiar elements should act familiar (i.e. a drop
    > down list should not launch a dialog window on drop down) and that
    > if you are going to experiment it should look very different so
    > expectations are not violated.
    >
    > The issue here is in part that when you can reference "u1.name" and
    > then "u1.name = 'Joe'" generates an exception (instead of
    > automatically making an implict transaction), some user expectation
    > of API symmetry may be violated...


    While this is feasible, the problem I have with this is that I think
    implicit transactions on this minute level of granularity are evil.
    That's the main reason I haven't implemented this, even though others
    have done this for PyPerSyst. I think too many people would abuse the
    implicit transaction feature, resulting in inconsistent and unreliable
    objects. I'm targeting serious, multi-user applications. But
    PyPerSyst is completely modular, so you can use it to implement all
    kinds of persistence systems. Most of the capabilities I've been
    discussing are new, and completely optional.

    > Also, on another issue, it seems like the persistant classes need to
    > derive from a special class and define their persistant features in
    > a special wy, i.e. class Realm(Entity): _attrSpec = [ 'name', ] etc.
    > Again, this is going somewhat towards Python language integration
    > yet not all the way.


    You don't *have* to use the Entity class that comes with PyPerSyst,
    but if you do, it lets you define the attributes, alternate keys, and
    fields for your subclass in as simple a form as I could think of.

    If you don't use the Entity class, then you have to figure out how to
    support instance integrity, alternate keys, referential integrity,
    bi-directional references, etc. So I think they provide some benefit.

    > While I'd certainly agree your version is more concise than what I
    > posted first (just an example of a system that does not attempt to use
    > Python language features), later in the email (perhaps you'll get to it
    > in your next reply) was the simpler:
    >
    > import persistanceSystem import *
    > foo = MyClass()
    > PersistanceSystem_Wrap(foo)
    > # the following defaults to a transaction
    > foo.x = 10
    > # this makes a two change transaction
    > PersistanceSystem_StartTransaction()
    > foo.y = 20
    > foo.z = 20
    > foo.info = "I am a 3D Point"
    > PersistanceSystem_EndTransaction()
    >
    > That approach does not violate any symmetry expectations by users --
    > you can assign and retrieve values just like always.


    If users expect symmetry it is because they are used to writing single
    process programs that do not share objects. Does anyone expect this
    kind of symmetry and transparency when writing a multi-threaded
    application? Why not? Granted, having start/end transaction
    semantics might change some of the rules. But even if we had those in
    PyPerSyst, I would probably only use them inside of Transaction
    classes, not embedded in application code where they are harder to
    find and test. Explicit transaction objects have many benefits.

    It's sort of similar to the notion of separating your application
    logic from your gui code. Sure its easier to just put a bunch of code
    in the event handler for a button. But is that the best way to code?
    In my mind, implicit transactions, or commit/rollback in application
    code, is like putting all your business logic in the event handlers
    for your gui widgets. I'm trying to keep people from writing crappy
    persistent applications.

    > > PyPerSyst can persist *any* picklable object graph.

    >
    > Are the graphs stand alone can they reference other previously
    > persisted Python objects (not derived from "Root" or "Entity")?


    A PyPerSyst database has a single entry point, named root, that can be
    any picklable Python object, and any objects reachable from that
    object. When the root gets pickled (for example when you do
    db.dump()), the whole thing gets pickled and all references are
    maintained. When the database starts, the entire thing gets
    unpickled. The entire thing is always in memory (real, or virtual).
    The snapshot and log are on disk. Each transaction is appended to the
    log. Did that answer your question?

    > > But it also comes with an Entity class and a Root class (that
    > > understands Entity classes) that provides additional functionality,
    > > such as alternate indexes, referential integrity, instance
    > > validation, etc.

    >
    > I guess I need to learn more about when these are better handled by
    > the persistance system as opposed to the applications that use it.


    In my mind, anything that is generic shouldn't have to be reinvented
    in application code. I feel like I've spent most of my career
    reinventing one database application after another. ;-)

    > Presumably a very transaparent API for persistance is still needed
    > for an ODBMS which is Python friendly? (Does ZODB do any of this?)


    I started writing a wrapper for ZODB and gave up about a year ago.

    > If I need to write any extra code at all for an object to be
    > persistant, or derive from a specialized class, I could just derive
    > from a class that knows how to use SQL to store pickled fields.


    You don't think there is a benefit to not having to use a database,
    not having to map anything to relational tables, not being limited to
    the relational model, and not having to do joins, etc? I don't care
    how good an O-R mapper is, not having to use one at all is better.

    > I think this is the core of the question of this part of the thread.
    > You wrote "I've come to think otherwise". I'd be curious to hear
    > more on any use cases or examples on why transaparency is not so
    > compatible with reliability etc.


    I just think implicit transparent transactions would lull users into a
    false sense of integrity and make them write sloppy applications that
    didn't actually maintain the integrity of their objects when used in a
    multi-user environment. I think the kind of applications I want to
    use PyPerSyst for demand that it be difficult for application
    programmers to do the wrong thing with regards to the integrity of the
    persisted data. I think having transactions as explicit objects
    provides more control over the integrity of the database. If users
    want transparency, it can be done, using PyPerSyst, it just isn't the
    focus of my current efforts. And I don't think explicit transactions
    are that much of a burden. Transaction code is a small percentage of
    application code, compared to all the interface code you have to
    write. And you could easily write wrappers for transactions that make
    them less burdensome.

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
    Patrick K. O'Brien, Aug 29, 2003
    #9
  10. Patrick K. O'Brien

    Jeremy Jones Guest

    Jeremy Jones, Aug 30, 2003
    #10
  11. Jeremy Jones <> writes:

    > On Fri, 29 Aug 2003 12:11:51 -0400
    > "Paul D. Fernhout" <> wrote:
    >
    > > By the way, I like your overview of various related ODBMS projects
    > > here: http://www.orbtech.com/wiki/PythonPersistence (maybe
    > > http://munkware.sourceforge.net/ might go there now?)

    >
    > I wouldn't be offended if Munkware found its way to the
    > PythonPersistence page ;-)


    Me either. It's a wiki, so please feel free to add anything you like.

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
    Patrick K. O'Brien, Aug 30, 2003
    #11
  12. Patrick-

    I think based on this and your other posts I now understand better where
    you are coming from. Thanks for the explainations and comments.

    To try to restate (and better justify) what I now think I see as your
    point of view on this transactional API issue, let me present this analogy.

    When one builds a modern GUI application that supports complete
    multicommand "Undo" and "Redo" such as built on the Macintosh MacApp
    framework
    http://developer.apple.com/documentation/mac/MacAppProgGuide/MacAppProgGuide-44.html
    or any other similar approach, the stategy generally is to have a stack
    of Command (subclassed) objects, where each such object supports "do",
    "undo" and "redo". We use a general purpose system like this for example
    in our Garden Simulator software (and other Delphi applications --
    hopefully someday to be ported to Python).
    http://www.gardenwithinsight.com/progmanlong.htm
    Rather than mess with the application's data domain directly, every user
    action in such an undoable application, from selecting an object in a
    drawing program, to making a change with a slider, to dragging an
    object, to deleting an item, to even setting multiple options in a
    dialog (if each change isn't itself a command), creates a command (i.e.
    a transaction), which changes the domain and then continues to modify
    the domain while it is active (say at the top of the command stack while
    a mouse is dragged) and then completely finishes modifying the domain
    and is left on the stack when all the related GUI activity is done.
    While the command (transaction) itself may fiddle with the domain, no
    button press, or mouse click, or drop down selection ever messes
    directly with the data domain (or what might in another context be sort
    of like the business logic and business data). By constraining changes
    to this approach, one can readily do, undo, and redo a stack of commands
    to one's heart's content -- and subject to available memory :) or other
    limits.

    Your transaction notion in PyPerSyst, now that I understand it better,
    seems to have something of this GUI command system flavor. And that
    emphasis is perhaps why you do not feel it is inconsistent to have one
    way to read values and another way to change values, since changing
    values is something in this model requiring significant forethought as
    an application level transaction. Implicitely, what you are getting at
    here is a development methodology where all data domain changes go
    through transactions (commands), and the transactions have been
    consciously considered and designed (rather than just resulting from
    randomly poking around in the data domain). And that is perhaps why you
    are so against the implicit transactions -- they violate this
    development methodology of being explicit about what chunks of changes
    are a transaction (as an atomic unit). The same sort of issues come up
    whan people try to avoid COmmand type framekworks, thinking it is easier
    to just fire off changes directly to the data domain from GUI events
    (and it is easier -- just not undoable or consistent). Adhering to a
    transactional (command-al?) development methodology makes it very
    straightforward to understand how the application is structured and what
    it can or cannot do (i.e just look in the transaction (or command) class
    hierarchy). And so, from your perspective, it is quite reasonable to
    have a lot of work go into crafting transaction objects (or subclassing
    them from related ones etc.) in the same way that it is expected that
    GUI applications with undo/redo capabilities will have a lot of effort
    put into their analogous "Command" class hierarchy.

    To step back a minute, in general, a transactional development
    methodology is in a way a step up from the random flounderings of how
    many programs work, with code that changes the data domain potentially
    sprinkled throughout the application code based, rather than cleanly
    specified in a set of Command or Transaction subclasses. So you are sort
    of proposing generally a step up in people's understanding and practice
    of how to deal with applications and persistent data.

    Does this sort of capture an essential part of what you are getting at
    here with your PyPerSyst application architecture development strategy?
    If so, I like it. ;-)

    All the best.

    --Paul Fernhout
    http://www.pointrel.org

    P.S. The Pointrel System supports abandoning in process transactions by
    sotrign all the data it changes long the way, and being able to roll
    back to this state. But, with an object database as you have outlined
    it, I think this would naturally be a lot more complicated -- although
    perhaps you could adopt the "undo" and "redo" aspect of Commands
    (including stashing the old objects somewhere in case of a redo...)

    Patrick K. O'Brien wrote:
    [Lots of good stuff snipped, and thanks for the interesting dialogue. :)]
    > If users expect symmetry it is because they are used to writing single
    > process programs that do not share objects. Does anyone expect this
    > kind of symmetry and transparency when writing a multi-threaded
    > application? Why not? Granted, having start/end transaction
    > semantics might change some of the rules. But even if we had those in
    > PyPerSyst, I would probably only use them inside of Transaction
    > classes, not embedded in application code where they are harder to
    > find and test. Explicit transaction objects have many benefits.
    >
    > It's sort of similar to the notion of separating your application
    > logic from your gui code. Sure its easier to just put a bunch of code
    > in the event handler for a button. But is that the best way to code?
    > In my mind, implicit transactions, or commit/rollback in application
    > code, is like putting all your business logic in the event handlers
    > for your gui widgets. I'm trying to keep people from writing crappy
    > persistent applications.


    >>I think this is the core of the question of this part of the thread.
    >>You wrote "I've come to think otherwise". I'd be curious to hear
    >>more on any use cases or examples on why transaparency is not so
    >>compatible with reliability etc.

    >
    > I just think implicit transparent transactions would lull users into a
    > false sense of integrity and make them write sloppy applications that
    > didn't actually maintain the integrity of their objects when used in a
    > multi-user environment. I think the kind of applications I want to
    > use PyPerSyst for demand that it be difficult for application
    > programmers to do the wrong thing with regards to the integrity of the
    > persisted data. I think having transactions as explicit objects
    > provides more control over the integrity of the database. If users
    > want transparency, it can be done, using PyPerSyst, it just isn't the
    > focus of my current efforts. And I don't think explicit transactions
    > are that much of a burden. Transaction code is a small percentage of
    > application code, compared to all the interface code you have to
    > write. And you could easily write wrappers for transactions that make
    > them less burdensome.




    -----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
    http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
    -----== Over 100,000 Newsgroups - 19 Different Servers! =-----
    Paul D. Fernhout, Aug 31, 2003
    #12
  13. "Paul D. Fernhout" <> writes:

    > Patrick-
    >
    > I think based on this and your other posts I now understand better
    > where you are coming from. Thanks for the explainations and
    > comments.
    >
    > To try to restate (and better justify) what I now think I see as
    > your point of view on this transactional API issue, let me present
    > this analogy.
    >

    [Analogy snipped]
    >


    Your analogy is absolutely correct. A PyPerSyst transaction
    Class/instance follows the Command pattern. In fact, they were
    initial called commands, and later renamed to transactions to
    emphasize the fact that the actions taking place inside the command
    needed to be atomic and leave the database in a consistent state (half
    of the ACID properties needed to be a reliable database). PyPerSyst
    itself provides the isolation (the engine provides this) and
    durability (storage provides this).

    > To step back a minute, in general, a transactional development
    > methodology is in a way a step up from the random flounderings of
    > how many programs work, with code that changes the data domain
    > potentially sprinkled throughout the application code based, rather
    > than cleanly specified in a set of Command or Transaction
    > subclasses. So you are sort of proposing generally a step up in
    > people's understanding and practice of how to deal with applications
    > and persistent data.


    Yes! And I love how you describe this - random flounderings.

    > Does this sort of capture an essential part of what you are getting
    > at here with your PyPerSyst application architecture development
    > strategy? If so, I like it. ;-)


    Absolutely. And I'm glad you like it. Now if only I had a perfect
    model for guaranteeing the integrity and consistency of the state of
    Python class instances, we'd be set. Well, actually, I've got some
    ways to do that, I just don't completely like them.

    > P.S. The Pointrel System supports abandoning in process transactions
    > by sotrign all the data it changes long the way, and being able to
    > roll back to this state. But, with an object database as you have
    > outlined it, I think this would naturally be a lot more complicated
    > -- although perhaps you could adopt the "undo" and "redo" aspect of
    > Commands (including stashing the old objects somewhere in case of a
    > redo...)


    I'm not sure we'd be able to easily support undo and redo in a
    multi-user environment. For a single-user application, this should be
    easy. And I'll probably eventually build in mechanisms to support
    this. But multi-user adds a few complexities.

    Commit and rollback would be nice for complex transactions that change
    a lot of state where your ability to guarantee the success of those
    changes, or test the success as a pre-condition, is difficult. The
    easiest approach, which Prevayler is implementing, is to always have
    two copies of the database in memory - one to try out a transaction
    and see if it completes, the other to receive only transactions that
    successfully completed on the tester. But that approach doubles the
    memory requirements, and we already have high memory requirements
    since we keep the entire object graph in memory. But the beauty of
    this approach is that it is simple and foolproof.

    Another approach would be for each transaction to keep mementos of
    objects that get changed, so they can be restored if an exception is
    raised at some point during the transaction. But I think that will be
    too complex and too much to expect from transaction writers, unless I
    can come up with support for that in PyPerSyst that makes it easier.

    Anyway, just some more thoughts. Good talking to you.

    --
    Patrick K. O'Brien
    Orbtech http://www.orbtech.com/web/pobrien
    -----------------------------------------------
    "Your source for Python programming expertise."
    -----------------------------------------------
    Patrick K. O'Brien, Sep 1, 2003
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul Chapman

    Idempotent ODBMS iterators

    Paul Chapman, Feb 16, 2005, in forum: Java
    Replies:
    0
    Views:
    430
    Paul Chapman
    Feb 16, 2005
  2. Patrick K. O'Brien

    Object Database (ODBMS) for Python

    Patrick K. O'Brien, Aug 28, 2003, in forum: Python
    Replies:
    0
    Views:
    1,048
    Patrick K. O'Brien
    Aug 28, 2003
  3. Niki Spahiev

    Re: Object Database (ODBMS) for Python

    Niki Spahiev, Sep 1, 2003, in forum: Python
    Replies:
    2
    Views:
    308
    Paul D. Fernhout
    Sep 1, 2003
  4. GinTon
    Replies:
    2
    Views:
    524
    Ilias Lazaridis
    Sep 19, 2006
  5. Fici

    ODBMS

    Fici, Jun 12, 2007, in forum: Java
    Replies:
    4
    Views:
    529
Loading...

Share This Page