Refactoring transactional support within Puppet (long)

L

Luke Kanies

Hi all,

This is one of those "complicated" problems I mentioned a bit ago.
I'm looking for help simplifying some of Puppet's internals,
specifically the parts related to transactions and idempotency.
Transactional support is important for the standard reasons plus good
reporting, and idempotency allows me to apply a configuration and
only change the bits that are out of compliance, so the same
configuration can be applied, say, every half hour and it will just
fix anything that's somehow gotten broken, which is both safer and
faster.

Puppet's idempotency and transactions are currently closely related.
Puppet is organized around a lot of high-level types like 'user',
'group', 'package', and 'service'. Each of these types has
parameters that affect how they function (like 'loglevel' and
'recurse') and parameters that actually modify the system (like 'uid'
and 'gid' on files). The complicated ones are those that modify the
system.

Currently, each of these parameters is defined in a separate class,
and that class has to define a 'retrieve' and a 'sync' method.
'retrieve' sets '@is' to be the current value (e.g., for UID on a
file, it would do a stat and get the UID from the stat), and 'sync'
basically takes the desired configuration (in '@should') and modifies
the system so it matches. So, the types have hooks for idempotency.

It's the transactions that actually provide idempotency. I create a
transaction and pass it a list of type instances (e.g., a bunch of
files, services, whatever). It steps through each instance, checks
to see if the instance is out of sync, and syncs any parameters that
are out of sync. This gives me great reporting, because the
transaction can always log exactly what it's doing, and rollback is
pretty easy because I can just switch '@is' and '@should' and sync
again for most cases.

There are two significant problems with this scenario: First, it's
pretty annoying to have to maintain @is and @should separately in the
parameters. It would be much, much better if 'retrieve' and 'sync'
could just work like getter and setter methods, returning or
accepting a value. Second, this system makes it pretty annoying for
someone else to use the library. I want to get to the point where
anyone can use the Puppet library to make changes to the system, but
for that to happen, the library interface needs to be simple. I want
something like this:

sudoers = Puppet::Type.create:)type => :file, :path => "/etc/
sudoers")
sudoers.uid = 0 unless sudoers.uid == 0

Instead, you pretty much have to use a transaction to do any work
right now:

sudoers = Puppet::Type.create:)type => :file, :path => "/etc/
sudoers")
trans = Puppet::Transaction.new(sudoers)
trans.evaluate

It's totally unclear what's going on there, and it's not exactly easy
to use. I'd also like to make it simple for people to use
transactions if they want, but I want it to be a good bit simpler:

report = Puppet.transaction do
sudoers = Puppet::Type.create:)type => :file, :path => "/etc/
sudoers")
sudoers.uid = 0 unless sudoers.uid == 0
end

That way people could still get the logging and rollback that always
come with transactions, but only if they wanted them and in a way
that they can see what's happening. By the way, the objects often
live much longer than the transactions -- I have a long-running
daemon that instantiates the objects once and applies them all in a
new transaction every half hour.

I think all of these problems (getting rid of '@is' and '@should',
simplifying transactional use, and simplifying use of the objects)
can have a single solution, but I don't know what it is. It could be
something like objects somehow knowing whether they're running under
a transaction, but I don't know how I'd do that without making
transactions either a singleton (which I can't afford, because I know
sometimes I'll need subtransactions) or very complex (e.g., creating
a 'transaction' instance variable for every object, and then nil'ing
that variable at the end of the transaction).

Anyone have any ideas? Any recommendations for what you'd want this
library interface to look like, either using transactions or not?

If you want to look at the code more closely, you can get it from svn
at http://reductivelabs.com/svn/puppet/trunk, or in Trac at https://
reductivelabs.com/cgi-bin/puppet.cgi/browser/trunk . The transaction
class is relatively straightforward, and most of the types are simple
enough to understand, although the Type baseclass is a bit long and
messy for my tastes.
 
A

ara.t.howard

It's totally unclear what's going on there, and it's not exactly easy to use.
I'd also like to make it simple for people to use transactions if they want,
but I want it to be a good bit simpler:

report = Puppet.transaction do
sudoers = Puppet::Type.create:)type => :file, :path => "/etc/sudoers")
sudoers.uid = 0 unless sudoers.uid == 0
end

That way people could still get the logging and rollback that always come
with transactions, but only if they wanted them and in a way that they can
see what's happening. By the way, the objects often live much longer than
the transactions -- I have a long-running daemon that instantiates the
objects once and applies them all in a new transaction every half hour.

I think all of these problems (getting rid of '@is' and '@should',
simplifying transactional use, and simplifying use of the objects) can have
a single solution, but I don't know what it is. It could be something like
objects somehow knowing whether they're running under a transaction, but I
don't know how I'd do that without making transactions either a singleton
(which I can't afford, because I know sometimes I'll need subtransactions)
or very complex (e.g., creating a 'transaction' instance variable for every
object, and then nil'ing that variable at the end of the transaction).

Anyone have any ideas? Any recommendations for what you'd want this library
interface to look like, either using transactions or not?

1) make transactions re-entrant AND singleton

2) make __all__ operations take place in a transaction

eg

def initialize
...
@transaction_mutex = Mutex.new
@in_transaction = false
...
end

def transaction
@transaction_mutex.sychronize do
if @in_transaction
yield
else
it = @in_transaction
begin
@in_transaction = true
yield
ensure
@in_transaction = it
end
end
end
end

alias_method "t", "transaction"


...

def foo() t{ @foo = 42 } end
def bar() t{ @bar = 42 } end
def foobar() t{ foo and bar } end


the state may have to global/module-level for this to work, but you get the
idea. probably the easiest way is

module Transaction
# all transaction (global) state and methods
end

class C
include Transaction
end

etc.


2 cts.

-a
 
L

Luke Kanies

1) make transactions re-entrant AND singleton

2) make __all__ operations take place in a transaction

eg

def initialize
...
@transaction_mutex = Mutex.new
@in_transaction = false
...
end

def transaction
@transaction_mutex.sychronize do
if @in_transaction
yield
else
it = @in_transaction
begin
@in_transaction = true
yield
ensure
@in_transaction = it
end
end
end
end

alias_method "t", "transaction"


...

def foo() t{ @foo = 42 } end
def bar() t{ @bar = 42 } end
def foobar() t{ foo and bar } end

Hmm. If I do this, then at the very least I want to organize things
so that the developer doesn't need to know about transactions; either
I use intermediate methods that handle the fact that it's in a
transaction, or I do some method-renaming so that the direct methods
get replaced with methods that go through a transaction.

That's somewhat immaterial, though, I guess. You're basic point is
that there should only ever be one transaction at a time, right?
There'd be no concept of sub-transactions, but anything that got done
in the middle of a transaction would automatically be included in the
transaction.

I'm not sure about always working within a transaction; I'm not sure
it's reasonable to assume that every user of Puppet's library will
want to use transactions, but I'm not sure it harms anything to do
so. I'd probably want things set up so that if there is a
transaction, all work is done within that transaction, and if there
is not one, then no transaction is used.

So that gives me an idea of how to handle transactions essentially
transparently (with some modification necessary to make it
transparent to the developer, also), but I still need to figure out
how to transparently handle the three-phase collect, compare, commit
operations. It seems that some controlling process would need to do
that, and currently my transaction is the controlling process, but
with your recommendation the transaction moves completely into the
background (which is probably where it belongs, largely).

There's also a lot of logging, error handling, and event handling
that currently take place in the transaction, so I would need to
translate that into these behind-the-scenes transactions, but I
wouldn't guess that would be too difficult.

Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top