Storing the state of script between steps

D

Denis Usanov

Good evening.

First of all I would like to apologize for the name of topic. I really didn't know how to name it more correctly.

I mostly develop on Python some automation scripts such as deployment (it'snot about fabric and may be not ssh at all), testing something, etc. In this terms I have such abstraction as "step".

Some code:

class IStep(object):
def run():
raise NotImplementedError()

And the certain steps:

class DeployStep: ...
class ValidateUSBFlash: ...
class SwitchVersionS: ...

Where I implement run method.
Then I use some "builder" class which can add steps to internal list and has a method "start" running all step one by one.

And I like this. It's loosely coupled system. It works fine in simple cases.. But sometimes some steps have to use the results from previous steps. Andnow I have problems. Before now I had internal dict in "builder" and namedit as "world" and passed it to each run() methods of steps. It worked but I disliked this.

How would you solve this problem and how would you do it? I understant thatit's more architecture specific question, not a python one.

I bet I wouldn't have asked this if I had worked with some of functional programming languages.
 
C

Cameron Simpson

I mostly develop on Python some automation scripts such as deployment (it's not about fabric and may be not ssh at all), testing something, etc. In this terms I have such abstraction as "step".

Some code:

class IStep(object):
def run():
raise NotImplementedError()

And the certain steps:

class DeployStep: ...
class ValidateUSBFlash: ...
class SwitchVersionS: ...

Where I implement run method.
Then I use some "builder" class which can add steps to internal list and has a method "start" running all step one by one.

And I like this. It's loosely coupled system. It works fine in simple cases. But sometimes some steps have to use the results from previous steps. And now I have problems. Before now I had internal dict in "builder" and named it as "world" and passed it to each run() methods of steps. It worked but I disliked this.

Can you qualify exactly what you dislike about it?

I have a similar system I'm working on which chains operational
steps, and each step can queue multiple following steps.
It is still somewhat alpha.

I think it has pretty much the same state issue that you describe:
you need to keep state around, but passing it to each step feels
clunky: you have this state parameter that you need to maintain and
pass around all the time.

My wishlist for state is twofold; I'd like it to be more implicit,
for example have the state be implicit, such as in the program
scope, and wouldn't it be better to be able to ignore it when you
don't care about the state?

My solution is threefold at present:

First up, the core algorithm/framework always passes the state
variable around. So every "step" function looks somewhat like this:

def step(self, argument, state):

where "state" is an object instead of a dict; otherwise esssentially
that same as your dict based approach. "argument" it the item to
be worked on in this step; my framework looks like a UNIX shell
pipeline, where arguments are passed down the pipeline from step
to step.

Second, steps which do not care about the state are written like this:

def step(self, argument)

and installed via a wrapper:

def step_full(self, argument, state):
return step(self, argument)

to make it easier to write the simple case.

In your setup, I'd be writing each Step class as a subclass of a
generic step class that incorporates the wrapper:

class GenericStep:

def step(self, argument, state):
return self.stateless_step(argument)

and then classes which do not care about the state would look like
this:

class SimpleOperation(GenericStep):

def stateless_step(self, argument):
... do stuff with argument ...

and classes which do operate on the state look like this:

class StepWith SideEffects(GenericStep):

def step(self, argument, state):
... do stuff with argument and modify state ...

From the outside you call .step(....) with the full argument list.
But on the inside your define the method which is the simplest
mapping to what the step does.

That leaves you freer to choose the style for each step function,
using the less cluttered form when you don't care about the state.

Third, in my scheme the return from step() is a sequence of
(new_argument, new_state) tuples because each step can fire multiple
following steps. Depending on the operations in the step, new_argument
might just be the original argument, and new_state will usually be
the original state. But sometimes new_state will be a shallow copy
of the original, with one or more parts deep copied. This is to
accomodate the implicit branching of state you might imagine in a
pipeline: each of the following steps might want its own independent
state from that point onward.

Again, there's some wrapper logic in the GenericStep class to handle the return
value, much as with the stpe() and stateless_step() calls: a simple step might
just think in terms of the argument and ignore the state entirely:

def one_to_one_step(self, argument):
... do stuff with argument ...
new_argument = ... blah ...
return new_argument

def one_to_many_step_with_state(self, argument, state):
return [ (self.one_to_one_step(argument), state) ]

You can see the "full" step just calls the simple step and then
repackages the result for passing down the pipeline. This lets you
write the "one_to_one_step" method simply, without clutter.

Hopefully this will give you some ideas for keeping the simple steps simple
while still accomodating the complex cases with state.
I bet I wouldn't have asked this if I had worked with some of functional programming languages.

Possibly, but you still need state.

Cheers,
 
S

Steven D'Aprano

I mostly develop on Python some automation scripts such as deployment
(it's not about fabric and may be not ssh at all), testing something,
etc. In this terms I have such abstraction as "step".

Some code:

class IStep(object):
def run():
raise NotImplementedError()

And the certain steps:

class DeployStep: ...
class ValidateUSBFlash: ...
class SwitchVersionS: ...

Where I implement run method.

Why are these classes? Unless you have multiple instances, or are using
inheritance, the use of classes is a Java-ism and a waste of time.

http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html

Since I see no reason to (say) have multiple ValidateUSBFlash instances,
with different state, at the same time, and no reason to have (say)
SwitchVersionS inherit from (say) DeployStep, I suggest a better approach
is to use functions. No more ValidateUSBFlash.run method calls, just call
function validate_usb_flash.
Then I use some "builder" class which can add steps to internal list and
has a method "start" running all step one by one.

That becomes:

steps = [deploy_step, validate_usb_flash, switch_version_s, ...]

for step in steps:
step()

And I like this. It's loosely coupled system. It works fine in simple
cases. But sometimes some steps have to use the results from previous
steps. And now I have problems. Before now I had internal dict in
"builder" and named it as "world" and passed it to each run() methods of
steps. It worked but I disliked this.

How would you solve this problem and how would you do it? I understant
that it's more architecture specific question, not a python one.


I'd avoid over-generalisation, and just write a script. Let each step
function take whatever arguments it needs, and return whatever arguments
it requires, then write it like this:

deploy_step()
result = validate_usb_flash()
if result == 1:
validate_other_usb_flash()
else:
validate_something_else(result)

I mean, how many steps do you have? Twenty? A hundred? It's not that hard
to maintain a 100 or 200 line script. And are you likely to have so many
different versions of it that you need extra abstraction layers?

Of course, I might be misinterpreting your post. Perhaps you do have so
many steps, and so many different types of deployment, that you do need a
more heavily abstracted system. In that case, I think you need to
describe your system in more detail to get any useful answers. But don't
over-generalise, and don't become an Architecture Astronaut :)
 
F

F.R.

Good evening.

First of all I would like to apologize for the name of topic. I really didn't know how to name it more correctly.

I mostly develop on Python some automation scripts such as deployment (it's not about fabric and may be not ssh at all), testing something, etc. In this terms I have such abstraction as "step".

Some code:

class IStep(object):
def run():
raise NotImplementedError()

And the certain steps:

class DeployStep: ...
class ValidateUSBFlash: ...
class SwitchVersionS: ...

Where I implement run method.
Then I use some "builder" class which can add steps to internal list and has a method "start" running all step one by one.

And I like this. It's loosely coupled system. It works fine in simple cases. But sometimes some steps have to use the results from previous steps. And now I have problems. Before now I had internal dict in "builder" and named it as "world" and passed it to each run() methods of steps. It worked but I disliked this.

How would you solve this problem and how would you do it? I understant that it's more architecture specific question, not a python one.

I bet I wouldn't have asked this if I had worked with some of functional programming languages.

A few months ago I posted a summary of a data transformation framework
inviting commentary.
(https://mail.python.org/pipermail/python-list/2013-August/654226.html).
It didn't meet with much interest and I forgot about it. Now that
someone is looking for something along the line as I understand his
post, there might be some interest after all.


My module is called TX. A base class "Transformer" handles the flow of
data. A custom Transformer defines a method "T.transform (self)" which
transforms input to output. Transformers are callable, taking input as
an argument and returning the output:

transformed_input = T (some_input)

A Transformer object retains both input and output after a run. If it is
called a second time without input, it simply returns its output,
without needlessly repeating its job:

same_transformed_input = T ()

Because of this IO design, Transformers nest:

csv_text = CSV_Maker (Data_Line_Picker (Line_Splitter (File_Reader
('1st-quarter-2013.statement'))))

A better alternative to nesting is to build a Chain:

Statement_To_CSV = TX.Chain (File_Reader, Line_Splitter,
Data_Line_Picker, CSV_Maker)

A Chain is functionally equivalent to a Transformer:

csv_text = Statement_To_CSV ('1st-quarter-2013.statement')

Since Transformers retain their data, developing or debugging a Chain is
a relatively simple affair. If a Chain fails, the method "show ()"
displays the innards of its elements one by one. The failing element is
the first one that has no output. It also displays such messages as the
method "transform (self)" would have logged. (self.log (message)). While
fixing the failing element, the element preceding keeps providing the
original input for testing, until the repair is done.

Since a Chain is functionally equivalent to a Transformer, a Chain can
be placed into a containing Chain alongside Transformers:

Table_Maker = TX.Chain (TX.File_Reader (), TX.Line_Splitter (),
TX.Table_Maker ())
Table_Writer = TX.Chain (Table_Maker, Table_Formatter,
TX.File_Writer (file_name = '/home/xy/office/addresses-4214'))
DB_Writer = TX.Chain (Table_Maker, DB_Formatter, TX.DB_Writer
(table_name = 'contacts'))

Better:

Splitter = TX.Splitter (TX.Table_Writer (), TX.DB_Writer ())
Table_Handler = TX.Chain (Table_Maker, Splitter)

Table_Handler ('home/xy/Downloads/report-4214') # Writes to both
file and to DB


If a structure builds up too complex to remember, the method "show_tree
()" would display something like this:

Chain
Chain[0] - Chain
Chain[0][0] - Quotes
Chain[0][1] - Adjust Splits
Chain[1] - Splitter
Chain[1][0] - Chain
Chain[1][0][0] - High_Low_Range
Chain[1][0][1] - Splitter
Chain[1][0][1][0] - Trailing_High_Low_Ratio
Chain[1][0][1][1] - Standard Deviations
Chain[1][1] - Chain
Chain[1][1][0] - Trailing Trend
Chain[1][1][1] - Pegs

Following a run, all intermediary formats are accessible:

standard_deviations = C[1][0][1][1]()

TM = TX.Table_Maker ()
TM (standard_deviations).write ()

0 | 1 | 2 |

116.49 | 132.93 | 11.53 |
115.15 | 128.70 | 11.34 |
1.01 | 0.00 | 0.01 |

A Transformer takes parameters, either at construction time or by means
of the method "T.set (key = parameter)". Whereas a File Reader doesn't
get payload passed and may take a file name as input argument, as a
convenient alternative, a File Writer does take payload and the file
name must be set by keyword:

File_Writer = TX.File_Writer (file_name = '/tmp/memos-with-dates-1')
File_Writer (input) # Writes file
File_Writer.set ('/tmp/memos-with-dates-2')
File_Writer () # Writes the same thing to the second file



That's about it. I am very pleased with the design. I developed it to
wrap a growing jungle of existing modules and classes having no
interconnectability and no common input-output specifications. The
improvement in terms of work time and resource management is enormous. I
would share the base class and a few custom classes, reasonably
autonomous to not require surgical extraction from the jungle.

Writing a custom class requires no more than defining private keywords,
if any, and writing the method "transform (self)", or "process_record
(self, record)" if the input is a list of records, which it often is.
The modular design encourages to have a Transformer do just one simple
thing, easy to write and easy to debug. Complexity comes from assembling
simple Transformers in a great variety of configurations.


Frederic
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,072
Latest member
trafficcone

Latest Threads

Top