Dataflow programming in Python

A

Anh Hai Trinh

Hello all,

I just want to share with you something that I've worked on recently.
It is a library which implements streams -- generalized iterators with
a pipelining mechanism and lazy-evaluation to enable data-flow
programming in Python.

The idea is to be able to take the output of a function that turn an
iterable into another iterable and plug that as the input of another
such function. While you can already do some of this using function
composition, this package provides an elegant notation for it by
overloading the '>>' operator.

To give a simple example of string processing, here we grep the lines
matching some regex, strip them and accumulate to a list:
import re
result = open('log').xreadlines() >> filter(re.compile('[Pp]attern').search) >> mapmethod('strip') >> list

This approach focuses the programming on processing streams of data,
step by step. A pipeline usually starts with a generator, or anything
iterable, then passes through a number of processors. Multiple streams
can be branched and combined. Finally, the output is fed to an
accumulator, which can be any function of one iterable argument.

Another advantage is that the values are lazily computed, i.e. only
when the accumulator needs to have it.

Homepage:
http://trinhhaianh.com/stream.py/
 
L

Lie Ryan

Anh said:
Hello all,

I just want to share with you something that I've worked on recently.
It is a library which implements streams -- generalized iterators with
a pipelining mechanism and lazy-evaluation to enable data-flow
programming in Python.

The idea is to be able to take the output of a function that turn an
iterable into another iterable and plug that as the input of another
such function. While you can already do some of this using function
composition, this package provides an elegant notation for it by
overloading the '>>' operator.

To give a simple example of string processing, here we grep the lines
matching some regex, strip them and accumulate to a list:
import re
result = open('log').xreadlines() >> filter(re.compile('[Pp]attern').search) >> mapmethod('strip') >> list

This approach focuses the programming on processing streams of data,
step by step. A pipeline usually starts with a generator, or anything
iterable, then passes through a number of processors. Multiple streams
can be branched and combined. Finally, the output is fed to an
accumulator, which can be any function of one iterable argument.

Another advantage is that the values are lazily computed, i.e. only
when the accumulator needs to have it.

Does it have any advantage to generator comprehension?

import re
mm = mapmethod('strip') # Is mapmethod something in the stdlib?
pat = re.compile('[Pp]attern')
result = (mm(line) for line in open('log') if pat.search(line))

which is also lazy
 
A

Anh Hai Trinh

Does it have any advantage to generator comprehension?
import re
mm = mapmethod('strip')        # Is mapmethod something in the stdlib?
pat = re.compile('[Pp]attern')
result = (mm(line) for line in open('log') if pat.search(line))

which is also lazy

Generator expression accomplishes the same thing, the main advantage I
think is that of better notation. You can clearly see each of the
processing steps which is usually lost in the syntaxes. This is the
simplest example here.

And strip is a method of string, so you would say `(line.strip() for
line in open('log') if pat.search(line))`.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top