SoC project: Python-Haskell bridge - request for feedback

  • Thread starter MichaÅ‚ Janeczek
  • Start date
M

Michał Janeczek

Hi,

I am a student interested in participating in this year's SoC.
At http://tsk.ch.uj.edu.pl/~janeczek/socapp.html (and also below
in this email) you can find a draft of my project proposal.

I'd like to ask you to comment on it, especially the deliverables
part. Are you interested in such a project, and if yes, what features
would be most important to you? Is anything missing, or should
something get more priority or attention?

Regards,
Michal


Python-Haskell bridge
=====================

Description
-----------

This project will seek to provide a comprehensive, high level (and thus
easy to use) binding between Haskell and Python programming languages.
This will allow using libraries of either side from each language.


Benefits for Python
-------------------

* Robust, high assurance components

It might be beneficial to implement safety-critical components
in a strongly, statically typed language, using Python to keep
them together. Cryptography or authentication modules can be
an example.

* Performance improvements for speed-critical code

Haskell compiled to native code is typically an order of magnitude
faster than Python. Aside from that, advanced language features
(such as multicore parallel runtime, very lightweight threads
and software transactional memory) further serve in improving the
performance. Haskell could become a safe, high level alternative
to commonly used C extensions.

* Access to sophisticated libraries

While its set of libraries is not as comprehensive as that of
Python, Haskell can still offer some well tested, efficient
libraries. Examples might be rich parser combinator libraries
(like Parsec) and persistent, functional data structures.
QuickCheck testing library could also be used to drive analysis
of Python code.


Benefits for Haskell
--------------------

The project would benefit Haskell by providing it with access to
an impressive suite of libraries. It also has a potential to help
Haskell adoption, by mitigating risk of using Haskell in a project.


Deliverables
------------

* A low level library to access Python objects from Haskell

* A set of low level functions to convert built-in data types
between Haskell and Python (strings, numbers, lists,
dictionaries, functions, generators etc.)

* A higher level library allowing easy (transparent) access to
Python functions from Haskell, and wrapping Haskell functions
for Python to access

* A way to easily derive conversion functions for user-defined
data types/objects. Functions derived in such a way should
work well with both low level and high level access libraries

* Documentation and a set of examples for all of above


Optional goals
--------------

These are of lower priority, and might require a fair amount of work.
I would like to implement most of them, if technically feasible. If
they don't fit into Summer of Code timeframe, I am planning to finish
afterwards.

* A Python module for accessing functions from Haskell modules without
manual wrapping (such wrapping should be already easy thanks to the
high level library). It'd be accomplished through GHC api - if it
allows it. The Haskell side of the high level library will already
support such mode of operation

* Extend and refactor the code, to make it support other similar
dynamic languages. This is a lot of work, and definitely out of
the scope of Summer of Code project, but some design decisions
may be influenced by this.


Related projects
----------------

They (and quite possibly some others) will be referenced for ideas.

* MissingPy

Provides a one way, low level binding to Python. Some of the code
can be possibly reused, especially data conversion functions. It
doesn't seem to export all features, in particular function
callbacks are not supported

* HaXR

XML-RPC binding for Haskell. It could provide inspiration for
reconciling Haskell and Python type systems, resulting in a
friendly interface

* rocaml

A binding between Ruby and OCaml
 
P

Paul Rubin

A few thoughts. The envisioned Python-Haskell bridge would have two
directions: 1) calling Haskell code from Python; 2) calling Python
code from Haskell. The proposal spends more space on #1 but I think
#1 is both more difficult and less interesting. By "Haskell" I
presume you mean GHC. I think that the GHC runtime doesn't embed very
well, despite the example on the Python wiki
(http://wiki.python.org/moin/PythonVsHaskell near the bottom). This
is especially if you want to use the concurrency stuff. The GHC
runtime wants to trap the timer interrupt and do select based i/o
among other things. And I'm not sure that wanting to call large
Haskell components under a Python top-level is that compelling: why
not write the top level in Haskell too? The idea of making the
critical components statically typed for safety is less convincing if
the top level is untyped.

There is something to be said for porting some functional data
structures to Python, but I think that's mostly limited to the simpler
ones like Data.Map (which I've wanted several times). And I think
this porting is most easily done by simply reimplementing those
structures in a Python-friendly style using Python's C API. The type
signatures (among other things) on the Haskell libraries for this
stuff tend to be a little too weird for Python; for example,
Data.Map.lookup runs in an arbitrary monad which controls the error
handling for a missing key. The Python version should be like a dict,
where you give it a key and a default value to return if the key is
not found. Plus, do you really want Python pointers into Haskell data
structures to be enrolled with both systems' garbage collectors?

(Actually (sidetrack that I just thought of), a Cyclone API would be
pretty useful for writing safe Python extensions. Cyclone is a
type-safe C dialect, see cyclone.thelanguage.org ).

The Haskell to Python direction sounds more useful, given Haskell's
weirdness and difficulty. Python is easy to learn and well-packaged
for embedding, so it's a natural extension language for Haskell
applications. If you wrote a database in Haskell, you could let
people write stored procedures in Python if they didn't want to deal
with Haskell's learning curve. Haskell would call Python through its
"safe" FFI (which runs the extension in a separate OS thread) and not
have to worry much about the Python side doing IO or whatever. Of
course this would also let Python call back into the Haskell system,
perhaps passing Python values as Data.Dynamic, or else using something
like COM interface specifications.

Anyway I'm babbling now, I may think about this more later.
 
M

Michał Janeczek

Thanks for finding time to reply!

A few thoughts. The envisioned Python-Haskell bridge would have two
directions: 1) calling Haskell code from Python; 2) calling Python
code from Haskell. The proposal spends more space on #1 but I think
#1 is both more difficult and less interesting. By "Haskell" I
presume you mean GHC. I think that the GHC runtime doesn't embed very
well, despite the example on the Python wiki
(http://wiki.python.org/moin/PythonVsHaskell near the bottom). This
is especially if you want to use the concurrency stuff. The GHC
runtime wants to trap the timer interrupt and do select based i/o
among other things. And I'm not sure that wanting to call large
Haskell components under a Python top-level is that compelling: why
not write the top level in Haskell too? The idea of making the
critical components statically typed for safety is less convincing if
the top level is untyped.

I wasn't aware of the runtime issues, these can be things to watch out
for. However, the type of embedding that I imagined would be mostly
pure functions, since Python can deal with IO rather well. It'd also
be applicable in situations where we want to add some functionality to
to existing, large Python project, where the complete rewrite
would be infeasible.
There is something to be said for porting some functional data
structures to Python, but I think that's mostly limited to the simpler
ones like Data.Map (which I've wanted several times). And I think
this porting is most easily done by simply reimplementing those
structures in a Python-friendly style using Python's C API. The type
signatures (among other things) on the Haskell libraries for this
stuff tend to be a little too weird for Python; for example,
Data.Map.lookup runs in an arbitrary monad which controls the error
handling for a missing key. The Python version should be like a dict,
where you give it a key and a default value to return if the key is
not found. Plus, do you really want Python pointers into Haskell data
structures to be enrolled with both systems' garbage collectors?

I didn't mention this in this first draft, but I don't know (yet)
how to support those "fancy" types. The plan for now is to export
monomorphic functions only. As for GC, I think having the two systems
involved is unavoidable if I want to have first class functions on
both sides.
The Haskell to Python direction sounds more useful, given Haskell's
weirdness and difficulty. Python is easy to learn and well-packaged
for embedding, so it's a natural extension language for Haskell
applications. If you wrote a database in Haskell, you could let
people write stored procedures in Python if they didn't want to deal
with Haskell's learning curve. Haskell would call Python through its
"safe" FFI (which runs the extension in a separate OS thread) and not
have to worry much about the Python side doing IO or whatever. Of
course this would also let Python call back into the Haskell system,
perhaps passing Python values as Data.Dynamic, or else using something
like COM interface specifications.

That is one of the use cases I have missed in the first draft.
Thanks for the idea!
Anyway I'm babbling now, I may think about this more later.
By all means, please do go on :) This has helped a lot :)

Regards,
Michal
 
M

malkarouri

A few thoughts. The envisioned Python-Haskell bridge would have two
directions: 1) calling Haskell code from Python; 2) calling Python
code from Haskell. The proposal spends more space on #1 but I think
#1 is both more difficult and less interesting.

FWIW, I find #1 more interesting for me personally.
As a monad-challenged person, I find it much easier to develop
components using pure functional programming in a language like
Haskell and do all my I/O in Python than having it the other way
round.
Of course, if it is more difficult then I wouldn't expect it from a
SoC project, but that's that.

Muhammad Alkarouri
 
P

Paul Rubin

malkarouri said:
FWIW, I find #1 more interesting for me personally.
As a monad-challenged person, I find it much easier to develop
components using pure functional programming in a language like
Haskell and do all my I/O in Python than having it the other way
round.

Haskell i/o is not that complicated, and monads are used in pure
computation as well as for i/o. Without getting technical, monads are
the bicycles that Haskell uses to get values from one place to another
in the right order. They are scary at first, but once you get a
little practice riding them, the understanding stays with you and
makes perfect sense.
 
P

Paul Rubin

Michaâ Janeczek said:
I wasn't aware of the runtime issues, these can be things to watch out
for. However, the type of embedding that I imagined would be mostly
pure functions, since Python can deal with IO rather well. It'd also
be applicable in situations where we want to add some functionality to
to existing, large Python project, where the complete rewrite
would be infeasible.

Of course I can't say what functions someone else would want to use,
but I'm not seeing very convincing applications of this myself. There
aren't that many prewritten Haskell libraries (especially monomorphic
ones) that I could see using that way. And if I'm skilled enough with
Haskell to write the functions myself, I'd probably rather embed
Python in a Haskell app than the other way around. Haskell's i/o has
gotten a lot better recently (Data.ByteString) though there is
important stuff still in progress (bytestring unicode). For other
pure functions (crypto, math, etc.) there are generally libraries
written in C already interfaced to Python (numarray, etc.), maybe
through Swig. The case for Haskell isn't that compelling.
I didn't mention this in this first draft, but I don't know (yet)
how to support those "fancy" types. The plan for now is to export
monomorphic functions only.

This probably loses most of the interesting stuff: parser combinators,
functional data structures like zippers, etc. Unless you mean to
use templates to make specialized versions?
As for GC, I think having the two systems involved is unavoidable if
I want to have first class functions on both sides.

This just seems worse and worse the more I think about it. Remember
that GHC uses a copying gc so there is no finalization and therefore
no way to notify python that a reference has been freed. And you'd
probably have to put Haskell pointers into Python's heap objects so
that the Haskell gc wouldn't have to scan the whole Python heap.
Also, any low level GHC gc stuff (not sure if there would be any)
might have to be redone for GHC 6.10(?) which is getting a new
(parallel) gc. Maybe I'm not thinking of this the right way though, I
haven't looked at the low level ghc code.

Keep in mind also that Python style tends to not use complex data
structures and fancy sharing of Haskell structures may not be in the
Python style. Python uses extensible lists and mutable dictionaries
for just about everything, relying on the speed of the underlying C
functions to do list operations very fast (C-coded O(n) operation
faster than interpreted O(log n) operation for realistic n). So maybe
this type of sharing won't be so useful.

It may be simplest to just marshal data structures across a message
passing interface rather than really try to share values between the
two systems. For fancy functional structures, from a Python
programmer's point of view, it is probably most useful to just pick a
few important ones and code them in C from scratch for direct use in
Python. Hedgehog Lisp (google for it) has a nice C implementation of
functional maps that could probably port easily to the Python C API,
and I've been sort of wanting to do that. It would be great if
you beat me to it.
By all means, please do go on :) This has helped a lot :)

One thing I highly recommend is that you join the #haskell channel on
irc.freenode.net. There are a lot of real experts there (I'm just
newbie) who can advise you better than I can, and you can talk to them
in real time.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top