Building a HPC data assimilation system using Python?


Matthew Francis

I have a prototype data assimilation code ( an ionospheric nowcast/forecastmodel driven by GPS data ) that is written in IDL (interactive data language) which is a horrible language choice for scaling the application up to large datasets as IDL is serial and slow (interpreted).

I am embarking on a project to convert this prototype into an operational parallel HPC code. In the past I've used C++ for this kind of project and amcomfortable using MPI. On the other hand, I've recently started using python and appreciate the flexibility and speed of development using python compared with C++. I have read that there is a trend to use python as the highlevel 'glue' for these kind of large number crunching projects, so it would seem appropriate to go down that path. There are a number of C++ and FORTRAN(!) libraries I'd need to incorporate that handle things such as the processing of raw GPS data and computing ionospheric models, so I'd need to beable to make the appropriate interface for these into python.

If anyone uses python is this way, I'd appreciate any tips, hints, things to be careful about and in general any war stories you can relate that you wish you'd heard before making some mistake.

Here are the things I have investigated that it looks like I'd probably need to use:

* scipy/numpy/matplotlib
* Cython (or pyrex?) for speeding up any bottlenecks that occur in python code (as opposed to C++/FORTRAN libraries)
* MPI for Python (mpi4py). Does this play nice with Cython?
* Something to interface python with other language libraries. ctypes, swig, boost? Which would be best for this application?
* Profiling. profile/cprofile are straightforward to use, but how do they cope with a parallel (mpi4py) code?
* If a C++ library call has its own MPI calls, does that work smoothly withmpi4py operating in the python part of the code?

Sorry if some of this is a little basic, I'm trying to get up to speed on this a quick as I can.

Thanks in advance!



Carlos Nepomuceno

Hi Matthew! I'm on a similar quest!

I'm still learning the basics of Python so I may not be a good source of information.

I'm reading a lot of stuff about how to use Python for the parallelization of code and data and found BSP[1] to be very interesting and perhaps worth the time to learn it! ;)



Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question