python in parallel for pattern discovery in genome data

B

BalyanM

Hi,

I am new to python.I am using it on redhat linux 9.
I am interested to run python on a sun machine(SunE420R,os=solaris)
with 4 cpu's for a pattern discovery/search program on biological
sequence(genomic sequence).I want to write the python code so that it
utilizes all the 4 cpu's.Moreover do i need some other libraries.
Kindly advice.

Thanks

Sincerely,

Manoj

--

****************************************************************
Manoj Balyan
Scientist- Bioinformatics
Centre for Cellular and Molecular Biology(CCMB)
Uppal Road,
Hyderabad-500007
Andhra Pradesh,INDIA
TEl:+91-040-27192772,27160222,27192777
FAX:+91-040-27160591,27160311
EMAIL:[email protected],
manoj_balyan@hotmail
WWW:http://www.ccmb.res.in
***************************************************************
If you weep for the setting sun,you will miss the stars:Tagore
***************************************************************
 
S

Stephan Diehl

BalyanM said:
Hi,

I am new to python.I am using it on redhat linux 9.
I am interested to run python on a sun machine(SunE420R,os=solaris)
with 4 cpu's for a pattern discovery/search program on biological
sequence(genomic sequence).I want to write the python code so that it
utilizes all the 4 cpu's.Moreover do i need some other libraries.
Kindly advice.

Thanks

Sincerely,

Manoj

Just a normal python interpreter won't help any, because of the GIL (Global
Interpreter Lock).
Just from your description, the following module might be something for you:
http://poshmodule.sourceforge.net/
It allows object sharing between differnet python processes.
As I have never worked with it, I can't say, if it's any good.

Stephan
 
A

Andrew Dalke

BalyanM:
I am interested to run python on a sun machine(SunE420R,os=solaris)
with 4 cpu's for a pattern discovery/search program on biological
sequence(genomic sequence).I want to write the python code so that it
utilizes all the 4 cpu's.

*oomphh*

There's a lot of details buried in your lines.

It looks like you will be writing your own pattern matching code.
Why? There are plenty of tools for that already. A quick web
search finds http://genome.imb-jena.de/seqanal.html and many
of those tools are freely available.

Okay, suppose you do have the tool or library for it. Do you
want to do high throughput searches? Then you can just break
your N jobs into N/4 parts, one per machine. Easiest way in
Python is to run 4 Python programs, each with a little server going
(see the xmlrpc module for an example) and have your code
call them (see Aahz's excellent example of master/slave
programming using threads). Other options for the communications
are Twisted and Pyro.

You will not be able to do this with one Python process because
Python has what's called the "global interpreter lock" that
prevents core Python from effectively using multiple processors.
You can write a C extension which does the search and gives
up the lock, but I you seem to want to do this in raw Python.

(The suggestion to look at POSH won't work - it has some
Intel-specific assembly instructions in the C extension.)

Depending on the type of pattern search, you instead can assign
1/4 of the genome to each process, with overlap if needed. This
will speed up a single search, which is good for interactivity.

These work for a single "user" of the code. Might you have
many people trying to do pattern searches? If so, you may
need some way to throttle how many searches are done per
machine. For in-house use this likely isn't a problem - besides,
you should get your code working first.

There are other approaches. You could use shared memory or
CORBA for the communications, or PVM or MPI. Still, given
your experience, you should:
1) get your algorithm working on one machine
2) get it working as a client/server using XML-RPC (see the
SimpleXMLRPCServer and xmlrpclib modules),
3) get your client to work with multiple servers,
using multiple threads in the client

(It's a bit of my experience too - I really should try Pyro
for this sort of work. Well, I need a break so maybe I'll
try it out tonight ;)

There are a lot of skills to learn before it all works, so don't
get too discouraged too quickly.

Andrew
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top