python in parallel for pattern discovery in genome data

Discussion in 'Python' started by BalyanM, Jul 30, 2003.

  1. BalyanM

    BalyanM Guest

    Hi,

    I am new to python.I am using it on redhat linux 9.
    I am interested to run python on a sun machine(SunE420R,os=solaris)
    with 4 cpu's for a pattern discovery/search program on biological
    sequence(genomic sequence).I want to write the python code so that it
    utilizes all the 4 cpu's.Moreover do i need some other libraries.
    Kindly advice.

    Thanks

    Sincerely,

    Manoj

    --

    ****************************************************************
    Manoj Balyan
    Scientist- Bioinformatics
    Centre for Cellular and Molecular Biology(CCMB)
    Uppal Road,
    Hyderabad-500007
    Andhra Pradesh,INDIA
    TEl:+91-040-27192772,27160222,27192777
    FAX:+91-040-27160591,27160311
    EMAIL:,
    manoj_balyan@hotmail
    WWW:http://www.ccmb.res.in
    ***************************************************************
    If you weep for the setting sun,you will miss the stars:Tagore
    ***************************************************************
     
    BalyanM, Jul 30, 2003
    #1
    1. Advertising

  2. BalyanM wrote:

    > Hi,
    >
    > I am new to python.I am using it on redhat linux 9.
    > I am interested to run python on a sun machine(SunE420R,os=solaris)
    > with 4 cpu's for a pattern discovery/search program on biological
    > sequence(genomic sequence).I want to write the python code so that it
    > utilizes all the 4 cpu's.Moreover do i need some other libraries.
    > Kindly advice.
    >
    > Thanks
    >
    > Sincerely,
    >
    > Manoj
    >


    Just a normal python interpreter won't help any, because of the GIL (Global
    Interpreter Lock).
    Just from your description, the following module might be something for you:
    http://poshmodule.sourceforge.net/
    It allows object sharing between differnet python processes.
    As I have never worked with it, I can't say, if it's any good.

    Stephan
     
    Stephan Diehl, Jul 30, 2003
    #2
    1. Advertising

  3. BalyanM

    Andrew Dalke Guest

    BalyanM:
    > I am interested to run python on a sun machine(SunE420R,os=solaris)
    > with 4 cpu's for a pattern discovery/search program on biological
    > sequence(genomic sequence).I want to write the python code so that it
    > utilizes all the 4 cpu's.


    *oomphh*

    There's a lot of details buried in your lines.

    It looks like you will be writing your own pattern matching code.
    Why? There are plenty of tools for that already. A quick web
    search finds http://genome.imb-jena.de/seqanal.html and many
    of those tools are freely available.

    Okay, suppose you do have the tool or library for it. Do you
    want to do high throughput searches? Then you can just break
    your N jobs into N/4 parts, one per machine. Easiest way in
    Python is to run 4 Python programs, each with a little server going
    (see the xmlrpc module for an example) and have your code
    call them (see Aahz's excellent example of master/slave
    programming using threads). Other options for the communications
    are Twisted and Pyro.

    You will not be able to do this with one Python process because
    Python has what's called the "global interpreter lock" that
    prevents core Python from effectively using multiple processors.
    You can write a C extension which does the search and gives
    up the lock, but I you seem to want to do this in raw Python.

    (The suggestion to look at POSH won't work - it has some
    Intel-specific assembly instructions in the C extension.)

    Depending on the type of pattern search, you instead can assign
    1/4 of the genome to each process, with overlap if needed. This
    will speed up a single search, which is good for interactivity.

    These work for a single "user" of the code. Might you have
    many people trying to do pattern searches? If so, you may
    need some way to throttle how many searches are done per
    machine. For in-house use this likely isn't a problem - besides,
    you should get your code working first.

    There are other approaches. You could use shared memory or
    CORBA for the communications, or PVM or MPI. Still, given
    your experience, you should:
    1) get your algorithm working on one machine
    2) get it working as a client/server using XML-RPC (see the
    SimpleXMLRPCServer and xmlrpclib modules),
    3) get your client to work with multiple servers,
    using multiple threads in the client

    (It's a bit of my experience too - I really should try Pyro
    for this sort of work. Well, I need a break so maybe I'll
    try it out tonight ;)

    There are a lot of skills to learn before it all works, so don't
    get too discouraged too quickly.

    Andrew
     
    Andrew Dalke, Jul 31, 2003
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris

    dynamic discovery ???

    Chris, Nov 12, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    417
    Chris
    Nov 12, 2004
  2. Replies:
    10
    Views:
    685
  3. Soren
    Replies:
    4
    Views:
    1,303
    c d saunter
    Feb 14, 2008
  4. Vivek Menon
    Replies:
    5
    Views:
    3,421
    Paul Uiterlinden
    Jun 8, 2011
  5. Vivek Menon
    Replies:
    0
    Views:
    1,786
    Vivek Menon
    Jun 10, 2011
Loading...

Share This Page