Sun Grid Engine / NFS and Python shell execution question

Discussion in 'Python' started by J.B. Brown, Jul 22, 2010.

  1. J.B. Brown

    J.B. Brown Guest

    Hello everyone, and thanks for your time to read this.

    For quite some time, I have had a problem using Python's shell
    execution facilities in combination with a cluster computer
    environment (such as Sun Grid Engine (SGE)).
    In particular, I wish to repeatedly execute a number of commands in
    sub-shells or pipes within a single function, and the repeated
    execution is depending on the previous execution, so just writing a
    brute force script file and executing commands is not an option for
    me.

    To isolate and exemplify my problem, I have created three files:
    (1) one which exemplifies the spirit of the code I wish to execute in Python
    (2) one which serves as the SGE execution script file, and actually
    calls python to execute the code in (1)
    (3) a simple shell script which executes (2) a sufficient number of
    times that it fills all processors on my computing cluster and leaves
    an additional number of jobs in the queue.

    Here is the spirit of the experiment/problem:
    generateTest.py:
    ----------------------------------------------
    # Constants
    numParallelJobs = 100
    testCommand = "continue" #"os.popen( \"clear\" )"
    loopSize = "1000"

    # First, write file with test script.
    pythonScript = file( "testScript.py", "w" )
    pythonScript.write(
    """
    import os
    for i in range( 0, """ + loopSize + """ ):
    for j in range( 0, """ + loopSize + """ ):
    for k in range( 0, """ + loopSize + """ ):
    for l in range( 0, """ + loopSize + """ ):
    """ + testCommand + """
    """ )
    pythonScript.close()

    # Second, write SGE script file to execute the Python script.
    sgeScript = file( "testScript.sge", "w" )
    sgeScript.write (
    """
    #$ -cwd
    #$ -N pythonTest
    #$ -e /export/home/jbbrown/errorLog
    #$ -o /export/home/jbbrown/outputLog
    python testScript.py
    """ )
    sgeScript.close()

    # Finally, write script to run SGE script a specified number of times.
    import os
    launchScript = file( "testScript.sh", "w" )
    for i in range( 0, numParallelJobs ):
    launchScript.write( "qsub testScript.sge" + os.linesep )
    launchScript.close()

    ----------------------------------------------

    Now, let's assume that I have about 50 processors available across 8
    compute nodes, with one NFS-mounted disk.
    If I run the code as above, simply executing Python "continue"
    statements and do nothing, the cluster head node reports no serious
    NFS daemon load.

    However - if I change the code to use the os.popen() call shown as a
    comment above, or use os.system(),
    the NFS daemon load on my system skyrockets within seconds of
    distributing the jobs to the compute nodes -- even though I'm doing
    nothing but executing the clear screen command, which technically
    doesn't pipe any output to the location for logging stdout.
    Even if I change the SGE script file to redirect standard output and
    error to explicitly go to /dev/null, I still have the same problem.

    I believe the source of this problem is that os.popen() or os.system()
    calls spawn subshells which then reference my shell resource files
    (.zshrc, .cshrc, .bashrc, etc.).
    But I don't see an alternative to os.popen{234} or os.system().
    os.exec*() cannot solve my problem, because it transfers execution to
    that program and stops executing the script which called os.exec*().

    Without having to rewrite a considerable amount of code (which
    performs cross validation by repeatedly executing in a subshell) in
    terms of a shell script language filled with a large number of
    conditional statements, does anyone know of a way to execute external
    programs in the middle of a script without referencing the shell
    resource file located on an NFS mounted directory?
    I have read through the >help(os) documentation repeatedly, but just
    can't find a solution.

    Even a small lead or thought would be greatly appreciated.

    With thanks from humid Kyoto,
    J.B. Brown
     
    J.B. Brown, Jul 22, 2010
    #1
    1. Advertising

  2. J.B. Brown

    Neil Hodgson Guest

    J.B. Brown:

    > I believe the source of this problem is that os.popen() or os.system()
    > calls spawn subshells which then reference my shell resource files
    > (.zshrc, .cshrc, .bashrc, etc.).
    > But I don't see an alternative to os.popen{234} or os.system().
    > os.exec*() cannot solve my problem, because it transfers execution to
    > that program and stops executing the script which called os.exec*().


    Call fork then call exec from the new process. Search the web for
    "fork exec" to find examples in C.

    Neil
     
    Neil Hodgson, Jul 23, 2010
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    2,908
  2. Christopher DeMarco

    Python and file locking - NFS or MySQL?

    Christopher DeMarco, Aug 29, 2005, in forum: Python
    Replies:
    0
    Views:
    299
    Christopher DeMarco
    Aug 29, 2005
  3. Jeremy Jones
    Replies:
    1
    Views:
    569
    Christopher DeMarco
    Sep 13, 2005
  4. MRAB
    Replies:
    0
    Views:
    432
  5. Leo

    Perl and Sun Grid Engine (SGE)

    Leo, Jun 17, 2004, in forum: Perl Misc
    Replies:
    3
    Views:
    405
Loading...

Share This Page