parallel computing in perl?

Discussion in 'Perl Misc' started by Jie, Sep 13, 2007.

  1. Jie

    Jie Guest


    I have to randomly select a sample set and run it 1,000 times. The
    following code that i am using now works fine, except it is taking
    long time.
    fore (1..1000) {
    ##get the random sample set and then run a command

    Now I am thinking to split this program into 10 processes to reduce
    the processing time. Of course, I can just change the first line to
    "for (1..100)" and run this same program in 10 different locations.
    but that is really tedi$$ous, and I believe there is a better way to
    accomplish it so that a big job can be split into multiple small jobs.
    Since I am repeatedly running a random sample set, there would be no
    need to worry where each proces ends and another process begins.

    Your insight is appreciated!!

    Jie, Sep 13, 2007
    1. Advertisements

  2. Jie

    J. Gleixner Guest

    Check CPAN for: Parallel::ForkManager
    J. Gleixner, Sep 13, 2007
    1. Advertisements

  3. You might want to look at Parallel::ForkManager. You code would look
    like something along the way of

    use Parallel::ForkManager;
    my $pm = new Parallel::ForkManager 10;

    for $data (1 .. 1000) {
    my $pid = $pm->start and next;

    ## get the random sample and process it



    Peter Makholm, Sep 13, 2007
  4. Jie

    Jie Guest

    however, the problem for parallel computing is a potential file
    sharing and overwritten.
    for example, previously my code will generate a temporary file and the
    next loop will overwrite it with a new generated file. There is no
    problem because the overwriting happens after each process is
    finished. now when I open 10 parallel processing for example, will
    those 10 temporary files or 10 temporary hashs/arrays/variables get
    messed up????


    Jie, Sep 13, 2007
  5. Use File::Temp when dealing with temporary files. Then each loop
    should overwrite eaqch other, not even when running in parallel.
    Then perl variables isn't shared between fork'ed processes.

    Peter Makholm, Sep 13, 2007
  6. Given the specs,

    perldoc -f fork
    perldoc perlipc

    Michele Dondi, Sep 13, 2007
  7. Variables belong each to their own process. As far as the files are
    concerned, just create ten *different* ones. File::Temp may be useful.

    Michele Dondi, Sep 14, 2007
  8. Jie

    Jie Guest

    Hi, thank you very much for the replies.

    I think below would be the code to do it.
    I don't know if I used the right syntax to open a temporary file...
    Also, I don't know if i need to use "$pm->wait_all_children;" as
    suggested by Peter

    use File::Temp
    use Parallel::ForkManager;

    my $pm = new Parallel::ForkManager(10);

    for $data (1 .. 1000) {
    my $pid = $pm->start and next;
    open TEMP_FILE, tempfile();
    ## Do something with this temp_file

    Jie, Sep 14, 2007
  9. Jie

    J. Gleixner Guest

    Really??.. that works??..

    If you want to know the right syntax, or what a method does,
    you may get the answer by actually reading the documentation.

    perldoc File::Temp
    perldoc Parallel::ForkManager
    J. Gleixner, Sep 14, 2007
  10. Usual recommendations:

    1. use lexical filehandles;
    2. use three-args form of open();
    3. check for success.

    open my $tempfile, '+>', tempfile or die badly;

    I changed the mode open because I suppose that you want to create the
    tempfile for writing and then read back stuff out of it. If you don't
    need the file to have a name, or to know it, then you can avoid
    File::Temp and let perl do it easily for you:

    open my $tempfile, '+>', undef or die badly;

    Michele Dondi, Sep 15, 2007
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.