optimize log parsing

Discussion in 'Perl Misc' started by it_says_BALLS_on_your forehead, Oct 4, 2005.

  1. i have roughly 400 logs to parse, with the size of each log ranging
    from 200 bytes to 263,000,000 bytes.

    i have 20 processes that i can allocate to log parsing.

    i have a hash that has the log file name as the key, and its size in
    bytes as the value. ( i can sort the entries by size, or use an array
    if that's better suited to the solution. )

    $file{'/var/log/log1'} => 200
    $file{'/var/log/log2'} => 210
    ....
    $file{'/var/log/log400'} => 262000000

    i would like to feed each process roughly the same amount of data (i.e.
    a certain queue of log files to parse, such that the sum of the bytes
    of the log files for each process are roughly equivalent).

    so let's just say there are 20 buckets, if that will make things more
    comprehensible.

    i calculate the sum of the sizes of all the logs, and it was about 7.6
    GB. divided among 20 buckets, that's roughly 380 MB per bucket.

    one caveat is that i can't break up files. these logs are gzipped, and
    i can't split them while zipped. i tried unzipping the 262 MB one (that
    took about 2 min 40 sec), and then splitting into files that were
    600,000 lines each. that took over 10 min. after running (in Unix)
    split -l 600000 file small_file, and waiting 10 min, i canceled, and
    only 3 files were written so far! it probably would have taken 20
    minutes or more to finish, and nothing was even being parsed yet.
    unacceptable.

    basically this problem can be reduced to figuring out a way to fit 400
    discrete numbers into 20 buckets where the count of the numbers does
    not matter, but the sum of the numbers should be roughly equal.

    one way of accomplishing this is to sort descending, and loop through
    the 400 files, putting the largest one in the first bucket, and going
    down the line, putting the largest one that will fit into the current
    bucket until you get to the end. granted, this is not perfect--perhaps
    if you had waited, 2 smaller ones would have brought you closer to your
    bucket-size goal. then you go to the next bucket, and do the same
    thing. this is a O = n^2 solution though. i was hoping that there was a
    more elegant, less brute-force method. perhaps the most perfect
    solution would be to take combinations and permutations of all the 400
    logs and finding which combinations of groups would result in the most
    equal 20 buckets, but that would take way too much computation time.

    using the method i outlined (not the combinations one, the one before
    that), i would have to go through the list of 400 logs for the 1st
    bucket, then let's say 395 logs for the 2nd bucket, etc. until the 20th
    bucket. this really isn't that much, and it's unlikely that the number
    of logs and processes will expand by orders of magnitude, so is it even
    worth it to expend the effort finding this algorithm? any help would be
    appreciated :)
     
    it_says_BALLS_on_your forehead, Oct 4, 2005
    #1
    1. Advertising

  2. it_says_BALLS_on_your forehead

    Guest

    "it_says_BALLS_on_your forehead" <> wrote:
    > i have roughly 400 logs to parse, with the size of each log ranging
    > from 200 bytes to 263,000,000 bytes.
    >
    > i have 20 processes that i can allocate to log parsing.
    >
    > i have a hash that has the log file name as the key, and its size in
    > bytes as the value. ( i can sort the entries by size, or use an array
    > if that's better suited to the solution. )
    >
    > $file{'/var/log/log1'} => 200
    > $file{'/var/log/log2'} => 210
    > ...
    > $file{'/var/log/log400'} => 262000000
    >
    > i would like to feed each process roughly the same amount of data (i.e.
    > a certain queue of log files to parse, such that the sum of the bytes
    > of the log files for each process are roughly equivalent).
    >
    > so let's just say there are 20 buckets, if that will make things more
    > comprehensible.
    >
    > i calculate the sum of the sizes of all the logs, and it was about 7.6
    > GB. divided among 20 buckets, that's roughly 380 MB per bucket.


    I wouldn't bother with this 'bucket' stuff at all. Just do it on the fly.
    By addressing the files in the appropriate order (from most work to least
    work) you ensure nearly optimal processing. In fact, because there is no
    guarantee that the actual time for a file to be processed is exactly
    proportional to the file size, balancing on the fly is almost surely going
    to be better than some precomputed balancing based on the assumption that
    size = time.

    $pm = new Parallel::ForkManager(20);

    foreach $file (sort {$files{$b}<=>$files{$a}} keys %files) {
    my $pid = $pm->start and next;
    ##Process the $file
    $pm->finish; # Terminates the child process
    }
    $pm->wait_all_children;

    ....

    > basically this problem can be reduced to figuring out a way to fit 400
    > discrete numbers into 20 buckets where the count of the numbers does
    > not matter, but the sum of the numbers should be roughly equal.
    >
    > one way of accomplishing this is to sort descending, and loop through
    > the 400 files, putting the largest one in the first bucket, and going
    > down the line, putting the largest one that will fit into the current
    > bucket until you get to the end. granted, this is not perfect--perhaps
    > if you had waited, 2 smaller ones would have brought you closer to your
    > bucket-size goal. then you go to the next bucket, and do the same
    > thing. this is a O = n^2 solution though.


    How so? N ln N to sort the files, only has to be done once. Finding the
    biggest file that fits the current bucket is ln N using a binary search
    into the sorted files. You have 1 N ln N operation, followed by N
    operations that are lnN. That comes out to N ln N overall.

    But anyway, why try to preordain the order in which the buckets become
    empty for full? Start processing the 20 biggest files. When one of them
    finishes (regardless of which one it is), start the next file.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Oct 4, 2005
    #2
    1. Advertising

  3. > I wouldn't bother with this 'bucket' stuff at all. Just do it on the fly.
    > By addressing the files in the appropriate order (from most work to least
    > work) you ensure nearly optimal processing. In fact, because there is no
    > guarantee that the actual time for a file to be processed is exactly
    > proportional to the file size,


    you're right here, although i had the idea that i could weight certain
    parse methods by multiplying each log size by a coefficient. the
    coefficient would be derived by dividing the average speed of a parse
    method by the average speed of the slowest parse method.

    balancing on the fly is almost surely going
    > to be better than some precomputed balancing based on the assumption that
    > size = time.
    >
    > $pm = new Parallel::ForkManager(20);
    >
    > foreach $file (sort {$files{$b}<=>$files{$a}} keys %files) {
    > my $pid = $pm->start and next;
    > ##Process the $file
    > $pm->finish; # Terminates the child process
    > }
    > $pm->wait_all_children;
    >
    > ...


    i admit i'm not too familiar with Threads/Forks (the only fork i use is
    the one called from system() ). also, i've read that Perl threading
    isn't too stable. i've looked on the web a little, but have not found
    anything that describes how to do all of the following:

    1) instantiate N processes (or threads)
    2) start each process parsing a log file
    3) the first process that is done looks at a shared or global queue and
    pulls the next log file from that and processes until the queue is
    empty.


    ....the current architecture of my log processing is:
    1) set a number of processes (e.g. 20)
    2) in a loop for the number of processes:
    my @rc;
    for my $i (1..$num_processes) {
    my $command = 'parseLog.pl $i';
    $rc[$i] = system($command);
    }
    # there is a conf file that has an entry for each log, along with a
    number in the next field--the number represents the process_id (can be
    1 thru 20)
    3) in a loop of all the logs, push logs into arrays if the process_id
    == the $num_process that was passed along, so each process has an array
    of files to process/parse. each process parses each file in its array
    of files. problem with this is that maybe each process has a similar
    number of logs to process (the process_id just increments for each
    line, then wraps around once it reaches the max number of processes i
    defined), but some could be huge while others are small, so not very
    optimal. one process could have 20 files of 200 bytes each, while the
    other could have 20 files of 230 MB each.

    since using the system() approach is all i know, the only scenarios i
    considered were those that dealt with providing each process with a
    balanced amount of data.

    if i can get the on-the-fly thing working, that would be preferable.
    then sorting would not even be helpful, would it?

    >
    > > basically this problem can be reduced to figuring out a way to fit 400
    > > discrete numbers into 20 buckets where the count of the numbers does
    > > not matter, but the sum of the numbers should be roughly equal.
    > >
    > > one way of accomplishing this is to sort descending, and loop through
    > > the 400 files, putting the largest one in the first bucket, and going
    > > down the line, putting the largest one that will fit into the current
    > > bucket until you get to the end. granted, this is not perfect--perhaps
    > > if you had waited, 2 smaller ones would have brought you closer to your
    > > bucket-size goal. then you go to the next bucket, and do the same
    > > thing. this is a O = n^2 solution though.

    >
    > How so? N ln N to sort the files, only has to be done once. Finding the
    > biggest file that fits the current bucket is ln N using a binary search
    > into the sorted files. You have 1 N ln N operation, followed by N
    > operations that are lnN. That comes out to N ln N overall.
    >


    you would be roughly doing m x n iterations, which i believe amounts to
    the same thing as big O of n^2. (i AM a bit rusty at algorithms). i
    didn't even count the sort, since that's only done once, and i'm not
    familiar with the under-the-hood mechanics of Perl's hash value sort.


    > But anyway, why try to preordain the order in which the buckets become
    > empty for full?


    that's the only way i know how for now.

    Start processing the 20 biggest files. When one of them
    > finishes (regardless of which one it is), start the next file.
    >


    if i can do the fork thing, why start with the biggest?

    > Xho
    >
    > --
    > -------------------- http://NewsReader.Com/ --------------------
    > Usenet Newsgroup Service $9.95/Month 30GB
     
    it_says_BALLS_on_your forehead, Oct 5, 2005
    #3
  4. Also sprach it_says_BALLS_on_your forehead:

    >> I wouldn't bother with this 'bucket' stuff at all. Just do it on the fly.
    >> By addressing the files in the appropriate order (from most work to least
    >> work) you ensure nearly optimal processing. In fact, because there is no
    >> guarantee that the actual time for a file to be processed is exactly
    >> proportional to the file size,

    >
    > you're right here, although i had the idea that i could weight certain
    > parse methods by multiplying each log size by a coefficient. the
    > coefficient would be derived by dividing the average speed of a parse
    > method by the average speed of the slowest parse method.
    >
    > balancing on the fly is almost surely going
    >> to be better than some precomputed balancing based on the assumption that
    >> size = time.
    >>
    >> $pm = new Parallel::ForkManager(20);
    >>
    >> foreach $file (sort {$files{$b}<=>$files{$a}} keys %files) {
    >> my $pid = $pm->start and next;
    >> ##Process the $file
    >> $pm->finish; # Terminates the child process
    >> }
    >> $pm->wait_all_children;
    >>
    >> ...

    >
    > i admit i'm not too familiar with Threads/Forks (the only fork i use is
    > the one called from system() ). also, i've read that Perl threading
    > isn't too stable.


    It arguably still has its flaws, but for a task as easy as yours they're
    perfectly usable.

    > i've looked on the web a little, but have not found
    > anything that describes how to do all of the following:
    >
    > 1) instantiate N processes (or threads)
    > 2) start each process parsing a log file
    > 3) the first process that is done looks at a shared or global queue and
    > pulls the next log file from that and processes until the queue is
    > empty.


    Extremely easy with threads. Here's a complete example of a program that
    spawns off a number of threads where each thread pulls data from a
    global queue until it is empty:

    #!/usr/bin/perl -w

    use strict;

    use threads;
    use threads::shared;

    use constant NUM_THREADS => 10;

    # shared queue visible to every thread
    my @queue : shared = 1 .. 30;

    # create threads
    my @threads;
    push @threads, threads->new("run") for 1 .. NUM_THREADS;

    # wait for all threads to finish
    $_->join for @threads;

    # code executed by each thread
    sub run {
    while (defined(my $element = pop @queue)) {
    printf "thread %i: working with %i\n", threads->tid, $element;
    # make runtime vary a little for
    # demonstration purpose
    select undef, undef, undef, rand;
    }
    }


    > ...the current architecture of my log processing is:
    > 1) set a number of processes (e.g. 20)
    > 2) in a loop for the number of processes:
    > my @rc;
    > for my $i (1..$num_processes) {
    > my $command = 'parseLog.pl $i';
    > $rc[$i] = system($command);
    > }
    > # there is a conf file that has an entry for each log, along with a
    > number in the next field--the number represents the process_id (can be
    > 1 thru 20)
    > 3) in a loop of all the logs, push logs into arrays if the process_id
    >== the $num_process that was passed along, so each process has an array
    > of files to process/parse. each process parses each file in its array
    > of files. problem with this is that maybe each process has a similar
    > number of logs to process (the process_id just increments for each
    > line, then wraps around once it reaches the max number of processes i
    > defined), but some could be huge while others are small, so not very
    > optimal. one process could have 20 files of 200 bytes each, while the
    > other could have 20 files of 230 MB each.


    Don't think in terms of processes. If you're using processes for that
    kind of thing you'll need to find a way for them to communicate
    (possibly pipes, or maybe shared memory). Threads takes this work off
    your shoulders as they can share data in a simple and secure manner.

    > since using the system() approach is all i know, the only scenarios i
    > considered were those that dealt with providing each process with a
    > balanced amount of data.


    Bad idea. It may take a different time for each piece of datum. The real
    way is to store the work in one central repository and each thread
    retrieves its working set from there. When it is done, it fetches the
    next unless the central pool is empty.

    > if i can get the on-the-fly thing working, that would be preferable.
    > then sorting would not even be helpful, would it?


    It is not helpful.

    > Start processing the 20 biggest files. When one of them
    >> finishes (regardless of which one it is), start the next file.
    >>

    >
    > if i can do the fork thing, why start with the biggest?


    Don't even worry about sorting. Use threads and have each thread do the
    parsing of the files in any order. It makes no difference since it's
    truely parallel and asynchronous.

    Tassilo
    --
    use bigint;
    $n=71423350343770280161397026330337371139054411854220053437565440;
    $m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);
     
    Tassilo v. Parseval, Oct 5, 2005
    #4
  5. >
    > Extremely easy with threads. Here's a complete example of a program that
    > spawns off a number of threads where each thread pulls data from a
    > global queue until it is empty:
    >
    > #!/usr/bin/perl -w
    >
    > use strict;
    >
    > use threads;
    > use threads::shared;
    >
    > use constant NUM_THREADS => 10;
    >
    > # shared queue visible to every thread
    > my @queue : shared = 1 .. 30;
    >
    > # create threads
    > my @threads;
    > push @threads, threads->new("run") for 1 .. NUM_THREADS;
    >
    > # wait for all threads to finish
    > $_->join for @threads;
    >
    > # code executed by each thread
    > sub run {
    > while (defined(my $element = pop @queue)) {
    > printf "thread %i: working with %i\n", threads->tid, $element;
    > # make runtime vary a little for
    > # demonstration purpose
    > select undef, undef, undef, rand;
    > }
    > }
    >
    >
    >
    > Don't think in terms of processes. If you're using processes for that
    > kind of thing you'll need to find a way for them to communicate
    > (possibly pipes, or maybe shared memory). Threads takes this work off
    > your shoulders as they can share data in a simple and secure manner.
    >


    Tassilo, thank you very much for your help. If i could trouble you once
    more for your insight...

    What benefit does the thread model have that the following does not?
    What drawbacks?

    use Parallel::ForkManager;
    $pm = new Parallel::ForkManager($MAX_PROCESSES);
    foreach $data (@all_data) {
    # Forks and returns the pid for the child:
    my $pid = $pm->start and next;
    ... do some work with $data in the child process ...
    $pm->finish; # Terminates the child process
    }



    > use bigint;
    > $n=71423350343770280161397026330337371139054411854220053437565440;
    > $m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);
     
    it_says_BALLS_on_your forehead, Oct 5, 2005
    #5
  6. >
    > Extremely easy with threads. Here's a complete example of a program that
    > spawns off a number of threads where each thread pulls data from a
    > global queue until it is empty:


    i'm running Perl 5.6, are threads and threads::shared available/stable
    in this version?


    >
    > #!/usr/bin/perl -w
    >
    > use strict;
    >
    > use threads;
    > use threads::shared;
    >
    > use constant NUM_THREADS => 10;
    >
    > # shared queue visible to every thread
    > my @queue : shared = 1 .. 30;
    >
    > # create threads
    > my @threads;
    > push @threads, threads->new("run") for 1 .. NUM_THREADS;
    >
    > # wait for all threads to finish
    > $_->join for @threads;
    >
    > # code executed by each thread
    > sub run {
    > while (defined(my $element = pop @queue)) {
    > printf "thread %i: working with %i\n", threads->tid, $element;


    ^ what is this?

    > # make runtime vary a little for
    > # demonstration purpose
    > select undef, undef, undef, rand;
    > }
    > }
    >
    >
     
    it_says_BALLS_on_your forehead, Oct 5, 2005
    #6
  7. it_says_BALLS_on_your forehead

    Paul Lalli Guest

    it_says_BALLS_on_your forehead wrote:
    > > printf "thread %i: working with %i\n", threads->tid, $element;

    > ^ what is this?


    When you don't understand what a function is doing, the best way to
    figure it out is to read the documentation. (And I admit, in this
    case, the documenation will lead you on a trail, so let's follow it:)

    perldoc -f printf
    printf FILEHANDLE FORMAT, LIST
    printf FORMAT, LIST
    Equivalent to "print FILEHANDLE sprintf(FORMAT,
    LIST)",

    Okay, so we need to figure out what sprintf does instead:
    perldoc -f sprintf
    sprintf FORMAT, LIST
    Returns a string formatted by the usual "printf"
    conventions of the C library function "sprintf".
    See below for more details
    . . .
    Finally, for backward (and we do mean "backward")
    compatibility, Perl permits these unnecessary but
    widely-supported conversions:

    %i a synonym for %d
    which, if we look back a bit, we see:
    Perl's "sprintf" permits the following universally-
    known conversions:
    . . .
    %d a signed integer, in decimal



    So, the original code took the two arguments, threads->tid and
    $element, converted them to integers (if necessary), and placed them at
    the corresponding %i markers.

    Paul Lalli
     
    Paul Lalli, Oct 5, 2005
    #7
  8. >
    >
    > So, the original code took the two arguments, threads->tid and
    > $element, converted them to integers (if necessary), and placed them at
    > the corresponding %i markers.
    >
    > Paul Lalli


    Ahhh...thanks Paul :)
     
    it_says_BALLS_on_your forehead, Oct 5, 2005
    #8
  9. Also sprach it_says_BALLS_on_your forehead:

    >> Extremely easy with threads. Here's a complete example of a program that
    >> spawns off a number of threads where each thread pulls data from a
    >> global queue until it is empty:
    >>
    >> #!/usr/bin/perl -w
    >>
    >> use strict;
    >>
    >> use threads;
    >> use threads::shared;
    >>
    >> use constant NUM_THREADS => 10;
    >>
    >> # shared queue visible to every thread
    >> my @queue : shared = 1 .. 30;
    >>
    >> # create threads
    >> my @threads;
    >> push @threads, threads->new("run") for 1 .. NUM_THREADS;
    >>
    >> # wait for all threads to finish
    >> $_->join for @threads;
    >>
    >> # code executed by each thread
    >> sub run {
    >> while (defined(my $element = pop @queue)) {
    >> printf "thread %i: working with %i\n", threads->tid, $element;
    >> # make runtime vary a little for
    >> # demonstration purpose
    >> select undef, undef, undef, rand;
    >> }
    >> }
    >>
    >>
    >>
    >> Don't think in terms of processes. If you're using processes for that
    >> kind of thing you'll need to find a way for them to communicate
    >> (possibly pipes, or maybe shared memory). Threads takes this work off
    >> your shoulders as they can share data in a simple and secure manner.
    >>

    >
    > Tassilo, thank you very much for your help. If i could trouble you once
    > more for your insight...
    >
    > What benefit does the thread model have that the following does not?
    > What drawbacks?
    >
    > use Parallel::ForkManager;
    > $pm = new Parallel::ForkManager($MAX_PROCESSES);
    > foreach $data (@all_data) {
    > # Forks and returns the pid for the child:
    > my $pid = $pm->start and next;
    > ... do some work with $data in the child process ...
    > $pm->finish; # Terminates the child process
    > }


    Perl threads are said to be inefficient in that a Perl interpreter is
    cloned for each thread. However, the Parallel::ForkManager approach
    requires a new definition of efficiency. :)

    Consider what happens in code like this:

    foreach my $data (1 .. 100_000) {
    $pm->start and next;
    do_something($data);
    $pm->finish;
    }

    For each of the 100.000 items a new process is spawned off. So unlike
    with threads where a thread handles more than one item here each process
    is essentially a once-and-throw-away thing. This is because
    Parallel::ForkManager provides no infrastructure for sharing data
    between these processes.

    Just try this little program yourself and see how poorly it performs:

    use Parallel::ForkManager;

    my $pm = Parallel::ForkManager->new(30);
    my @data = 1 ... shift;
    foreach(@data) {
    my $pid = $pm->start and next;
    print "$_\n";
    $pm->finish;
    }
    $pm->wait_all_children;

    and compare it to an equivalent threads-implementation where the items
    are stored in a shared array:

    use threads;
    use threads::shared;

    use constant NUM_THREADS => 30;

    my @queue : shared = 1 .. shift;
    my @threads;

    push @threads, threads->new("run") for 1 .. NUM_THREADS;
    $_->join for @threads;

    sub run {
    while (defined(my $element = shift @queue)) {
    print "$element\n";
    }
    }

    On my machine I get:

    ethan@ethan:~$ time perl procs.pl 2000
    [...]
    real 0m7.002s
    user 0m2.500s
    sys 0m4.500s

    ethan@ethan:~$ time perl thread.pl 2000
    [...]
    real 0m3.141s
    user 0m1.540s
    sys 0m1.590s

    If you increase the number further to, say, 10000, it's already

    [processes]
    real 0m45.605s
    user 0m24.320s
    sys 0m21.130s

    [threads]
    real 0m8.671s
    user 0m1.090s
    sys 0m7.580s

    So apparently threads scale much better than processes which is no
    wonder because threads only require a one-time initialization for
    creating the threads whereas Parallel::ForkManager constantly has to
    create and terminate new processes.

    Besides, I find the code with threads much easier to understand: Each
    thread has a function that it executes. With fork it's more tricky.
    You first have to find the call to fork() and then one of the code
    branches after that is for the child, the other one for the parent.

    Tassilo
    --
    use bigint;
    $n=71423350343770280161397026330337371139054411854220053437565440;
    $m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);
     
    Tassilo v. Parseval, Oct 5, 2005
    #9
  10. it_says_BALLS_on_your forehead

    Guest

    "it_says_BALLS_on_your forehead" <> wrote:


    > > balancing on the fly is almost surely going
    > > to be better than some precomputed balancing based on the assumption
    > > that size = time.
    > >
    > > $pm = new Parallel::ForkManager(20);
    > >
    > > foreach $file (sort {$files{$b}<=>$files{$a}} keys %files) {
    > > my $pid = $pm->start and next;
    > > ##Process the $file
    > > $pm->finish; # Terminates the child process
    > > }
    > > $pm->wait_all_children;
    > >
    > > ...

    >
    > i admit i'm not too familiar with Threads/Forks (the only fork i use is
    > the one called from system() ).


    One advantage of Parallel::ForkManager is that you don't need to be all
    that familiar with fork. It handles most of it for you, as long as you
    follow the example of having a "$pm->start and next" near the top of the
    loop and a "$pm->finish" at the end of the loop. ($pm->finish actually
    calls exit in the child process, so anything between the finish and the end
    of the loop is not executed.)


    > also, i've read that Perl threading
    > isn't too stable.


    Forking on linux is rock stable. Forking on Windows is emulated using
    threads, but I think it is stable enough for what you are doing.

    > i've looked on the web a little, but have not found
    > anything that describes how to do all of the following:
    >
    > 1) instantiate N processes (or threads)


    $pm = new Parallel::ForkManager($N);
    (Doesn't actual instantiate them, but declares how many you want
    instantiated, once you get around to instantiating them.)

    > 2) start each process parsing a log file


    That is what the "foreach...$pm->start and next" does. It starts a process
    on the next log file, unless there are already 20 (or $N) outstanding
    processes. In that case, it waits for one of those outstanding processes to
    end, then starts a process on the next log file.

    > 3) the first process that is done looks at a shared or global queue and
    > pulls the next log file from that and processes until the queue is
    > empty.


    ForkManager uses inversion of control (or at least something like it). The
    first slave process that is done finishes. As part of finishing, it
    notifies the master process. The master process keeps the queue, and uses
    it to start the next process, to replace the one that finished.

    ....
    >
    > if i can get the on-the-fly thing working, that would be preferable.
    > then sorting would not even be helpful, would it?


    I find that it is helpful, especially when the length of the various
    tasks vary by orders of magnitude.

    Let's say your largest task will take 20 minutes for 1 process/CPU to
    process, and all the rest of your tasks combined will take 20 minutes for
    the other 19 CPUs to process. If you start the largest task first, then in
    20 minutes you are done. If you start the largest task last, then say it
    takes 15 minutes before it gets started[1], and then 20 minutes for it to
    run, so the time to completeion is 35 minutes.

    By starting the tasks it reverse order of run time, it lets the shorter
    tasks pack around the longer ones in an efficient way.

    (I just did a test on a uniform distribution of run-lengths[2], and
    "processing" from long to short took 8:25 while short to long took 9:10. I
    think the difference can be larger if the dispersion in runtimes is
    greater)

    Xho



    [1] Since all-but-the-longest take 20 minutes to finish on 19 CPUs, they
    will take ~19 minutes to finish on 20 CPUs (since we haven't yet started
    the longest task, the shorter ones will have 20 CPUs to use, not 19).
    However, the longest one doesn't need to wait for all of the shorter ones
    to finish before it starts, it only needs to wait for 381 out of the 400 of
    the shorter ones to finish. So I just pulled 15 minutes out of my ass, as
    a guess of how long it will take for 381 of them to finish.

    [2]
    use strict;

    use Parallel::ForkManager;

    my $pm = new Parallel::ForkManager(10);

    ## do it with and without the "reverse"
    foreach my $file (reverse 1..100) {
    my $pid = $pm->start and next;
    sleep $file;
    $pm->finish; # Terminates the child process
    }
    $pm->wait_all_children;

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Oct 5, 2005
    #10
  11. Also sprach it_says_BALLS_on_your forehead:

    >> Extremely easy with threads. Here's a complete example of a program that
    >> spawns off a number of threads where each thread pulls data from a
    >> global queue until it is empty:

    >
    > i'm running Perl 5.6, are threads and threads::shared available/stable
    > in this version?


    Oh, 5.6. In this case don't even bother using threads. The threads and
    threads::shared module aren't yet available. What is available in 5.6 is
    an older threads-model that was later replaced by the so called
    "interpreter-threads", ithreads in short, which I was talking about.

    If you're curious see 'perldoc perlthrtut' for your version of Perl.
    These older threads were a pain especially since you had to write your
    own code for safely sharing data between threads using lock(). It feels
    more like programming with System V semaphores and other obscenities.

    Tassilo
    --
    use bigint;
    $n=71423350343770280161397026330337371139054411854220053437565440;
    $m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);
     
    Tassilo v. Parseval, Oct 5, 2005
    #11
  12. it_says_BALLS_on_your forehead

    Guest

    "it_says_BALLS_on_your forehead" <> wrote:
    ....
    > >
    > >
    > > Don't think in terms of processes. If you're using processes for that
    > > kind of thing you'll need to find a way for them to communicate
    > > (possibly pipes, or maybe shared memory). Threads takes this work off
    > > your shoulders as they can share data in a simple and secure manner.
    > >

    >
    > Tassilo, thank you very much for your help. If i could trouble you once
    > more for your insight...
    >
    > What benefit does the thread model have that the following does not?
    > What drawbacks?
    >
    > use Parallel::ForkManager;
    > $pm = new Parallel::ForkManager($MAX_PROCESSES);
    > foreach $data (@all_data) {
    > # Forks and returns the pid for the child:
    > my $pid = $pm->start and next;
    > ... do some work with $data in the child process ...
    > $pm->finish; # Terminates the child process
    > }



    The ForkManager method does not allow the children to report back
    to the parent in a meaningful way (i.e. beyond exit status). Threads do
    readily allow that (although Tassilo's specific example doesn't).

    There is probably more overhead in forking a process for each task than
    there is in popping from a shared queue. Unless you have many thousands
    of very very short tasks, this difference is probably negligible.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Oct 5, 2005
    #12
  13. Tassilo,
    I would like to upgrade to Perl 5.8 so I can use threads (as well as
    for other reasons), but for now I think I'm stuck with a fork.

    Xho,
    I actually rummaged around Google Groups and saw some postings you
    sent around 2002 about Parallel::ForkManager, so you must be expert by
    now :). I'll try implementing this and see how it goes.

    Xho and Tassilo--
    Thank you both very much for your explanations :).
     
    it_says_BALLS_on_your forehead, Oct 5, 2005
    #13
  14. Hey Xho, I tried this:
    ----
    #!/apps/webstats/bin/perl

    use File::Copy;
    use Parallel::ForkManager;


    my $pm = Parallel::ForkManager->new(5);

    $pm->run_on_start(
    sub { my ($pid,$ident)=@_;
    print "** $ident started, pid: $pid\n";
    }
    );

    my @data = 1 ... shift;
    for (@data) {
    my $pid = $pm->start and next;
    print "$pid: $_\n";
    $pm->finish;
    }

    $pm->wait_all_children;
    ------------
    and got this:
    #####
    [smro180 123] ~/simon/1-perl > tryFork.pl 10
    ** started, pid: 16208
    0: 1
    ** started, pid: 16209
    0: 2
    ** started, pid: 16210
    0: 3
    ** started, pid: 16211
    0: 4
    ** started, pid: 16212
    0: 5
    ** started, pid: 16213
    0: 6
    ** started, pid: 16214
    0: 7
    ** started, pid: 16215
    0: 8
    ** started, pid: 16216
    0: 9
    ** started, pid: 16217
    0: 10
    [smro180 124] ~/simon/1-perl >
    ####

    ....I read this:
    start [ $process_identifier ]
    This method does the fork. It returns the pid of the child process for
    the parent, and 0 for the child process. If the $processes parameter
    for the constructor is 0 then, assuming you're in the child process,
    $pm->start simply returns 0.

    An optional $process_identifier can be provided to this method... It is
    used by the "run_on_finish" callback (see CALLBACKS) for identifying
    the finished process.

    and this:
    run_on_start $code
    You can define a subroutine which is called when a child is started. It
    called after the successful startup of a child in the parent process.

    The parameters of the $code are the following:

    - pid of the process which has been started
    - identification of the process (if provided in the "start" method)



    ....but I don't understand why in my: print "$pid: $_\n";
    line, i'm getting 0 as the pid. I know the documentation said i should
    get 0 for the child process and the child pid for the parent, but
    aren't i calling start on the parent? and in the callback, why are
    there more than 5 different process ids?
     
    it_says_BALLS_on_your forehead, Oct 5, 2005
    #14

  15. > and in the callback, why are
    > there more than 5 different process ids?


    ....i had initially assumed that i was creating 5 processes, and each
    one would finish a job, and go back to the 'queue' (in this case, the
    array). so i would see the same 5 pids over and over again. is this not
    the case? does the number 5 here represent the number of processes that
    are active or in existence *at one time*? so that there will be a total
    of
    scalar(@data) processes created overall, and once one is done, it will
    die (not be re-used) and a new one will be created to replace it?
     
    it_says_BALLS_on_your forehead, Oct 5, 2005
    #15
  16. it_says_BALLS_on_your forehead

    Guest

    "Tassilo v. Parseval" <> wrote:
    >
    > Perl threads are said to be inefficient in that a Perl interpreter is
    > cloned for each thread. However, the Parallel::ForkManager approach
    > requires a new definition of efficiency. :)
    >
    > Consider what happens in code like this:
    >
    > foreach my $data (1 .. 100_000) {
    > $pm->start and next;
    > do_something($data);
    > $pm->finish;
    > }
    >
    > For each of the 100.000 items a new process is spawned off. So unlike
    > with threads where a thread handles more than one item here each process
    > is essentially a once-and-throw-away thing. This is because
    > Parallel::ForkManager provides no infrastructure for sharing data
    > between these processes.
    >
    > Just try this little program yourself and see how poorly it performs:
    >
    > use Parallel::ForkManager;
    >
    > my $pm = Parallel::ForkManager->new(30);
    > my @data = 1 ... shift;
    > foreach(@data) {
    > my $pid = $pm->start and next;
    > print "$_\n";
    > $pm->finish;
    > }
    > $pm->wait_all_children;


    I agree that if you are going to do something incredibly silly, like
    parallelizing something that has no legitimate need to be parallelized,
    then using ForkManager is probably not the best choice. But then again,
    doing silly things is generally not the best choice in the first place,
    unless your goal is to be silly (which is a noble goal in itself,
    sometimes).


    > and compare it to an equivalent threads-implementation where the items
    > are stored in a shared array:


    Hardly equivalent. The equivalent thread implementation to the above fork
    code would be to spawn one thread per item. The equivalent fork
    implementation to the below threaded code would be to use forks::shared
    (which, BTW, is even worse than the ForkManager way).


    >
    > use threads;
    > use threads::shared;
    >
    > use constant NUM_THREADS => 30;
    >
    > my @queue : shared = 1 .. shift;
    > my @threads;
    >
    > push @threads, threads->new("run") for 1 .. NUM_THREADS;
    > $_->join for @threads;
    >
    > sub run {
    > while (defined(my $element = shift @queue)) {
    > print "$element\n";
    > }
    > }
    >
    > On my machine I get:

    ....
    > If you increase the number further to, say, 10000, it's already
    >
    > [processes]
    > real 0m45.605s
    > user 0m24.320s
    > sys 0m21.130s
    >
    > [threads]
    > real 0m8.671s
    > user 0m1.090s
    > sys 0m7.580s


    Parallelization is inherently an optimization step. As such, there is
    no general solution and the appropriate way to parallelize is highly
    dependent on the details of what is to be parallelized. If I wanted to
    parallelize something like your test case, consisting of a large number of
    very very fast operations, I would use the unpublished "Parallel_Proc"
    module, which divides into chunks up front.

    use Parallel_Proc;

    my $pm = Parallel_Proc->new();
    my @data = 1 ... shift;

    my ($l,$r)=$pm->spawn(30,scalar @data);
    foreach(@data[$l..$r]) {
    print "$_\n";
    }
    $pm->harvest(sub{print $_[0]}); #pass childs output out the parent's STDOUT
    $pm->Done();

    __END__

    On my machine:

    ~/perl_misc]$ time perl thread.pl 500000 > /dev/null
    5.740u 0.320s 0:06.16 98.3% 0+0k 0+0io 447pf+0w

    ~/perl_misc]$ time perl parallel_proc.pl 500000 > /dev/null
    2.320u 0.410s 0:02.75 99.2% 0+0k 0+0io 454pf+0w

    That was on a single CPU machine. On a four CPU machine:

    $ time /usr/bin/perl thread.pl 500000 > /dev/null
    11.212u 23.806s 0:18.00 194.5% 0+0k 0+0io 0pf+0w

    $ time /usr/bin/perl parallel_proc.pl 500000 > /dev/null
    1.631u 0.572s 0:01.65 133.3% 0+0k 0+0io 0pf+0w


    >
    > So apparently threads scale much better than processes which is no
    > wonder because threads only require a one-time initialization for
    > creating the threads whereas Parallel::ForkManager constantly has to
    > create and terminate new processes.


    While this is true, it is not particularly relevant to the poster's
    problem. There are cases where threading wins hands down. This original
    problem is not one of them.

    > Besides, I find the code with threads much easier to understand: Each
    > thread has a function that it executes. With fork it's more tricky.
    > You first have to find the call to fork() and then one of the code
    > branches after that is for the child, the other one for the parent.


    I somewhat agree. Parallel::ForkManger was (apparently) designed so that
    you can usually take code originally written to be serial, and make it
    parallel by simply adding 2 carefully-placed lines (plus 3 house-keeping
    lines). The threaded code is pretty much written from the ground up to be
    threaded. The threaded code structure tends to be dominated by the
    threading, while the ForkManager code tends to be dominated by whatever you
    are fundamentally trying to do, which just a few lines making a nod to the
    parallelization. This makes it easier to thoughtless add code that breaks
    parallelization under ForkManager. So when I substantially refactor code
    that uses ForkManager, I simply remove the parallelization, refactor the
    code as serial code, then add ForkManager back in at the end.

    Oh, one more thing I discovered. Threaded code with a shared queue is
    tricky to do if the queue holds references or objects.

    If you change the queue to:

    my @queue : shared = map {[$_]} 1 .. shift;

    Then it dies with "Invalid value for shared scalar". Since the forking
    code doesn't use shared values, it doesn't have this particular problem.

    You can circumvent this with the rather ugly:

    my @queue : shared = map {my $x =[$_]; share $x; $x} 1 .. shift;

    With blessed reference, this doesn't work.

    (Yes, I know that this is rather far afield of the original question, but
    while I'm dissertating on the subject I might as well go whole hog.)

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Oct 5, 2005
    #16
  17. it_says_BALLS_on_your forehead

    Guest

    "it_says_BALLS_on_your forehead" <> wrote:
    > > and in the callback, why are
    > > there more than 5 different process ids?

    >
    > ...i had initially assumed that i was creating 5 processes, and each
    > one would finish a job, and go back to the 'queue' (in this case, the
    > array). so i would see the same 5 pids over and over again. is this not
    > the case? does the number 5 here represent the number of processes that
    > are active or in existence *at one time*? so that there will be a total
    > of
    > scalar(@data) processes created overall, and once one is done, it will
    > die (not be re-used) and a new one will be created to replace it?


    Correct.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Oct 5, 2005
    #17
  18. it_says_BALLS_on_your forehead

    Guest

    "it_says_BALLS_on_your forehead" <> wrote:
    > Hey Xho, I tried this:
    > ----
    > #!/apps/webstats/bin/perl
    >
    > use File::Copy;
    > use Parallel::ForkManager;
    >
    > my $pm = Parallel::ForkManager->new(5);
    >
    > $pm->run_on_start(
    > sub { my ($pid,$ident)=@_;
    > print "** $ident started, pid: $pid\n";
    > }
    > );
    >
    > my @data = 1 ... shift;
    > for (@data) {
    > my $pid = $pm->start and next;
    > print "$pid: $_\n";
    > $pm->finish;
    > }
    >
    > $pm->wait_all_children;
    > ------------
    > and got this:
    > #####
    > [smro180 123] ~/simon/1-perl > tryFork.pl 10
    > ** started, pid: 16208
    > 0: 1
    > ** started, pid: 16209
    > 0: 2
    > ** started, pid: 16210

    ....
    >
    > ...I read this:
    > start [ $process_identifier ]
    > This method does the fork. It returns the pid of the child process for
    > the parent, and 0 for the child process. If the $processes parameter
    > for the constructor is 0 then, assuming you're in the child process,
    > $pm->start simply returns 0.
    >
    > An optional $process_identifier can be provided to this method... It is
    > used by the "run_on_finish" callback (see CALLBACKS) for identifying
    > the finished process.
    >
    > and this:
    > run_on_start $code
    > You can define a subroutine which is called when a child is started. It
    > called after the successful startup of a child in the parent process.
    >
    > The parameters of the $code are the following:
    >
    > - pid of the process which has been started
    > - identification of the process (if provided in the "start" method)
    >
    > ...but I don't understand why in my: print "$pid: $_\n";
    > line, i'm getting 0 as the pid. I know the documentation said i should
    > get 0 for the child process and the child pid for the parent, but
    > aren't i calling start on the parent?


    You are calling "start" *in* the parent, but is returning in both the
    parent and child process. Inside, "start" does a fork, so when "start"
    ends there are two processes. The parent process gets the child's pid,
    which means the "and next" is activated. The child gets zero, so the "and
    next" is not activated. This means everything between the start and the
    finish statements are done in one of the children, not in the parent.

    The example I posted was just copied and modified from perldoc, and for
    some reason they do capture the pid. In practise I almost never capture
    it:

    $pm->start and next;

    If the child needs it's own pid, it gets it from $$. Why do I need
    the parent to know the child's pid? Usually I don't, because the module
    itself takes care of all the waiting and stuff for me.

    I rarely use anything except new, start, finish, and wait_all_children,
    except to goof around with. Once your needs get more complicated than
    those simple methods, I find that things get hairy real quick.

    BTW, I'm curious about the bottleneck in your code. If your code is
    CPU-bound, then parallelization to 20 processes won't help much unless you
    have 20 CPUs. If it is disk-drive bound, then parallelization won't help
    unless your files are on different disks (and probably on different
    controllers.)

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Oct 5, 2005
    #18
  19. wrote:
    > "it_says_BALLS_on_your forehead" <> wrote:
    > > Hey Xho, I tried this:
    > > ----
    > > #!/apps/webstats/bin/perl
    > >
    > > use File::Copy;
    > > use Parallel::ForkManager;
    > >
    > > my $pm = Parallel::ForkManager->new(5);
    > >
    > > $pm->run_on_start(
    > > sub { my ($pid,$ident)=@_;
    > > print "** $ident started, pid: $pid\n";
    > > }
    > > );
    > >
    > > my @data = 1 ... shift;
    > > for (@data) {
    > > my $pid = $pm->start and next;
    > > print "$pid: $_\n";
    > > $pm->finish;
    > > }
    > >
    > > $pm->wait_all_children;
    > > ------------
    > > and got this:
    > > #####
    > > [smro180 123] ~/simon/1-perl > tryFork.pl 10
    > > ** started, pid: 16208
    > > 0: 1
    > > ** started, pid: 16209
    > > 0: 2
    > > ** started, pid: 16210

    > ...
    > >
    > > ...I read this:
    > > start [ $process_identifier ]
    > > This method does the fork. It returns the pid of the child process for
    > > the parent, and 0 for the child process. If the $processes parameter
    > > for the constructor is 0 then, assuming you're in the child process,
    > > $pm->start simply returns 0.
    > >
    > > An optional $process_identifier can be provided to this method... It is
    > > used by the "run_on_finish" callback (see CALLBACKS) for identifying
    > > the finished process.
    > >
    > > and this:
    > > run_on_start $code
    > > You can define a subroutine which is called when a child is started. It
    > > called after the successful startup of a child in the parent process.
    > >
    > > The parameters of the $code are the following:
    > >
    > > - pid of the process which has been started
    > > - identification of the process (if provided in the "start" method)
    > >
    > > ...but I don't understand why in my: print "$pid: $_\n";
    > > line, i'm getting 0 as the pid. I know the documentation said i should
    > > get 0 for the child process and the child pid for the parent, but
    > > aren't i calling start on the parent?

    >
    > You are calling "start" *in* the parent, but is returning in both the
    > parent and child process. Inside, "start" does a fork, so when "start"
    > ends there are two processes. The parent process gets the child's pid,
    > which means the "and next" is activated. The child gets zero, so the "and
    > next" is not activated. This means everything between the start and the
    > finish statements are done in one of the children, not in the parent.
    >
    > The example I posted was just copied and modified from perldoc, and for
    > some reason they do capture the pid. In practise I almost never capture
    > it:
    >
    > $pm->start and next;
    >
    > If the child needs it's own pid, it gets it from $$. Why do I need
    > the parent to know the child's pid? Usually I don't, because the module
    > itself takes care of all the waiting and stuff for me.
    >
    > I rarely use anything except new, start, finish, and wait_all_children,
    > except to goof around with. Once your needs get more complicated than
    > those simple methods, I find that things get hairy real quick.
    >
    > BTW, I'm curious about the bottleneck in your code. If your code is
    > CPU-bound, then parallelization to 20 processes won't help much unless you
    > have 20 CPUs. If it is disk-drive bound, then parallelization won't help
    > unless your files are on different disks (and probably on different
    > controllers.)
    >
    > Xho
    >


    ahh, that makes sense, thanks!

    to answer your question, i'm working on a box with 16 CPUs. the number
    20 is from code that i inherited from a predecessor. there used to be
    10 processes, and he changed it to 20, and it went faster, so 20 it
    stayed. should i change it to 16?

    also, what's the difference between using Parallel::ForkManager to do
    20 tasks, and looping through system('script.pl &') 20 times? i mean, i
    see an advantage in that with ForkManager, when one processes dies,
    another takes its place so you don't need to pre-ordain which process
    does which work. but let's assume that each process has exactly the
    same amount of work and processes that work with the same speed. would
    ForkManager be faster? Is there ever a case where multiple system()
    calls is the answer?


    > --
    > -------------------- http://NewsReader.Com/ --------------------
    > Usenet Newsgroup Service $9.95/Month 30GB
     
    it_says_BALLS_on_your forehead, Oct 5, 2005
    #19
  20. it_says_BALLS_on_your forehead

    Guest

    "it_says_BALLS_on_your forehead" <> wrote:
    > wrote:
    > >
    > > BTW, I'm curious about the bottleneck in your code. If your code is
    > > CPU-bound, then parallelization to 20 processes won't help much unless
    > > you have 20 CPUs. If it is disk-drive bound, then parallelization won't
    > > help unless your files are on different disks (and probably on
    > > different controllers.)
    > >
    > > Xho
    > >

    >
    > ahh, that makes sense, thanks!
    >
    > to answer your question, i'm working on a box with 16 CPUs. the number
    > 20 is from code that i inherited from a predecessor. there used to be
    > 10 processes, and he changed it to 20, and it went faster, so 20 it
    > stayed. should i change it to 16?


    There is no real reason to use a max of 20 (or anything else greater than
    16) if you only have 16 CPUs, but using 20 shouldn't hurt, either. There
    may be slightly more kernel overhead with 20 than 16, but on modern OSes
    the difference will be negligible.

    One way it could matter is if other process are also using the machine.
    For example, if there are 8 other CPU-bound processes (all of same nice
    level as yours), then if you use max of 16 you will get ~16/24 of the CPU
    time and the other programs will get ~8/24, while if you use a max of 20,
    then you will get ~20/28 of the time and they will get ~8/28.


    > also, what's the difference between using Parallel::ForkManager to do
    > 20 tasks, and looping through system('script.pl &') 20 times?


    In that case (when the number of tasks to do is less than or equal to the
    maximum number of processes you want running at one time) the biggest
    difference is that ForkManager forks a copy of the currently running
    perl interpreter, which is very fast, while system("foo.pl&") will cause a
    brand new interpreter to start up, which is a lot slower than forking a
    copy.[1] (Although both of these are fast enough it doesn't really matter
    if you are only doing it 20 times. (and if you are doing it far more than
    20 times, than the initial assumption of this paragraph is not met))

    Of course, this also means that the new interpreter started with "system"
    is independent of the old one. This could be good (i.e. its name space
    is not contaminated, it doesn't corrupt the parent's database handles, its
    file locks are independent from the parent's etc.) but is at least as
    likely to be bad (it has to redo the "use"s and reparse any input that
    might already have been "use"d or parsed in the original interpreter, it
    doesn't readily inherit its parent's file locks, etc.)

    > i mean, i
    > see an advantage in that with ForkManager, when one processes dies,
    > another takes its place so you don't need to pre-ordain which process
    > does which work.


    But of course that only matters if there are more tasks to run than there
    are simultaneous processes you wish to allow. ForkManager is great if you
    want to run 400 tasks, but want to have only 20 going at any one time. If
    you only want to run 20 in the first place and will do them all at the same
    time, that removes a lot (but not all) of the benefit of using ForkManager.

    > but let's assume that each process has exactly the
    > same amount of work and processes that work with the same speed. would
    > ForkManager be faster?


    Probably, but only because simple forking is faster than "system".

    > Is there ever a case where multiple system()
    > calls is the answer?


    If the child processes spawned by ForkManager have to call "system" or the
    moral equivalent anyway (for example, because the task you need to run is
    an external executable, not Perl code. Or it is Perl code but you need to
    run it in a standalone interpreter for some reason) then there is no reason
    to combine both ForkManager and system, just go with system. But again,
    this only applies if you don't need to throttle the number of simultaneous
    processes.

    If the tasks you need to do are very short and you need to do a very large
    number of them (400 is *not* a very large number), you should try to batch
    them up so that each process/thread does bunch of them. This is a basic
    matter of overhead, and is largely true regardless of whether you use
    ForkManager, system, threads or something else.


    Xho


    [1] Actually, "system" will internally first fork a copy of the current
    interpreter, and then do an exec, which discards this forked copy and
    causes a new intepreter to be started. At least on Linux/Unix, which I
    suspect you are using as I've never seen a 16 CPU Windows machine.

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Oct 5, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Henrik_the_boss
    Replies:
    0
    Views:
    2,659
    Henrik_the_boss
    Nov 5, 2003
  2. Amratash
    Replies:
    0
    Views:
    529
    Amratash
    Apr 13, 2004
  3. =?Utf-8?B?VG9tIFdpbmdlcnQ=?=

    My.Log.Writeexception not writing to Application Event Log.

    =?Utf-8?B?VG9tIFdpbmdlcnQ=?=, Jan 20, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    2,381
    =?Utf-8?B?VG9tIFdpbmdlcnQ=?=
    Jan 20, 2006
  4. sajuptpm
    Replies:
    6
    Views:
    157
    sajuptpm
    Feb 11, 2012
  5. optimize XML parsing

    , Jun 12, 2007, in forum: Perl Misc
    Replies:
    2
    Views:
    144
    J. Gleixner
    Jun 12, 2007
Loading...

Share This Page