multiplexed IO suggestion?

Discussion in 'Perl Misc' started by Walter Roberson, Jan 5, 2004.

  1. I am hoping someone can point me to an existing module or give
    a good algorithm hint for this.

    I process large-ish text files, performing a bit of work on each line.
    The processing of each line can often be done mostly independantly of
    the other lines, but it would be better if all the output for a given
    line were to proceed the output for the next line. Some manner of
    internal buffering for that purpose would be fine.

    The per-line cpu time is short, by the way: but many lines call for DNS
    lookups, and I'd like to do some of those lookups in parallel while
    preferably maintining the output order -as if- the lines were processed
    sequentially (assuming the processing of each is independant.)

    I started writing this up in terms of ordered queues of processing
    requests, and callbacks, but before I got very far on that approach, I
    realized that in theory I could instead do something akin to having a
    set of filehandles, barely change the existing code, and do some
    behind-the-scenes work so that what was printed to any one filehandle
    did not get sent to stdout until all the previous filehandles had
    terminated.


    Generalizing, I can see that this same mechanism could come up in other
    contexts -- that one might want to start threads that execute
    independantly, with a thread-localized filehandle that could be written
    to by each thread, with the outputs automatically being demultiplexed
    into a single filehandle so that all output from threads started
    earlier appeared before the threads started later. Would someone have
    some ideas on good ways to accomplish this?


    While writing the previous paragraph, I realized that in the program
    I'm working on most immediately, that output is always the last thing
    the thread would do, after the "work" is done, so what I could do is,
    just before output, queue on a semaphore that will not be made
    available until the previous thread finishes. But I could imagine
    other programs with intermediate outputs.


    I will not be able to literally use perl threads; threaded
    perl fails one of it's build tests on my IRIX system.
    ("known problem" "put here to stop you from putting into
    production.) I guess there's always forking.
    --
    Scintillate, scintillate, globule vivific
    Fain would I fathom thy nature specific.
    Loftily poised on ether capacious
    Strongly resembling a gem carbonaceous. -- Anon
    Walter Roberson, Jan 5, 2004
    #1
    1. Advertising

  2. Walter Roberson

    Sara Guest

    -cnrc.gc.ca (Walter Roberson) wrote in message news:<btbj5n$skb$>...
    > I am hoping someone can point me to an existing module or give
    > a good algorithm hint for this.
    >
    > I process large-ish text files, performing a bit of work on each line.
    > The processing of each line can often be done mostly independantly of
    > the other lines, but it would be better if all the output for a given
    > line were to proceed the output for the next line. Some manner of
    > internal buffering for that purpose would be fine.
    >
    > The per-line cpu time is short, by the way: but many lines call for DNS
    > lookups, and I'd like to do some of those lookups in parallel while
    > preferably maintining the output order -as if- the lines were processed
    > sequentially (assuming the processing of each is independant.)
    >
    > I started writing this up in terms of ordered queues of processing
    > requests, and callbacks, but before I got very far on that approach, I
    > realized that in theory I could instead do something akin to having a
    > set of filehandles, barely change the existing code, and do some
    > behind-the-scenes work so that what was printed to any one filehandle
    > did not get sent to stdout until all the previous filehandles had
    > terminated.
    >
    >
    > Generalizing, I can see that this same mechanism could come up in other
    > contexts -- that one might want to start threads that execute
    > independantly, with a thread-localized filehandle that could be written
    > to by each thread, with the outputs automatically being demultiplexed
    > into a single filehandle so that all output from threads started
    > earlier appeared before the threads started later. Would someone have
    > some ideas on good ways to accomplish this?
    >
    >
    > While writing the previous paragraph, I realized that in the program
    > I'm working on most immediately, that output is always the last thing
    > the thread would do, after the "work" is done, so what I could do is,
    > just before output, queue on a semaphore that will not be made
    > available until the previous thread finishes. But I could imagine
    > other programs with intermediate outputs.
    >
    >
    > I will not be able to literally use perl threads; threaded
    > perl fails one of it's build tests on my IRIX system.
    > ("known problem" "put here to stop you from putting into
    > production.) I guess there's always forking.


    Hello Anon:

    Do you really mean "multiplexed" as in TDM or FDM or ?? Seems like an
    odd term to use - perhaps you mean multi-threaded or
    parallel-processed? "Multiplexing" is splitting up one task in either
    time or frequency domains. Sounds like you want to multitask?

    At any rate, you want to process lines in a file indepenently.
    Optimizing this type of requirement can be difficult as it depends on
    your hardware, swapspace and overhead to do a swap, input size, etc.
    You might try some test cases, such as:

    split the file into 1/4's then kick off 4 BG tasks, and compare the
    clocktime with your baseline (running the lines sequentially in one
    thread).

    If the clocktime gets WORSE (which it could given swapping and
    multiple task overhead) then its possible you might have nothing to
    gain by trying to "multiplex" this process. If it improves then you'll
    need to start searching for an optimal number of threads, which would
    also depend on the size of your input file.

    Don't forget you'll create a lot of task management overhead- some
    process will have to split up and monitor (and error-check/report) all
    of the threads, and another will have to reassemble the results from
    the individual task.

    Also, if you already have a multi-processor system with a contemporary
    O/S then these types of tasks may already be optimized for you at
    runtime.

    G
    Sara, Jan 5, 2004
    #2
    1. Advertising

  3. In article <>,
    Sara <> wrote:
    :Do you really mean "multiplexed" as in TDM or FDM or ?? Seems like an
    :eek:dd term to use - perhaps you mean multi-threaded or
    :parallel-processed? "Multiplexing" is splitting up one task in either
    :time or frequency domains. Sounds like you want to multitask?

    TDM and FDM are two particular -kinds- of multiplexing. OED
    says:

    5. Telecomm. Designating or relating to a process or apparatus
    for simultaneously transmitting two or more signals or programmes
    (to be later separated and recovered) over a single wire or
    channel; designating a composite signal so formed.

    My task involves several "signals" that have to be
    "later separated and recovered", so "multiplex" was the proper term
    for the component I asked about. The several signals are generated
    through multitasking, but I have a sufficient handle on multitasking
    thanks.


    :Also, if you already have a multi-processor system with a contemporary
    :O/S then these types of tasks may already be optimized for you at
    :runtime.

    I have -several- multi-processor systems with contemporary O/S.
    On the fastest of them, the CPU time involved in my processing
    averages about 1 second per megabyte. CPU time is not the
    issue here: it's waiting for the results from remote DNS
    servers that is bogging the program down (e.g., over 8 hours
    clock time to process a 112 Mb file.) A single DNS lookup could take
    30 seconds clock time, so I will run several lookups in parallel
    (say, 100). As it is unpredictable which lookup will return first,
    it is very likely that the lookup results will come back
    out-of-order, so now I'm looking at mechanisms to re-order
    the output -- i.e., mechanisms to de-multiplex it. As long
    as the multiprocess management overhead is less than about 250
    times as expensive as the per-line CPU processing, I come out ahead.
    No point in having the system sitting idle waiting for
    remote systems to return DNS results.


    There are several options available to me. For example, I could
    just proceed every output line with the input line number it
    corresponds to (plus a sequence number of relative output line
    number), and then in a post-processing pass sort based upon those
    sequence numbers, but I think there are better options available.
    --
    "There are three kinds of lies: lies, damn lies, and statistics."
    -- not Twain, perhaps Disraeli, first quoted by Leonard Courtney
    Walter Roberson, Jan 5, 2004
    #3
  4. Walter Roberson

    Guest

    -cnrc.gc.ca (Walter Roberson) wrote:
    > I am hoping someone can point me to an existing module or give
    > a good algorithm hint for this.
    >
    > I process large-ish text files, performing a bit of work on each line.
    > The processing of each line can often be done mostly independantly of
    > the other lines, but it would be better if all the output for a given
    > line were to proceed the output for the next line. Some manner of
    > internal buffering for that purpose would be fine.
    >
    > The per-line cpu time is short, by the way: but many lines call for DNS
    > lookups, and I'd like to do some of those lookups in parallel while
    > preferably maintining the output order -as if- the lines were processed
    > sequentially (assuming the processing of each is independant.)


    Use the UNIX "split" util or a simple perl script to divide input file
    into (say) four files. Run your existing script on the four files
    independently in parallel, then concatenate the output files (in the same
    order they were parcelled out).

    Or, have the master thread assign a serial number to each line, have the
    slave threads output this serial number in additional to the normal output,
    then sort on this number.


    > I started writing this up in terms of ordered queues of processing
    > requests, and callbacks, but before I got very far on that approach, I
    > realized that in theory I could instead do something akin to having a
    > set of filehandles, barely change the existing code, and do some
    > behind-the-scenes work so that what was printed to any one filehandle
    > did not get sent to stdout until all the previous filehandles had
    > terminated.
    >
    > Generalizing, I can see that this same mechanism could come up in other
    > contexts -- that one might want to start threads that execute
    > independantly, with a thread-localized filehandle that could be written
    > to by each thread, with the outputs automatically being demultiplexed
    > into a single filehandle so that all output from threads started
    > earlier appeared before the threads started later. Would someone have
    > some ideas on good ways to accomplish this?


    I was going to suggest a heap with serial numbers might be the way to go,
    but I decided a simple hash (if made thread-safe) would work:

    while ($INPUT_STILL_GOING or $next_seq<=$last_seq) {
    sleep 1 while !exists $hash{$next_seq};
    print delete $hash{$next_seq++}, "\n";
    };

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service New Rate! $9.95/Month 50GB
    , Jan 5, 2004
    #4
  5. Walter Roberson

    gnari Guest

    "Walter Roberson" <-cnrc.gc.ca> wrote in message
    news:btc936$98f$...
    >
    > I have -several- multi-processor systems with contemporary O/S.
    > On the fastest of them, the CPU time involved in my processing
    > averages about 1 second per megabyte. CPU time is not the
    > issue here: it's waiting for the results from remote DNS
    > servers that is bogging the program down (e.g., over 8 hours
    > clock time to process a 112 Mb file.) A single DNS lookup could take
    > 30 seconds clock time, so I will run several lookups in parallel
    > (say, 100). As it is unpredictable which lookup will return first,
    > it is very likely that the lookup results will come back
    > out-of-order, so now I'm looking at mechanisms to re-order
    > the output -- i.e., mechanisms to de-multiplex it. As long
    > as the multiprocess management overhead is less than about 250
    > times as expensive as the per-line CPU processing, I come out ahead.
    > No point in having the system sitting idle waiting for
    > remote systems to return DNS results.
    >
    >
    > There are several options available to me. For example, I could
    > just proceed every output line with the input line number it
    > corresponds to (plus a sequence number of relative output line
    > number), and then in a post-processing pass sort based upon those
    > sequence numbers, but I think there are better options available.


    maybe it would be advantageous to separate the DNS lookups totally from
    your main process:
    a) performe a separate pass over the input, retrieving addresses to lookup
    b) sort, uniq and split list of addresses into , say 100 lists
    c) feed those to 100 identical DNS-lookup processes, createing a local
    lookup database.
    d) now run your original process, but using the local database
    instead of DNS-lookups

    I am not saying this would be more effective, but it should simplify
    your coding, and avoids unecessarily complicate your main input processing
    to keep the output in order.

    also, the database could be kept between runs if there is a chance of
    repeated
    addresses. then you would filter out known addresses in b) before the split

    gnari
    gnari, Jan 5, 2004
    #5
  6. Walter Roberson

    Ben Morrow Guest

    -cnrc.gc.ca (Walter Roberson) wrote:
    > While writing the previous paragraph, I realized that in the program
    > I'm working on most immediately, that output is always the last thing
    > the thread would do, after the "work" is done, so what I could do is,
    > just before output, queue on a semaphore that will not be made
    > available until the previous thread finishes. But I could imagine
    > other programs with intermediate outputs.


    What about (untested):

    use IO::Select;

    my @chunks = split_input_into_chunks();
    my (@FHs, %bufs);

    for(@chunks) {
    unless(open my $FH, "-|") {
    process_chunk($_);
    exit();
    }
    push $FH, @FHs;
    }

    my $sel = IO::Select->new(@FHs);

    while(my @to_read = $sel->can_read) {
    for(@to_read) {
    sysread $_, $bufs{$_}, 2048, -1
    or $sel->remove($_);

    if($_ == @FHs[0]) {
    print STDOUT $bufs{$_};
    $bufs{$_} = "";
    $sel->exists($_) or shift @FHs;
    }
    }
    }

    print STDOUT $bufs{$_} for @FHs;

    ? Obviously it needs error checking added.

    Ben

    --
    Every twenty-four hours about 34k children die from the effects of poverty.
    Meanwhile, the latest estimate is that 2800 people died on 9/11, so it's like
    that image, that ghastly, grey-billowing, double-barrelled fall, repeated
    twelve times every day. Full of children. [Iain Banks]
    Ben Morrow, Jan 5, 2004
    #6
  7. In article <btcejn$23g$>,
    Ben Morrow <> wrote:
    :What about (untested):

    :use IO::Select;
    [...]

    Thanks, Ben, I'll go through that in more detail when I'm a bit
    more awake.

    One possibility I was thinking of was to introduce a new object
    inheriting from IO, in which a "write" to any given handle appended
    ( e.g., .= ) to the end of a class variable, to be released to the
    master file-handle when all the previous objects had closed.

    Hmmm, guess that wouldn't really be a class variable in a pure
    sense: it should really be something parameterized by the master
    file-handle. Dual-level objects perhaps, create a multiplexer
    master and then create ordered slaves off of that master.
    (Queue the memories of SVRr3's "clone" device mechanism!)


    --
    What is "The Ultimate Meme"? Would it, like Monty Python's
    "The World's Funniest Joke", lead to the deaths of everyone who
    encountered it? Ideas *have* lead to the destruction of entire cultures.
    -- A Child's Garden Of Memes
    Walter Roberson, Jan 5, 2004
    #7
  8. Walter Roberson

    Uri Guttman Guest


    >>>>> "WR" == Walter Roberson <-cnrc.gc.ca> writes:


    WR> My task involves several "signals" that have to be
    WR> "later separated and recovered", so "multiplex" was the proper term
    WR> for the component I asked about. The several signals are generated
    WR> through multitasking, but I have a sufficient handle on multitasking
    WR> thanks.

    WR> I have -several- multi-processor systems with contemporary O/S.
    WR> On the fastest of them, the CPU time involved in my processing
    WR> averages about 1 second per megabyte. CPU time is not the
    WR> issue here: it's waiting for the results from remote DNS
    WR> servers that is bogging the program down (e.g., over 8 hours
    WR> clock time to process a 112 Mb file.) A single DNS lookup could take
    WR> 30 seconds clock time, so I will run several lookups in parallel
    WR> (say, 100). As it is unpredictable which lookup will return first,
    WR> it is very likely that the lookup results will come back
    WR> out-of-order, so now I'm looking at mechanisms to re-order
    WR> the output -- i.e., mechanisms to de-multiplex it. As long
    WR> as the multiprocess management overhead is less than about 250
    WR> times as expensive as the per-line CPU processing, I come out ahead.
    WR> No point in having the system sitting idle waiting for
    WR> remote systems to return DNS results.

    i will have to recommend you check out stem. it can do all you want and
    with much less effort than you are doing now. stem is a network
    application toolkit and framework and it supports message passing and
    events. it is easy to create connected processes (forked or otherwise)
    and some can do the blocking DNS lookups and others the main crunching
    of the file. creating the modules you need under stem for your
    application would be very simple. a single method that takes in a
    message and does a DNS lookup (gethostbyname?) would be all you need for
    that service. creating your complete application is also simple as it
    won't require any network programming, just configuring the desired
    components and running them under stem.

    you can check out stem at stemsystems.com. feel free to ask me any
    questions or if you need development or technical support, i am
    available.

    thanx,

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Jan 5, 2004
    #8
  9. Walter Roberson

    Rocco Caputo Guest

    On 5 Jan 2004 11:52:23 GMT, Walter Roberson wrote:
    > I am hoping someone can point me to an existing module or give
    > a good algorithm hint for this.
    >
    > I process large-ish text files, performing a bit of work on each line.
    > The processing of each line can often be done mostly independantly of
    > the other lines, but it would be better if all the output for a given
    > line were to proceed the output for the next line. Some manner of
    > internal buffering for that purpose would be fine.
    >
    > The per-line cpu time is short, by the way: but many lines call for DNS
    > lookups, and I'd like to do some of those lookups in parallel while
    > preferably maintining the output order -as if- the lines were processed
    > sequentially (assuming the processing of each is independant.)


    You might want to look at POE, on the CPAN and at http://poe.perl.org/.
    Specifically, the cookbook at http://poe.perl.org/?POE_Cookbook has
    examples of asynchronous DNS lookups using POE::Component::Client::DNS.

    Matching the input order can be done simply by tagging each line of
    output with the input line number

    printf "%9d %s\n", $input_line_number, $output_text;

    then sorting the results and cutting off the first ten characters of
    each line.

    ./foo | sort -n | cut 11- > output.text

    .... which isn't Perl, but it beats writing code. :)

    > I started writing this up in terms of ordered queues of processing
    > requests, and callbacks, but before I got very far on that approach, I
    > realized that in theory I could instead do something akin to having a
    > set of filehandles, barely change the existing code, and do some
    > behind-the-scenes work so that what was printed to any one filehandle
    > did not get sent to stdout until all the previous filehandles had
    > terminated.


    That seems like an awful lot of work. Consider this alternative:

    Maintain an output window, and its offset. Perl starts counting input
    and output records at 1, so we do too.

    my $output_offset = 1;
    my @window;

    Here's a callback to process a record asynchronously. We use the return
    value to determine when your input source is exhausted.

    sub process_one_more {
    my $next_record = <INPUT>;
    return unless defined $next_record;

    chomp $next_record;
    process_record($., $next_record);
    return 1;
    }

    Theoretically, you call process_one_more() until you have enough records
    being processed in parallel. At that point you stop calling it until
    one of the parallel records is done.

    Note that we're passing the input record number along with the record.
    It's used in output() to reconstitute the file's original order.

    When a record is done being processed, the results are sent to output,
    and we try to process a new record to replace the finished one.

    output($input_record_number, $output_record);
    process_one_more();

    The output function adds the output record to @window, then it decides
    whether some records can be flushed to disk.

    sub output {
    my ($input_record_number, $output_record) = @_;

    $window[$input_record_number - $output_offset] = $output_record;

    while (@window and defined $window[0]) {
    print OUTPUT shift @window;
    $output_offset++;
    }
    }

    Assume there are four records, which become ready in the order 4, 2, 1,
    and 3.

    Record 4 will be entered into position 3 of the window, because
    $output_offset (1) is subtracted from it. The window looks like

    $output_offset == 1

    0: undef
    1: undef
    2: undef
    3: record 4

    Likewise, records 2 and 1 are entered into positions 1 and 0,
    respectively.

    $output_offset == 1

    0: record 1
    1: record 2
    2: undef
    3: record 4

    Position 0 is now defined, so the while() loop shifts that record off
    the @window, writes it to OUTPUT, and adjusts the output offset.

    $output_offset == 2

    0: record 2
    1: undef
    2: record 4

    Position 0 is still defined, so record 2 is shifted off and flushed.

    $output_offset == 3

    0: undef
    1: record 4

    Record 3 will be entered into position 0 of the window, because
    $output_offset is subtracted from its record number.

    0: record 3
    1: record 4

    The while() loop iterates twice, writing out records 3 and 4, and
    leaving $output_offest at 5.

    etc.

    > Generalizing, I can see that this same mechanism could come up in other
    > contexts -- that one might want to start threads that execute
    > independantly, with a thread-localized filehandle that could be written
    > to by each thread, with the outputs automatically being demultiplexed
    > into a single filehandle so that all output from threads started
    > earlier appeared before the threads started later. Would someone have
    > some ideas on good ways to accomplish this?
    >
    > While writing the previous paragraph, I realized that in the program
    > I'm working on most immediately, that output is always the last thing
    > the thread would do, after the "work" is done, so what I could do is,
    > just before output, queue on a semaphore that will not be made
    > available until the previous thread finishes. But I could imagine
    > other programs with intermediate outputs.


    I think you're thinking too hard. :)

    > I will not be able to literally use perl threads; threaded
    > perl fails one of it's build tests on my IRIX system.
    > ("known problem" "put here to stop you from putting into
    > production.) I guess there's always forking.


    POE doesn't require threads or forking to handle asynchronous DNS
    requests. If you run into problems installing it on IRIX, please e-mail
    them (and any helpful output) to the output of

    perl -wle '$_=q();tr[a-z][n-za-m];print'

    Thanks!

    --
    Rocco Caputo - - http://poe.perl.org/
    Rocco Caputo, Jan 5, 2004
    #9
  10. In message <btbj5n$skb$>, Walter Roberson
    <-cnrc.gc.ca> writes
    >The per-line cpu time is short, by the way: but many lines call for DNS
    >lookups, and I'd like to do some of those lookups in parallel while
    >preferably maintining the output order -as if- the lines were processed
    >sequentially (assuming the processing of each is independant.)
    >
    >I started writing this up in terms of ordered queues of processing
    >requests, and callbacks, but before I got very far on that approach, I
    >realized that in theory I could instead do something akin to having a
    >set of filehandles, barely change the existing code, and do some
    >behind-the-scenes work so that what was printed to any one filehandle
    >did not get sent to stdout until all the previous filehandles had
    >terminated.


    Are the text output lines of constant length or similar length? You
    could record the current write point in the output file (call it WP),
    write a line of n spaces as a placeholder and spawn the child process,
    giving it WP (and the filehandle) as a parameter.

    You then write another line of n spaces and spawn the next child and so
    on.

    Each child process does its stuff and seeks to WP before overwriting the
    n spaces with its n characters of real data.

    Even if the lines aren't exactly the same length, you could do the same
    thing but set n to allow for the maximum and then do a single
    post-process scan at the end to trim trailing blanks. (Or just leave the
    trailing blanks.)

    Regards,
    --
    Bruce Horrocks
    Surrey
    England
    <firstname>@<surname>.plus.com -- fix the obvious for email
    Bruce Horrocks, Jan 5, 2004
    #10
  11. Walter Roberson

    Anno Siegel Guest

    gnari <> wrote in comp.lang.perl.misc:

    [DNS lookup]

    > maybe it would be advantageous to separate the DNS lookups totally from
    > your main process:
    > a) performe a separate pass over the input, retrieving addresses to lookup
    > b) sort, uniq and split list of addresses into , say 100 lists
    > c) feed those to 100 identical DNS-lookup processes, createing a local
    > lookup database.
    > d) now run your original process, but using the local database
    > instead of DNS-lookups


    The strategy seems sound, but I'm a little worried by the number of
    parallel lookups suggested, here and elsewhere in the thread. I'm
    no DNS expert, but there must be a limit to how many lookups the local
    system can deal with in a given time. Once you reach that limit,
    additional parallelization won't gain anything. The point of anti-social
    behavior may be reached before that. Just from a guts feeling, 5-10
    parallel processes seems more realistic than 100.

    Anno
    Anno Siegel, Jan 7, 2004
    #11
  12. On 5 Jan 2004 11:52:23 GMT, -cnrc.gc.ca (Walter
    Roberson) wrote:

    >The per-line cpu time is short, by the way: but many lines call for DNS
    >lookups, and I'd like to do some of those lookups in parallel while
    >preferably maintining the output order -as if- the lines were processed
    >sequentially (assuming the processing of each is independant.)


    I haven't read carefully all the other posts in this thread, but it
    seems to me that no one has mentioned "open my $fh, '-|', command"
    yet: I am well aware that mine may be a naive suggestion to you, but
    it may well be worth giving it a try...

    More precisely, to preserve the order of the input lines, you may put
    into an array to be used like a FIFO/buffer the processed line if
    there's no lengthy process to be performed and a reference to an
    open()ed FH, running the lengthy process, otherwise.


    Michele
    --
    # This prints: Just another Perl hacker,
    seek DATA,15,0 and print q... <DATA>;
    __END__
    Michele Dondi, Jan 8, 2004
    #12
  13. In article <btce6j$fce$>, gnari <> wrote:
    :"Walter Roberson" <-cnrc.gc.ca> wrote in message
    :news:btc936$98f$...

    :> I have -several- multi-processor systems with contemporary O/S.
    :> On the fastest of them, the CPU time involved in my processing
    :> averages about 1 second per megabyte. CPU time is not the
    :> issue here: it's waiting for the results from remote DNS
    :> servers that is bogging the program down (e.g., over 8 hours
    :> clock time to process a 112 Mb file.)

    :maybe it would be advantageous to separate the DNS lookups totally from
    :your main process:
    :a) performe a separate pass over the input, retrieving addresses to lookup

    Thanks. What I've done in the meantime is use Storable to save
    and restore the lookup caches, so that later passes over the same
    (or similar) data will execute more quickly. It isn't perfect
    because it can still take a long time the first time around, and
    because I'm not recording any lookup expiry information. I would
    perhaps be better off with using a nameserver that used a persistant
    cache, but my investigation of bind9 shows that bind9's dump and
    restore features are intended only for bind9 developers.

    I am working on moving the lookups into threads, and am making
    some progress but there is still much to do on it.
    --
    100% of all human deaths occur within 100 miles of Earth.
    Walter Roberson, Jan 9, 2004
    #13
  14. In article <bth0tm$alg$-Berlin.DE>,
    Anno Siegel <-berlin.de> wrote:
    :The strategy seems sound, but I'm a little worried by the number of
    :parallel lookups suggested, here and elsewhere in the thread. I'm
    :no DNS expert, but there must be a limit to how many lookups the local
    :system can deal with in a given time. Once you reach that limit,
    :additional parallelization won't gain anything. The point of anti-social
    :behavior may be reached before that.

    True.

    :Just from a guts feeling, 5-10
    :parallel processes seems more realistic than 100.

    In this particular case, the kernels are engineered so that
    (say) 100 parallel lookups shouldn't be a significant problem.
    They were designed from the ground up as high-performance
    multiprocessing machine, complete with distributed signal handling
    and distributed system call handling. But you do raise a good
    point, so I will keep an eye on it during my testing.
    --
    Positrons can be described as electrons traveling backwards in time.
    Certainly many Usenet arguments about the past become clearer when they
    are re-interpreted as uncertainty about the future.
    -- Walter Roberson
    Walter Roberson, Jan 9, 2004
    #14
  15. In article <bth0tm$alg$-Berlin.DE>,
    Anno Siegel <-berlin.de> wrote:
    :The strategy seems sound, but I'm a little worried by the number of
    :parallel lookups suggested, here and elsewhere in the thread. I'm
    :no DNS expert, but there must be a limit to how many lookups the local
    :system can deal with in a given time. Once you reach that limit,
    :additional parallelization won't gain anything. The point of anti-social
    :behavior may be reached before that. Just from a guts feeling, 5-10
    :parallel processes seems more realistic than 100.

    I hit the limit rather sooner than that -- dns lookups in threads
    was noticably slower than sequential lookups, even with only
    one lookup thread active at a time.

    I found something in perlthrtut that I believe accounts for this.
    In the discussion of private vs shared variables, the documentation
    notes that at the time the thread is created, private copies are
    taken of all private variables, so that the thread can potentially
    modify them in isolation. If so, then the slowness I saw would
    be due to the time spent cloning my address space. My small
    threads test program is not particularily slow, but the data in
    my real program is non-trivial.

    I am revising the program to be more aware of when it needs to spin
    off threads and when it doesn't (particularily important when the
    dns data is in cache already.) At some point, though, I'll probably
    have to revise further to spin off some worker threads very early
    on (before I load my data) and then distribute work to them as needed.

    I guess I was expecting "copy on write" or other light-weight semantics.
    Or expecting a way to be able to spin off a thread with a very limited
    data context... sort of closure-like.
    --
    Sub-millibarn resolution bio-hyperdimensional plasmatic space
    polyimaging is just around the corner. -- Corry Lee Smith
    Walter Roberson, Jan 11, 2004
    #15
  16. Walter Roberson

    Ben Morrow Guest

    -cnrc.gc.ca (Walter Roberson) wrote:
    > I guess I was expecting "copy on write" or other light-weight semantics.
    > Or expecting a way to be able to spin off a thread with a very limited
    > data context... sort of closure-like.


    perldoc threads
    | TODO
    | The current implementation of threads has been an attempt to get a cor-
    | rect threading system working that could be built on, and optimized, in
    | newer versions of perl.
    |
    | Currently the overhead of creating a thread is rather large, also the
    | cost of returning values can be large. These are areas were there most
    | likely will be work done to optimize what data that needs to be cloned.

    Ben

    --
    I've seen things you people wouldn't believe: attack ships on fire off the
    shoulder of Orion; I've watched C-beams glitter in the darkness near the
    Tannhauser Gate. All these moments will be lost, in time, like tears in rain.
    Time to die. |-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-|
    Ben Morrow, Jan 11, 2004
    #16
  17. In article <btrg4p$974$>,
    Walter Roberson <-cnrc.gc.ca> wrote:
    |I am revising the program to be more aware of when it needs to spin
    |off threads and when it doesn't (particularily important when the
    |dns data is in cache already.) At some point, though, I'll probably
    |have to revise further to spin off some worker threads very early
    |on (before I load my data) and then distribute work to them as needed.

    I have worked on this parallel DNS lookups [for log file analysis] for
    quite a few hours now, and I have learned some lessons:

    1) perl 5.8.2 thread start-up time depends heavily upon the size of
    your unshared variables. This is because ithreads copy all unshared
    variables that are of potential use by the new thread -- there is,
    at present, no copy-on-write semantics.

    2) It seems it also can take a fair time to copy shared variables -- just
    not *as* much time.

    3) Because of the copying behaviour, you should start up your threads
    after all the variables they will need have been set, but before
    large variables they don't need have been created. For example, one
    of my large hashes was not needed by the threads, so I postponed
    initializing the hash.

    4) You can only use the :shared syntax for 'my' variables. This makes
    it harder to selectively create shared variables depending on whether
    threads are present or not (but you can use the share() function.)

    5) Using the share function on arrays or hashes wipes out the previous
    values stored. Using the share function on a scalar preserves the existing
    value.

    6) You can't lock a hash element. This is documented, but documented
    in terms of $hashref->{key} so it wasn't clear that locking $hash{key} would
    not work.

    7) Avoid creating new threads. This may mean introducing new functions
    that do the work if it can be done locally and return a special status
    if you really do need another thread. Depends how much you want your
    utility functions to know about your thread management strategy.

    8) If you need to have different thread results come out in a
    particular order, you can use a semaphore and at the end of each
    step 'up' by the [sequential] identifier of the next in line. It can
    be a bit tricky to get this working properly... and it turns out to
    be kind of slow.

    8a) Instead, another way that works much much faster is to drop the
    result into a shared hash keyed by the sequence identifier and have a
    mainline harvesting routine that pulls the results out in sequence.

    9) Avoid creating new threads, da.*it! Try a pool of threads and
    Thread::Queue enqueue work for them -- that is *much* faster than
    creating a new thread if you have large data structures sitting around
    [see point 1.] I held off trying this approach for rather some time
    thinking it was going to be hard to impliment, but it turned out to be
    relatively easy.

    10) Thread::Queue won't handle refs or anything more than simple variables.
    And it only allows a single item to be dequeued at a time -- not much
    fun if you want to pass a bunch of parameters. Thread::Queue::Any
    from CPAN handles complicated structures and is much more sensible
    about groups of parameters, but it uses Storable to move the data around,
    and sometimes it gets confused when it tries to thaw the data. If it
    does get confused, then you will probably get an error message about
    being unable to start up a thread -- I suspect it is referring to an
    internal thread to handle an 'eval'.

    11) perl -d gets confused and very messed up if you have multiple
    threads running. But maybe I just haven't learned how to debug threads
    yet.

    12) lock so that you don't have shared hashes being written to
    by multiple threads. Even if the different threads are writing to
    different elements, you will end up with a corrupt hash. When I
    accidently turned my locking off, I ended up with strange results.

    13) Detached threads take their own sweet time exiting, and there
    is nothing you can do about it except sleep() until they are gone.

    14) If you have a lot of detached threads, your program will probably
    just die for no apparent reason. If you don't detach your threads, then
    they are going to sit around using resources until you harvest them.
    Consider having a routine that periodically join()'s to older threads
    so as to release the resources. This is tricky because there is no
    provided mechanism to tell whether a thread is active or waiting to
    be shutdown. Consider having the thread notify its completion by
    mechanisms such as in 8a).

    15) Cache the results of your threads if there is a realistic chance
    you will need the result again. (See point 7.) The speedup can be
    substantial.

    16) Unless you put a lot of work in to optimizing thread usage, your
    threaded program might be noticably slower than proceeding non-threaded.

    In the case of my log-file scanner, I am now at the point where
    the non-threaded version is -usually- faster, but sometimes the
    threaded version is faster, depending on system load, how much DNS
    has been cached by the system, and so on. The variations in runtimes
    of the non-threaded versions can exceed the runtime of the
    threaded version.

    Next step would probably be to try it on some really big input files;
    I'm no-longer using Thread::Queue::Any, so I'm expecting the program
    now will not die after 400000 input lines or so.
    --
    "No one has the right to destroy another person's belief by
    demanding empirical evidence." -- Ann Landers
    Walter Roberson, Jan 20, 2004
    #17
  18. In article <buiu5l$33h$>,
    Walter Roberson <-cnrc.gc.ca> wrote:
    [lessons learned about perl threads]

    17) If you try to use Storable to save a shared variable, you will
    just get a string that makes reference to Tied, and the actual contents
    will not be saved. Work around: copy the data into a temporary variable
    and save that temporary variable.
    --
    Warning: potentially contains traces of nuts.
    Walter Roberson, Jan 21, 2004
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Santastp

    Suggestion groups??

    Santastp, Sep 18, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    398
    Jay B. Harlow [MVP - Outlook]
    Sep 18, 2003
  2. Replies:
    5
    Views:
    552
    Ray Andraka
    Mar 3, 2005
  3. scadav
    Replies:
    2
    Views:
    441
    scadav
    Jul 3, 2004
  4. Michael
    Replies:
    4
    Views:
    288
    Sabre
    Aug 2, 2003
  5. Hardy Wang

    Suggestion needed for huge DataGrid

    Hardy Wang, Nov 12, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    340
Loading...

Share This Page