perl multithreading performance

Discussion in 'Perl Misc' started by dniq00@gmail.com, Aug 27, 2008.

  1. Guest

    Hello, oh almighty perl gurus!

    I'm trying to implement multithreaded processing for the humongous
    amount of logs that I'm currently processing in 1 process on a 4-CPU
    server.

    What the script does is for each line it checks if the line contains
    GET request, and if it does - goes through a list of pre-compiled
    regular expressions, trying to find a matching one. Once the match is
    found - it uses another regexp, associated with the found match, which
    is a bit more complex, to extract data from the line. I have split it
    in two separate matches, because about 30% of all lines will match,
    and I don't want to run that complex regexp to extract data for all
    the lines I know won't match. The goal is to count how many lines
    matched for every specific regexp, and the end result is built as a
    hash, having data, extracted from the line with second regexp, used as
    hash keys, and the value is the number of matches.

    Anyway, currently all this is done in a single process, which parses
    approx. 30000 lines per second. The CPU usage for this process is
    100%, so the bottleneck is in the parsing part.

    I have changed the script to use threads + threads::shared +
    Thread::Queue. I read data from logs like this:

    Code
    until( $no_more_data ) {
    my @buffer;
    foreach( (1..$buffer_size) ) {
    if( my $line = <> ) {
    push( @buffer, $line );
    } else {
    $no_more_data = 1;
    $q_in->enqueue( \@buffer );
    foreach( (1..$cpu_count) ) {
    $q_in->enqueue( undef );
    }
    last;
    }
    }
    $q_in->enqueue( \@buffer ) unless $no_more_data;
    }

    Then, I create $cpu_count threads, which does something like this:

    Code
    sub parser {
    my $counters = {};
    while( my $buffer = $q_in->dequeue() ) {
    foreach my $line ( @{ $buffer } ) {
    # do its thing
    }
    }
    return $counters;
    }

    Everything works fine, HOWEVER! It's all so damn slow! It's only 10%
    faster than single-process script, consumes about 2-3 times more
    memory and about as much times more CPU.

    I've also tried abandoning the Thread:Queue and just use
    threads::shared with lock/cond_wait/cond_signal combination, without
    much success.

    I've tried to play with $cpu_count and $buf_size, and found that after
    $buf_size > 1000 doesn't make much difference, and $cpu_count > 2
    actually makes things a lot worse.

    Any ideas why in the world it's so slow? I did some research and
    couldn't find a lot of info, other than the way I do it pretty much
    the way it should be done, unless I'm missing something...

    Hope anybody can enlighten me...

    THANKS!
     
    , Aug 27, 2008
    #1
    1. Advertising

  2. On Wed, 27 Aug 2008 12:59:36 -0700, dniq00 wrote:
    >
    > Everything works fine, HOWEVER! It's all so damn slow! It's only 10%
    > faster than single-process script, consumes about 2-3 times more memory
    > and about as much times more CPU.
    >
    > I've also tried abandoning the Thread:Queue and just use threads::shared
    > with lock/cond_wait/cond_signal combination, without much success.
    >
    > I've tried to play with $cpu_count and $buf_size, and found that after
    > $buf_size > 1000 doesn't make much difference, and $cpu_count > 2
    > actually makes things a lot worse.
    >
    > Any ideas why in the world it's so slow? I did some research and
    > couldn't find a lot of info, other than the way I do it pretty much the
    > way it should be done, unless I'm missing something...
    >
    > Hope anybody can enlighten me...
    >
    > THANKS!


    The speed of perl's threading is dependent on how much you share between
    threads. Sharing the lines before processing them can become a
    bottleneck, I suspect that's the problem in your case. You probably want
    to divide the work first, and only used shared resources to report back
    the results. Making a program scale over multiple processors isn't easy.
    Sean O'Rourke entry in the wide finder benchmark (http://www.cs.ucsd.edu/
    ~sorourke/wf.pl) offers an interesting approach to this, though it isn't
    exactly optimized for readability.

    Regards,

    Leon Timmermans
     
    Leon Timmermans, Aug 27, 2008
    #2
    1. Advertising

  3. Ted Zlatanov Guest

    On Wed, 27 Aug 2008 12:59:36 -0700 (PDT) wrote:

    d> What the script does is for each line it checks if the line contains
    d> GET request, and if it does - goes through a list of pre-compiled
    d> regular expressions, trying to find a matching one. Once the match is
    d> found - it uses another regexp, associated with the found match, which
    d> is a bit more complex, to extract data from the line. I have split it
    d> in two separate matches, because about 30% of all lines will match,
    d> and I don't want to run that complex regexp to extract data for all
    d> the lines I know won't match. The goal is to count how many lines
    d> matched for every specific regexp, and the end result is built as a
    d> hash, having data, extracted from the line with second regexp, used as
    d> hash keys, and the value is the number of matches.

    d> Anyway, currently all this is done in a single process, which parses
    d> approx. 30000 lines per second. The CPU usage for this process is
    d> 100%, so the bottleneck is in the parsing part.
    ....
    d> Everything works fine, HOWEVER! It's all so damn slow! It's only 10%
    d> faster than single-process script, consumes about 2-3 times more
    d> memory and about as much times more CPU.
    ....
    d> Any ideas why in the world it's so slow? I did some research and
    d> couldn't find a lot of info, other than the way I do it pretty much
    d> the way it should be done, unless I'm missing something...

    You may be hitting the limits of I/O. Try feeding your script
    pre-canned data from memory in a loop and see if that improves
    performance. It also depends on what kind of processing you are doing
    on input lines.

    Also, check out the swatch log file monitor, it may do what you need
    already.

    Ted
     
    Ted Zlatanov, Aug 27, 2008
    #3
  4. Guest

    On Aug 27, 5:06 pm, Ted Zlatanov <> wrote:
    > You may be hitting the limits of I/O.  Try feeding your script
    > pre-canned data from memory in a loop and see if that improves
    > performance.


    No, the IO is fine - there are pretty much always $q_in->pending > 1,
    and as the script does its thing, number of pending buffers sometimes
    goes beyond 10.

    > It also depends on what kind of processing you are doing
    > on input lines.


    Just trying to match multiple regexps against each line.

    > Also, check out the swatch log file monitor, it may do what you need
    > already.


    Nope, it doesn't :( I already have the single-threaded script, which
    has been working for years now, but the amount of logs it needs to
    process keeps growing, and I'm basically at the point where it can
    only keep up with the speed with which logs are being written, so if
    there's back-log for whatever reason - it might not catch up, so I'm
    looking into how I can improve its performance.
     
    , Aug 27, 2008
    #4
  5. Guest

    On Aug 27, 4:39 pm, Leon Timmermans <> wrote:
    > On Wed, 27 Aug 2008 12:59:36 -0700, dniq00 wrote:
    >
    > > Everything works fine, HOWEVER! It's all so damn slow! It's only 10%
    > > faster than single-process script, consumes about 2-3 times more memory
    > > and about as much times more CPU.

    >
    > > I've also tried abandoning the Thread:Queue and just use threads::shared
    > > with lock/cond_wait/cond_signal combination, without much success.

    >
    > > I've tried to play with $cpu_count and $buf_size, and found that after
    > > $buf_size > 1000 doesn't make much difference, and $cpu_count > 2
    > > actually makes things a lot worse.

    >
    > > Any ideas why in the world it's so slow? I did some research and
    > > couldn't find a lot of info, other than the way I do it pretty much the
    > > way it should be done, unless I'm missing something...

    >
    > > Hope anybody can enlighten me...

    >
    > > THANKS!

    >
    > The speed of perl's threading is dependent on how much you share between
    > threads. Sharing the lines before processing them can become a
    > bottleneck, I suspect that's the problem in your case. You probably want
    > to divide the work first, and only used shared resources to report back
    > the results. Making a program scale over multiple processors isn't easy.
    > Sean O'Rourke entry in the wide finder benchmark (http://www.cs.ucsd.edu/
    > ~sorourke/wf.pl) offers an interesting approach to this, though it isn't
    > exactly optimized for readability.
    >
    > Regards,
    >
    > Leon Timmermans


    Thanks for the link - trying to figure out whattahellisgoingon
    there :) Looks like he's basically mmaps the input and begins reading
    it starting at different points. Thing is, I'm using <> as input,
    which can contain hundreds of gigabytes of data, so I'm not sure how's
    that going to work out...
     
    , Aug 27, 2008
    #5
  6. On Wed, 27 Aug 2008 14:15:34 -0700, dniq00 wrote:

    > Nope, it doesn't :( I already have the single-threaded script, which has
    > been working for years now, but the amount of logs it needs to process
    > keeps growing, and I'm basically at the point where it can only keep up
    > with the speed with which logs are being written, so if there's back-log
    > for whatever reason - it might not catch up, so I'm looking into how I
    > can improve its performance.


    Perl threading, well frankly, sucks. You may want to switch to another
    language with re support that meets your needs. I would go for C++ (with
    boost), but then I know that language very well.

    M4
     
    Martijn Lievaart, Aug 27, 2008
    #6
  7. Guest

    wrote:
    > Hello, oh almighty perl gurus!
    >
    > I'm trying to implement multithreaded processing for the humongous
    > amount of logs that I'm currently processing in 1 process on a 4-CPU
    > server.


    Start 4 processes, telling each one to work on a different log file.
    Either do this from the command line, or implement it with fork or system,
    depending on how automatic it all has to be.

    > Anyway, currently all this is done in a single process, which parses
    > approx. 30000 lines per second.


    If you just check for GET (and then ignore the result), how many lines per
    second would it do?

    > The CPU usage for this process is
    > 100%, so the bottleneck is in the parsing part.
    >
    > I have changed the script to use threads + threads::shared +
    > Thread::Queue. I read data from logs like this:
    >
    > Code
    > until( $no_more_data ) {
    > my @buffer;
    > foreach( (1..$buffer_size) ) {
    > if( my $line = <> ) {
    > push( @buffer, $line );
    > } else {
    > $no_more_data = 1;
    > $q_in->enqueue( \@buffer );
    > foreach( (1..$cpu_count) ) {
    > $q_in->enqueue( undef );
    > }
    > last;
    > }
    > }
    > $q_in->enqueue( \@buffer ) unless $no_more_data;
    > }
    >
    > Then, I create $cpu_count threads, which does something like this:


    What do you mean "then"? If you wait until all lines are enqueued before
    you create the consumer threads, your entire log file will be in memory!

    >
    > Code
    > sub parser {
    > my $counters = {};
    > while( my $buffer = $q_in->dequeue() ) {
    > foreach my $line ( @{ $buffer } ) {
    > # do its thing
    > }
    > }
    > return $counters;
    > }


    When $counters is returned, what do you do with it? That could be
    another synchronization bottleneck.

    >
    > Everything works fine, HOWEVER! It's all so damn slow! It's only 10%
    > faster than single-process script, consumes about 2-3 times more
    > memory and about as much times more CPU.


    That doesn't surprise me.

    > I've also tried abandoning the Thread:Queue and just use
    > threads::shared with lock/cond_wait/cond_signal combination, without
    > much success.


    This also doesn't surprise me. Synchronizing shared access is hard and
    often slow.


    >
    > I've tried to play with $cpu_count and $buf_size, and found that after
    > $buf_size > 1000 doesn't make much difference, and $cpu_count > 2
    > actually makes things a lot worse.
    >
    > Any ideas why in the world it's so slow? I did some research and
    > couldn't find a lot of info, other than the way I do it pretty much
    > the way it should be done, unless I'm missing something...
    >
    > Hope anybody can enlighten me...


    If you post fully runnable dummy code, and a simple program which
    generates log-file data to put through it, I'd probably couldn't resist the
    temptation to play around with it and find the bottlenecks.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
     
    , Aug 27, 2008
    #7
  8. cartercc Guest

    On Aug 27, 5:53 pm, Martijn Lievaart <> wrote:

    > Perl threading, well frankly, sucks. You may want to switch to another
    > language with re support that meets your needs. I would go for C++ (with
    > boost), but then I know that language very well.


    I've been playing with Erlang. In this case, you could probably spawn
    separate threads per line and have them all run concurrently. I
    haven't done a 'real' project (yet) but I've written some toy scripts
    that tear through large files in fractions of milliseconds.

    CC
     
    cartercc, Aug 28, 2008
    #8
  9. Ted Zlatanov Guest

    On Wed, 27 Aug 2008 23:53:09 +0200 Martijn Lievaart <> wrote:

    ML> On Wed, 27 Aug 2008 14:15:34 -0700, dniq00 wrote:
    >> Nope, it doesn't :( I already have the single-threaded script, which has
    >> been working for years now, but the amount of logs it needs to process
    >> keeps growing, and I'm basically at the point where it can only keep up
    >> with the speed with which logs are being written, so if there's back-log
    >> for whatever reason - it might not catch up, so I'm looking into how I
    >> can improve its performance.


    ML> Perl threading, well frankly, sucks. You may want to switch to another
    ML> language with re support that meets your needs. I would go for C++ (with
    ML> boost), but then I know that language very well.

    Hadoop is a nice non-Perl framework for this kind of work.

    Ted
     
    Ted Zlatanov, Aug 28, 2008
    #9
  10. J. Gleixner Guest

    wrote:
    > Hello, oh almighty perl gurus!
    >
    > I'm trying to implement multithreaded processing for the humongous
    > amount of logs that I'm currently processing in 1 process on a 4-CPU
    > server.
    >
    > What the script does is for each line it checks if the line contains
    > GET request, and if it does - goes through a list of pre-compiled
    > regular expressions, trying to find a matching one. [...]


    > Any ideas why in the world it's so slow? I did some research and
    > couldn't find a lot of info, other than the way I do it pretty much
    > the way it should be done, unless I'm missing something...


    Another, much easier/faster approach, would be:

    grep ' GET ' file | your_script.pl

    The earlier you can filter out the work that's needed, the better, and
    you're not going to get much faster than grep. The more refined you
    can make that initial filtering of data to only send lines you're
    interested in, to your program, the better.
     
    J. Gleixner, Aug 28, 2008
    #10
  11. On Wed, 27 Aug 2008 14:25:32 -0700, dniq00 wrote:
    > Thanks for the link - trying to figure out whattahellisgoingon there :)
    > Looks like he's basically mmaps the input and begins reading it starting
    > at different points. Thing is, I'm using <> as input, which can contain
    > hundreds of gigabytes of data, so I'm not sure how's that going to work
    > out...


    Is your computer 64 or 32 bits? In the former case mmap will work for
    such large files, but the latter it won't. In that case it may not be a
    bad idea to split the log files into chunks that do fit into your memory
    space. An additional advantage of that would be that you may not need to
    use threads at all.

    Regards,

    Leon
     
    Leon Timmermans, Aug 28, 2008
    #11
  12. On Wed, 27 Aug 2008 23:53:09 +0200, Martijn Lievaart wrote:

    > Perl threading, well frankly, sucks. You may want to switch to another
    > language with re support that meets your needs.


    Some would say all threading sucks. All approaches are either hard to get
    a proper performance from or hard to get correct. At least the queue
    approach perl promotes gets one of them right.

    Also lets not forget that Perl at least supports preemptive threading.
    Ruby doesn't at all and python has a giant interpreter lock, making it
    useless for this kind of problem.

    Regards,

    Leon Timmermans
     
    Leon Timmermans, Aug 28, 2008
    #12
  13. On Thu, 28 Aug 2008 19:26:28 +0000, Leon Timmermans wrote:

    > On Wed, 27 Aug 2008 23:53:09 +0200, Martijn Lievaart wrote:
    >
    >> Perl threading, well frankly, sucks. You may want to switch to another
    >> language with re support that meets your needs.

    >
    > Some would say all threading sucks. All approaches are either hard to
    > get a proper performance from or hard to get correct. At least the queue
    > approach perl promotes gets one of them right.


    Well, Perl threading has it uses (and maybe this use case is one of
    them), but it has severe limitations. For instance, signals are out. That
    alone was the killer in each and every case I thought I could use threads
    in Perl.

    Threading in general doesn't suck. It's hard to get right until you get
    some basic understanding, but after that I find threading a valuable tool
    in the toolbox.

    Perl threading does suck in my opinion, I didn't know Python threading
    sucked harder.

    M4
     
    Martijn Lievaart, Aug 28, 2008
    #13
  14. Guest

    Leon Timmermans <> wrote:
    > On Wed, 27 Aug 2008 23:53:09 +0200, Martijn Lievaart wrote:
    >
    > > Perl threading, well frankly, sucks. You may want to switch to another
    > > language with re support that meets your needs.

    >
    > Some would say all threading sucks. All approaches are either hard to get
    > a proper performance from or hard to get correct. At least the queue
    > approach perl promotes gets one of them right.
    >
    > Also lets not forget that Perl at least supports preemptive threading.
    > Ruby doesn't at all and python has a giant interpreter lock, making it
    > useless for this kind of problem.


    I fleshed out the OPs example code to make it runnable, using a simple
    foreach (1..400) {}; to simulate the processing of each line in the
    consumer threads (400 because that is what provided a throughput of 30_000
    per second in a simple non-threaded model) and was pleasantly surprised.
    I got a substantial speed up by using threading, with a factor of 3
    improvement in throughput by using $cpu_count=4 (4 consumer threads, plus
    main thread).

    I still wouldn't use threads on my own code for something like this,
    though. I'd just start 4 processes assigning each a different chunk of the
    data.


    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
     
    , Aug 29, 2008
    #14
  15. On 2008-08-28 17:49, Leon Timmermans <> wrote:
    > On Wed, 27 Aug 2008 14:25:32 -0700, dniq00 wrote:
    >> Thanks for the link - trying to figure out whattahellisgoingon there :)
    >> Looks like he's basically mmaps the input and begins reading it starting
    >> at different points. Thing is, I'm using <> as input, which can contain
    >> hundreds of gigabytes of data, so I'm not sure how's that going to work
    >> out...

    >
    > Is your computer 64 or 32 bits? In the former case mmap will work for
    > such large files, but the latter it won't.


    Assuming <> is actually referring to a single file (if it doesn't, you
    can just process several files in parallel), the same approach can be
    used even without mmap:

    Fork $num_cpu worker processes. Let each process seek to position
    $i * $length / $num_cpu, and search for the start of the next line. Then
    start processing lines until you get to position ($i+1) * $length / $num_cpu.
    Finally report result to parent process and let it aggregate the
    results.

    hp
     
    Peter J. Holzer, Aug 29, 2008
    #15
  16. [A complimentary Cc of this posting was sent to
    <>], who wrote in article <>:
    > I'm trying to implement multithreaded processing for the humongous
    > amount of logs that I'm currently processing in 1 process on a 4-CPU
    > server.


    Keep in mind that AFAIK, all multithreading support is long removed
    from Perl. Instead, the code which was designed to simulate fork()ing
    under Win* is used as a substitution for multithreading support...

    =========

    Sorry that I can't be more specific with your speed issues: when I
    discovered that under the "new doctrine" starting a new thread is
    about 100-300 times SLOWER than starting a new Perl process, I just
    gave up and did not do any other test...

    Hope this helps,
    Ilya
     
    Ilya Zakharevich, Aug 31, 2008
    #16
  17. Guest

    On Aug 27, 3:59 pm, wrote:

    > Hope anybody can enlighten me...
    >
    > THANKS!


    Hello again, oh almighty All! :)

    The amount of useful information to my post has been great, and I
    REALLY appreciate all the input so far! I've gotten some ideas from
    your responses on what I can do, and will try a few things once the
    holiday is over. I guess I will have to abandon the <> approach and
    parse files instead. I kinda love, though, the advantage that the <>
    gives me: my script doesn't need to know what and how much it is being
    given. Be it a list of files (many small ones, or fewer large ones), a
    pipe or whatever - it doesn't care.

    Initially, the first multithreaded version I've made, processed the
    data line-by-line, with reader thread pushing each line to the queue,
    and parser threads yanking a line out of it. The performance was
    absolutely horrible - it consumed 3 times more CPU, and worked 3 times
    slower than a single-threaded process (about 10-11 thousand lines per
    second). That's why I started splitting the data into chunks and
    pushing references to the chunks into the queue, which helped a bit,
    but not by much.

    Tomorrow I'm going to try to take a list of files and split it across
    the worker threads, to see if it gives me an improvement. Not sure yet
    if I want to go the mmap way, though, but probably I will give it a
    try as well. I'm trying to make my script as independent of the way
    it's being fed the data as possible, so I will have to find the best
    way to handle as many situations as I can.

    To answer a few questions asked in the thread: the $counters, produced
    by each worker thread, are then being aggregated, serialized and
    written to a file (this doesn't take much time and resources), which
    is then processed by another script, which stores all the data into an
    Oracle database. I've done that so that there can be multiple servers,
    processing data, without adding more load on the database, which is,
    as you might imagine, already very busy as it is :)

    Again, thanks a million for all the great ideas! I will report back
    with my results, if anyone cares ;)

    With best regards - Dmitry.

    ^D
     
    , Sep 1, 2008
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jm
    Replies:
    1
    Views:
    540
    alien2_51
    Dec 12, 2003
  2. bugbear
    Replies:
    4
    Views:
    2,930
    Arne Vajhøj
    Mar 28, 2008
  3. mk
    Replies:
    1
    Views:
    316
    Oktaka Com
    Jan 18, 2010
  4. Nate
    Replies:
    8
    Views:
    296
  5. Software Engineer
    Replies:
    0
    Views:
    368
    Software Engineer
    Jun 10, 2011
Loading...

Share This Page