perl multithreading performance

dniq00 · Aug 27, 2008

Hello, oh almighty perl gurus!

I'm trying to implement multithreaded processing for the humongous
amount of logs that I'm currently processing in 1 process on a 4-CPU
server.

What the script does is for each line it checks if the line contains
GET request, and if it does - goes through a list of pre-compiled
regular expressions, trying to find a matching one. Once the match is
found - it uses another regexp, associated with the found match, which
is a bit more complex, to extract data from the line. I have split it
in two separate matches, because about 30% of all lines will match,
and I don't want to run that complex regexp to extract data for all
the lines I know won't match. The goal is to count how many lines
matched for every specific regexp, and the end result is built as a
hash, having data, extracted from the line with second regexp, used as
hash keys, and the value is the number of matches.

Anyway, currently all this is done in a single process, which parses
approx. 30000 lines per second. The CPU usage for this process is
100%, so the bottleneck is in the parsing part.

I have changed the script to use threads + threads::shared +
Thread::Queue. I read data from logs like this:

Code
until( $no_more_data ) {
my @buffer;
foreach( (1..$buffer_size) ) {
if( my $line = <> ) {
push( @buffer, $line );
} else {
$no_more_data = 1;
$q_in->enqueue( \@buffer );
foreach( (1..$cpu_count) ) {
$q_in->enqueue( undef );
}
last;
}
}
$q_in->enqueue( \@buffer ) unless $no_more_data;
}

Then, I create $cpu_count threads, which does something like this:

Code
sub parser {
my $counters = {};
while( my $buffer = $q_in->dequeue() ) {
foreach my $line ( @{ $buffer } ) {
# do its thing
}
}
return $counters;
}

Everything works fine, HOWEVER! It's all so damn slow! It's only 10%
faster than single-process script, consumes about 2-3 times more
memory and about as much times more CPU.

I've also tried abandoning the Thread:Queue and just use
threads::shared with lock/cond_wait/cond_signal combination, without
much success.

I've tried to play with $cpu_count and $buf_size, and found that after
$buf_size > 1000 doesn't make much difference, and $cpu_count > 2
actually makes things a lot worse.

Any ideas why in the world it's so slow? I did some research and
couldn't find a lot of info, other than the way I do it pretty much
the way it should be done, unless I'm missing something...

Hope anybody can enlighten me...

THANKS!

Leon Timmermans · Aug 27, 2008

Everything works fine, HOWEVER! It's all so damn slow! It's only 10%
faster than single-process script, consumes about 2-3 times more memory
and about as much times more CPU.

I've also tried abandoning the Thread:Queue and just use threads::shared
with lock/cond_wait/cond_signal combination, without much success.

I've tried to play with $cpu_count and $buf_size, and found that after
$buf_size > 1000 doesn't make much difference, and $cpu_count > 2
actually makes things a lot worse.

Any ideas why in the world it's so slow? I did some research and
couldn't find a lot of info, other than the way I do it pretty much the
way it should be done, unless I'm missing something...

Hope anybody can enlighten me...

THANKS!

The speed of perl's threading is dependent on how much you share between
threads. Sharing the lines before processing them can become a
bottleneck, I suspect that's the problem in your case. You probably want
to divide the work first, and only used shared resources to report back
the results. Making a program scale over multiple processors isn't easy.
Sean O'Rourke entry in the wide finder benchmark (http://www.cs.ucsd.edu/
~sorourke/wf.pl) offers an interesting approach to this, though it isn't
exactly optimized for readability.

Regards,

Leon Timmermans

Ted Zlatanov · Aug 27, 2008

On Wed, 27 Aug 2008 12:59:36 -0700 (PDT) (e-mail address removed) wrote:

d> What the script does is for each line it checks if the line contains
d> GET request, and if it does - goes through a list of pre-compiled
d> regular expressions, trying to find a matching one. Once the match is
d> found - it uses another regexp, associated with the found match, which
d> is a bit more complex, to extract data from the line. I have split it
d> in two separate matches, because about 30% of all lines will match,
d> and I don't want to run that complex regexp to extract data for all
d> the lines I know won't match. The goal is to count how many lines
d> matched for every specific regexp, and the end result is built as a
d> hash, having data, extracted from the line with second regexp, used as
d> hash keys, and the value is the number of matches.

d> Anyway, currently all this is done in a single process, which parses
d> approx. 30000 lines per second. The CPU usage for this process is
d> 100%, so the bottleneck is in the parsing part.
....
d> Everything works fine, HOWEVER! It's all so damn slow! It's only 10%
d> faster than single-process script, consumes about 2-3 times more
d> memory and about as much times more CPU.
....
d> Any ideas why in the world it's so slow? I did some research and
d> couldn't find a lot of info, other than the way I do it pretty much
d> the way it should be done, unless I'm missing something...

You may be hitting the limits of I/O. Try feeding your script
pre-canned data from memory in a loop and see if that improves
performance. It also depends on what kind of processing you are doing
on input lines.

Also, check out the swatch log file monitor, it may do what you need
already.

Ted

dniq00 · Aug 27, 2008

You may be hitting the limits of I/O. Try feeding your script
pre-canned data from memory in a loop and see if that improves
performance.

No, the IO is fine - there are pretty much always $q_in->pending > 1,
and as the script does its thing, number of pending buffers sometimes
goes beyond 10.

It also depends on what kind of processing you are doing
on input lines.

Just trying to match multiple regexps against each line.

Also, check out the swatch log file monitor, it may do what you need
already.

Nope, it doesn't

I already have the single-threaded script, which
has been working for years now, but the amount of logs it needs to
process keeps growing, and I'm basically at the point where it can
only keep up with the speed with which logs are being written, so if
there's back-log for whatever reason - it might not catch up, so I'm
looking into how I can improve its performance.

dniq00 · Aug 27, 2008

The speed of perl's threading is dependent on how much you share between
threads. Sharing the lines before processing them can become a
bottleneck, I suspect that's the problem in your case. You probably want
to divide the work first, and only used shared resources to report back
the results. Making a program scale over multiple processors isn't easy.
Sean O'Rourke entry in the wide finder benchmark (http://www.cs.ucsd.edu/
~sorourke/wf.pl) offers an interesting approach to this, though it isn't
exactly optimized for readability.

Regards,

Leon Timmermans

Thanks for the link - trying to figure out whattahellisgoingon
there

Looks like he's basically mmaps the input and begins reading
it starting at different points. Thing is, I'm using <> as input,
which can contain hundreds of gigabytes of data, so I'm not sure how's
that going to work out...

Martijn Lievaart · Aug 27, 2008

Nope, it doesn't I already have the single-threaded script, which has
been working for years now, but the amount of logs it needs to process
keeps growing, and I'm basically at the point where it can only keep up
with the speed with which logs are being written, so if there's back-log
for whatever reason - it might not catch up, so I'm looking into how I
can improve its performance.

Perl threading, well frankly, sucks. You may want to switch to another
language with re support that meets your needs. I would go for C++ (with
boost), but then I know that language very well.

M4

xhoster · Aug 27, 2008

Hello, oh almighty perl gurus!

I'm trying to implement multithreaded processing for the humongous
amount of logs that I'm currently processing in 1 process on a 4-CPU
server.

Start 4 processes, telling each one to work on a different log file.
Either do this from the command line, or implement it with fork or system,
depending on how automatic it all has to be.

Anyway, currently all this is done in a single process, which parses
approx. 30000 lines per second.

If you just check for GET (and then ignore the result), how many lines per
second would it do?

The CPU usage for this process is
100%, so the bottleneck is in the parsing part.

I have changed the script to use threads + threads::shared +
Thread::Queue. I read data from logs like this:

Code
until( $no_more_data ) {
my @buffer;
foreach( (1..$buffer_size) ) {
if( my $line = <> ) {
push( @buffer, $line );
} else {
$no_more_data = 1;
$q_in->enqueue( \@buffer );
foreach( (1..$cpu_count) ) {
$q_in->enqueue( undef );
}
last;
}
}
$q_in->enqueue( \@buffer ) unless $no_more_data;
}

Then, I create $cpu_count threads, which does something like this:

What do you mean "then"? If you wait until all lines are enqueued before
you create the consumer threads, your entire log file will be in memory!

Code
sub parser {
my $counters = {};
while( my $buffer = $q_in->dequeue() ) {
foreach my $line ( @{ $buffer } ) {
# do its thing
}
}
return $counters;
}

When $counters is returned, what do you do with it? That could be
another synchronization bottleneck.

Everything works fine, HOWEVER! It's all so damn slow! It's only 10%
faster than single-process script, consumes about 2-3 times more
memory and about as much times more CPU.

That doesn't surprise me.

I've also tried abandoning the Thread:Queue and just use
threads::shared with lock/cond_wait/cond_signal combination, without
much success.

This also doesn't surprise me. Synchronizing shared access is hard and
often slow.

I've tried to play with $cpu_count and $buf_size, and found that after
$buf_size > 1000 doesn't make much difference, and $cpu_count > 2
actually makes things a lot worse.

Any ideas why in the world it's so slow? I did some research and
couldn't find a lot of info, other than the way I do it pretty much
the way it should be done, unless I'm missing something...

Hope anybody can enlighten me...

If you post fully runnable dummy code, and a simple program which
generates log-file data to put through it, I'd probably couldn't resist the
temptation to play around with it and find the bottlenecks.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

cartercc · Aug 28, 2008

Perl threading, well frankly, sucks. You may want to switch to another
language with re support that meets your needs. I would go for C++ (with
boost), but then I know that language very well.

I've been playing with Erlang. In this case, you could probably spawn
separate threads per line and have them all run concurrently. I
haven't done a 'real' project (yet) but I've written some toy scripts
that tear through large files in fractions of milliseconds.

CC

Ted Zlatanov · Aug 28, 2008

ML> Perl threading, well frankly, sucks. You may want to switch to another
ML> language with re support that meets your needs. I would go for C++ (with
ML> boost), but then I know that language very well.

Hadoop is a nice non-Perl framework for this kind of work.

Ted

J. Gleixner · Aug 28, 2008

Hello, oh almighty perl gurus!

I'm trying to implement multithreaded processing for the humongous
amount of logs that I'm currently processing in 1 process on a 4-CPU
server.

What the script does is for each line it checks if the line contains
GET request, and if it does - goes through a list of pre-compiled
regular expressions, trying to find a matching one. [...]

Any ideas why in the world it's so slow? I did some research and
couldn't find a lot of info, other than the way I do it pretty much
the way it should be done, unless I'm missing something...

Another, much easier/faster approach, would be:

grep ' GET ' file | your_script.pl

The earlier you can filter out the work that's needed, the better, and
you're not going to get much faster than grep. The more refined you
can make that initial filtering of data to only send lines you're
interested in, to your program, the better.

Leon Timmermans · Aug 28, 2008

Thanks for the link - trying to figure out whattahellisgoingon there
Looks like he's basically mmaps the input and begins reading it starting
at different points. Thing is, I'm using <> as input, which can contain
hundreds of gigabytes of data, so I'm not sure how's that going to work
out...

Is your computer 64 or 32 bits? In the former case mmap will work for
such large files, but the latter it won't. In that case it may not be a
bad idea to split the log files into chunks that do fit into your memory
space. An additional advantage of that would be that you may not need to
use threads at all.

Regards,

Leon

Leon Timmermans · Aug 28, 2008

Perl threading, well frankly, sucks. You may want to switch to another
language with re support that meets your needs.

Some would say all threading sucks. All approaches are either hard to get
a proper performance from or hard to get correct. At least the queue
approach perl promotes gets one of them right.

Also lets not forget that Perl at least supports preemptive threading.
Ruby doesn't at all and python has a giant interpreter lock, making it
useless for this kind of problem.

Regards,

Leon Timmermans

Martijn Lievaart · Aug 28, 2008

Some would say all threading sucks. All approaches are either hard to
get a proper performance from or hard to get correct. At least the queue
approach perl promotes gets one of them right.

Well, Perl threading has it uses (and maybe this use case is one of
them), but it has severe limitations. For instance, signals are out. That
alone was the killer in each and every case I thought I could use threads
in Perl.

Threading in general doesn't suck. It's hard to get right until you get
some basic understanding, but after that I find threading a valuable tool
in the toolbox.

Perl threading does suck in my opinion, I didn't know Python threading
sucked harder.

M4

xhoster · Aug 29, 2008

Leon Timmermans said:
Some would say all threading sucks. All approaches are either hard to get
a proper performance from or hard to get correct. At least the queue
approach perl promotes gets one of them right.

Also lets not forget that Perl at least supports preemptive threading.
Ruby doesn't at all and python has a giant interpreter lock, making it
useless for this kind of problem.

I fleshed out the OPs example code to make it runnable, using a simple
foreach (1..400) {}; to simulate the processing of each line in the
consumer threads (400 because that is what provided a throughput of 30_000
per second in a simple non-threaded model) and was pleasantly surprised.
I got a substantial speed up by using threading, with a factor of 3
improvement in throughput by using $cpu_count=4 (4 consumer threads, plus
main thread).

I still wouldn't use threads on my own code for something like this,
though. I'd just start 4 processes assigning each a different chunk of the
data.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Peter J. Holzer · Aug 29, 2008

Is your computer 64 or 32 bits? In the former case mmap will work for
such large files, but the latter it won't.

Assuming <> is actually referring to a single file (if it doesn't, you
can just process several files in parallel), the same approach can be
used even without mmap:

Fork $num_cpu worker processes. Let each process seek to position
$i * $length / $num_cpu, and search for the start of the next line. Then
start processing lines until you get to position ($i+1) * $length / $num_cpu.
Finally report result to parent process and let it aggregate the
results.

hp

Ilya Zakharevich · Aug 31, 2008

[A complimentary Cc of this posting was sent to

I'm trying to implement multithreaded processing for the humongous
amount of logs that I'm currently processing in 1 process on a 4-CPU
server.

Keep in mind that AFAIK, all multithreading support is long removed
from Perl. Instead, the code which was designed to simulate fork()ing
under Win* is used as a substitution for multithreading support...

=========

Sorry that I can't be more specific with your speed issues: when I
discovered that under the "new doctrine" starting a new thread is
about 100-300 times SLOWER than starting a new Perl process, I just
gave up and did not do any other test...

Hope this helps,
Ilya

dniq00 · Sep 1, 2008

Hope anybody can enlighten me...

THANKS!

Hello again, oh almighty All!

The amount of useful information to my post has been great, and I
REALLY appreciate all the input so far! I've gotten some ideas from
your responses on what I can do, and will try a few things once the
holiday is over. I guess I will have to abandon the <> approach and
parse files instead. I kinda love, though, the advantage that the <>
gives me: my script doesn't need to know what and how much it is being
given. Be it a list of files (many small ones, or fewer large ones), a
pipe or whatever - it doesn't care.

Initially, the first multithreaded version I've made, processed the
data line-by-line, with reader thread pushing each line to the queue,
and parser threads yanking a line out of it. The performance was
absolutely horrible - it consumed 3 times more CPU, and worked 3 times
slower than a single-threaded process (about 10-11 thousand lines per
second). That's why I started splitting the data into chunks and
pushing references to the chunks into the queue, which helped a bit,
but not by much.

Tomorrow I'm going to try to take a list of files and split it across
the worker threads, to see if it gives me an improvement. Not sure yet
if I want to go the mmap way, though, but probably I will give it a
try as well. I'm trying to make my script as independent of the way
it's being fed the data as possible, so I will have to find the best
way to handle as many situations as I can.

To answer a few questions asked in the thread: the $counters, produced
by each worker thread, are then being aggregated, serialized and
written to a file (this doesn't take much time and resources), which
is then processed by another script, which stores all the data into an
Oracle database. I've done that so that there can be multiple servers,
processing data, without adding more load on the database, which is,
as you might imagine, already very busy as it is

Again, thanks a million for all the great ideas! I will report back
with my results, if anyone cares

With best regards - Dmitry.

^D

MultiThreading	1	Sep 11, 2013
Naive threading performance questions	15	Oct 26, 2006
Multithreading and compatibility library (libconfig)	1	Jan 23, 2013
java, multithreading, fork() - performance problem on Solaris	4	Mar 25, 2008
[CM] OStatic on Package Mgmt & Perl	1	Mar 6, 2014
Performance of hand-optimised assembly	99	Dec 23, 2011
Idk need help in editing this source code	0	Nov 5, 2022
How to use Flow-guided video completion (FGVC)?	0	Jan 25, 2021

perl multithreading performance

dniq00

Leon Timmermans

Ted Zlatanov

dniq00

dniq00

Martijn Lievaart

xhoster

cartercc

Ted Zlatanov

J. Gleixner

Leon Timmermans

Leon Timmermans

Martijn Lievaart

xhoster

Peter J. Holzer

Ilya Zakharevich

dniq00

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads