Naive threading performance questions

Worky Workerson · Oct 26, 2006

I'm doing ETL for a database, i.e. line-by-line transformation of
fairly large data sets. From some basic profiling, I've determined
that the transformation process is relatively slow and I am heavily CPU
bound on (i.e. the DB can take data 10 times faster than I can
transform it).

Since I am on an 8-way box and each line is independent of the others,
I decided to try my hand using perl threads. I came up with a naive
implementation (below), that spawns a couple of "transformation
threads", where each thread is fed via a Thread::Queue.

Unfortunately, the threaded implementation performs about 3 times
*slower* than the single threaded implementation. Am I doing something
horribly wrong? Is there something I can be doing better? Is there
some hidden synchronization bottleneck that I'm not seeing, or is a
Thread::Queue not very efficient? Are there some common idioms for
threading that I am missing?

Thanks!

# Sorry if this is incorrect ... its hand-copied from an isolated lab

use threads;
use threads::shared;
use Thread::Queue;

my $num_threads = 5;
my $finished_processing : shared = 0;
my $data_queue = Thread::Queue->new();

threads->create("process_lines") for (1..$num_threads);

while (<>) { chomp; $data_queue->enqueue($_); }

$finished_processing = 1;
$_->join() foreach (threads->list());

# Transform thread
sub process_lines {
while (1) {
my $line = $data_queue->dequeue_nb();
last if $finish_processing && !$line;
next unless $line;

# Do a line transformation ....

print $line;
}
}

jdhedden · Oct 26, 2006

Worky said:
Unfortunately, the threaded implementation performs about 3 times
*slower* than the single threaded implementation.

It may be that the dequeue_nb is causing fast loops inside your
threads. Try the following instead:

#!/usr/bin/perl

use strict;
use warnings;

use threads;
use Thread::Queue;

my $num_threads = 5;
my $data_queue = Thread::Queue->new();

# Start up the threads
threads->create('process_lines') for (1..$num_threads);

# Feed data to threads
while (<>) { chomp; $data_queue->enqueue($_); }

# Signal all threads to terminate
$data_queue->enqueue(undef) for (1..$num_threads);

# Wait for threads to finish
$_->join() foreach (threads->list());

# Transform thread
sub process_lines
{
while (1) {
my $line = $data_queue->dequeue();
last if (! defined($line)); # Done processing
next unless $line; # Ignore blank lines

# Do a line transformation ....

print $line, "\n";
}
}

xhoster · Oct 26, 2006

Worky Workerson said:
I'm doing ETL for a database, i.e. line-by-line transformation of
fairly large data sets. From some basic profiling, I've determined
that the transformation process is relatively slow and I am heavily CPU
bound on (i.e. the DB can take data 10 times faster than I can
transform it).

Since I am on an 8-way box and each line is independent of the others,
I decided to try my hand using perl threads. I came up with a naive
implementation (below), that spawns a couple of "transformation
threads", where each thread is fed via a Thread::Queue.

Unfortunately, the threaded implementation performs about 3 times
*slower* than the single threaded implementation. Am I doing something
horribly wrong? Is there something I can be doing better?

My first choice would be to make N different files and process them
independently. If that were inconvient, I'd start N processes, each
getting a $token from 0 to N-1, and each opening an independent handle
onto the one file and each one only processing the lines where
$. % $N == $token

In order to start these N processes, I'd use forking where possible,
and threads only as a last resort (unless I had *other* compelling reasons
to use threads).

Is there
some hidden synchronization bottleneck that I'm not seeing, or is a
Thread::Queue not very efficient?

Thread::Queue is not very efficient for "flyweight" stuff because it has a
synchronization bottleneck. I wouldn't exactly call this "hidden", more
like "implicit".

while (<>) { chomp; $data_queue->enqueue($_); }

There is a high probability that this will enqueue lines faster than the
client threads can dequeue them, resulting in a memory explosion and
eventually a crash. (You can probably guess how I learned this)

while (<>) {
$data_queue->enqueue($_);
sleep 1 if $data_queue->pending() > 10_000;
};

Xho

Worky Workerson · Oct 26, 2006

Unfortunately, the threaded implementation performs about 3 times

It may be that the dequeue_nb is causing fast loops inside your
threads. Try the following instead:
....snip...

# Signal all threads to terminate
$data_queue->enqueue(undef) for (1..$num_threads);
....snip...

sub process_lines
{
while (1) {
my $line = $data_queue->dequeue();
last if (! defined($line)); # Done processing
next unless $line; # Ignore blank lines

# Do a line transformation ....

print $line, "\n";
}

}

Thanks! This definitely improved the process, and it seems to max out
at about 30% faster than the original with 4 transformation threads.
Is Thread::Queue the accepted/best way to do this sort of heavy I/O
between threads or is there something with more throughput (on linux)?

Worky Workerson · Oct 26, 2006

My first choice would be to make N different files and process them
independently. If that were inconvient, I'd start N processes, each
getting a $token from 0 to N-1, and each opening an independent handle
onto the one file and each one only processing the lines where
$. % $N == $token

The first is, like you mentioned, inconvenient, but the token idea is
excellent. Thanks!

Thread::Queue is not very efficient for "flyweight" stuff because it has a
synchronization bottleneck. I wouldn't exactly call this "hidden", more
like "implicit".

What do you mean by "flyweight"? I figured that this would be pretty
IO intensive for Thread::Queue ...

There is a high probability that this will enqueue lines faster than the
client threads can dequeue them, resulting in a memory explosion and
eventually a crash. (You can probably guess how I learned this)

while (<>) {
$data_queue->enqueue($_);
sleep 1 if $data_queue->pending() > 10_000;

};

Thanks for that catch

Didn't notice it since I am running on 16GB
RAM machine playing with a 500 MB file, but I'm sure that it would come
to bite me once I start dealing with the real data sets.

J. Gleixner · Oct 26, 2006

Worky said:
Thanks! This definitely improved the process, and it seems to max out
at about 30% faster than the original with 4 transformation threads.
Is Thread::Queue the accepted/best way to do this sort of heavy I/O
between threads or is there something with more throughput (on linux)?

If you haven't already, start with the code that's doing the
"transformation" and use parallel processing after that's optimized.

You could use Parallel::ForkManager, which manages fork() nicely.

You could post your transformation code, to see if there are
better/faster ways to do it.

You could write your transformation in C.

You could have the DB do some of the work.

You could possibly do more optimized updates to the DB.

You could do many things...

Worky Workerson · Oct 26, 2006

Thanks! This definitely improved the process, and it seems to max out

If you haven't already, start with the code that's doing the
"transformation" and use parallel processing after that's optimized.

You could post your transformation code, to see if there are
better/faster ways to do it.

You could write your transformation in C.

My code is in an isolated lab, so its difficult to transfer, however
I've profiled the transform with Devel:

Prof, and it spends 90% of the
time doing CSV processing with Text::CSV_XS. I figured that
Text::CSV_XS is already written in C and that optimizing the rest of my
code would only buy me a ~10% improvement, so it was time to look into
using more than one processor.

You could use Parallel::ForkManager, which manages fork() nicely.

Hmm, I hadn't seen that module ... thanks ... looks easy enough. The
forking that I would be doing would be relatively trivial, however.

You could have the DB do some of the work.

Not for my purposes, i.e. bulk transform and loading. Perl is *much*
faster than transforming the data within the database, especially if I
can transform it into a format that can be slurped up by the DB bulk
loader. The DB can't be optimized for everything

You could possibly do more optimized updates to the DB.

I'm using the bulk loading methods of my DB, and, like I mentioned
previously, the DB can accept data orders of magnitude faster than I
can feed it with my perl transform. I was looking for the biggest
bang-for-the-buck.

J. Gleixner · Oct 27, 2006

Worky said:
Not for my purposes, i.e. bulk transform and loading. Perl is *much*
faster than transforming the data within the database, especially if I
can transform it into a format that can be slurped up by the DB bulk
loader. The DB can't be optimized for everything

Yep. You didn't mention the source of the data.

I'm using the bulk loading methods of my DB, and, like I mentioned
previously, the DB can accept data orders of magnitude faster than I
can feed it with my perl transform. I was looking for the biggest
bang-for-the-buck.

I was simply tossing out possible suggestions.

OK. I understand a little better, now. Thanks.

I'd agree with:

I'm just curious.. Why is that inconvenient?

Since you mentioned you're on a Linux box: man split

That would seem to be a very good solution and one you could tune, by
forking more/less processes, to find an optimal solution.

xhoster · Oct 27, 2006

Worky Workerson said:
The first is, like you mentioned, inconvenient, but the token idea is
excellent. Thanks!

The main drawback of it is that it needs a real file, it can't use
streaming on STDIN. Of course, you are also doing N times the (virtual)
file IO, but the file system cache generally makes that irrelevant.

What do you mean by "flyweight"?

I am assuming you are processing millions of lines, each of which take
less than 100 microsecond. That is what I mean by flyweight. As opposed
to doing a few thousand lines that each take tens of milliseconds. As a
rule of thumb, if you do >1 millisecond of "real" work for every dequeue,
the overhead of Thread::Queue is insignificant. If you do <50 microseconds
of "real" work per dequeue, then Thread::Queue will probably be the
bottleneck and by using it your overall task will get slower rather than
faster. Your mileage may vary, of course.

I figured that this would be pretty
IO intensive for Thread::Queue ...

Thread::Queue doesn't do I/O in the way I generally think of I/O. It moves
stuff around in main memory, and does mutexs (or however your OS
accomplishes that), both of which I think are more competitive with CPU
load than they are with disk or network I/O.

Xho

xhoster · Oct 27, 2006

Worky Workerson said:
My code is in an isolated lab, so its difficult to transfer, however
I've profiled the transform with Devel:Prof, and it spends 90% of the
time doing CSV processing with Text::CSV_XS. I figured that
Text::CSV_XS is already written in C

It seems pretty weird that Text::CSV_XS would be slower than a database
load. I'm not sure I'd give up on that avenue for optimization quite yet.
Can you post an example of some data and code, and the top several subs
from dprofpp? Maybe we can spot something that isn't going as well as it
could be. Of course, if parallelization turns out to be easy and
successful, then there is probably not point in delving too deeply. Is
your data in ASCII/bytes, or in a wide character set?

and that optimizing the rest of my
code would only buy me a ~10% improvement, so it was time to look into
using more than one processor.

Hmm, I hadn't seen that module ... thanks ... looks easy enough. The
forking that I would be doing would be relatively trivial, however.

Sure, but even more trivial is even better.

my $delay=50;
my $num_threads=5; # OK, not *threads* anymore.
my $file="whatever";

my $pm=Parallel::ForkManager->new($num_threads);

foreach my $token
(0..$num_threads-1) { $pm->start() and next;
open my $fh, $file or die $!;
open my $out, ">$file.$token" or die $!;
while (my $line=<$fh>) {
next unless $. % $num_threads == $token;
my $sum; # busy work
foreach my $x ( 1..$delay) { # simulate working hard
foreach my $y (1..$delay) { # (very hard)
$sum+=$x*$y
}
};
print $out $line;
}
close $out or die $!;
$pm->finish();
};
$pm->wait_all_children();

Xho

Worky Workerson · Oct 27, 2006

It seems pretty weird that Text::CSV_XS would be slower than a database

load. I'm not sure I'd give up on that avenue for optimization quite yet.
Can you post an example of some data and code, and the top several subs
from dprofpp? Maybe we can spot something that isn't going as well as it
could be. Of course, if parallelization turns out to be easy and
successful, then there is probably not point in delving too deeply. Is
your data in ASCII/bytes, or in a wide character set?

You're right, its me. I came up with the following example which is
very similar to one of my data sets and loads, and basically shows that
I can do a lot better.

The input file line is of the form:

IP/MASK [key=value][:key=value]

where there are a limited set of known keys, however each key isn't
necessarily present in the input line, however a (empty) CSV slot for
it must be present in the output line. The keys and values are all
ASCII in this particular load (which is not true of all my data
sources), except for a couple of odd characters here and there, which
can be safely deleted. A sample line would be:

0.0.0.0/0 keya=vala:keyb=valb:keyc=valc:keyd=vald:keybig=this is bigger
value:keyanother=this is another key value

And here is my code. I've factored out the line processing so that it
would show up in the dprofpp. Again, sorry if there are any hand-copy
errors ....

#!/usr/bin/perl

use strict; use warnings;
use IO::File; use Text::CSV_XS;

my @valid_columns = qw/ keya keyb keyc keyd keye keyf keyg keybig
keyanother /;
my %valid_columns = map {$_ => 1} @valid_columns;

my $output_csv = Text::CSV_XS->new({eol=>"\n", 'binary' => 1});
$output_csv->print(*STDOUT, process_line($_)) while (<>);

sub process_line {
my ($line) = @_;
my ($ip_range, $rest) = split /\s+/, $line, 2;
chomp($rest);

my %ip_details = (ip_range => $ip_range);

# split on ':', then split each element on '=' and stick in hash
map { my ($k, $v) = split /=/; $ip_details{$k} = $v } split(/:/,
$rest);

# fix up column with random bad bytes
$ip_details{keya} = s/[^\x20-\x7e]//g;

my @cols = map { $ip_details{$_} } @valid_columns;
return \@cols;
}

and the top of the dprofpp looks like:

%Time ExclSec CumulS #Calls sec/call Csec/c Name
81.7 10.38 10.386 214000 0.0000 0.0000 main:

rocess_line
10.7 1.362 1.838 214000 0.0000 0.0000 Text::CSV_XS:

rint
3.75 0.476 0.476 214000 0.0000 0.0000 IO::Handle:

rint
... other stuff is <1s

Greatly appreciate all your help.

xhoster · Oct 30, 2006

Worky Workerson said:
And here is my code. I've factored out the line processing so that it
would show up in the dprofpp. Again, sorry if there are any hand-copy
errors ....

I had assumed you were using Text::CSV_XS to parse lines rather than
printing them. You might want to try printing them in Perl, you never
know, it might be faster.

#!/usr/bin/perl

use strict; use warnings;
use IO::File; use Text::CSV_XS;

my @valid_columns = qw/ keya keyb keyc keyd keye keyf keyg keybig
keyanother /;
my %valid_columns = map {$_ => 1} @valid_columns;

my $output_csv = Text::CSV_XS->new({eol=>"\n", 'binary' => 1});
$output_csv->print(*STDOUT, process_line($_)) while (<>);

sub process_line {
my ($line) = @_;
my ($ip_range, $rest) = split /\s+/, $line, 2;
chomp($rest);

my %ip_details = (ip_range => $ip_range);

# split on ':', then split each element on '=' and stick in hash
map { my ($k, $v) = split /=/; $ip_details{$k} = $v } split(/:/,
$rest);

# fix up column with random bad bytes
$ip_details{keya} = s/[^\x20-\x7e]//g;

my @cols = map { $ip_details{$_} } @valid_columns;
return \@cols;
}

I made a few changes that sped it up, but not by much:

my ($ip_range, $rest) = split /\s+/, $line, 2;
chomp($rest);

## This will give the same answer as the your map way for "well-formed"
## data. For malformed data, it will give a different, but probably
## equally meaningless, answer
my %ip_details = (split /[=:]/, $rest);

# I don't see what the point of this is, as you never use the ip_range key.
$ip_details{ip_range} = $ip_range;

## This takes a surprising amount of time, but I don't know what you can
## do about it.
# fix up column with random bad bytes
$ip_details{keya} = s/[^\x20-\x7e]//g;

## Use a hash slice rather than map:

return [@ip_details{@valid_columns}];

and the top of the dprofpp looks like:

%Time ExclSec CumulS #Calls sec/call Csec/c Name
81.7 10.38 10.386 214000 0.0000 0.0000 main:rocess_line
10.7 1.362 1.838 214000 0.0000 0.0000 Text::CSV_XS:rint
3.75 0.476 0.476 214000 0.0000 0.0000 IO::Handle:rint
... other stuff is <1s

Greatly appreciate all your help.

I'm somewhat skeptical about DProf in this instance. My own profiling
seems to show that CSV_XS:

rint is using about 33% of the time in the
my modified version of your code. In my experience, DProf does tend to
give inaccurate answers when fast subs are called a huge number of times.

I get about 44,000 lines per second. Anyway, I don't see any obvious
inefficiencies. Maybe parallelization is the better route afterall.

Xho

anno4000 · Oct 30, 2006

Worky Workerson said:
Worky Workerson said:

And here is my code. I've factored out the line processing so that it
would show up in the dprofpp. Again, sorry if there are any hand-copy
errors ....

Click to expand...

[...]

I made a few changes that sped it up, but not by much:
[...]

## This takes a surprising amount of time, but I don't know what you can
## do about it.
# fix up column with random bad bytes
$ip_details{keya} = s/[^\x20-\x7e]//g;

I suppose "=" should be "=~" here.

$ip_details{keya} =~ tr/\x20-\x7e//cd;

may be faster.

Anno

Worky Workerson · Oct 30, 2006

I had assumed you were using Text::CSV_XS to parse lines rather than
printing them. You might want to try printing them in Perl, you never
know, it might be faster.

That's another data set, where the input and output data is in CSV. I
actually have several data sets that I convert into a single,
normalized, CSV which is then loaded into the DB with Text::CSV_XS.
Essentially, for most of the othe data sets, I have set up translation
maps that convert the various input fields into the correct output
fields. Upon load, each of the fields either gets inserted into the DB
directly or translated into a integer key by the perl before being
inserted.

For the other transforms, I have modularized and factored out much of
the translation code, but I think that that is part of the slow down
.... too many function calls. Its a bit too unwieldy to post here
(especially when I hand-copy it), but this thread has given me a bunch
of ideas, especially on the handling/conversions of arrays/hashes.

#!/usr/bin/perl

Click to expand...

use strict; use warnings;
use IO::File; use Text::CSV_XS;

Click to expand...

my @valid_columns = qw/ keya keyb keyc keyd keye keyf keyg keybig
keyanother /;
my %valid_columns = map {$_ => 1} @valid_columns;

Click to expand...

my $output_csv = Text::CSV_XS->new({eol=>"\n", 'binary' => 1});
$output_csv->print(*STDOUT, process_line($_)) while (<>);

Click to expand...

sub process_line {
my ($line) = @_;
my ($ip_range, $rest) = split /\s+/, $line, 2;
chomp($rest);

Click to expand...

my %ip_details = (ip_range => $ip_range);

Click to expand...

# split on ':', then split each element on '=' and stick in hash
map { my ($k, $v) = split /=/; $ip_details{$k} = $v } split(/:/,
$rest);

Click to expand...

# fix up column with random bad bytes
$ip_details{keya} = s/[^\x20-\x7e]//g;

Click to expand...

my @cols = map { $ip_details{$_} } @valid_columns;
return \@cols;
}I made a few changes that sped it up, but not by much:

Click to expand...

my ($ip_range, $rest) = split /\s+/, $line, 2;
chomp($rest);

## This will give the same answer as the your map way for "well-formed"
## data. For malformed data, it will give a different, but probably
## equally meaningless, answer
my %ip_details = (split /[=:]/, $rest);

Thanks ... very simple and elegant. I sometimes tend to complicate
thing needlessly.

# I don't see what the point of this is, as you never use the ip_range key.
$ip_details{ip_range} = $ip_range;

Sorry, guess I missed putting that at the beginning of the
@valid_columns

## This takes a surprising amount of time, but I don't know what you can
## do about it.
# fix up column with random bad bytes
$ip_details{keya} = s/[^\x20-\x7e]//g;

Is this from experience or profiling? Is there an easy way to profile
single lines like this, without factoring them out into a subroutine?

## Use a hash slice rather than map:

return [@ip_details{@valid_columns}];

OK, this is a big question of mine, and what (I think) is the major
slowdown in some of my other code. My data flow usually looks like
this:

CSV -> arrayref -> hashref -> new hashref with transformed values and
names -> arrayref -> CSV

I did it this way so that I could factor out a lot of the common code
of associated with transforming and outputting a new input format. In
order to do this, I usually pass around a line of input in various
forms. For example, one sub will read the CSV into an arrayref, pass
it to another that will convert it to a hashref, which will then pass
it to another to transform the hashref's values, etc. I realize that
this is not necessarily the fastest way of doing these things, but it
helps a lot when I have 10 different input types being translated into
the same output type.

Because I have a few subs being called *many* times, I have a couple
local optimization questions:

-What is the most efficient way of calling a module subroutine and/or
object's method? I'm assuming that, like C, its cheaper to pass a
reference to a hash/array than to pass the actual hash/array, right?
Does this also hold true for the return value from a sub?

-What is the most efficient way of converting back and forth between a
hash and an array, when the key->index mapping is known? Does the
answer change at all when dealing with references?

-As above, when returning a value, does it make a difference if you
create a new local variable to return, or just return the computation
directly? I.e. (a very simple example):

my $a = {a=>'1', b=>'2'}; return $a;
# vs
return {a=>'1', b=>'2'};

I'd like to think that perl's compiler might be able to figure out that
these are equivalent, but perhaps I am wrong.

-Sometimes I assign one of the hash/array elements to a local value so
that I can transform it, and eventually assign it back to the hash. Is
this a "win" vs just transforming the hash value directly? I.e.:

sub transform {
my $hashref = (@_);
my $name = $hashref->{name_key};

$name = s/A/a/g;
# ... more $name transforms ...

$hashref->{name_key} = $name;
return $hashref;
}

# ... vs ...

sub transform {
my $hashref = (@_);

$hashref->{name_key} = s/A/a/g;
# ... more $hashref->{name_key} transforms ...

return $hashref;
}

I'm sure that the answer is "it depends", but my next question would be
"On what?". My naive thoughts would be that it would depend on the
number of hash lookups that are needed, which relies on the (C)
assumption that perl would not be able to cache the hashref lookup into
a "register". Are there any other costs to the hash lookup/update
implementation?

-When transforming values, is it more efficient to use the same
variable to hold the new value or to create a new variable? I'm
thinking that this is one of those space vs. time questions, but since
I have a lot of memory, I'd like to optimize for time. I.e.:

sub transform {
my ($manufacturer) = (@_);
%translation_of = ( 'Mercedes' => 'Luxury', 'BMW' => 'Luxury',
'Honda' => 'Normal');

$manufacturer = $translation_of{$manufacturer};
# do more stuff
}

# ... vs ...

sub transform {
my ($manufacturer) = (@_);
%translation_of = ( 'Mercedes' => 'Luxury', 'BMW' => 'Luxury',
'Honda' => 'Normal');

my $transformed_manufacturer = $translation_of{$manufacturer};
# do more stuff
}

I get about 44,000 lines per second. Anyway, I don't see any obvious
inefficiencies. Maybe parallelization is the better route afterall.

Thanks, I'll see what I can do with that, and the previous examples.

xhoster · Oct 30, 2006

Worky Workerson said:
## This takes a surprising amount of time, but I don't know what you
can ## do about it.
# fix up column with random bad bytes
$ip_details{keya} = s/[^\x20-\x7e]//g;

Click to expand...

Is this from experience or profiling? Is there an easy way to profile
single lines like this, without factoring them out into a subroutine?

I create a simple test case and time its run (on linux, "time sample.pl").

Then I commented out that line, and re-ran it. Since the sample input file
I used had no weird characters, the presence or absense of this line should
have no effect on downstream processing. Doing it this way is more work
than DProf, but give you better answers.

## Use a hash slice rather than map:

return [@ip_details{@valid_columns}];

Click to expand...

OK, this is a big question of mine, and what (I think) is the major
slowdown in some of my other code. My data flow usually looks like
this:

CSV -> arrayref -> hashref -> new hashref with transformed values and
names -> arrayref -> CSV

I did it this way so that I could factor out a lot of the common code
of associated with transforming and outputting a new input format. In
order to do this, I usually pass around a line of input in various
forms. For example, one sub will read the CSV into an arrayref, pass
it to another that will convert it to a hashref, which will then pass
it to another to transform the hashref's values, etc. I realize that
this is not necessarily the fastest way of doing these things, but it
helps a lot when I have 10 different input types being translated into
the same output type.

Yes, extreme optimization and flexibility/maintainibility are often mortal
enemies. I thought there might be one obvious dramatic speed up, but since
there isn't, I'd probably continue what you were doing, focusing on
parallelization rather than extreme optimization.

Because I have a few subs being called *many* times, I have a couple
local optimization questions:

-What is the most efficient way of calling a module subroutine and/or
object's method? I'm assuming that, like C, its cheaper to pass a
reference to a hash/array than to pass the actual hash/array, right?
Does this also hold true for the return value from a sub?

I believe so, yes.

time perl -le 'sub foo {my @x=1..500; return \@x}; my $y; \
$y+= scalar (()=@{foo()}) foreach 1..1e6; print $y'
500000000
67.792u 0.002s 1:07.80 99.9% 0+0k 0+0io 0pf+0w

time perl -le 'sub foo {my @x=1..500; return @x}; my $y; \
$y+= scalar (()=foo()) foreach 1..1e6; print $y'
500000000
97.425u 0.006s 1:37.45 99.9% 0+0k 0+0io 0pf+0w

(Notice I need the scalar (()=...) construct. This is to force the sub
and/or dereference to be called in a list context. That is one danger with
doing these types of tests--it is easy to screw them up and test something
different from what you wanted to.)

-What is the most efficient way of converting back and forth between a
hash and an array, when the key->index mapping is known? Does the
answer change at all when dealing with references?

Probably a hash slice.
my %hash;
@hash{@key_list}=@value_list;

or going the other way:

my @value_list = @hash{@key_list};

-As above, when returning a value, does it make a difference if you
create a new local variable to return, or just return the computation
directly? I.e. (a very simple example):

my $a = {a=>'1', b=>'2'}; return $a;
# vs
return {a=>'1', b=>'2'};

It will make a difference, but it will be slight.

time perl -le 'sub foo {return [1..500]}; my $y; \
$y+= scalar (()=@{foo()}) foreach 1..1e6; print $y'
500000000
60.099u 0.002s 1:01.86 97.1% 0+0k 0+0io 0pf+0w

(Compare this to the 67.792 seconds above)

I'd like to think that perl's compiler might be able to figure out that
these are equivalent, but perhaps I am wrong.

At present, the Perl compiler does almost no optimization. The upside
of that is that it makes conducting hands-on test easy, as you know
your experimental constructs aren't going to be optimized away.

Your other questions can be answered by similar tests.

(BTW, in your other questions, you've also used = rather than =~ in several
places, something I didn't notice in your original until Anno pointed it
out.)

Xho

brundlefly76 · Oct 31, 2006

95% of the time when you think you want multithreading, you actually
want multiprocessing.

Check out Parallel::ForkManager....it is ridiculously simple and
robust.
I use it liberally on an SMP media processing farm that keeps a 8 cores
busy for 12 hours a day.

Just tell it how many processes you can use at a time and it will keep
all of them busy.

perl multithreading performance	16	Aug 27, 2008
Threading problem	2	Apr 25, 2010
Question about the example in perlthrtut	5	Mar 31, 2009
Noob threading question	2	Jul 25, 2006
threading question	0	Sep 11, 2009
Perlthrtut threadqueue example : possibly incorrect ?	1	Nov 28, 2006
urllib2 and threading	6	May 1, 2009
Threading / Queue management	3	Feb 2, 2009

Naive threading performance questions

Worky Workerson

jdhedden

xhoster

Worky Workerson

Worky Workerson

J. Gleixner

Worky Workerson

J. Gleixner

xhoster

xhoster

Worky Workerson

xhoster

anno4000

Worky Workerson

xhoster

brundlefly76

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads