Processing workload distribution

Ted · Feb 25, 2008

The machine is a single processor/quad core server running the latest
Windows Server.

The Perl we're using is the 64 bit build of 5.8.8 from Activestate.

I created a script that uses threads to launch a series of standalone
SQL scripts (I use system to invoke mysql to run my SQL scripts). I
had thought that system was spawning child processes, but it seems
that the script that launches them waits on each child before
launching the next. It is looking like the threads launches all
without wating for the next, which is closer to what I want.

However, regardless of what I have tried so far, all the hard work is
being done by one core, with the other three mostly sitting idle.

What can I do to distribute the workload evenly over all the cores in
the machine? The whole purpose of this machine was to serve as a
compute server for my colleagues and I, and in fact only two of us
have a need to access it, and this is so we can run heavy duty DB
analysis programs on it. The code works adequately, but it seems a
shame to have invested in a quad core processor based machine only to
have all the work done on only one core. I don't care if I have to
launch threads or child processes, as long as I can distribute the
work more evenly. Is what I am after possible?

Thanks

Ted

BTW: are there any 64 bit modules on CPAN? The repository for the 64
bit build that Activestate has provided seems to be empty.

Ben Morrow · Feb 25, 2008

Quoth Ted said:
The machine is a single processor/quad core server running the latest
Windows Server.

The Perl we're using is the 64 bit build of 5.8.8 from Activestate.

I created a script that uses threads to launch a series of standalone
SQL scripts (I use system to invoke mysql to run my SQL scripts).

In general it is easier to talk to the database directly using DBI than
to launch external mysql processes. In this case, however, you may find
it easier to make things run in parallel with an external command.

I had thought that system was spawning child processes, but it seems
that the script that launches them waits on each child before
launching the next.

Yes, this is how system works. As a special case, on Windows only,
calling

my $pid = system 1, "mysql ...";

(with a literal 1 as the first argument) will spawn a new process and
*not* wait for it. You can use the returned pid as an argument to wait
or waitpid. Alternatively, for more control, you can use Win32:

rocess
or IPC::Run.

It is looking like the threads launches all without wating for the
next, which is closer to what I want.

You would have to post your code for us to see what is happening here. I
think probably each thread is launching one child and waiting for it,
but since you have several threads you have several children.

However, regardless of what I have tried so far, all the hard work is
being done by one core, with the other three mostly sitting idle.

Are you *certain* you end up with more than one mysql process running at
a time? You should be able to check easily with Task Manager. If you
don't, then you need to fix that (or switch to using DBI to talk to the
database and Perl threads for parallelism). If you do, yet they are all
running on the same core, then something odd is happening: you will need
to show us a (minimal) script that runs two processes at the same time
that still end up on the same core.

BTW: are there any 64 bit modules on CPAN? The repository for the 64
bit build that Activestate has provided seems to be empty.

Modules on CPAN are (generally-speaking) architecture-neutral, given
that CPAN only holds C and Perl source. If ActiveState don't provide
64-bit ppms, then you can install pure-Perl modules directly from CPAN,
but XS modules will require a copy of the compiler used to build your
perl, probably a 64-bit version of MSVC. You may be able to get gcc
working (the 32-bit version of AS Perl has support for building modules
with gcc), but I don't know how well the 64-bit version is supported.

Ben

smallpond · Feb 25, 2008

The machine is a single processor/quad core server running the latest
Windows Server.

The Perl we're using is the 64 bit build of 5.8.8 from Activestate.

I created a script that uses threads to launch a series of standalone
SQL scripts (I use system to invoke mysql to run my SQL scripts). I
had thought that system was spawning child processes, but it seems
that the script that launches them waits on each child before
launching the next. It is looking like the threads launches all
without wating for the next, which is closer to what I want.

However, regardless of what I have tried so far, all the hard work is
being done by one core, with the other three mostly sitting idle.

What can I do to distribute the workload evenly over all the cores in
the machine? The whole purpose of this machine was to serve as a
compute server for my colleagues and I, and in fact only two of us
have a need to access it, and this is so we can run heavy duty DB
analysis programs on it. The code works adequately, but it seems a
shame to have invested in a quad core processor based machine only to
have all the work done on only one core. I don't care if I have to
launch threads or child processes, as long as I can distribute the
work more evenly. Is what I am after possible?

Thanks

Ted

BTW: are there any 64 bit modules on CPAN? The repository for the 64
bit build that Activestate has provided seems to be empty.

See perlthrtut and check your perl build with: perl -V
Look for a line like:

usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define

I believe that the last part indicates whether perl tries to use
multiple
CPUs.

xhoster · Feb 25, 2008

Ted said:
The machine is a single processor/quad core server running the latest
Windows Server.

The Perl we're using is the 64 bit build of 5.8.8 from Activestate.

I created a script that uses threads to launch a series of standalone
SQL scripts (I use system to invoke mysql to run my SQL scripts). I
had thought that system was spawning child processes, but it seems
that the script that launches them waits on each child before
launching the next.

That is what Perl's "system" does. It spawns a child process, and then
waits for it.

It is looking like the threads launches all
without wating for the next, which is closer to what I want.

However, regardless of what I have tried so far, all the hard work is
being done by one core, with the other three mostly sitting idle.

Generally (not always, but generally) the hard work of an SQL query
is done by the server, not the client. If your MySQL database server fails
to use all the cores in doing it's job, that is not a Perl problem, that is
a MySQL problem.

If you don't think this describes your situation, then launch the four
standalone SQL scripts separately from Windows. If you launch them this
way, are all cores used? If not, then you don't have a Perl problem.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Ben Morrow · Feb 25, 2008

Quoth smallpond said:
See perlthrtut and check your perl build with: perl -V
Look for a line like:

usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define

I believe that the last part indicates whether perl tries to use
multiple CPUs.

Please don't make things up and post them without verifying.
Multiplicity indicates whether it is possible to have several perl
interpreters in the same process: this is a requirement for ithreads (so
any perl with useithreads=define will also have usemultiplicity=define)
but is also used without ithreads for e.g. mod_perl. It has *nothing* to
do with use of multiple CPUs: this is determined by your OS's thread
scheduling policy.

Ben

Ted · Feb 25, 2008

See perlthrtut and check your perl build with: perl -V
Look for a line like:

usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define

I believe that the last part indicates whether perl tries to use
multiple
CPUs.- Hide quoted text -

- Show quoted text -

Yes, both on my development machine (only a dual core) and on the quad
core server, I see this: use of both threads and multiplicity are
there. Near the top of the output from perl -V, I see:

usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define

And lower down I see,

Characteristics of this binary (from libperl):
Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT
PERL_IMPLICIT_SYS PERL_MALLOC_WRAP
PL_OP_SLAB_ALLOC USE_ITHREADS USE_LARGE_FILES
USE_PERLIO USE_SITECUSTOMIZE

When I launch four mysql commands separately, I see four instances of
it in task manager, and the work load is distributed over all four
cores. So the problem must be in my code (See my reply to myself).

Thanks

Ted

Ted · Feb 25, 2008

That is what Perl's "system" does. It spawns a child process, and then
waits for it.

Generally (not always, but generally) the hard work of an SQL query
is done by the server, not the client. If your MySQL database server fails
to use all the cores in doing it's job, that is not a Perl problem, that is
a MySQL problem.

If you don't think this describes your situation, then launch the four
standalone SQL scripts separately from Windows. If you launch them this
way, are all cores used? If not, then you don't have a Perl problem.

Xho

--
--------------------http://NewsReader.Com/--------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

yes, so I have a perl problem (or probably more accurately my use of
perl is the problem).

Thanks

Ted

Ted · Feb 25, 2008

Here is my code:

use threads;

my $thr0 = threads->create(\&dbspawn,'e5pf16.sql','e5pf16.log');
$thr0->join;
my $thr1 = threads->create(\&dbspawn,'gpf45.sql','gpf45.log');
$thr1->join;
my $thr2 = threads->create(\&dbspawn,'gpf46.sql','gpf46.log');
$thr2->join;
my $thr3 = threads->create(\&dbspawn,'gpf47.sql','gpf47.log');
$thr3->join;
my $thr4 = threads->create(\&dbspawn,'gpf48.sql','gpf48.log');
$thr4->join;
my $thr5 = threads->create(\&dbspawn,'gpf49.sql','gpf49.log');
$thr5->join;
my $thr6 = threads->create(\&dbspawn,'gpf50.sql','gpf50.log');
$thr6->join;

sub dbspawn {
local ($script,$log) = @_;
system("mysql -t -u rejbyers --password=jesakos yohan < $script >
$log") == 0 or die "$script failed!";
}

Is the problem the way I use "system"?

When this script is running, I only ever see one instance of mysql
running in task manager.

Is there a Win64:

rocess, or will Win32:

rocess work on a 64 bit
version of windows using the 64 bit build of perl?

Thanks

Ted

Joost Diepenmaat · Feb 25, 2008

Ted said:
Here is my code:

use threads;

my $thr0 = threads->create(\&dbspawn,'e5pf16.sql','e5pf16.log');
$thr0->join;

you should join() all threads /after/ creating all of them. a join()
blocks untill the thread is done. In other words, move the ->join()
calls to the end of script.

xhoster · Feb 25, 2008

Ted said:
Here is my code:

use threads;

my $thr0 = threads->create(\&dbspawn,'e5pf16.sql','e5pf16.log');
$thr0->join;

Join means you wait for $thr0 to finish. Obviously this is going
to lead to serialization. The simple answer is to move all the joins
from where they are and put them after all of the creates have been done.
The *right* answer is probably to put the thread objects into an array,
then after that loop over that array doing joins, rather than using a bunch
of different variables to hold the objects.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Ted · Feb 25, 2008

you should join() all threads /after/ creating all of them. a join()
blocks untill the thread is done. In other words, move the ->join()
calls to the end of script.

Ah, OK.

While it is not likely in this case, since the scripts normally take
several hours to run, what happens if you create half a dozen threads,
and one of them finishes before you get to join it?

Also, I recall reading somewhere that some system calls block
everything in the process until the system call returns. Is that
true? If so, which are involved, and what would be the workaround?

Thanks

Ted

Ben Morrow · Feb 25, 2008

Quoth Ted said:
Here is my code:

use threads;

my $thr0 = threads->create(\&dbspawn,'e5pf16.sql','e5pf16.log');
$thr0->join;
my $thr1 = threads->create(\&dbspawn,'gpf45.sql','gpf45.log');
$thr1->join;
my $thr2 = threads->create(\&dbspawn,'gpf46.sql','gpf46.log');
$thr2->join;
my $thr3 = threads->create(\&dbspawn,'gpf47.sql','gpf47.log');
$thr3->join;
my $thr4 = threads->create(\&dbspawn,'gpf48.sql','gpf48.log');
$thr4->join;
my $thr5 = threads->create(\&dbspawn,'gpf49.sql','gpf49.log');
$thr5->join;
my $thr6 = threads->create(\&dbspawn,'gpf50.sql','gpf50.log');
$thr6->join;

As others have said, join waits for the thread, so you need to do it
last. Also, this is *not* an efficient way to create many threads. I
would probably write something like

$_->join for map {
threads->create(\&dbspawn, "$_.sql", "$_.log");
} qw/e5pf16 gpf45 gpf46 gpf48 gpf49 gpf50/;

or probably drop the sub altogether and use async:

$_->join for map {
async {
system "mysql -t -u rejbyers <$_.sql >$_.log"
and warn "$script failed";

# there's little point in dieing when the thread's about to
# exit anyway...
}
} qw/.../;

but you may find something a little less compressed easier to
understand:

my @jobs = qw/e5pf16 gpf45 .../;

my @threads = map {
async {
system "...";
}
} @jobs;

$_->join for @threads;

Of course, given that you only use the threads to launch processes, you
would be better off just launching processes:

my @procs = map { system 1, "..." } @jobs;

waitpid $_ for @procs;

Is there a Win64:rocess, or will Win32:rocess work on a 64 bit
version of windows using the 64 bit build of perl?

I don't know. That is, there isn't a Win64:

rocess, and I don't know
whether Win32:

rocess works on Win64. That depends on whether the
underlying Win32 system calls still work the same way: I suspect they
do. There are no Win64 results from CPAN Testers (surprise), which is
usually the first place to look. In any case, since there aren't any
Win64 ppms, you'd need to compile libwin32 from source. Do you have an
appropriate compiler?

Ben

Ted · Feb 25, 2008

Join means you wait for $thr0 to finish. Obviously this is going
to lead to serialization. The simple answer is to move all the joins
from where they are and put them after all of the creates have been done.
The *right* answer is probably to put the thread objects into an array,
then after that loop over that array doing joins, rather than using a bunch
of different variables to hold the objects.

Xho

--
--------------------http://NewsReader.Com/--------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Yup that was it.

This script was just a test to make sure I got it all right. I was
going to worry about putting it into arrays after I got it right.

that means one array for the script name, a second for the log name,
and a third for the thread objects.

I saw Joost's post before I saw yours, and set up a trivial test,
involving selecting everything from each of three tables, and
displaying them in the command line window. Yes, that window looks
like a mess, but it is getting all the data and it looks like the work
load is being distributed evenly, and I see an instance of mysql in
Task Manager for each thread. So now just a little rewriting to make
good use of arrays, and to simplify the code.

Thanks all.

Ted

Ben Morrow · Feb 25, 2008

Quoth Ted said:
Ah, OK.

While it is not likely in this case, since the scripts normally take
several hours to run, what happens if you create half a dozen threads,
and one of them finishes before you get to join it?

It sits there, consuming a small amount of memory in your Perl process,
until you do. This is exactly the same as a 'zombie' process under Unix.
If you don't want this to happen, and don't care about the return value
of the thread, you can ->detach it, which will cause it to clean itself
up as soon as it finishes. However, if your main thread exits, any
detached threads will be silently destroyed, so if this is a problem you
will need to join them.

Also, I recall reading somewhere that some system calls block
everything in the process until the system call returns. Is that
true? If so, which are involved, and what would be the workaround?

This depends on your underlying threads implementation, and is rarely
true any more (most systems use some sort of 'real' kernel threads,
which don't behave like this).

Ben

Joost Diepenmaat · Feb 25, 2008

Ted said:
While it is not likely in this case, since the scripts normally take
several hours to run, what happens if you create half a dozen threads,
and one of them finishes before you get to join it?

if the thread has already finished, the join() should just return
immediately (after possibly doing some cleanup). Note that you should
always either join() or detach() each thread.

Also, I recall reading somewhere that some system calls block
everything in the process until the system call returns.

Besides thread-specific calls, like the ones used by threads::shared's
synchronization methods, I don't know of anything like that. From perl's
point of view, you should probably regard threads as completely seperate
processes that have some nifty additional IPC mechanisms.

This also implies that unless you need those sharing mechanisms, you're
not gaining anything by using threads instead of fork() - and you're
quite likely to be losing efficiency and gaining some possible issues) -
at least on OSs that support fork natively. Though I gather that on
Windows systems, the situation is different.

xhoster · Feb 25, 2008

Ted said:
Ah, OK.

While it is not likely in this case, since the scripts normally take
several hours to run, what happens if you create half a dozen threads,
and one of them finishes before you get to join it?

Then it waits very very patiently for you to "join" it. No problem.
(Unless you accumulate a large number of them).

Also, I recall reading somewhere that some system calls block
everything in the process until the system call returns.

There is the Perl function named "system", then there are calls that perl
makes into the kernel, referred to as system calls. They are different
but confusingly named. Is your concern about the first or the latter?

Is that
true?

I suspect that it is true of certain kernel-level system calls, but such
calls should be extremely fast (otherwise, they OS designers should have
reworked them not to block.) There isn't much you can do about it in
a Perl program, or at least not without knowing exactly what the call
is.

If so, which are involved, and what would be the workaround?

Find a better OS to run your Perl program on?

Anyway, don't worry about it until it happens.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

smallpond · Feb 25, 2008

Quoth smallpond <[email protected]>:

Please don't make things up and post them without verifying.
Multiplicity indicates whether it is possible to have several perl
interpreters in the same process: this is a requirement for ithreads (so
any perl with useithreads=define will also have usemultiplicity=define)
but is also used without ithreads for e.g. mod_perl. It has *nothing* to
do with use of multiple CPUs: this is determined by your OS's thread
scheduling policy.

Ben

I didn't make it up, I looked at previous postings in this NG where
someone has ithreads=define and not usemultiplicity.
http://groups.google.com/group/comp...en&lnk=gst&q=usemultiplicity#18a233467aaa72d8

Ted · Feb 25, 2008

As others have said, join waits for the thread, so you need to do it
last. Also, this is *not* an efficient way to create many threads. I
would probably write something like

$_->join for map {
threads->create(\&dbspawn, "$_.sql", "$_.log");
} qw/e5pf16 gpf45 gpf46 gpf48 gpf49 gpf50/;

;-) I like the look of this.

or probably drop the sub altogether and use async:

$_->join for map {
async {
system "mysql -t -u rejbyers <$_.sql >$_.log"
and warn "$script failed";

# there's little point in dieing when the thread'sabout to
# exit anyway...
}
} qw/.../;

but you may find something a little less compressed easier to
understand:

my @jobs = qw/e5pf16 gpf45 .../;

my @threads = map {
async {
system "...";
}
} @jobs;

$_->join for @threads;

Of course, given that you only use the threads to launch processes, you
would be better off just launching processes:

my @procs = map { system 1, "..." } @jobs;

waitpid $_ for @procs;

Are there performance implications for the various forms you suggest,
apart from the obvious that the last just has the overhead of creating
the needed processes while the others create the same processes within
threads that carry their own overhead. But I doubt that this overhead
could be significant when the scripts that are launched take hours to
complete.

I don't know. That is, there isn't a Win64:rocess, and I don't know
whether Win32:rocess works on Win64. That depends on whether the
underlying Win32 system calls still work the same way: I suspect they
do. There are no Win64 results from CPAN Testers (surprise), which is
usually the first place to look. In any case, since there aren't any
Win64 ppms, you'd need to compile libwin32 from source. Do you have an
appropriate compiler?

Well, I have MS Visual Studio 2003, so I suppose I could build it here
(with a 64 bit target) and deploy it at the office. I have yet to get
cygwin to work on a 64 bit machine, so the 64 bit machine does not
have a native compiler installed. But wait a minute. On the 64 bit
machine, 'ppm list' includes libwin32 at the bottom of the list. And,
I see in the documentation Activestate provided with this build that
Win32:

rocess is available, along with a long list of other Win32
modules. Mind you, I don't know if it works since I just found it
mere minutes ago.

Thanks

Ted

Ted · Feb 25, 2008

Then it waits very very patiently for you to "join" it. No problem.
(Unless you accumulate a large number of them).

At present, we're talking a few dozen at a time. But, I wouldn't mind
having the problem of making it work for thousands, given what that
would imply about our commercial success!

There is the Perl function named "system", then there are calls that perl
makes into the kernel, referred to as system calls. They are different
but confusingly named. Is your concern about the first or the latter?

I assumed the former was just a special case of the latter. I was
concerned about the former, though, because at present I am merely
launched child processes using it. But the latter would always
create a nagging doubt, for when I need to make more interesting
threads.

I suspect that it is true of certain kernel-level system calls, but such
calls should be extremely fast (otherwise, they OS designers should have
reworked them not to block.) There isn't much you can do about it in
a Perl program, or at least not without knowing exactly what the call
is.

Find a better OS to run your Perl program on?
Anyway, don't worry about it until it happens.

OK. In this case, though, I don't have the option of choosing OS
since our administrator only knows Windows, and I have neither the
time nor the full suite of skills required to administer a unix box.
I develop application level code and analyze data, and even for that
there is insufficient time in the day. Now if Perl could make the
days last 48 hours ....

;-)

Thanks

Ted

Ted · Feb 25, 2008

if the thread has already finished, the join() should just return
immediately (after possibly doing some cleanup). Note that you should
always either join() or detach() each thread.

Besides thread-specific calls, like the ones used by threads::shared's
synchronization methods, I don't know of anything like that. From perl's
point of view, you should probably regard threads as completely seperate
processes that have some nifty additional IPC mechanisms.

This also implies that unless you need those sharing mechanisms, you're
not gaining anything by using threads instead of fork() - and you're
quite likely to be losing efficiency and gaining some possible issues) -
at least on OSs that support fork natively. Though I gather that on
Windows systems, the situation is different.

Good to know. Thanks. I didn't consider fork because i was under the
impression that it doesn't work right on Windows.

Thanks

ted

Old LOCKFILE warning on github repo - I cant run the app locally	1	Apr 22, 2023
A question about multithreading	5	Mar 1, 2008
cross platform distribution	5	Sep 4, 2009
Crud operations VS stored procedures	1	May 18, 2023
New Member: Simple 'where to start' question.	0	Nov 1, 2021
Parallel processing on shared data structures	2	Mar 19, 2009
Upgrading Company's Internal Record Keeping Systems	0	Sep 24, 2021
Bash scripts for web apps	1	Jan 16, 2023

Processing workload distribution

Ted

Ben Morrow

smallpond

xhoster

Ben Morrow

Ted

Ted

Ted

Joost Diepenmaat

xhoster

Ted

Ben Morrow

Ted

Ben Morrow

Joost Diepenmaat

xhoster

smallpond

Ted

Ted

Ted

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads