Discussion on converting script to multithreaded one

Domenico Discepola · Jan 7, 2004

I've recently been introduced to perldoc's perlthrtut and found this
fascinating because of the tasks I am frequently asked to script. For
example, I recently wrote a program that: 1.reads X lines of an ascii file
into RAM, 2. performs some transformations on this data, 3. outputs the
results, 4. returns back to step 1. Obviously, this continues until EOF.
Part of my code is:

######
#snip
tie @arr_file, 'Tie::File', $file_input, recsep => "${g_delimiter_record}";
while ( $arr_file[$r] ) {
for ( $buffer_count=0; $buffer_count < ${load_buffer}; $buffer_count++ )
{
#Step a: Push row to array for processing below
$r++;
}
#Step b: Now perform some operations on the data in RAM here
#Step c: reset the array here
}

I have access to a multi-cpu Windows server. My questions are: 1. I was
thinking of using the time that the script is performing step b to continue
with step a. This way, I can essentially load the next chunk of data in
RAM. I would somehow have to wait until the 1st iteration of step b is
finished before using the 'new' data... Is this correct? What are the
challenges with this method? 2. Will this recoding effort be worth the
performance gain? 3. As I am new to multithreading concepts, if someone can
provide me with a concrete example of this, I would appreciate it.
Perlthrtut does provide some examples and I will continue to look into them
but they are a little hard for us newbies to understand at first...

TIA

Bill · Jan 7, 2004

Domenico said:
I've recently been introduced to perldoc's perlthrtut and found this
fascinating because of the tasks I am frequently asked to script. For
example, I recently wrote a program that: 1.reads X lines of an ascii file
into RAM, 2. performs some transformations on this data, 3. outputs the
results, 4. returns back to step 1. Obviously, this continues until EOF.
Part of my code is:

######
#snip
tie @arr_file, 'Tie::File', $file_input, recsep => "${g_delimiter_record}";
while ( $arr_file[$r] ) {
for ( $buffer_count=0; $buffer_count < ${load_buffer}; $buffer_count++ )
{
#Step a: Push row to array for processing below
$r++;
}
#Step b: Now perform some operations on the data in RAM here
#Step c: reset the array here
}

I have access to a multi-cpu Windows server. My questions are: 1. I was
thinking of using the time that the script is performing step b to continue
with step a. This way, I can essentially load the next chunk of data in
RAM. I would somehow have to wait until the 1st iteration of step b is
finished before using the 'new' data... Is this correct? What are the
challenges with this method?

Can be done, but you will need to use a pipe or queue, with one thread
loading data on one end and the other taking it out. Check Thread::Queue

2. Will this recoding effort be worth the

performance gain?

With a file on the hard drive, no. If you are reading the data over a
variable connection, like an internet connection with a slow server, it
would make more sense.

If you have the resources to make threads anyway, and the file is not
gigs in size, check if you cannot just slurp the whole file and then
process it all in RAM. I think that would likely be the fastest way.

3. As I am new to multithreading concepts, if someone can

provide me with a concrete example of this, I would appreciate it.
Perlthrtut does provide some examples and I will continue to look into them
but they are a little hard for us newbies to understand at first...

They are fairly good I think. Also, read the docs on Thread::Queue and
thread.pm

Ben Liddicott · Jan 8, 2004

Domenico Discepola said:
2. Will this recoding effort be worth the
performance gain? 3. As I am new to multithreading concepts, if someone can
provide me with a concrete example of this, I would appreciate it.
Perlthrtut does provide some examples and I will continue to look into them
but they are a little hard for us newbies to understand at first...

In my opinion you are unlikely to see worthwhile performance gains from this, as disk reads are probably cached ahead of time, and writes are almost certainly cached.

You can, if you whish, use Win32API's support for CreateFile to pass the FILE_SEQUENTIAL_READ when you open the input file, which will help optimize this process slightly.

You will see faster performance on the dual CPU server anyway, as other activity (Task Manager updates for example, and other services such as Disk IO, database servers etc) will be able to be shared by the other processor.

Partially completed coding of loop script, need help finishing.	0	Oct 7, 2022
Issue with textbox script?	0	Sep 5, 2022
Need help with this script	4	Mar 12, 2023
Script to send email not working	1	Apr 10, 2023
How to have two html audio players on one page?	0	May 3, 2022
Script stops working when using variables to save time typing...	4	Oct 31, 2022
Help wanted to modify Gimp Script-fu : will pay	0	Aug 26, 2022
Only one table shows up with the information	2	Mar 29, 2023

Discussion on converting script to multithreaded one

Domenico Discepola

Bill

Ben Liddicott

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads