Discussion on converting script to multithreaded one

  • Thread starter Domenico Discepola
  • Start date
D

Domenico Discepola

I've recently been introduced to perldoc's perlthrtut and found this
fascinating because of the tasks I am frequently asked to script. For
example, I recently wrote a program that: 1.reads X lines of an ascii file
into RAM, 2. performs some transformations on this data, 3. outputs the
results, 4. returns back to step 1. Obviously, this continues until EOF.
Part of my code is:

######
#snip
tie @arr_file, 'Tie::File', $file_input, recsep => "${g_delimiter_record}";
while ( $arr_file[$r] ) {
for ( $buffer_count=0; $buffer_count < ${load_buffer}; $buffer_count++ )
{
#Step a: Push row to array for processing below
$r++;
}
#Step b: Now perform some operations on the data in RAM here
#Step c: reset the array here
}

I have access to a multi-cpu Windows server. My questions are: 1. I was
thinking of using the time that the script is performing step b to continue
with step a. This way, I can essentially load the next chunk of data in
RAM. I would somehow have to wait until the 1st iteration of step b is
finished before using the 'new' data... Is this correct? What are the
challenges with this method? 2. Will this recoding effort be worth the
performance gain? 3. As I am new to multithreading concepts, if someone can
provide me with a concrete example of this, I would appreciate it.
Perlthrtut does provide some examples and I will continue to look into them
but they are a little hard for us newbies to understand at first...

TIA
 
B

Bill

Domenico said:
I've recently been introduced to perldoc's perlthrtut and found this
fascinating because of the tasks I am frequently asked to script. For
example, I recently wrote a program that: 1.reads X lines of an ascii file
into RAM, 2. performs some transformations on this data, 3. outputs the
results, 4. returns back to step 1. Obviously, this continues until EOF.
Part of my code is:

######
#snip
tie @arr_file, 'Tie::File', $file_input, recsep => "${g_delimiter_record}";
while ( $arr_file[$r] ) {
for ( $buffer_count=0; $buffer_count < ${load_buffer}; $buffer_count++ )
{
#Step a: Push row to array for processing below
$r++;
}
#Step b: Now perform some operations on the data in RAM here
#Step c: reset the array here
}

I have access to a multi-cpu Windows server. My questions are: 1. I was
thinking of using the time that the script is performing step b to continue
with step a. This way, I can essentially load the next chunk of data in
RAM. I would somehow have to wait until the 1st iteration of step b is
finished before using the 'new' data... Is this correct? What are the
challenges with this method?

Can be done, but you will need to use a pipe or queue, with one thread
loading data on one end and the other taking it out. Check Thread::Queue

2. Will this recoding effort be worth the
performance gain?

With a file on the hard drive, no. If you are reading the data over a
variable connection, like an internet connection with a slow server, it
would make more sense.

If you have the resources to make threads anyway, and the file is not
gigs in size, check if you cannot just slurp the whole file and then
process it all in RAM. I think that would likely be the fastest way.

3. As I am new to multithreading concepts, if someone can
provide me with a concrete example of this, I would appreciate it.
Perlthrtut does provide some examples and I will continue to look into them
but they are a little hard for us newbies to understand at first...

They are fairly good I think. Also, read the docs on Thread::Queue and
thread.pm
 
B

Ben Liddicott

Domenico Discepola said:
2. Will this recoding effort be worth the
performance gain? 3. As I am new to multithreading concepts, if someone can
provide me with a concrete example of this, I would appreciate it.
Perlthrtut does provide some examples and I will continue to look into them
but they are a little hard for us newbies to understand at first...

In my opinion you are unlikely to see worthwhile performance gains from this, as disk reads are probably cached ahead of time, and writes are almost certainly cached.

You can, if you whish, use Win32API's support for CreateFile to pass the FILE_SEQUENTIAL_READ when you open the input file, which will help optimize this process slightly.

You will see faster performance on the dual CPU server anyway, as other activity (Task Manager updates for example, and other services such as Disk IO, database servers etc) will be able to be shared by the other processor.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top