run rsync multithreaded within perl

H

Hongyi Zhao

Hi all,

Could someone here please give me a example to run rsync multithreaded
within perl?

Regards
 
G

George Mpouras

Here is a quick parallel rsync.
You can understand the core idea and adopt it as you think.
First you have to configure the passwordless ssh login between your servers.
Also be very careful of the --delete rsync switch I use because it can
wipe some of your remote data

for testing create the folders

mkdir /tmp/{src1,src2,src3,dest1,dest2,dest3}

So, here it is




#!/usr/bin/perl
# example of parallel rsync
use strict;
use warnings;
use feature 'say';

my $parallel_jobs = 2;
my $rsync_path = '/usr/bin/rsync';
my %rsync_jobs =(

'test rsync 1' => {
'local folder' => '/tmp/src1',
'remote server' => '127.0.1.1',
'remote folder' => '/tmp/dest1',
'rsync switches' => '-a -z -v -P --partial --delete --rsh=\'ssh\'' },

'test rsync 2' => {
'local folder' => '/tmp/src2',
'remote server' => '127.0.1.2',
'remote folder' => '/tmp/dest2',
'rsync switches' => '-a -z -v -P --partial --delete --rsh=\'ssh\'' },

'test rsync 3' => {
'local folder' => '/tmp/src3',
'remote server' => '127.0.1.3',
'remote folder' => '/tmp/dest3',
'rsync switches' => '-a -z -v -P --partial --delete --rsh=\'ssh\'' } );

# Some quick checks
die "Could not found executable \"$rsync_path\" for user \"".
getpwuid($>) ."\"\n" unless -x $rsync_path;
-d $rsync_jobs{$_}->{'local folder'} || die "Could not found the local
source readable directory \"$rsync_jobs{$_}->{'local folder'}\" for user
\"". getpwuid($>) ."\", for the rsync job \"$_\"\n" for keys %rsync_jobs;




# Create job chunks with members as many as the $parallel_jobs
my %chunk = (id=>1, data=>[]);
foreach (sort {$a cmp $b} keys %rsync_jobs)
{
if ($parallel_jobs == scalar @{$chunk{data}})
{
&Parallel_execution_of_chunk;
$chunk{data}=[];
$chunk{id}++
}

push @{$chunk{data}}, $_
}

&Parallel_execution_of_chunk;




sub Parallel_execution_of_chunk
{
say "Starting jodid $chunk{id} of ". scalar @{$chunk{data}} ." parallel
jobs";
my @Threads;
$|=1;


for my $chunk (@{$chunk{data}})
{
my $answer = fork;
die "Perl process $$ Could not fork\n" unless defined $answer;

if ( $answer == 0 )
{
print "rsync thread $$ started\n";
my $command = "$rsync_path $rsync_jobs{$chunk}->{'rsync switches'}
$rsync_jobs{$chunk}->{'local folder'}/ $rsync_jobs{$chunk}->{'remote
server'}:$rsync_jobs{$chunk}->{'remote folder'}";
open SHELL, '-|', "$command 2>&1" or die "Could not run rsync :
\"$command\" because $? , $^N\n";
while (<SHELL>) { print "jodid $chunk{id} , rsync \"$chunk\" $_" }
close SHELL;
exit 0
}
else
{
push @Threads, $answer
}
}

print "Waiting the tids: @Threads\n";
foreach my $tid (@Threads) { waitpid($tid, 0); print "Thread $tid
finished\n" }
}
 
H

Hongyi Zhao

for testing create the folders

mkdir /tmp/{src1,src2,src3,dest1,dest2,dest3}

So, here it is

Thanks, George, based on your hints, I use the following one for my case:

-----------
#!/usr/bin/perl -w
# example of parallel rsync
use strict;
use warnings;
use feature 'say';

foreach (1..3) {
mkdir('/tmp/src'.$_, 0777);
}

my $parallel_jobs = 2;
my $rsync_path = '/usr/local/bin/rsync';
my %rsync_jobs =(

'test rsync 1' => {
'local folder' => '/tmp/src1',
'remote server' => 'ftp.cn.debian.org',
'remote folder' => '::debian',
'rsync switches' => '-a -z -v -P --partial --delete' },

'test rsync 2' => {
'local folder' => '/tmp/src2',
'remote server' => 'ftp.cn.debian.org',
'remote folder' => '::debian',
'rsync switches' => '-a -z -v -P --partial --delete' },

'test rsync 3' => {
'local folder' => '/tmp/src3',
'remote server' => 'ftp.cn.debian.org',
'remote folder' => '::debian',
'rsync switches' => '-a -z -v -P --partial --delete' } );

# Some quick checks
die "Could not found executable \"$rsync_path\" for user \"".
getpwuid($>) ."\"\n" unless -x $rsync_path;
-d $rsync_jobs{$_}->{'local folder'} || die "Could not found the local
source readable directory \"$rsync_jobs{$_}->{'local folder'}\" for user
\"". getpwuid($>) ."\", for the rsync job \"$_\"\n" for keys %rsync_jobs;




# Create job chunks with members as many as the $parallel_jobs
my %chunk = (id=>1, data=>[]);
foreach (sort {$a cmp $b} keys %rsync_jobs)
{
if ($parallel_jobs == scalar @{$chunk{data}})
{
&Parallel_execution_of_chunk;
$chunk{data}=[];
$chunk{id}++
}

push @{$chunk{data}}, $_
}

&Parallel_execution_of_chunk;




sub Parallel_execution_of_chunk
{
say "Starting jodid $chunk{id} of ". scalar @{$chunk{data}} ." parallel
jobs";
my @Threads;
$|=1;


for my $chunk (@{$chunk{data}})
{
my $answer = fork;
die "Perl process $$ Could not fork\n" unless defined $answer;

if ( $answer == 0 )
{
print "rsync thread $$ started\n";
my $command = "$rsync_path $rsync_jobs{$chunk}->{'rsync
switches'}
$rsync_jobs{$chunk}->{'local folder'}/ $rsync_jobs{$chunk}->{'remote
server'}:$rsync_jobs{$chunk}->{'remote folder'}";
open SHELL, '-|', "$command 2>&1" or die "Could not
run rsync :
\"$command\" because $? , $^N\n";
while (<SHELL>) { print "jodid $chunk{id} , rsync \"$chunk
\" $_" }
close SHELL;
exit 0
}
else
{
push @Threads, $answer
}
}

print "Waiting the tids: @Threads\n";
foreach my $tid (@Threads) { waitpid($tid, 0); print "Thread $tid
finished\n" }
}
-----------

But when I run it with the following command:

werner@debian:~$ ./prsync.pl

I meet the following errors:

....
jodid 1 , rsync "test rsync 2" sh: 2: /tmp/src2/: Permission denied
jodid 1 , rsync "test rsync 1" sh: 2: /tmp/src1/: Permission denied
Thread 5490
finished
Thread 5491
finished
Starting jodid 2 of 1 parallel
jobs
Waiting the tids: 5498
rsync thread 5498 started
Use of uninitialized value in concatenation (.) or string at ./prsync.pl
line 78.
.....
rsync error: syntax or usage error (code 1) at main.c(1622)
[Receiver=3.1.1pre1]
jodid 2 , rsync "test rsync 3" sh: 2: /tmp/src3/: Permission denied
Thread 5498
finished

Regards
 
H

Hongyi Zhao

The local and remote arguments are the wrong way around. You are trying
to copy your local folder to the remote site, which (fortunately) you
are not allowed to do.

Dear Ben,

I'm a newbie with perl. The purpose of mine is to do the following thing
multithreaded with the power of perl:

rsync [rsync_options] ftp.cn.debian.org::debian /destdir/

The rsync_server used here can be one or more, the /destdir/ is only one
directory, i.e., I want to rsyncing some huge data from one or more
remote rsync servers into my local folder, here /destdir/.

Another issue of mine is as follows: if have the files_list to be
rsynced,
I can use the --files-from option of rsync. In this case, if the list is
so huge, say, more than 50,000 files to be rsynced, the rsync will use
much time for this thing. If I can multithreaded this job, then the time
used should be shortened a lot.

Regards
 
G

George Mpouras

...
jodid 1 , rsync "test rsync 2" sh: 2: /tmp/src2/: Permission denied
jodid 1 , rsync "test rsync 1" sh: 2: /tmp/src1/: Permission denied



Honestly Hongyi, I think what you only need is the following command

nohup rsync ... 1> /tmp/rsync_activity.log 2> /tmp/rsync_errors.log &

From time to time monitor the activity using

tail -f /tmp/rsync_activity.log
tail -f /tmp/rsync_errors.log


Optionally you go to a friend that have already have download the mirror
and copy the whole directory from his computer. That will make the rsync
a lot faster !
 
H

Hongyi Zhao

Honestly Hongyi, I think what you only need is the following command

nohup rsync ... 1> /tmp/rsync_activity.log 2> /tmp/rsync_errors.log &

From time to time monitor the activity using

tail -f /tmp/rsync_activity.log tail -f /tmp/rsync_errors.log


Optionally you go to a friend that have already have download the mirror
and copy the whole directory from his computer. That will make the rsync
a lot faster !

Thanks for your hints, but I've noticed a perl script named apt-mirror,
see here for detail:

https://github.com/apt-mirror/apt-mirror

Which use wget as the downloader and performed the multithread by power
of perl.

Whith th nohup method you mentioned above, I'll lose some more control
than the perl's implementation as things were done within apt-mirror.

So I just want to transplant the rsync to it.

Regards
 
G

gamo

El 10/02/14 16:14, (e-mail address removed) escribió:

I agree, because rsync does things like creating list of
files to transfer that are not paralellizable. Simply
interrupting rsync transfers and rerunning, could lead
to indesiderable effects. I would not play with things
like rsync until its clever authors do it for me.
 
J

johannes falcone

Hi all,



Could someone here please give me a example to run rsync multithreaded

within perl?



Regards

your network will be speed limit
multithreading wont help
rsync -avPz /folder1 me@box33:/home/folder44 will put folder1 in folder44
use trailing slash for contents not folder
use -e ssh if need ssh
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,521
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top