run rsync multithreaded within perl

Discussion in 'Perl Misc' started by Hongyi Zhao, Feb 10, 2014.

  1. Hongyi Zhao

    Hongyi Zhao Guest

    Hi all,

    Could someone here please give me a example to run rsync multithreaded
    within perl?

    Regards
     
    Hongyi Zhao, Feb 10, 2014
    #1
    1. Advertisements

  2. Here is a quick parallel rsync.
    You can understand the core idea and adopt it as you think.
    First you have to configure the passwordless ssh login between your servers.
    Also be very careful of the --delete rsync switch I use because it can
    wipe some of your remote data

    for testing create the folders

    mkdir /tmp/{src1,src2,src3,dest1,dest2,dest3}

    So, here it is




    #!/usr/bin/perl
    # example of parallel rsync
    use strict;
    use warnings;
    use feature 'say';

    my $parallel_jobs = 2;
    my $rsync_path = '/usr/bin/rsync';
    my %rsync_jobs =(

    'test rsync 1' => {
    'local folder' => '/tmp/src1',
    'remote server' => '127.0.1.1',
    'remote folder' => '/tmp/dest1',
    'rsync switches' => '-a -z -v -P --partial --delete --rsh=\'ssh\'' },

    'test rsync 2' => {
    'local folder' => '/tmp/src2',
    'remote server' => '127.0.1.2',
    'remote folder' => '/tmp/dest2',
    'rsync switches' => '-a -z -v -P --partial --delete --rsh=\'ssh\'' },

    'test rsync 3' => {
    'local folder' => '/tmp/src3',
    'remote server' => '127.0.1.3',
    'remote folder' => '/tmp/dest3',
    'rsync switches' => '-a -z -v -P --partial --delete --rsh=\'ssh\'' } );

    # Some quick checks
    die "Could not found executable \"$rsync_path\" for user \"".
    getpwuid($>) ."\"\n" unless -x $rsync_path;
    -d $rsync_jobs{$_}->{'local folder'} || die "Could not found the local
    source readable directory \"$rsync_jobs{$_}->{'local folder'}\" for user
    \"". getpwuid($>) ."\", for the rsync job \"$_\"\n" for keys %rsync_jobs;




    # Create job chunks with members as many as the $parallel_jobs
    my %chunk = (id=>1, data=>[]);
    foreach (sort {$a cmp $b} keys %rsync_jobs)
    {
    if ($parallel_jobs == scalar @{$chunk{data}})
    {
    &Parallel_execution_of_chunk;
    $chunk{data}=[];
    $chunk{id}++
    }

    push @{$chunk{data}}, $_
    }

    &Parallel_execution_of_chunk;




    sub Parallel_execution_of_chunk
    {
    say "Starting jodid $chunk{id} of ". scalar @{$chunk{data}} ." parallel
    jobs";
    my @Threads;
    $|=1;


    for my $chunk (@{$chunk{data}})
    {
    my $answer = fork;
    die "Perl process $$ Could not fork\n" unless defined $answer;

    if ( $answer == 0 )
    {
    print "rsync thread $$ started\n";
    my $command = "$rsync_path $rsync_jobs{$chunk}->{'rsync switches'}
    $rsync_jobs{$chunk}->{'local folder'}/ $rsync_jobs{$chunk}->{'remote
    server'}:$rsync_jobs{$chunk}->{'remote folder'}";
    open SHELL, '-|', "$command 2>&1" or die "Could not run rsync :
    \"$command\" because $? , $^N\n";
    while (<SHELL>) { print "jodid $chunk{id} , rsync \"$chunk\" $_" }
    close SHELL;
    exit 0
    }
    else
    {
    push @Threads, $answer
    }
    }

    print "Waiting the tids: @Threads\n";
    foreach my $tid (@Threads) { waitpid($tid, 0); print "Thread $tid
    finished\n" }
    }
     
    George Mpouras, Feb 10, 2014
    #2
    1. Advertisements

  3. Hongyi Zhao

    tom Guest

    No
     
    tom, Feb 10, 2014
    #3
  4. Hongyi Zhao

    Hongyi Zhao Guest

    Thanks, George, based on your hints, I use the following one for my case:

    -----------
    #!/usr/bin/perl -w
    # example of parallel rsync
    use strict;
    use warnings;
    use feature 'say';

    foreach (1..3) {
    mkdir('/tmp/src'.$_, 0777);
    }

    my $parallel_jobs = 2;
    my $rsync_path = '/usr/local/bin/rsync';
    my %rsync_jobs =(

    'test rsync 1' => {
    'local folder' => '/tmp/src1',
    'remote server' => 'ftp.cn.debian.org',
    'remote folder' => '::debian',
    'rsync switches' => '-a -z -v -P --partial --delete' },

    'test rsync 2' => {
    'local folder' => '/tmp/src2',
    'remote server' => 'ftp.cn.debian.org',
    'remote folder' => '::debian',
    'rsync switches' => '-a -z -v -P --partial --delete' },

    'test rsync 3' => {
    'local folder' => '/tmp/src3',
    'remote server' => 'ftp.cn.debian.org',
    'remote folder' => '::debian',
    'rsync switches' => '-a -z -v -P --partial --delete' } );

    # Some quick checks
    die "Could not found executable \"$rsync_path\" for user \"".
    getpwuid($>) ."\"\n" unless -x $rsync_path;
    -d $rsync_jobs{$_}->{'local folder'} || die "Could not found the local
    source readable directory \"$rsync_jobs{$_}->{'local folder'}\" for user
    \"". getpwuid($>) ."\", for the rsync job \"$_\"\n" for keys %rsync_jobs;




    # Create job chunks with members as many as the $parallel_jobs
    my %chunk = (id=>1, data=>[]);
    foreach (sort {$a cmp $b} keys %rsync_jobs)
    {
    if ($parallel_jobs == scalar @{$chunk{data}})
    {
    &Parallel_execution_of_chunk;
    $chunk{data}=[];
    $chunk{id}++
    }

    push @{$chunk{data}}, $_
    }

    &Parallel_execution_of_chunk;




    sub Parallel_execution_of_chunk
    {
    say "Starting jodid $chunk{id} of ". scalar @{$chunk{data}} ." parallel
    jobs";
    my @Threads;
    $|=1;


    for my $chunk (@{$chunk{data}})
    {
    my $answer = fork;
    die "Perl process $$ Could not fork\n" unless defined $answer;

    if ( $answer == 0 )
    {
    print "rsync thread $$ started\n";
    my $command = "$rsync_path $rsync_jobs{$chunk}->{'rsync
    switches'}
    $rsync_jobs{$chunk}->{'local folder'}/ $rsync_jobs{$chunk}->{'remote
    server'}:$rsync_jobs{$chunk}->{'remote folder'}";
    open SHELL, '-|', "$command 2>&1" or die "Could not
    run rsync :
    \"$command\" because $? , $^N\n";
    while (<SHELL>) { print "jodid $chunk{id} , rsync \"$chunk
    \" $_" }
    close SHELL;
    exit 0
    }
    else
    {
    push @Threads, $answer
    }
    }

    print "Waiting the tids: @Threads\n";
    foreach my $tid (@Threads) { waitpid($tid, 0); print "Thread $tid
    finished\n" }
    }
    -----------

    But when I run it with the following command:

    werner@debian:~$ ./prsync.pl

    I meet the following errors:

    ....
    jodid 1 , rsync "test rsync 2" sh: 2: /tmp/src2/: Permission denied
    jodid 1 , rsync "test rsync 1" sh: 2: /tmp/src1/: Permission denied
    Thread 5490
    finished
    Thread 5491
    finished
    Starting jodid 2 of 1 parallel
    jobs
    Waiting the tids: 5498
    rsync thread 5498 started
    Use of uninitialized value in concatenation (.) or string at ./prsync.pl
    line 78.
    .....
    rsync error: syntax or usage error (code 1) at main.c(1622)
    [Receiver=3.1.1pre1]
    jodid 2 , rsync "test rsync 3" sh: 2: /tmp/src3/: Permission denied
    Thread 5498
    finished

    Regards
     
    Hongyi Zhao, Feb 11, 2014
    #4
  5. Hongyi Zhao

    Hongyi Zhao Guest

    Dear Ben,

    I'm a newbie with perl. The purpose of mine is to do the following thing
    multithreaded with the power of perl:

    rsync [rsync_options] ftp.cn.debian.org::debian /destdir/

    The rsync_server used here can be one or more, the /destdir/ is only one
    directory, i.e., I want to rsyncing some huge data from one or more
    remote rsync servers into my local folder, here /destdir/.

    Another issue of mine is as follows: if have the files_list to be
    rsynced,
    I can use the --files-from option of rsync. In this case, if the list is
    so huge, say, more than 50,000 files to be rsynced, the rsync will use
    much time for this thing. If I can multithreaded this job, then the time
    used should be shortened a lot.

    Regards
     
    Hongyi Zhao, Feb 11, 2014
    #5


  6. Honestly Hongyi, I think what you only need is the following command

    nohup rsync ... 1> /tmp/rsync_activity.log 2> /tmp/rsync_errors.log &

    From time to time monitor the activity using

    tail -f /tmp/rsync_activity.log
    tail -f /tmp/rsync_errors.log


    Optionally you go to a friend that have already have download the mirror
    and copy the whole directory from his computer. That will make the rsync
    a lot faster !
     
    George Mpouras, Feb 11, 2014
    #6
  7. Hongyi Zhao

    Hongyi Zhao Guest

    Thanks for your hints, but I've noticed a perl script named apt-mirror,
    see here for detail:

    https://github.com/apt-mirror/apt-mirror

    Which use wget as the downloader and performed the multithread by power
    of perl.

    Whith th nohup method you mentioned above, I'll lose some more control
    than the perl's implementation as things were done within apt-mirror.

    So I just want to transplant the rsync to it.

    Regards
     
    Hongyi Zhao, Feb 12, 2014
    #7
  8. Hongyi Zhao

    gamo Guest

    El 10/02/14 16:14, escribió:
    I agree, because rsync does things like creating list of
    files to transfer that are not paralellizable. Simply
    interrupting rsync transfers and rerunning, could lead
    to indesiderable effects. I would not play with things
    like rsync until its clever authors do it for me.
     
    gamo, Feb 15, 2014
    #8
  9. your network will be speed limit
    multithreading wont help
    rsync -avPz /folder1 me@box33:/home/folder44 will put folder1 in folder44
    use trailing slash for contents not folder
    use -e ssh if need ssh
     
    johannes falcone, Feb 20, 2014
    #9
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.