run rsync multithreaded within perl

Discussion in 'Perl Misc' started by Hongyi Zhao, Feb 10, 2014.

  1. Hongyi Zhao

    Hongyi Zhao Guest

    Hi all,

    Could someone here please give me a example to run rsync multithreaded
    within perl?

    Regards
    --
    ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
    Hongyi Zhao, Feb 10, 2014
    #1
    1. Advertising

  2. Here is a quick parallel rsync.
    You can understand the core idea and adopt it as you think.
    First you have to configure the passwordless ssh login between your servers.
    Also be very careful of the --delete rsync switch I use because it can
    wipe some of your remote data

    for testing create the folders

    mkdir /tmp/{src1,src2,src3,dest1,dest2,dest3}

    So, here it is




    #!/usr/bin/perl
    # example of parallel rsync
    use strict;
    use warnings;
    use feature 'say';

    my $parallel_jobs = 2;
    my $rsync_path = '/usr/bin/rsync';
    my %rsync_jobs =(

    'test rsync 1' => {
    'local folder' => '/tmp/src1',
    'remote server' => '127.0.1.1',
    'remote folder' => '/tmp/dest1',
    'rsync switches' => '-a -z -v -P --partial --delete --rsh=\'ssh\'' },

    'test rsync 2' => {
    'local folder' => '/tmp/src2',
    'remote server' => '127.0.1.2',
    'remote folder' => '/tmp/dest2',
    'rsync switches' => '-a -z -v -P --partial --delete --rsh=\'ssh\'' },

    'test rsync 3' => {
    'local folder' => '/tmp/src3',
    'remote server' => '127.0.1.3',
    'remote folder' => '/tmp/dest3',
    'rsync switches' => '-a -z -v -P --partial --delete --rsh=\'ssh\'' } );

    # Some quick checks
    die "Could not found executable \"$rsync_path\" for user \"".
    getpwuid($>) ."\"\n" unless -x $rsync_path;
    -d $rsync_jobs{$_}->{'local folder'} || die "Could not found the local
    source readable directory \"$rsync_jobs{$_}->{'local folder'}\" for user
    \"". getpwuid($>) ."\", for the rsync job \"$_\"\n" for keys %rsync_jobs;




    # Create job chunks with members as many as the $parallel_jobs
    my %chunk = (id=>1, data=>[]);
    foreach (sort {$a cmp $b} keys %rsync_jobs)
    {
    if ($parallel_jobs == scalar @{$chunk{data}})
    {
    &Parallel_execution_of_chunk;
    $chunk{data}=[];
    $chunk{id}++
    }

    push @{$chunk{data}}, $_
    }

    &Parallel_execution_of_chunk;




    sub Parallel_execution_of_chunk
    {
    say "Starting jodid $chunk{id} of ". scalar @{$chunk{data}} ." parallel
    jobs";
    my @Threads;
    $|=1;


    for my $chunk (@{$chunk{data}})
    {
    my $answer = fork;
    die "Perl process $$ Could not fork\n" unless defined $answer;

    if ( $answer == 0 )
    {
    print "rsync thread $$ started\n";
    my $command = "$rsync_path $rsync_jobs{$chunk}->{'rsync switches'}
    $rsync_jobs{$chunk}->{'local folder'}/ $rsync_jobs{$chunk}->{'remote
    server'}:$rsync_jobs{$chunk}->{'remote folder'}";
    open SHELL, '-|', "$command 2>&1" or die "Could not run rsync :
    \"$command\" because $? , $^N\n";
    while (<SHELL>) { print "jodid $chunk{id} , rsync \"$chunk\" $_" }
    close SHELL;
    exit 0
    }
    else
    {
    push @Threads, $answer
    }
    }

    print "Waiting the tids: @Threads\n";
    foreach my $tid (@Threads) { waitpid($tid, 0); print "Thread $tid
    finished\n" }
    }
    George Mpouras, Feb 10, 2014
    #2
    1. Advertising

  3. Hongyi Zhao

    Guest

    No
    , Feb 10, 2014
    #3
  4. Hongyi Zhao

    Hongyi Zhao Guest

    On Mon, 10 Feb 2014 14:35:29 +0200, George Mpouras wrote:

    > for testing create the folders
    >
    > mkdir /tmp/{src1,src2,src3,dest1,dest2,dest3}
    >
    > So, here it is


    Thanks, George, based on your hints, I use the following one for my case:

    -----------
    #!/usr/bin/perl -w
    # example of parallel rsync
    use strict;
    use warnings;
    use feature 'say';

    foreach (1..3) {
    mkdir('/tmp/src'.$_, 0777);
    }

    my $parallel_jobs = 2;
    my $rsync_path = '/usr/local/bin/rsync';
    my %rsync_jobs =(

    'test rsync 1' => {
    'local folder' => '/tmp/src1',
    'remote server' => 'ftp.cn.debian.org',
    'remote folder' => '::debian',
    'rsync switches' => '-a -z -v -P --partial --delete' },

    'test rsync 2' => {
    'local folder' => '/tmp/src2',
    'remote server' => 'ftp.cn.debian.org',
    'remote folder' => '::debian',
    'rsync switches' => '-a -z -v -P --partial --delete' },

    'test rsync 3' => {
    'local folder' => '/tmp/src3',
    'remote server' => 'ftp.cn.debian.org',
    'remote folder' => '::debian',
    'rsync switches' => '-a -z -v -P --partial --delete' } );

    # Some quick checks
    die "Could not found executable \"$rsync_path\" for user \"".
    getpwuid($>) ."\"\n" unless -x $rsync_path;
    -d $rsync_jobs{$_}->{'local folder'} || die "Could not found the local
    source readable directory \"$rsync_jobs{$_}->{'local folder'}\" for user
    \"". getpwuid($>) ."\", for the rsync job \"$_\"\n" for keys %rsync_jobs;




    # Create job chunks with members as many as the $parallel_jobs
    my %chunk = (id=>1, data=>[]);
    foreach (sort {$a cmp $b} keys %rsync_jobs)
    {
    if ($parallel_jobs == scalar @{$chunk{data}})
    {
    &Parallel_execution_of_chunk;
    $chunk{data}=[];
    $chunk{id}++
    }

    push @{$chunk{data}}, $_
    }

    &Parallel_execution_of_chunk;




    sub Parallel_execution_of_chunk
    {
    say "Starting jodid $chunk{id} of ". scalar @{$chunk{data}} ." parallel
    jobs";
    my @Threads;
    $|=1;


    for my $chunk (@{$chunk{data}})
    {
    my $answer = fork;
    die "Perl process $$ Could not fork\n" unless defined $answer;

    if ( $answer == 0 )
    {
    print "rsync thread $$ started\n";
    my $command = "$rsync_path $rsync_jobs{$chunk}->{'rsync
    switches'}
    $rsync_jobs{$chunk}->{'local folder'}/ $rsync_jobs{$chunk}->{'remote
    server'}:$rsync_jobs{$chunk}->{'remote folder'}";
    open SHELL, '-|', "$command 2>&1" or die "Could not
    run rsync :
    \"$command\" because $? , $^N\n";
    while (<SHELL>) { print "jodid $chunk{id} , rsync \"$chunk
    \" $_" }
    close SHELL;
    exit 0
    }
    else
    {
    push @Threads, $answer
    }
    }

    print "Waiting the tids: @Threads\n";
    foreach my $tid (@Threads) { waitpid($tid, 0); print "Thread $tid
    finished\n" }
    }
    -----------

    But when I run it with the following command:

    werner@debian:~$ ./prsync.pl

    I meet the following errors:

    ....
    jodid 1 , rsync "test rsync 2" sh: 2: /tmp/src2/: Permission denied
    jodid 1 , rsync "test rsync 1" sh: 2: /tmp/src1/: Permission denied
    Thread 5490
    finished
    Thread 5491
    finished
    Starting jodid 2 of 1 parallel
    jobs
    Waiting the tids: 5498
    rsync thread 5498 started
    Use of uninitialized value in concatenation (.) or string at ./prsync.pl
    line 78.
    .....
    rsync error: syntax or usage error (code 1) at main.c(1622)
    [Receiver=3.1.1pre1]
    jodid 2 , rsync "test rsync 3" sh: 2: /tmp/src3/: Permission denied
    Thread 5498
    finished

    Regards
    --
    ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
    Hongyi Zhao, Feb 11, 2014
    #4
  5. Hongyi Zhao

    Hongyi Zhao Guest

    On Tue, 11 Feb 2014 04:52:17 +0000, Ben Morrow wrote:

    > Quoth Hongyi Zhao <>:
    >> {
    >> print "rsync thread $$ started\n";
    >> my $command = "$rsync_path $rsync_jobs{$chunk}->{'rsync
    >> switches'}
    >> $rsync_jobs{$chunk}->{'local folder'}/ $rsync_jobs{$chunk}->{'remote
    >> server'}:$rsync_jobs{$chunk}->{'remote folder'}";

    >
    > The local and remote arguments are the wrong way around. You are trying
    > to copy your local folder to the remote site, which (fortunately) you
    > are not allowed to do.


    Dear Ben,

    I'm a newbie with perl. The purpose of mine is to do the following thing
    multithreaded with the power of perl:

    rsync [rsync_options] ftp.cn.debian.org::debian /destdir/

    The rsync_server used here can be one or more, the /destdir/ is only one
    directory, i.e., I want to rsyncing some huge data from one or more
    remote rsync servers into my local folder, here /destdir/.

    Another issue of mine is as follows: if have the files_list to be
    rsynced,
    I can use the --files-from option of rsync. In this case, if the list is
    so huge, say, more than 50,000 files to be rsynced, the rsync will use
    much time for this thing. If I can multithreaded this job, then the time
    used should be shortened a lot.

    Regards
    --
    ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
    Hongyi Zhao, Feb 11, 2014
    #5

  6. > ...
    > jodid 1 , rsync "test rsync 2" sh: 2: /tmp/src2/: Permission denied
    > jodid 1 , rsync "test rsync 1" sh: 2: /tmp/src1/: Permission denied




    Honestly Hongyi, I think what you only need is the following command

    nohup rsync ... 1> /tmp/rsync_activity.log 2> /tmp/rsync_errors.log &

    From time to time monitor the activity using

    tail -f /tmp/rsync_activity.log
    tail -f /tmp/rsync_errors.log


    Optionally you go to a friend that have already have download the mirror
    and copy the whole directory from his computer. That will make the rsync
    a lot faster !
    George Mpouras, Feb 11, 2014
    #6
  7. Hongyi Zhao

    Hongyi Zhao Guest

    On Tue, 11 Feb 2014 20:26:21 +0200, George Mpouras wrote:

    > Honestly Hongyi, I think what you only need is the following command
    >
    > nohup rsync ... 1> /tmp/rsync_activity.log 2> /tmp/rsync_errors.log &
    >
    > From time to time monitor the activity using
    >
    > tail -f /tmp/rsync_activity.log tail -f /tmp/rsync_errors.log
    >
    >
    > Optionally you go to a friend that have already have download the mirror
    > and copy the whole directory from his computer. That will make the rsync
    > a lot faster !


    Thanks for your hints, but I've noticed a perl script named apt-mirror,
    see here for detail:

    https://github.com/apt-mirror/apt-mirror

    Which use wget as the downloader and performed the multithread by power
    of perl.

    Whith th nohup method you mentioned above, I'll lose some more control
    than the perl's implementation as things were done within apt-mirror.

    So I just want to transplant the rsync to it.

    Regards
    --
    ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
    Hongyi Zhao, Feb 12, 2014
    #7
  8. Hongyi Zhao

    gamo Guest

    El 10/02/14 16:14, escribió:
    > No
    >


    I agree, because rsync does things like creating list of
    files to transfer that are not paralellizable. Simply
    interrupting rsync transfers and rerunning, could lead
    to indesiderable effects. I would not play with things
    like rsync until its clever authors do it for me.

    --
    http://www.telecable.es/personales/gamo/
    gamo, Feb 15, 2014
    #8
  9. On Sunday, February 9, 2014 8:20:22 PM UTC-8, Hongyi Zhao wrote:
    > Hi all,
    >
    >
    >
    > Could someone here please give me a example to run rsync multithreaded
    >
    > within perl?
    >
    >
    >
    > Regards
    >
    > --
    >
    > .: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.


    your network will be speed limit
    multithreading wont help
    rsync -avPz /folder1 me@box33:/home/folder44 will put folder1 in folder44
    use trailing slash for contents not folder
    use -e ssh if need ssh
    johannes falcone, Feb 20, 2014
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. M_at

    Starting rsync via ASP.Net

    M_at, Mar 27, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    651
  2. Joshua Jung
    Replies:
    2
    Views:
    5,031
    Joshua Jung
    Jun 30, 2006
  3. Rasmusson, Lars
    Replies:
    1
    Views:
    754
    popov
    Apr 30, 2004
  4. David Bear

    rsync protocol in python

    David Bear, Jun 27, 2005, in forum: Python
    Replies:
    1
    Views:
    6,007
    Robert Kern
    Jun 27, 2005
  5. timw.google
    Replies:
    8
    Views:
    682
    timw.google
    Oct 9, 2007
Loading...

Share This Page