program to copy files - problems - unix ksh to java

Discussion in 'Java' started by kaeli, Feb 9, 2004.

  1. kaeli

    kaeli Guest

    Hi all,

    I've got a shell script (ksh) that does some file copying and deleting.
    I'm running into some problems with it that I'm wondering if I can
    solve. Since I plan on re-writing it with java (1.4), I figure I might
    as well do it right this time.

    Here's the drill:
    Cron runs this code every 5 minutes.
    Program looks on one machine, uses ssh to copy a file to another
    machine, changes the filename, the owner and permissions (chmod), and
    then deletes the file from the source machine.
    Sounds simple enough...

    Problems:
    Large files (we're talking gigabytes) take more than 5 minutes to copy.
    Something is causing the file to be deleted before it has finished
    copying. We lose the whole file, as it doesn't show up on either
    machine.
    Program gets called again while an instance is running, so it tries to
    copy files that are currently being copied.

    I was going to solve this with the usual .running type fix, but we
    really need the program to actually run every 5 minutes (more than one
    instance will be needed). If an instance is already copying the file,
    the file should just be ignored. The file should not be deleted if the
    copy hasn't finished.

    Does anyone know of any system stuff I should be looking at for java?
    Specifically, Unix Solaris interface so I can tell if a file is in use?
    Also, how can I make sure that the copy was finished before deleting the
    source? I expected the script to wait for the copy to finish before
    deleting, but it appears that it is not doing that. Should I use threads
    for this?

    Thanks for any ideas, input, etc...

    --
    --
    ~kaeli~
    Bakers trade bread recipes on a knead-to-know basis.
    http://www.ipwebdesign.net/wildAtHeart
    http://www.ipwebdesign.net/kaelisSpace
     
    kaeli, Feb 9, 2004
    #1
    1. Advertising

  2. kaeli

    hiwa Guest

    kaeli <> wrote in message news:<>...
    > Hi all,
    >
    > I've got a shell script (ksh) that does some file copying and deleting.
    > I'm running into some problems with it that I'm wondering if I can
    > solve. Since I plan on re-writing it with java (1.4), I figure I might
    > as well do it right this time.
    >
    > Here's the drill:
    > Cron runs this code every 5 minutes.
    > Program looks on one machine, uses ssh to copy a file to another
    > machine, changes the filename, the owner and permissions (chmod), and
    > then deletes the file from the source machine.
    > Sounds simple enough...
    >
    > Problems:
    > Large files (we're talking gigabytes) take more than 5 minutes to copy.
    > Something is causing the file to be deleted before it has finished
    > copying. We lose the whole file, as it doesn't show up on either
    > machine.
    > Program gets called again while an instance is running, so it tries to
    > copy files that are currently being copied.
    >
    > I was going to solve this with the usual .running type fix, but we
    > really need the program to actually run every 5 minutes (more than one
    > instance will be needed). If an instance is already copying the file,
    > the file should just be ignored. The file should not be deleted if the
    > copy hasn't finished.
    >
    > Does anyone know of any system stuff I should be looking at for java?
    > Specifically, Unix Solaris interface so I can tell if a file is in use?
    > Also, how can I make sure that the copy was finished before deleting the
    > source? I expected the script to wait for the copy to finish before
    > deleting, but it appears that it is not doing that. Should I use threads
    > for this?
    >
    > Thanks for any ideas, input, etc...
    >
    > --

    Are you syncing or flushing with proper synchronization?
     
    hiwa, Feb 10, 2004
    #2
    1. Advertising

  3. kaeli <> wrote in message news:<>...
    > Cron runs this code every 5 minutes.
    > Program looks on one machine, uses ssh to copy a file to another
    > machine, changes the filename, the owner and permissions (chmod), and
    > then deletes the file from the source machine.
    > Sounds simple enough...
    >
    > Problems:
    > Large files (we're talking gigabytes) take more than 5 minutes to copy.
    > Something is causing the file to be deleted before it has finished
    > copying. We lose the whole file, as it doesn't show up on either
    > machine.


    Hard to believe on a *NIX machine as the system allows to delete
    files held open by other processes. It may be a possible problem
    if the file is on NFS. In any case it
    sounds like the first process, when finished, deletes the file,
    while the 2nd copy process, started while the first was still
    running, then gets in trouble and messes things up.

    > Program gets called again while an instance is running, so it tries to
    > copy files that are currently being copied.


    A solution might be to rename the file locally *before*
    copying. This way the process starting 5 minutes later will
    not pick up the same file again. If this is not an option,
    create an empty file with another extension than the
    big file as a mark that the file is being worked on.

    > Also, how can I make sure that the copy was finished before deleting the
    > source? I expected the script to wait for the copy to finish before
    > deleting, but it appears that it is not doing that.


    Assuming that you do *not* start the copy process in
    the background (&), the script does wait. You have to
    look for a different reason why the file is deleted
    too early, maybe as I suggested above.

    There is no need to solve this task in Java.

    Harald.
     
    Harald Kirsch, Feb 10, 2004
    #3
  4. kaeli

    kaeli Guest

    kaeli, Feb 10, 2004
    #4
  5. kaeli

    kaeli Guest

    In article <>,
    enlightened us with...
    >
    > Hard to believe on a *NIX machine as the system allows to delete
    > files held open by other processes. It may be a possible problem
    > if the file is on NFS. In any case it
    > sounds like the first process, when finished, deletes the file,
    > while the 2nd copy process, started while the first was still
    > running, then gets in trouble and messes things up.
    >


    That's probably it.

    > > Program gets called again while an instance is running, so it tries to
    > > copy files that are currently being copied.

    >
    > A solution might be to rename the file locally *before*
    > copying. This way the process starting 5 minutes later will
    > not pick up the same file again. If this is not an option,
    > create an empty file with another extension than the
    > big file as a mark that the file is being worked on.
    >


    That won't help.
    The code copies any files in a directory on one machine to a directory
    on another. So it will still grab the file. The code would have to move
    the file, which is already the problem.

    >
    > There is no need to solve this task in Java.


    I need threading (I think) because right now, the solution is to not run
    two instances of the code at the same time. That is not a good solution.
    We need code that runs almost continuously, looking in directories and
    copying and deleting the files.
    (it's a DMZ, in case that helps you see why this needs to be done - it
    takes files people uploaded and moves them to a machine inside our
    firewall)

    So, as far as I see, I need C or Java, and I've not coded C in over a
    year. :)

    We want a process that runs pretty much all the time. I'm thinking a
    program that looks in directories over and over. When it finds a file,
    it starts a thread that copies it then deletes it. As part of the
    thread, it can put the name of the file in a vector. Any new threads
    check that vector before bothering a file...

    I dunno, am I way off on that?

    --
    --
    ~kaeli~
    Never say, "Oops!"; always say, "Ah, interesting!"
    http://www.ipwebdesign.net/wildAtHeart
    http://www.ipwebdesign.net/kaelisSpace
     
    kaeli, Feb 10, 2004
    #5
  6. kaeli wrote:
    > I've got a shell script (ksh) that does some file copying and deleting.
    > I'm running into some problems with it that I'm wondering if I can
    > solve. Since I plan on re-writing it with java (1.4), I figure I might
    > as well do it right this time.


    There are several ways to fix this (to a "good enough" level), without
    using Java. One example:

    > Here's the drill:
    > Cron runs this code every 5 minutes.
    > Program looks on one machine,


    Check for the particular file. If found, rename the file to something
    temporary - all in one operation. E.g. (Bourne-Shell syntax):

    if [ mv "$file" "$file.$$" ] ; then
    # Found file, renamed it.
    # can start copying
    # A second invocation will not find
    # this file any more, and leave it alone.
    > uses ssh to copy a file to another

    fi

    > machine, changes the filename, the owner and permissions (chmod), and
    > then deletes the file from the source machine.


    Delete the renamed file instead.

    > Sounds simple enough...


    It is. You might want to add a sanity check which e.g. runs once a day
    and checks if there are old renamed files lying around and either
    collect them, or delete them.

    Other solutions include setting empty files as markers to indicate if a
    file is already copied. But this can result in a race condition if you
    start the script multiple times at the same time:

    if [ ! -f "$file.mark" ] ; then
    # race condition can happen here

    # place a mark
    touch "$file.mark"
    # now copy

    # after copy, delete
    rm "$file" "$file.mark"
    fi

    Instead of setting the marker on the remote machine, you could also set
    the marker on the local machine, but you would have to add the remote
    host name in order to distinguish the markers.

    Another solution would be to separate the script into two scripts. One
    doing the copying, and another one checking if there is already a
    copying script running for a particular remote machine. Have fun with ps
    or pgrep.

    > Problems:
    > Large files (we're talking gigabytes) take more than 5 minutes to copy.
    > Something is causing the file to be deleted before it has finished
    > copying.


    There is something else wrong. Try to find this "something" first. Most
    likely it is the application writing the file, or there is something
    wrong in your script. Unix is robust when it comes to the deletion of
    files which are currently in use. A deletion during a copy should not
    affect the copy.

    > I was going to solve this with the usual .running type fix, but we
    > really need the program to actually run every 5 minutes (more than one
    > instance will be needed). If an instance is already copying the file,
    > the file should just be ignored. The file should not be deleted if the
    > copy hasn't finished.


    You do check the exit code of the copy command, don't you?

    > Does anyone know of any system stuff I should be looking at for java?


    There is absolutely no need for Java. In fact, you will find that you
    gain nothing by using Java, but that you will e.g. get problems in
    setting the file owner and mode. You would have to invoke the Unix
    commands from Java via exec(), or the system calls via JNI.

    > Specifically, Unix Solaris interface so I can tell if a file is in use?


    Java has no public platform interface, not even on Sun. You would have
    to use exec() or JNI.

    > Also, how can I make sure that the copy was finished before deleting the
    > source?


    Check the return code of the copy command.

    May I suggest a good book for learning Unix scripting and a lot of other
    Unix command-line tricks? "Unix Power Tools" by Peek, O'Reilly, and
    Loukides.

    > I expected the script to wait for the copy to finish before
    > deleting, but it appears that it is not doing that. Should I use threads
    > for this?


    You already have concurrency problems, and you want to use threads to
    move your concurrency problems to another level? I would not do this.

    /Thomas
     
    Thomas Weidenfeller, Feb 11, 2004
    #6
  7. kaeli <>:
    > In article <>,
    > enlightened us with...
    >
    > > > Program gets called again while an instance is running, so it tries to
    > > > copy files that are currently being copied.

    > >
    > > A solution might be to rename the file locally *before*
    > > copying. This way the process starting 5 minutes later will
    > > not pick up the same file again. If this is not an option,
    > > create an empty file with another extension than the
    > > big file as a mark that the file is being worked on.
    > >

    >
    > That won't help.
    > The code copies any files in a directory on one machine to a directory
    > on another. So it will still grab the file. The code would have to move
    > the file, which is already the problem.


    I am still not convinced that renaming would not work. Isn't there
    a directory on the source machine which does not have to be copied.
    You 'mv' (rename) the files to be copied to this directory and
    then copy them to their destination from there in the background.

    > We want a process that runs pretty much all the time. I'm thinking a
    > program that looks in directories over and over. When it finds a file,
    > it starts a thread that copies it then deletes it. As part of the
    > thread, it can put the name of the file in a vector.


    Don't forget to delete the file name from the vector, once it is
    done. And a Set would actually be more appropriate than a Vector.
    And if you go for Java 1.5, you'll find BlockingQueue which is
    what you really want.

    Harald.
     
    Harald Kirsch, Feb 12, 2004
    #7
  8. kaeli

    kaeli Guest

    In article <>,
    enlightened us with...
    >
    > I am still not convinced that renaming would not work. Isn't there
    > a directory on the source machine which does not have to be copied.
    > You 'mv' (rename) the files to be copied to this directory and
    > then copy them to their destination from there in the background.
    >


    Same issue. What if in the middle of the move to the other directory,
    the cron calls the code again. It still sees the file in DIR_A, even
    though it's currently being copied to DIR_B. It starts to move it, but
    in the middle of that move, the first invocation finishes it's move,
    deleting the file from DIR_A. The first invocation may then copy to the
    other machine, I suppose, but what happens when the second invocation
    tries to move a file that no longer exists? Or even worse, overwrites
    the destination on the new machine with an empty or half-empty file?

    Currently, this problem is being handled with lockfiles. We don't like
    that way if we can find another.

    --
    --
    ~kaeli~
    The secret of the universe is @*&^^^ NO CARRIER
    http://www.ipwebdesign.net/wildAtHeart
    http://www.ipwebdesign.net/kaelisSpace
     
    kaeli, Feb 12, 2004
    #8
  9. kaeli <> wrote in message news:<>...
    > In article <>,
    > enlightened us with...
    > >
    > > I am still not convinced that renaming would not work. Isn't there
    > > a directory on the source machine which does not have to be copied.
    > > You 'mv' (rename) the files to be copied to this directory and
    > > then copy them to their destination from there in the background.
    > >

    >
    > Same issue. What if in the middle of the move to the other directory,
    > the cron calls the code again. It still sees the file in DIR_A, even


    If the two directories are on the same file system, moving always
    takes the same time, independent of file size. It may take a few
    milliseconds and I cannot imagine a scenario where it takes
    5 minutes, except if the whole machine (OS/hardware) is in
    deep trouble anyway.

    Harald.
     
    Harald Kirsch, Feb 13, 2004
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    12
    Views:
    2,460
    Magnus Lycka
    Jun 8, 2005
  2. ClassRubyExceptionHandline

    ruby and ksh

    ClassRubyExceptionHandline, Sep 8, 2006, in forum: Ruby
    Replies:
    4
    Views:
    137
    David Vallner
    Sep 9, 2006
  3. Replies:
    4
    Views:
    494
    Tad McClellan
    Dec 10, 2004
  4. Andy Haupt
    Replies:
    1
    Views:
    128
    Chris Mattern
    Mar 24, 2005
  5. Jose Luis
    Replies:
    3
    Views:
    175
    Jose Luis
    Aug 13, 2009
Loading...

Share This Page