G
George Adams
Hi, all. I'm trying to make a simple backup program for myself that
will check to see if certain files have been modifed before backing them
up. (It's got to be portable to multiple OSes that have Perl, but
possibly not other handy tools like, say, rsync or some such).
Anyway, I had originally planned to use the Windows archive bit, or else
file modification dates to determine if a file had been changed. That
turned out to be unreliable, though (and, in the case of the archive
bit, impossible on Linux). So I decided instead to create a checksum of
the original file, then compare future versions of the file with the
stored checksum to see if it's changed (and hence needs to be backed up).
This works... except it's really, really slow. I started with SHA-1,
but it was taking just too long. I switched to MD5 and then CRC32, but
even that was fairly slow. And when the backup directory contains
several gigs of files to check, it was just too much.
So, given the problem of "how can I tell if a file has been changed", am
I tackling it the wrong way? Should I be using some other method
besides simply hashing an entire file (using whatever algorithm) and
comparing it to a stored value? Obviously backup software makers have
solved this problem to make incremental backups pretty fast - what's the
trick to it?
Thanks to anyone who can help.
will check to see if certain files have been modifed before backing them
up. (It's got to be portable to multiple OSes that have Perl, but
possibly not other handy tools like, say, rsync or some such).
Anyway, I had originally planned to use the Windows archive bit, or else
file modification dates to determine if a file had been changed. That
turned out to be unreliable, though (and, in the case of the archive
bit, impossible on Linux). So I decided instead to create a checksum of
the original file, then compare future versions of the file with the
stored checksum to see if it's changed (and hence needs to be backed up).
This works... except it's really, really slow. I started with SHA-1,
but it was taking just too long. I switched to MD5 and then CRC32, but
even that was fairly slow. And when the backup directory contains
several gigs of files to check, it was just too much.
So, given the problem of "how can I tell if a file has been changed", am
I tackling it the wrong way? Should I be using some other method
besides simply hashing an entire file (using whatever algorithm) and
comparing it to a stored value? Obviously backup software makers have
solved this problem to make incremental backups pretty fast - what's the
trick to it?
Thanks to anyone who can help.