C
ccc31807
A normal task: sorting a large data file by some criterion, breaking
it into sub-files, and sending each sub-file to a particular client
based on the criterion.
During the next several weeks, I've been tasked with taking three data
files, comparing the keys of each file, and if the keys are identical,
processing the file but if not, printing out a list of differences,
which in effect means printing out the different keys. The keys are
all seven digit integers. (Each file is to be generated by a different
query of the same database.)
Okay, I could use diff for this, but I'd like to do it
programmatically. Using brute force, I could generate three files with
just the keys and compare them line by line, but I'd like not to do
this for several reason but mostly because the data files are pretty
much guaranteed to be identical and we don't expect there to be any
differences.
I'm thinking about hashing the keys in the three files and comparing
the key digests, with the assumption that identical hashes means
identical files.
Ideas?
Thanks, CC.
it into sub-files, and sending each sub-file to a particular client
based on the criterion.
During the next several weeks, I've been tasked with taking three data
files, comparing the keys of each file, and if the keys are identical,
processing the file but if not, printing out a list of differences,
which in effect means printing out the different keys. The keys are
all seven digit integers. (Each file is to be generated by a different
query of the same database.)
Okay, I could use diff for this, but I'd like to do it
programmatically. Using brute force, I could generate three files with
just the keys and compare them line by line, but I'd like not to do
this for several reason but mostly because the data files are pretty
much guaranteed to be identical and we don't expect there to be any
differences.
I'm thinking about hashing the keys in the three files and comparing
the key digests, with the assumption that identical hashes means
identical files.
Ideas?
Thanks, CC.