ndiff

B

Bryan

i tried using ndiff and Differ.compare today from the difflib module. i
have three questions.

1. both ndiff and Differ.compare return all the lines including lines that
are the same in both files, not just the diffs. is the convention to take
the output and then filter out lines that contain a space as the first
character to just get the diffs? it seems strange to me that the output is
not just the deltas and a lot of wasted filtering (especially if the file is
very large) to get the diff you wanted in the first place. isn't there a
better way?

2. i also tried passing IS_LINE_JUNK and IS_CHARACTER_JUNK, but there was
no difference in the output even though i changed some whitespace in the
file. i then wrote my own junk functions and again, there was no
difference in the output even though i returned 1 to filter out some lines.
can someone show an example of using IS_LINE_JUNK and IS_CHARACTER_JUNK
showing different output than when not using it.

3. is there a simple method that just returns true or false whether two
files are different or not? i was hoping that ndiff/compare would return an
empty list if there was no difference, but that's not the case. i ended up
using a simple: if file1.read() == file2.read(): but there must be a smarter
faster way.

thanks,

bryan
 
I

Ian Bicking

3. is there a simple method that just returns true or false whether two
files are different or not? i was hoping that ndiff/compare would return an
empty list if there was no difference, but that's not the case. i ended up
using a simple: if file1.read() == file2.read(): but there must be a smarter
faster way.

Maybe something like:

def areDifferent(file1, file2):
while 1:
data1, data2 = file1.read(1000), file2.read(1000)
if not data1 and not data2:
return True
if data1 != data2:
return False


You still have to go through the entire file if you really want to be
sure. If you use filenames, of course, you can take some shortcuts:

def filesDiffer(filename1, filename2):
if os.stat(filename1).st_size != os.stat(filename2).st_size:
return False
else:
return areDifferent(open(filename1), open(filename2)

You could also try a quick comparison from somewhere not at the
beginning (using .seek(pos)), if you think it is likely that files will
have common headers. But you'd still have to scan the entire file to be
sure.

Ian
 
R

Raymond Hettinger

1. both ndiff and Differ.compare return all the lines including lines that
are the same in both files, not just the diffs. is the convention to take
the output and then filter out lines that contain a space as the first
character to just get the diffs? it seems strange to me that the output is
not just the deltas and a lot of wasted filtering (especially if the file is
very large) to get the diff you wanted in the first place. isn't there a
better way?

The new difflib.py in Py2.3 has two new functions, context_diff()
and unified_diff(). The new functions and an exposed underlying
method strip-away the commonalities leaving only the changes
and context, if desired.


Raymond Hettinger
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top