difflib.ndiff broken?

H

Humpdydum

Can anyone try the following in their python interpreter?

These give correct output:
print list(ndiff(['saving2 <<A'],['saving <<a>>']))
['- saving2 said:
print list(ndiff(['saving2 <<AA'],['saving <<a>>']))
['- saving2 said:
print list(ndiff(['saving2 <<A'],['saving <<aa>>']))
['- saving2 said:
print list(ndiff(['saving <<A'],['saving <<aa>>']))
['- saving <<A', '? ^\n', '+ saving <<aa>>', '? ^^^^\n']

Now try the very slight variations:
print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))
['- saving2 said:
print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))
['- saving2 <<AA', '+ saving <<aa>>']

This can't be right... or is it? Where are the '? ...' lines? It does this
for both Python 2.3.2 on Windows 2000 and Python 2.3.3 on SGI. If it's
correct, how come???

Oliver
 
T

Tim Peters

[Humpdydum]
Can anyone try the following in their python interpreter?

These give correct output:
print list(ndiff(['saving2 <<A'],['saving <<a>>']))
['- saving2 said:
print list(ndiff(['saving2 <<AA'],['saving <<a>>']))
['- saving2 said:
print list(ndiff(['saving2 <<A'],['saving <<aa>>']))
['- saving2 said:
print list(ndiff(['saving <<A'],['saving <<aa>>']))
['- saving <<A', '? ^\n', '+ saving <<aa>>', '? ^^^^\n']

Now try the very slight variations:
print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))
['- saving2 said:
print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))
['- saving2 <<AA', '+ saving <<aa>>']

This can't be right... or is it? Where are the '? ...' lines? It does this
for both Python 2.3.2 on Windows 2000 and Python 2.3.3 on SGI. If it's
correct, how come???

ndiff produces intraline difference marking if and only if it thinks
the inputs are "reasonably close". The cutoff between "reasonably
close" and "not reasonably close" is necessarily heuristic. '?' lines
are more irritating than helpful when they have a lot of markup in
them, so it certainly wan't intended that '?' lines *always* be
produced. The '+' and '-' lines contain all the information about how
to change one sequence into another; the '?' lines are fluff (abeit
sometimes helpful fluff -- that's why they're (sometimes) there).

Concretely, ndiff produces intraline marking iff two lines have a
similarity ratio of at least 0.75. In your first examples, the lines
do:
0.782608695652

In your last examples, the lines don't:

Internally, 0.75 is the default value of FancyReplacer's optional
minimal_cutoff argument.
 
H

Humpdydum

OK, forget it, sorry it was my mistake: it wasn't obvious from the difflib
docs, but it appears that ndiff points out the sub-line differences (lines
that start with ?) only if it was able to figure out operations that could
be applied to substrings on the line. Though often such operations are
obvious by looking at the strings being compared, ndiff doesn't always find
them, and so marks the whole line as + or -.

Anyone know of web site that explains ndiff output? I coulnd't figure out a
good set of search terms in google, didn't get anything useful. Thanks,

Oliver

Humpdydum said:
Can anyone try the following in their python interpreter?

These give correct output:
print list(ndiff(['saving2 <<A'],['saving <<a>>']))
['- saving2 said:
print list(ndiff(['saving2 <<AA'],['saving <<a>>']))
['- saving2 <<AA', '? - ^^\n', '+ saving <<a>>', '? ^^^\n']
print list(ndiff(['saving2 <<A'],['saving <<aa>>']))
['- saving2 <<A', '? - ^\n', '+ saving <<aa>>', '? ^^^^\n']
print list(ndiff(['saving <<A'],['saving <<aa>>']))
['- saving <<A', '? ^\n', '+ saving <<aa>>', '? ^^^^\n']

Now try the very slight variations:
print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))
['- saving2 said:
print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))
['- saving2 <<AA', '+ saving <<aa>>']

This can't be right... or is it? Where are the '? ...' lines?
 
T

Tim Peters

[Humpdydum]
OK, forget it, sorry it was my mistake:

I didn't see a mistake, just a question.
it wasn't obvious from the difflib docs, but it appears that ndiff points out the
sub-line differences (lines that start with ?) only if it was able to figure out
operations that could be applied to substrings on the line. Though often such
operations are obvious by looking at the strings being compared,

They can be for a program but often aren't for people. That's why
ndiff produces '?' lines when it thinks they might help. This is a
heuristic -- a guess. Sometimes it's not the same guess you'd make.
There's always a sequence of operations that can be applied to change
any line into any other line, but *usually* they're uninteresting.
'?' lines attempt to point out "minor edits".
ndiff doesn't always find them, and so marks the whole line as + or -.

It marks two input lines that differ with - and + regardless of
whether it produces two ? lines too.
Anyone know of web site that explains ndiff output? I coulnd't figure out a
good set of search terms in google, didn't get anything useful. Thanks,

ndiff is unique to Python, and you have the source code for it.
Because '?' lines are fluff, precise docs for them would be
counterproductive. They're meant to guide the eye to minor intraline
differences, and that's all.

If a ? line appears, there are always two of them, interleaved between
a -+ pair, in this pattern:

-
?
+
?

Each ? line implicitly refers to the line immediately above it. Four
meaningful characters appear in ? lines. A caret (^) means the
character immediately above it was replaced, in going from the - to
the + line. "-" means the character immediately above it was deleted;
'+' means it was inserted; and a blank means the character immediately
above it is the same in both (- and +) lines. A '-' can appear only
in the ? line following a - line, and a '+' can appear only in the ?
line following a + line, because we're picturing the edits needed to
change the - line into the + line.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top