Using difflib to compare text ignoring whitespace differences

N

Neilen Marais

Hi

I'm trying to compare some text to find differences other than whitespace.
I seem to be misunderstanding something, since I can't even get a basic
example to work:

In [104]: d = difflib.Differ(charjunk=difflib.IS_CHARACTER_JUNK)

In [105]: list(d.compare([' a'], ['a']))
Out[105]: ['- a', '+ a']

Surely if whitespace characters are being ignored those two strings should
be marked as identical? What am I doing wrong?

Thanks
Neilen
 
G

Gabriel Genellina

Hi

I'm trying to compare some text to find differences other than whitespace.
I seem to be misunderstanding something, since I can't even get a basic
example to work:

In [104]: d =difflib.Differ(charjunk=difflib.IS_CHARACTER_JUNK)

In [105]: list(d.compare([' a'], ['a']))
Out[105]: ['- a', '+ a']

Surely if whitespace characters are being ignored those two strings should
be marked as identical? What am I doing wrong?

The docs for Differ are a bit terse and misleading.
compare() does a two-level matching: first, on a *line* level,
considering only the linejunk parameter. And then, for each pair of
similar lines found on the first stage, it does a intraline match
considering only the charjunk parameter.
Also note that junk!=ignored, the algorithm tries to "find the longest
contiguous matching subsequence that contains no ``junk'' elements"

Using a slightly longer text gets closer to what you want, I think:

d=difflib.Differ(charjunk=difflib.IS_CHARACTER_JUNK)
for delta in d.compare([' a larger line'],['a longer line']): print
delta

- a larger line
? --- ^^

+ a longer line
? ^^
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top