parse unix-style difference reporting

L

Liang

Hi all,

I want to diff two files or two versions of one file, and parse the output
to find a summary of how many lines of replacement/addition/deletion in the
two files.

Known from diff/cleardiff, the output has a style like:
15a16, 15,17d3, 18c19,21 etc.

Anyone know how to parse these output to generate a summary?

Thanks in advance,
Liang
 
B

Barry Margolin

Liang said:
Hi all,

I want to diff two files or two versions of one file, and parse the output
to find a summary of how many lines of replacement/addition/deletion in the
two files.

Known from diff/cleardiff, the output has a style like:
15a16, 15,17d3, 18c19,21 etc.

Anyone know how to parse these output to generate a summary?

You can use "diff -c" and count the number of "<", ">", and "!" lines.
Or use the "comm" command and count the number of lines.
 
J

Jonathan Leffler

Liang said:
I want to diff two files or two versions of one file, and parse the output
to find a summary of how many lines of replacement/addition/deletion in the
two files.

Known from diff/cleardiff, the output has a style like:
15a16, 15,17d3, 18c19,21 etc.

Anyone know how to parse these output to generate a summary?

It isn't very hard to work it out, is it?

Each item conceptually has four numbers and an operation code:

N1,N2 op N3,N4

When there is just one number on one side of the operation, the values
N1 and N2, or N3 and N4, are the same.

Inserts are easy: there's always a single number on the LHS, and the
number of lines inserted is N4-N3+1.

Similarly, deletes are easy: there's always a single number on the RHS
of the operator, and the number of lines deleted is N2-N1+1.

Number of lines replaced has two parts to the value - the number of
lines removed and the number replacing the removed lines. Depending
on your viewpoint, you can either choose to count the two values
separately (number removed NR = N2-N1+1, number inserted NI =
N4-N3+1), or you can be cleverer about the calculation and decide that
when NR > NI, then you have NI changed lines and NR-NI deleted lines,
and that when NR < NI, you have NR changed lines and NI-NR inserted
lines. When NR = NI, you have NR (or NI) changed lines, of course.

That took me five minutes to think and type - how long would it have
taken you to do it? (And cross-posted too?)
 
L

Liang

You can use "diff -c" and count the number of "<", ">", and "!" lines.
Or use the "comm" command and count the number of lines.
marvellous! this is the simplest solution.

Happy new year!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top