Compare 2 files and put the matching part in a 3rd file

B

BerNaC

Hi all,

I need to compare two text files and put the maching result in another
file. Does anybody have an idea?

file1 file2 comprare and match=
file3
------ -----------
-----------
1 3
3
2 4
4
3 5
5
4 6
5 7


Thank you
 
A

Arndt Jonasson

BerNaC said:
I need to compare two text files and put the maching result in another
file. Does anybody have an idea?

Is anything known about the format of the files and in what ways they
can differ? Doing a general comparison and present the differences as
a minimal set of individual differences is quite complex. In that case
I would choose running the Unix 'diff' program on the files and
post-process the output.

CPAN has only "compare and stop when finding a difference", it seems.
 
B

BerNaC

Arndt Jonasson a formulé ce vendredi :
Is anything known about the format of the files and in what ways they
can differ? Doing a general comparison and present the differences as
a minimal set of individual differences is quite complex. In that case
I would choose running the Unix 'diff' program on the files and
post-process the output.

CPAN has only "compare and stop when finding a difference", it seems.

Well the 2 text files have 1 ID from sendmail log per line, it looks
like that :

1U34334Y34
1ZRTRG345
2SDFSDF17
and so on

So one file is ID from mail the other one is ID to mail so il they
match that mean that one mail with this ID has been sent from this guy
to this guy :).
So as you can see i'm trying to make a script that parse sendmail log
to find all email from someone to somebody.
 
A

Arndt Jonasson

BerNaC said:
Arndt Jonasson a formulé ce vendredi :

Well the 2 text files have 1 ID from sendmail log per line, it looks
like that :

1U34334Y34
1ZRTRG345
2SDFSDF17
and so on

So one file is ID from mail the other one is ID to mail so il they
match that mean that one mail with this ID has been sent from this guy
to this guy :).
So as you can see i'm trying to make a script that parse sendmail log
to find all email from someone to somebody.

That seems to mean that no valuable information is lost if you sort
the files first, which makes the job of comparing them much easier (I'd
say trivial, but maybe that's overstating it). Is that enough for an
idea, or is there some particular aspect of it which you don't know
how to do in Perl?

If the files are not very large, reading in their contents into perl (*)
and sorting there will be OK, otherwise it's better to sort them on disk.

(*) "perl", "Perl", what do I want here? I want a
"case-doesn't-matter-perl"...
 
J

John W. Krahn

BerNaC said:
I need to compare two text files and put the maching result in another
file. Does anybody have an idea?

file1 file2 comprare and match=
file3
------ -----------
-----------
1 3
3
2 4
4
3 5
5
4 6
5 7

$ perl -ne'$a?$x{$_}&&print:$x{$_}++;$a||=eof' file1 file2
3
4
5


John
 
D

David Combs

That seems to mean that no valuable information is lost if you sort
the files first, which makes the job of comparing them much easier (I'd
say trivial, but maybe that's overstating it). Is that enough for an
idea, or is there some particular aspect of it which you don't know
how to do in Perl?

If the files are not very large, reading in their contents into perl (*)
and sorting there will be OK, otherwise it's better to sort them on disk.

If you're allowed to sort them, then do that, and do "comm"
on those two.

(It's *exactly* what comm was designed for.)

David


PS: Question: does the following conjecture make any sense?:

Oh, by the way, make you sort via the same scheme that comm uses,
otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.
 
A

Anno Siegel

[...]
If you're allowed to sort them, then do that, and do "comm"
on those two.

(It's *exactly* what comm was designed for.)

David


PS: Question: does the following conjecture make any sense?:

Oh, by the way, make you sort via the same scheme that comm uses,
otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.

Conjecture?

No, the remark doesn't make sense. All comm requires is that identical
lines be next to each other. Any sort that considers the whole line will
guarantee that.

My comm man page doesn't even specify the sort to be ascending or descending,
though it does (unnecessarily) specify "lexically".

Anno
 
X

xhoster

[...]
If you're allowed to sort them, then do that, and do "comm"
on those two.

(It's *exactly* what comm was designed for.)

David


PS: Question: does the following conjecture make any sense?:

Oh, by the way, make you sort via the same scheme that comm uses,
otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.

Conjecture?

No, the remark doesn't make sense. All comm requires is that identical
lines be next to each other.

The only way you can ensure that identical lines are next to each other by
sorting the separate files is if the files are identical in the first
place. If you already know that, then you are already done.

In the non-trivial case, comm needs a way to re-align the files once it
encounters a non-indentical lines. In order to do that, the sort order
of the files needs to be done the same way that comm expects.
My comm man page doesn't even specify the sort to be ascending or
descending, though it does (unnecessarily) specify "lexically".

Apparently man wasn't good enough, now if you want to know how a
commandline tool works you have read the "info" page too.


from info comm:<<EOF
Before `comm' can be used, the input files must be sorted using the
collating sequence specified by the `LC_COLLATE' locale. If an input
file ends in a non-newline character, a newline is silently appended.
The `sort' command with no options always outputs a file that is
suitable input to `comm'.
EOF

Xho
 
A

Anno Siegel

[...]
If you're allowed to sort them, then do that, and do "comm"
on those two.

(It's *exactly* what comm was designed for.)

David


PS: Question: does the following conjecture make any sense?:

Oh, by the way, make you sort via the same scheme that comm uses,
otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.

Conjecture?

No, the remark doesn't make sense. All comm requires is that identical
lines be next to each other.

The only way you can ensure that identical lines are next to each other by
sorting the separate files is if the files are identical in the first
place. If you already know that, then you are already done.

You are so very right. Both files must be sorted according to the same
sort specification, or comm can foul up. Sorry.

Anno
 
C

colin_lyse

Hi all,

I need to compare two text files and put the maching result in another
file. Does anybody have an idea?

file1 file2 comprare and match=
file3
------ -----------
-----------
1 3
3
2 4
4
3 5
5
4 6
5 7


Thank you


{
my %t;
$t{$_} .= "1" for @file1;
$t{$_} .= "2" for @file2;
@matching = grep $t{$_} eq "12", keys %t;
}


no need to sort files first. VERY fast.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top