Compare 2 files and put the matching part in a 3rd file

BerNaC · Jan 21, 2005

Hi all,

I need to compare two text files and put the maching result in another
file. Does anybody have an idea?

file1 file2 comprare and match=
file3
------ -----------
-----------
1 3
3
2 4
4
3 5
5
4 6
5 7

Thank you

Arndt Jonasson · Jan 21, 2005

BerNaC said:
I need to compare two text files and put the maching result in another
file. Does anybody have an idea?

Is anything known about the format of the files and in what ways they
can differ? Doing a general comparison and present the differences as
a minimal set of individual differences is quite complex. In that case
I would choose running the Unix 'diff' program on the files and
post-process the output.

CPAN has only "compare and stop when finding a difference", it seems.

BerNaC · Jan 21, 2005

Arndt Jonasson a formulé ce vendredi :

Is anything known about the format of the files and in what ways they
can differ? Doing a general comparison and present the differences as
a minimal set of individual differences is quite complex. In that case
I would choose running the Unix 'diff' program on the files and
post-process the output.

CPAN has only "compare and stop when finding a difference", it seems.

Well the 2 text files have 1 ID from sendmail log per line, it looks
like that :

1U34334Y34
1ZRTRG345
2SDFSDF17
and so on

So one file is ID from mail the other one is ID to mail so il they
match that mean that one mail with this ID has been sent from this guy
to this guy

.
So as you can see i'm trying to make a script that parse sendmail log
to find all email from someone to somebody.

Arndt Jonasson · Jan 21, 2005

BerNaC said:
Arndt Jonasson a formulé ce vendredi :

Well the 2 text files have 1 ID from sendmail log per line, it looks
like that :

1U34334Y34
1ZRTRG345
2SDFSDF17
and so on

So one file is ID from mail the other one is ID to mail so il they
match that mean that one mail with this ID has been sent from this guy
to this guy .
So as you can see i'm trying to make a script that parse sendmail log
to find all email from someone to somebody.

That seems to mean that no valuable information is lost if you sort
the files first, which makes the job of comparing them much easier (I'd
say trivial, but maybe that's overstating it). Is that enough for an
idea, or is there some particular aspect of it which you don't know
how to do in Perl?

If the files are not very large, reading in their contents into perl (*)
and sorting there will be OK, otherwise it's better to sort them on disk.

(*) "perl", "Perl", what do I want here? I want a
"case-doesn't-matter-perl"...

John W. Krahn · Jan 21, 2005

BerNaC said:
I need to compare two text files and put the maching result in another
file. Does anybody have an idea?

file1 file2 comprare and match=
file3
------ -----------
-----------
1 3
3
2 4
4
3 5
5
4 6
5 7

$ perl -ne'$a?$x{$_}&&print:$x{$_}++;$a||=eof' file1 file2
3
4
5

John

David Combs · Jan 25, 2005

That seems to mean that no valuable information is lost if you sort
the files first, which makes the job of comparing them much easier (I'd
say trivial, but maybe that's overstating it). Is that enough for an
idea, or is there some particular aspect of it which you don't know
how to do in Perl?

If the files are not very large, reading in their contents into perl (*)
and sorting there will be OK, otherwise it's better to sort them on disk.

If you're allowed to sort them, then do that, and do "comm"
on those two.

(It's *exactly* what comm was designed for.)

David

PS: Question: does the following conjecture make any sense?:

Oh, by the way, make you sort via the same scheme that comm uses,
otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.

Anno Siegel · Jan 25, 2005

[...]

If you're allowed to sort them, then do that, and do "comm"
on those two.

(It's *exactly* what comm was designed for.)

David

PS: Question: does the following conjecture make any sense?:

Oh, by the way, make you sort via the same scheme that comm uses,
otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.

Conjecture?

No, the remark doesn't make sense. All comm requires is that identical
lines be next to each other. Any sort that considers the whole line will
guarantee that.

My comm man page doesn't even specify the sort to be ascending or descending,
though it does (unnecessarily) specify "lexically".

Anno

xhoster · Jan 25, 2005

[...]

If you're allowed to sort them, then do that, and do "comm"
on those two.

(It's *exactly* what comm was designed for.)

David

PS: Question: does the following conjecture make any sense?:

Oh, by the way, make you sort via the same scheme that comm uses,
otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.

Click to expand...

Conjecture?

No, the remark doesn't make sense. All comm requires is that identical
lines be next to each other.

The only way you can ensure that identical lines are next to each other by
sorting the separate files is if the files are identical in the first
place. If you already know that, then you are already done.

In the non-trivial case, comm needs a way to re-align the files once it
encounters a non-indentical lines. In order to do that, the sort order
of the files needs to be done the same way that comm expects.

My comm man page doesn't even specify the sort to be ascending or
descending, though it does (unnecessarily) specify "lexically".

Apparently man wasn't good enough, now if you want to know how a
commandline tool works you have read the "info" page too.

from info comm:<<EOF
Before `comm' can be used, the input files must be sorted using the
collating sequence specified by the `LC_COLLATE' locale. If an input
file ends in a non-newline character, a newline is silently appended.
The `sort' command with no options always outputs a file that is
suitable input to `comm'.
EOF

Xho

Anno Siegel · Jan 25, 2005

[...]

If you're allowed to sort them, then do that, and do "comm"
on those two.

(It's *exactly* what comm was designed for.)

David

PS: Question: does the following conjecture make any sense?:

Oh, by the way, make you sort via the same scheme that comm uses,
otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.

Click to expand...

Conjecture?

No, the remark doesn't make sense. All comm requires is that identical
lines be next to each other.

Click to expand...

The only way you can ensure that identical lines are next to each other by
sorting the separate files is if the files are identical in the first
place. If you already know that, then you are already done.

You are so very right. Both files must be sorted according to the same
sort specification, or comm can foul up. Sorry.

Anno

colin_lyse · Feb 16, 2005

Hi all,

I need to compare two text files and put the maching result in another
file. Does anybody have an idea?

file1 file2 comprare and match=
file3
------ -----------
-----------
1 3
3
2 4
4
3 5
5
4 6
5 7

Thank you

{
my %t;
$t{$_} .= "1" for @file1;
$t{$_} .= "2" for @file2;
@matching = grep $t{$_} eq "12", keys %t;
}

no need to sort files first. VERY fast.

Select files based on text list of filenames(part of the name:date) with condition	0	May 4, 2022
To compare the content in two files..	4	Nov 17, 2010
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
trim the last blank-line and compare files	6	Mar 2, 2010
Engineering a list container. Part 1.	71	Dec 7, 2013
C exercise	1	Feb 3, 2022
Engineering a List container Part 2: Implementations	20	Dec 8, 2013
A number everyday of the month "and" a different number depending on the day of the month´s day time	2	Mar 16, 2021

Compare 2 files and put the matching part in a 3rd file

BerNaC

Arndt Jonasson

BerNaC

Arndt Jonasson

John W. Krahn

David Combs

Anno Siegel

xhoster

Anno Siegel

colin_lyse

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads