Compare 2 files and put the matching part in a 3rd file

Discussion in 'Perl Misc' started by BerNaC, Jan 21, 2005.

  1. BerNaC

    BerNaC Guest

    Hi all,

    I need to compare two text files and put the maching result in another
    file. Does anybody have an idea?

    file1 file2 comprare and match=
    file3
    ------ -----------
    -----------
    1 3
    3
    2 4
    4
    3 5
    5
    4 6
    5 7


    Thank you

    --
    -----------------------
    BerNaC
    ___________
     
    BerNaC, Jan 21, 2005
    #1
    1. Advertising

  2. BerNaC <> writes:
    >
    > I need to compare two text files and put the maching result in another
    > file. Does anybody have an idea?


    Is anything known about the format of the files and in what ways they
    can differ? Doing a general comparison and present the differences as
    a minimal set of individual differences is quite complex. In that case
    I would choose running the Unix 'diff' program on the files and
    post-process the output.

    CPAN has only "compare and stop when finding a difference", it seems.
     
    Arndt Jonasson, Jan 21, 2005
    #2
    1. Advertising

  3. BerNaC

    BerNaC Guest

    Arndt Jonasson a formulé ce vendredi :
    > BerNaC <> writes:
    >>
    >> I need to compare two text files and put the maching result in another
    >> file. Does anybody have an idea?

    >
    > Is anything known about the format of the files and in what ways they
    > can differ? Doing a general comparison and present the differences as
    > a minimal set of individual differences is quite complex. In that case
    > I would choose running the Unix 'diff' program on the files and
    > post-process the output.
    >
    > CPAN has only "compare and stop when finding a difference", it seems.


    Well the 2 text files have 1 ID from sendmail log per line, it looks
    like that :

    1U34334Y34
    1ZRTRG345
    2SDFSDF17
    and so on

    So one file is ID from mail the other one is ID to mail so il they
    match that mean that one mail with this ID has been sent from this guy
    to this guy :).
    So as you can see i'm trying to make a script that parse sendmail log
    to find all email from someone to somebody.

    --
    -----------------------
    BerNaC
    ___________
     
    BerNaC, Jan 21, 2005
    #3
  4. BerNaC <> writes:
    > Arndt Jonasson a formulé ce vendredi :
    > > BerNaC <> writes:
    > >> I need to compare two text files and put the maching result in
    > >> another
    > >> file. Does anybody have an idea?

    > >
    > > Is anything known about the format of the files and in what ways they
    > > can differ? Doing a general comparison and present the differences as
    > > a minimal set of individual differences is quite complex. In that case
    > > I would choose running the Unix 'diff' program on the files and
    > > post-process the output.
    > >
    > > CPAN has only "compare and stop when finding a difference", it seems.

    >
    > Well the 2 text files have 1 ID from sendmail log per line, it looks
    > like that :
    >
    > 1U34334Y34
    > 1ZRTRG345
    > 2SDFSDF17
    > and so on
    >
    > So one file is ID from mail the other one is ID to mail so il they
    > match that mean that one mail with this ID has been sent from this guy
    > to this guy :).
    > So as you can see i'm trying to make a script that parse sendmail log
    > to find all email from someone to somebody.


    That seems to mean that no valuable information is lost if you sort
    the files first, which makes the job of comparing them much easier (I'd
    say trivial, but maybe that's overstating it). Is that enough for an
    idea, or is there some particular aspect of it which you don't know
    how to do in Perl?

    If the files are not very large, reading in their contents into perl (*)
    and sorting there will be OK, otherwise it's better to sort them on disk.

    (*) "perl", "Perl", what do I want here? I want a
    "case-doesn't-matter-perl"...
     
    Arndt Jonasson, Jan 21, 2005
    #4
  5. BerNaC wrote:
    >
    > I need to compare two text files and put the maching result in another
    > file. Does anybody have an idea?
    >
    > file1 file2 comprare and match=
    > file3
    > ------ -----------
    > -----------
    > 1 3
    > 3
    > 2 4
    > 4
    > 3 5
    > 5
    > 4 6
    > 5 7


    $ perl -ne'$a?$x{$_}&&print:$x{$_}++;$a||=eof' file1 file2
    3
    4
    5


    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Jan 21, 2005
    #5
  6. BerNaC

    David Combs Guest

    In article <>,
    Arndt Jonasson <> wrote:
    >

    <SNIP>
    >
    >That seems to mean that no valuable information is lost if you sort
    >the files first, which makes the job of comparing them much easier (I'd
    >say trivial, but maybe that's overstating it). Is that enough for an
    >idea, or is there some particular aspect of it which you don't know
    >how to do in Perl?
    >
    >If the files are not very large, reading in their contents into perl (*)
    >and sorting there will be OK, otherwise it's better to sort them on disk.


    If you're allowed to sort them, then do that, and do "comm"
    on those two.

    (It's *exactly* what comm was designed for.)

    David


    PS: Question: does the following conjecture make any sense?:

    Oh, by the way, make you sort via the same scheme that comm uses,
    otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.
     
    David Combs, Jan 25, 2005
    #6
  7. BerNaC

    Anno Siegel Guest

    David Combs <> wrote in comp.lang.perl.misc:

    [...]

    > If you're allowed to sort them, then do that, and do "comm"
    > on those two.
    >
    > (It's *exactly* what comm was designed for.)
    >
    > David
    >
    >
    > PS: Question: does the following conjecture make any sense?:
    >
    > Oh, by the way, make you sort via the same scheme that comm uses,
    > otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.


    Conjecture?

    No, the remark doesn't make sense. All comm requires is that identical
    lines be next to each other. Any sort that considers the whole line will
    guarantee that.

    My comm man page doesn't even specify the sort to be ascending or descending,
    though it does (unnecessarily) specify "lexically".

    Anno
     
    Anno Siegel, Jan 25, 2005
    #7
  8. BerNaC

    Guest

    -berlin.de (Anno Siegel) wrote:
    > David Combs <> wrote in comp.lang.perl.misc:
    >
    > [...]
    >
    > > If you're allowed to sort them, then do that, and do "comm"
    > > on those two.
    > >
    > > (It's *exactly* what comm was designed for.)
    > >
    > > David
    > >
    > >
    > > PS: Question: does the following conjecture make any sense?:
    > >
    > > Oh, by the way, make you sort via the same scheme that comm uses,
    > > otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.

    >
    > Conjecture?
    >
    > No, the remark doesn't make sense. All comm requires is that identical
    > lines be next to each other.


    The only way you can ensure that identical lines are next to each other by
    sorting the separate files is if the files are identical in the first
    place. If you already know that, then you are already done.

    In the non-trivial case, comm needs a way to re-align the files once it
    encounters a non-indentical lines. In order to do that, the sort order
    of the files needs to be done the same way that comm expects.

    > My comm man page doesn't even specify the sort to be ascending or
    > descending, though it does (unnecessarily) specify "lexically".


    Apparently man wasn't good enough, now if you want to know how a
    commandline tool works you have read the "info" page too.


    from info comm:<<EOF
    Before `comm' can be used, the input files must be sorted using the
    collating sequence specified by the `LC_COLLATE' locale. If an input
    file ends in a non-newline character, a newline is silently appended.
    The `sort' command with no options always outputs a file that is
    suitable input to `comm'.
    EOF

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Jan 25, 2005
    #8
  9. BerNaC

    Anno Siegel Guest

    <> wrote in comp.lang.perl.misc:
    > -berlin.de (Anno Siegel) wrote:
    > > David Combs <> wrote in comp.lang.perl.misc:
    > >
    > > [...]
    > >
    > > > If you're allowed to sort them, then do that, and do "comm"
    > > > on those two.
    > > >
    > > > (It's *exactly* what comm was designed for.)
    > > >
    > > > David
    > > >
    > > >
    > > > PS: Question: does the following conjecture make any sense?:
    > > >
    > > > Oh, by the way, make you sort via the same scheme that comm uses,
    > > > otherwise comm won't think it's sorted. Ie, beware of -u, -r, etc.

    > >
    > > Conjecture?
    > >
    > > No, the remark doesn't make sense. All comm requires is that identical
    > > lines be next to each other.

    >
    > The only way you can ensure that identical lines are next to each other by
    > sorting the separate files is if the files are identical in the first
    > place. If you already know that, then you are already done.


    You are so very right. Both files must be sorted according to the same
    sort specification, or comm can foul up. Sorry.

    Anno
     
    Anno Siegel, Jan 25, 2005
    #9
  10. BerNaC

    colin_lyse Guest

    In article <>, BerNaC <> wrote:
    >Hi all,
    >
    >I need to compare two text files and put the maching result in another
    >file. Does anybody have an idea?
    >
    >file1 file2 comprare and match=
    > file3
    >------ -----------
    > -----------
    >1 3
    > 3
    >2 4
    > 4
    >3 5
    > 5
    >4 6
    >5 7
    >
    >
    >Thank you
    >



    {
    my %t;
    $t{$_} .= "1" for @file1;
    $t{$_} .= "2" for @file2;
    @matching = grep $t{$_} eq "12", keys %t;
    }


    no need to sort files first. VERY fast.
     
    colin_lyse, Feb 16, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rob Nicholson

    3rd part list controls (to replace SELECT)

    Rob Nicholson, Jun 16, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    504
    Steven Cheng[MSFT]
    Jun 17, 2005
  2. Ian Gil
    Replies:
    3
    Views:
    385
    Klaus-G. Meyer
    Oct 31, 2003
  3. Harvey Thomas

    RE: Regex matching 3rd word in a line?

    Harvey Thomas, Oct 31, 2003, in forum: Python
    Replies:
    0
    Views:
    506
    Harvey Thomas
    Oct 31, 2003
  4. Harvey Thomas

    RE: Regex matching 3rd word in a line?

    Harvey Thomas, Oct 31, 2003, in forum: Python
    Replies:
    0
    Views:
    982
    Harvey Thomas
    Oct 31, 2003
  5. Bobby Chamness
    Replies:
    2
    Views:
    268
    Xicheng Jia
    May 3, 2007
Loading...

Share This Page