Parsing two files and comparing the first fields..

Discussion in 'Perl Misc' started by clearguy02@yahoo.com, Nov 28, 2007.

  1. Guest

    I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
    file has 4 fields and the second one has two fields, but both files
    have the "user_id" as the first field.

    Example:

    c:\test1.txt
    =================
    jcarter john mstella
    mstella mary bborders
    msmith martin mstella
    bborders bob rcasey
    swatson sush mstella
    rcasey rick rcasey


    c:\test2.txt
    ======================
    aaboss active
    jcarter active
    msmith non-active
    ssullivan non-active
    rcasey non-active
    usmiths active

    ===============================================

    Now I want to check if each id from the second file exists in the
    first one or not. I want the output of both matching and non-matching
    id's.

    Below is the script I am using and can you kindly let me know where I
    am doing wrong here?

    ================================

    use strict;
    use warnings;

    open (IN1, "c:\test1.txt") || die "Can not open the file: $!";
    open (IN2, "c:\test2.txt") || die "Can not open the file: $!";
    open (OUT1, ">$dir1\\matching.txt") || die "Can not write to the
    file: $!";
    open (OUT2, ">$dir1\\not_matching.txt") || die "Can not write to the
    file: $!";

    @array1 = <IN1>;
    @array2 = <IN2>;

    foreach $record1 (@array1)
    {
    chomp $record1;
    @fields1= split /\t/, $record1;
    $fist_id = $fields1[0];
    }

    foreach $record2 (@array2)
    {
    chomp $record2;
    @fields2= split /\t/, $record2;
    $second_id = $fields2[0];

    foreach (@fields1)
    {
    if ($second_id eq $fist_id)
    {
    print OUT1 "$record2\n" ; # matching
    }
    else
    {
    print OUT1 "$record2\n" ; # matching
    }
    }
    close (IN1);
    close (IN2);
    close (OUT1);
    close (OUT2);
    +++++++++++++++++++++++++++++++++++++


    Thanks in advance,
    JC
     
    , Nov 28, 2007
    #1
    1. Advertising

  2. Guest

    On Nov 28, 3:12 pm, wrote:
    > I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
    > file has 4 fields and the second one has two fields, but both files
    > have the "user_id" as the first field.
    >
    > Example:
    >
    > c:\test1.txt
    > =================
    > jcarter john mstella
    > mstella mary bborders
    > msmith martin mstella
    > bborders bob rcasey
    > swatson sush mstella
    > rcasey rick rcasey
    >
    > c:\test2.txt
    > ======================
    > aaboss active
    > jcarter active
    > msmith non-active
    > ssullivan non-active
    > rcasey non-active
    > usmiths active
    >
    > ===============================================
    >
    > Now I want to check if each id from the second file exists in the
    > first one or not. I want the output of both matching and non-matching
    > id's.
    >
    > Below is the script I am using and can you kindly let me know where I
    > am doing wrong here?
    >
    > ================================
    >
    > use strict;
    > use warnings;
    >
    > open (IN1, "c:\test1.txt") || die "Can not open the file: $!";
    > open (IN2, "c:\test2.txt") || die "Can not open the file: $!";
    > open (OUT1, ">$dir1\\matching.txt") || die "Can not write to the
    > file: $!";
    > open (OUT2, ">$dir1\\not_matching.txt") || die "Can not write to the
    > file: $!";
    >
    > @array1 = <IN1>;
    > @array2 = <IN2>;
    >
    > foreach $record1 (@array1)
    > {
    > chomp $record1;
    > @fields1= split /\t/, $record1;
    > $fist_id = $fields1[0];
    > }
    >
    > foreach $record2 (@array2)
    > {
    > chomp $record2;
    > @fields2= split /\t/, $record2;
    > $second_id = $fields2[0];
    >
    > foreach (@fields1)
    > {
    > if ($second_id eq $fist_id)
    > {
    > print OUT1 "$record2\n" ; # matching
    > }
    > else
    > {
    > print OUT1 "$record2\n" ; # matching
    > }
    > }
    > close (IN1);
    > close (IN2);
    > close (OUT1);
    > close (OUT2);
    > +++++++++++++++++++++++++++++++++++++
    >
    > Thanks in advance,
    > JC


    Forgot to add "my" before the variables while typing.. sorry about
    that.

    --JC
     
    , Nov 28, 2007
    #2
    1. Advertising

  3. wrote in news:b20d8640-91c1-41d7-a46a-ab04bf405239
    @d21g2000prf.googlegroups.com:

    >
    > Now I want to check if each id from the second file exists in the
    > first one or not. I want the output of both matching and non-matching
    > id's.


    Read

    perldoc -q intersection

    Parse the files into a hashes using the id field values as keys.

    > use strict;
    > use warnings;
    >
    > open (IN1, "c:\test1.txt") || die "Can not open the file: $!";


    This will probably not succeed as it will look for a file named
    {TAB}est1.txt in c:\.

    > open (IN2, "c:\test2.txt") || die "Can not open the file: $!";
    > open (OUT1, ">$dir1\\matching.txt") || die "Can not write to the
    > file: $!";
    > open (OUT2, ">$dir1\\not_matching.txt") || die "Can not write to the
    > file: $!";


    I generally prefer to use lexical filehandles and the three argument
    form of open. Also, you can just use / as the directory separator in
    Windows. For increased portability, I prefer to use File::Spec::catfile.

    > @array1 = <IN1>;
    > @array2 = <IN2>;


    No need to slurp anything.

    > foreach $record1 (@array1)
    > {
    > chomp $record1;
    > @fields1= split /\t/, $record1;
    > $fist_id = $fields1[0];


    my $first_id = (split /\t/, $record)[0];

    > }
    >
    > foreach $record2 (@array2)
    > {
    > chomp $record2;
    > @fields2= split /\t/, $record2;
    > $second_id = $fields2[0];



    This nested loop approach will have extremely bad performance
    characteristics as the number of input lines increases. Use hashes.

    > foreach (@fields1)
    > {
    > if ($second_id eq $fist_id)
    > {
    > print OUT1 "$record2\n" ; # matching
    > }
    > else
    > {
    > print OUT1 "$record2\n" ; # matching
    > }
    > }


    So if $second_id eq $first_id, your write it to OUT1, otherwise, you
    also write it to OUT1. What's the point???

    The script below represents my best guess as to what you are trying to
    achieve.

    #!/usr/bin/perl

    use strict;
    use warnings;

    my %myconfig = (
    input1 => 'input1.txt',
    input2 => 'input2.txt',
    matching => 'matching.txt',
    non_matching => 'non_matching.txt',
    );

    my %fields1;

    {
    open my $input, '<', $myconfig{input1}
    or die "Cannot open '$myconfig{input1}': $!";

    while ( <$input> ) {
    if ( /^(\w+)/ ) {
    $fields1{ $1 } = 1;
    }
    }

    close $input
    or die "Cannot close '$myconfig{input1}': $!";
    }

    open my $input, '<', $myconfig{input2}
    or die "Cannot open '$myconfig{input2}': $!";

    open my $matching, '>', $myconfig{matching}
    or die "Cannot open '$myconfig{matching}': $!";

    open my $non_matching, '>', $myconfig{non_matching}
    or die "Cannot open '$myconfig{non_matching}': $!";

    while ( <$input> ) {
    if ( /^(\w+)/ ) {
    if ( exists $fields1{ $1 } ) {
    print $matching "$1\n";
    }
    else {
    print $non_matching "$1\n";
    }
    }
    }

    __END__

    C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat input1.txt
    jcarter john mstella
    mstella mary bborders
    msmith martin mstella
    bborders bob rcasey
    swatson sush mstella
    rcasey rick rcasey


    C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat input2.txt
    aaboss active
    jcarter active
    msmith non-active
    ssullivan non-active
    rcasey non-active
    usmiths active


    C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat matching.txt
    jcarter
    msmith
    rcasey

    C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat non_matching.txt
    aaboss
    ssullivan
    usmiths



    --
    A. Sinan Unur <>
    (remove .invalid and reverse each component for email address)
    clpmisc guidelines: <URL:http://www.augustmail.com/~tadmc/clpmisc.shtml>
     
    A. Sinan Unur, Nov 28, 2007
    #3
  4. wrote:
    >
    > I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
    > file has 4 fields and the second one has two fields, but both files
    > have the "user_id" as the first field.
    >
    > Example:
    >
    > c:\test1.txt
    > =================
    > jcarter john mstella
    > mstella mary bborders
    > msmith martin mstella
    > bborders bob rcasey
    > swatson sush mstella
    > rcasey rick rcasey
    >
    > c:\test2.txt
    > ======================
    > aaboss active
    > jcarter active
    > msmith non-active
    > ssullivan non-active
    > rcasey non-active
    > usmiths active
    >
    > ===============================================
    >
    > Now I want to check if each id from the second file exists in the
    > first one or not. I want the output of both matching and non-matching
    > id's.



    Something like this should work:


    #!/usr/bin/perl
    use warnings;
    use strict;

    open my $fh2, '<', 'c:/test2.txt' or die "Cannot open 'c:/test2.txt'
    $!";

    my %ids;
    while ( <$fh2> ) {
    $ids{ ( split /\t/ )[ 0 ] }++;
    }

    close $fh2;

    open my $fh1, '<', 'c:/test1.txt' or die "Cannot open 'c:/test1.txt'
    $!";
    open my $match, '>', "$dir1/matching.txt" or die "Cannot open
    '$dir1/matching.txt' $!";
    open my $nonm, '>', "$dir1/not_matching.txt" or die "Cannot open
    '$dir1/not_matching.txt' $!";

    while ( <$fh1> ) {
    my $id = ( split /\t/ )[ 0 ];
    if ( exists $ids{ $id } ) {
    print $match $_;
    }
    else {
    print $nonm $_;
    }
    }

    close $nonm;
    close $match;
    close $fh1;

    __END__



    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Nov 28, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. GenxLogic
    Replies:
    3
    Views:
    1,289
    andrewmcdonagh
    Dec 6, 2006
  2. M.i.r.a.g.e.

    Comparing two 12hr time fields

    M.i.r.a.g.e., Aug 30, 2004, in forum: Javascript
    Replies:
    4
    Views:
    121
    M.i.r.a.g.e.
    Sep 2, 2004
  3. Replies:
    8
    Views:
    185
  4. AMT2K5
    Replies:
    1
    Views:
    199
    Eric Schwartz
    Nov 8, 2005
  5. The guy upstairs

    comparing two email fields

    The guy upstairs, Nov 19, 2006, in forum: Javascript
    Replies:
    2
    Views:
    111
    mick white
    Nov 19, 2006
Loading...

Share This Page