Comparison of two files..

Discussion in 'Perl Misc' started by clearguy02@yahoo.com, Oct 24, 2008.

  1. Guest

    Hi folks,

    I have two files:
    a.txt has 100 unique log_id's (one id per line);
    all.txt has 5000 entries (each line has six entries seperated by a
    tab and the first entry on each line is the login ID and then full
    name, country etc).

    Now I want to match both files and get the output with all 100 full
    entries and ignore the rest.

    Here is the code I am working on.. for some reason, I see more 160
    entries instead of the exact 100 entries.

    ++++++++++++++++++++
    my %myconfig = (
    input1 => 'a.txt',
    input2 => 'all.txt',
    matching => 'required.txt',
    non_matching => 'ignore.txt',
    );

    my %fields2;
    {
    open my $input, '<', $myconfig{input1} or die "Cannot open
    '$myconfig{input1}': $!";
    while ( <$input> )
    {
    if ( /^(\w+)/ )
    {
    $fields2{ $1 } = 1;
    }
    }
    close $input or die "Cannot close '$myconfig{input1}': $!";
    }
    open my $input, '<', $myconfig{input2} or die "Cannot open
    '$myconfig{input2}': $!";
    open my $matching, '>', $myconfig{matching} or die "Cannot open
    '$myconfig{matching}': $!";
    open my $non_matching, '>', $myconfig{non_matching} or die "Cannot
    open '$myconfig{non_matching}': $!";

    while ( <$input> )
    {
    if ( /^(\w+)/ )
    {
    if ( exists $fields2{ $1 } )
    {
    print $matching "$_\n";
    }
    else
    {
    print $non_matching "$_\n";
    }
    }
    }

    ++++++++++++++++++++++++++++++++++++

    What I am doing wrong here? Or is there any alternative way of doing
    it?

    Thanks,
    J
    , Oct 24, 2008
    #1
    1. Advertising

  2. Jim Gibson Guest

    In article
    <>,
    <> wrote:

    > Hi folks,
    >
    > I have two files:
    > a.txt has 100 unique log_id's (one id per line);
    > all.txt has 5000 entries (each line has six entries seperated by a
    > tab and the first entry on each line is the login ID and then full
    > name, country etc).
    >
    > Now I want to match both files and get the output with all 100 full
    > entries and ignore the rest.
    >
    > Here is the code I am working on.. for some reason, I see more 160
    > entries instead of the exact 100 entries.


    What does "I see more 160 entries ..." mean? Do you mean you see more
    than 160 lines output to required.txt when you only expected 100? What
    constitutes the excess lines? Are there duplicates in required.txt? Are
    there lines in required.txt that do not have corresponding entries in
    a.txt?

    >
    > ++++++++++++++++++++
    > my %myconfig = (
    > input1 => 'a.txt',
    > input2 => 'all.txt',
    > matching => 'required.txt',
    > non_matching => 'ignore.txt',
    > );
    >
    > my %fields2;
    > {
    > open my $input, '<', $myconfig{input1} or die "Cannot open
    > '$myconfig{input1}': $!";
    > while ( <$input> )
    > {
    > if ( /^(\w+)/ )
    > {
    > $fields2{ $1 } = 1;
    > }
    > }
    > close $input or die "Cannot close '$myconfig{input1}': $!";
    > }
    > open my $input, '<', $myconfig{input2} or die "Cannot open
    > '$myconfig{input2}': $!";
    > open my $matching, '>', $myconfig{matching} or die "Cannot open
    > '$myconfig{matching}': $!";
    > open my $non_matching, '>', $myconfig{non_matching} or die "Cannot
    > open '$myconfig{non_matching}': $!";
    >
    > while ( <$input> )
    > {
    > if ( /^(\w+)/ )
    > {
    > if ( exists $fields2{ $1 } )
    > {
    > print $matching "$_\n";
    > }
    > else
    > {
    > print $non_matching "$_\n";
    > }
    > }
    > }
    >
    > ++++++++++++++++++++++++++++++++++++
    >
    > What I am doing wrong here? Or is there any alternative way of doing
    > it?


    There doesn't appear to be anything wrong with your code (nothing
    obvious anyway). While there are certainly alternate ways of doing
    this, you seem to have stumbled upon a good solution that uses a hash.
    Without seeing your exact input and output data, it is difficult to do
    any further analysis of your problem.

    If you can answer the questions above, it might help. If you can
    isolate the problem to a few anomalous test cases, you can post those.

    --
    Jim Gibson
    Jim Gibson, Oct 24, 2008
    #2
    1. Advertising

  3. wrote:
    >Now I want to match both files and get the output with all 100 full
    >entries and ignore the rest.
    >
    >Here is the code I am working on.. for some reason, I see more 160
    >entries instead of the exact 100 entries.

    [...]
    >What I am doing wrong here? Or is there any alternative way of doing
    >it?


    Your code logic looks alright to me and I can't spot any glaring issues
    with it.
    Did you consider, that some IDs might appear more than once in the
    second file? If you got duplicates that would explain the mismatch.

    jue
    Jürgen Exner, Oct 24, 2008
    #3
  4. On 2008-10-24, Jim Gibson <> wrote:
    > In article
    ><>,
    ><> wrote:
    >

    *SKIP*
    >> while ( <$input> )
    >> {
    >> if ( /^(\w+)/ )
    >> {
    >> if ( exists $fields2{ $1 } )
    >> {
    >> print $matching "$_\n";
    >> }
    >> else
    >> {
    >> print $non_matching "$_\n";
    >> }
    >> }
    >> }
    >>
    >> ++++++++++++++++++++++++++++++++++++
    >>
    >> What I am doing wrong here? Or is there any alternative way of doing
    >> it?

    >
    > There doesn't appear to be anything wrong with your code (nothing
    > obvious anyway). While there are certainly alternate ways of doing


    Looking at that --

    perl -wle '
    q|x| =~ m/(x)/; print $1;
    q|y| =~ m/(x)/; print $1;'
    x
    x

    I suppose, that OP doesn't show his code.

    *CUT*

    --
    Torvalds' goal for Linux is very simple: World Domination
    Eric Pozharski, Oct 24, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Albretch
    Replies:
    9
    Views:
    5,306
    Michael Borgwardt
    Sep 10, 2004
  2. Chris
    Replies:
    12
    Views:
    803
    Patricia Shanahan
    Aug 24, 2006
  3. GenxLogic
    Replies:
    3
    Views:
    1,266
    andrewmcdonagh
    Dec 6, 2006
  4. Jani Tiainen

    Slow comparison between two lists

    Jani Tiainen, Oct 23, 2008, in forum: Python
    Replies:
    6
    Views:
    287
    Steven D'Aprano
    Oct 23, 2008
  5. Deepu
    Replies:
    1
    Views:
    236
    ccc31807
    Feb 7, 2011
Loading...

Share This Page