Comparison of two files..

clearguy02 · Oct 24, 2008

Hi folks,

I have two files:
a.txt has 100 unique log_id's (one id per line);
all.txt has 5000 entries (each line has six entries seperated by a
tab and the first entry on each line is the login ID and then full
name, country etc).

Now I want to match both files and get the output with all 100 full
entries and ignore the rest.

Here is the code I am working on.. for some reason, I see more 160
entries instead of the exact 100 entries.

++++++++++++++++++++
my %myconfig = (
input1 => 'a.txt',
input2 => 'all.txt',
matching => 'required.txt',
non_matching => 'ignore.txt',
);

my %fields2;
{
open my $input, '<', $myconfig{input1} or die "Cannot open
'$myconfig{input1}': $!";
while ( <$input> )
{
if ( /^(\w+)/ )
{
$fields2{ $1 } = 1;
}
}
close $input or die "Cannot close '$myconfig{input1}': $!";
}
open my $input, '<', $myconfig{input2} or die "Cannot open
'$myconfig{input2}': $!";
open my $matching, '>', $myconfig{matching} or die "Cannot open
'$myconfig{matching}': $!";
open my $non_matching, '>', $myconfig{non_matching} or die "Cannot
open '$myconfig{non_matching}': $!";

while ( <$input> )
{
if ( /^(\w+)/ )
{
if ( exists $fields2{ $1 } )
{
print $matching "$_\n";
}
else
{
print $non_matching "$_\n";
}
}
}

++++++++++++++++++++++++++++++++++++

What I am doing wrong here? Or is there any alternative way of doing
it?

Thanks,
J

Jim Gibson · Oct 24, 2008

Hi folks,

I have two files:
a.txt has 100 unique log_id's (one id per line);
all.txt has 5000 entries (each line has six entries seperated by a
tab and the first entry on each line is the login ID and then full
name, country etc).

Now I want to match both files and get the output with all 100 full
entries and ignore the rest.

Here is the code I am working on.. for some reason, I see more 160
entries instead of the exact 100 entries.

What does "I see more 160 entries ..." mean? Do you mean you see more
than 160 lines output to required.txt when you only expected 100? What
constitutes the excess lines? Are there duplicates in required.txt? Are
there lines in required.txt that do not have corresponding entries in
a.txt?

++++++++++++++++++++
my %myconfig = (
input1 => 'a.txt',
input2 => 'all.txt',
matching => 'required.txt',
non_matching => 'ignore.txt',
);

my %fields2;
{
open my $input, '<', $myconfig{input1} or die "Cannot open
'$myconfig{input1}': $!";
while ( <$input> )
{
if ( /^(\w+)/ )
{
$fields2{ $1 } = 1;
}
}
close $input or die "Cannot close '$myconfig{input1}': $!";
}
open my $input, '<', $myconfig{input2} or die "Cannot open
'$myconfig{input2}': $!";
open my $matching, '>', $myconfig{matching} or die "Cannot open
'$myconfig{matching}': $!";
open my $non_matching, '>', $myconfig{non_matching} or die "Cannot
open '$myconfig{non_matching}': $!";

while ( <$input> )
{
if ( /^(\w+)/ )
{
if ( exists $fields2{ $1 } )
{
print $matching "$_\n";
}
else
{
print $non_matching "$_\n";
}
}
}

++++++++++++++++++++++++++++++++++++

What I am doing wrong here? Or is there any alternative way of doing
it?

There doesn't appear to be anything wrong with your code (nothing
obvious anyway). While there are certainly alternate ways of doing
this, you seem to have stumbled upon a good solution that uses a hash.
Without seeing your exact input and output data, it is difficult to do
any further analysis of your problem.

If you can answer the questions above, it might help. If you can
isolate the problem to a few anomalous test cases, you can post those.

Jürgen Exner · Oct 24, 2008

Now I want to match both files and get the output with all 100 full
entries and ignore the rest.

Here is the code I am working on.. for some reason, I see more 160
entries instead of the exact 100 entries. [...]
What I am doing wrong here? Or is there any alternative way of doing
it?

Your code logic looks alright to me and I can't spot any glaring issues
with it.
Did you consider, that some IDs might appear more than once in the
second file? If you got duplicates that would explain the mismatch.

jue

Eric Pozharski · Oct 24, 2008

*SKIP*

There doesn't appear to be anything wrong with your code (nothing
obvious anyway). While there are certainly alternate ways of doing

Looking at that --

perl -wle '
q|x| =~ m/(x)/; print $1;
q|y| =~ m/(x)/; print $1;'
x
x

I suppose, that OP doesn't show his code.

*CUT*

To compare the content in two files..	4	Nov 17, 2010
Comparing two files	2	Jan 15, 2008
Select files based on text list of filenames(part of the name:date) with condition	0	May 4, 2022
Merge files	1	Aug 7, 2013
Parsing two files and comparing the first fields..	3	Nov 28, 2007
Select Eof extension files based on text list of filenames with if condition	0	May 4, 2022
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
Arrays instead of files into hashes	15	Jan 12, 2009

Comparison of two files..

clearguy02

Jim Gibson

Jürgen Exner

Eric Pozharski

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads