data file

friend.05 · Oct 9, 2008

I have a large file in following format:

ID | Time | IP | Code

I want only data lines which has unique IP+Code.

If IP+Code is repeated then I don't want line.

Ben Morrow · Oct 9, 2008

Quoth "[email protected] said:
I have a large file in following format:

ID | Time | IP | Code

I want only data lines which has unique IP+Code.

If IP+Code is repeated then I don't want line.

perldoc -q unique

Ben

friend.05 · Oct 10, 2008

perldoc -q unique

Ben

Below is code which I have written to extract unique IP+Code from
large file. (File format is ID | Time | IP | code).

I am not sure which will be best way to do this.

#!/usr/local/bin/perl

print "Welcome\n";

$pri_file = "out_pri.txt";

$cnt = 0;
$flag = 0;

open(INFO_PRI,$pri_file)or die $!;
open(INFO,$pri_file)or die $!;

@pri_lines_ = <INFO>;

while($pri_line = <INFO_PRI>)
{
@primary = split('\|',$pri_line);
$pri_cli_ip = $primary[4];
$pri_id = $primary[7];
print "$pri_id\n";

foreach $p_line (@pri_lines_)
{
@pri = split('\|',$p_line);
$cli_ip = $pri[4];
$id = $pri[7];

if(($pri_cli_ip == $cli_ip) && ($pri__id == $id))
{
$cnt++;
if($cnt == 2){
$cnt = 0;
$flag = 1;
last;
}
}
}
if($flag == 0){
open(FILE,'>>pri_unique.txt');
print FILE "$pri_line\n";
close(FILE);
}else{
$flag = 0;
}
}

close(INFO_PRI);
close(INFO);

Jürgen Exner · Oct 10, 2008

[email protected] said:
Below is code which I have written to extract unique IP+Code from
large file. (File format is ID | Time | IP | code).

I am not sure which will be best way to do this.

#!/usr/local/bin/perl
$pri_file = "out_pri.txt";

$cnt = 0;
$flag = 0;

open(INFO_PRI,$pri_file)or die $!;
open(INFO,$pri_file)or die $!;

@pri_lines_ = <INFO>;

while($pri_line = <INFO_PRI>)

[rest of code snipped]

Many things I don't understand in this code, among them why you are
using 2 file handles to the same file, why you are slurping in the whole
file on one file handle and then process the file line by line on the
other file handle, why you have a nested loop, etc, etc.

Your requirements seem to be straight forward and easy to translate into
a simple algorithm (warning, sketch only, not tested):

my %idtable;
open ($F, '<', $myfile) of die "Cannot read $myfile because $!\n";
while (<$F>) { #loop through file and gather all IP | Code combinations
(undef, undef, $ip, $code) = split '\|';
$idtable{"$ip|$code"}++; #record this ip-code combination
}
seek $F, 0; #reset file to start
while (<$F>) { #loop through file again and ....
(undef, undef, $ip, $code) = split '\|';
print if $idtable{"$ip|$code"} == 1;
#... print that line if the ip-code combination
#exists exactly once in the file
close $F;

jue

Tad J McClellan · Oct 10, 2008

$flag = 0;

You should choose meaningful variable names.

Ben Morrow · Oct 10, 2008

[don't quote .signatures]

Quoth "[email protected] said:
Below is code which I have written to extract unique IP+Code from
large file. (File format is ID | Time | IP | code).

I am not sure which will be best way to do this.

#!/usr/local/bin/perl

Where is

use warnings;
use strict;

? You have already been told to include this.

print "Welcome\n";

$pri_file = "out_pri.txt";

$cnt = 0;
$flag = 0;

open(INFO_PRI,$pri_file)or die $!;
open(INFO,$pri_file)or die $!;

You have already been told to use lexical filehandles and 3-arg open.
You should make the error message actually useful:

open (my $INFO_PRI, "<", $pri_file)
or die "can't open '$pri_file': $!";

Why are you opening the same file twice? Just iterate over @pri_lines_
instead.

@pri_lines_ = <INFO>;

Why on earth are you using a variable name ending in _?

while($pri_line = <INFO_PRI>)
{
@primary = split('\|',$pri_line);
$pri_cli_ip = $primary[4];
$pri_id = $primary[7];
print "$pri_id\n";

foreach $p_line (@pri_lines_)
{
@pri = split('\|',$p_line);

You keep doing the same split over and over. Split the line first, and
keep the results in a datastructure till you need them.

$cli_ip = $pri[4];
$id = $pri[7];

if(($pri_cli_ip == $cli_ip) && ($pri__id == $id))

Did you read perldoc -q unique? It says to use a hash for finding
uniqueness.

{
$cnt++;

You are not resetting $cnt between iterations of the outer loop, so
every other line will be considered duplicate.

if($cnt == 2){
$cnt = 0;
$flag = 1;
last;

If you give the outer loop a label, you can use next LABEL and avoid
$flag.

}
}
}
if($flag == 0){
open(FILE,'>>pri_unique.txt');
print FILE "$pri_line\n";
close(FILE);

Why do you keep opening and closing this file?

Ben

J. Gleixner · Oct 10, 2008

Below is code which I have written to extract unique IP+Code from
large file. (File format is ID | Time | IP | code).

I am not sure which will be best way to do this.

Well, it's not the way you posted.

Did you actually read the perldoc Ben mentioned above? You don't use a
hash at all, so I'm guessing not.

#!/usr/local/bin/perl

use strict;

open( my $INFO, '<', $pri_file ) or die "Can't open $pri_file: $!";
open( my $OUT, '>', 'unique.out' ) or die "Can't open unique.out: $!";

my %info;
while ( my $line = <$INFO> )
{
chomp( $line );
# split the data.. you can split directly into the variables..
# my ( $v1, $v2 ) = ( split( /\|/, $line ) )[1,2];
# print $line to $OUT if the hash key of $cli_ip and $id doesn't already
exist.

}

Jürgen Exner · Oct 10, 2008

J. Gleixner said:
Below is code which I have written to extract unique IP+Code from
large file. (File format is ID | Time | IP | code).

I am not sure which will be best way to do this.

Click to expand...

Well, it's not the way you posted.

Did you actually read the perldoc Ben mentioned above? You don't use a
hash at all, so I'm guessing not.
ACK!

while ( my $line = <$INFO> )
{
chomp( $line );
# split the data.. you can split directly into the variables..
# my ( $v1, $v2 ) = ( split( /\|/, $line ) )[1,2];
# print $line to $OUT if the hash key of $cli_ip and $id doesn't already
exist.

That will print each IP+code exactly once. I think (but I may be
mistaken, the OPs isn't clear on that) he wants only those lines, that
_are_ unique wrt. the IP+code, i.e. where there is no second line with
the same IP+code.

jue

friend.05 · Oct 10, 2008

Well, it's not the way you posted.

Click to expand...

Did you actually read the perldoc Ben mentioned above? You don't use a
hash at all, so I'm guessing not.
ACK!

while ( my $line = <$INFO> )
{
chomp( $line );
# split the data.. you can split directly into the variables..
# my ( $v1, $v2 ) = ( split( /\|/, $line ) )[1,2];
# print $line to $OUT if the hash key of $cli_ip and $id doesn't already
exist.

Click to expand...

That will print each IP+code exactly once. I think (but I may be
mistaken, the OPs isn't clear on that) he wants only those lines, that
_are_ unique wrt. the IP+code, i.e. where there is no second line with
the same IP+code.

jue- Hide quoted text -

- Show quoted text -

Thanks to all for help. That was helpful.

But.

I created the hash (IP+Code) combination.

But How to chk if this hash(each combination) is exactly one time in
file ?

Jürgen Exner · Oct 10, 2008

[email protected] said:
I created the hash (IP+Code) combination.

But How to chk if this hash(each combination) is exactly one time in
file ?

You could count the number of occurences and then compare the count
against 1?

$IDTable{"$IP+$Code"}++;
[......]

if ($IDTable{"$IP+$Code"} == 1) {
print "Look ma, $IP+$Code occurs exactly once in the file\n";

J. Gleixner · Oct 10, 2008

Jürgen Exner said:
J. Gleixner said:

Quoth "(e-mail address removed)" <[email protected]>:

I have a large file in following format:
ID | Time | IP | Code
I want only data lines which has unique IP+Code.
If IP+Code is repeated then I don't want line.
perldoc -q unique

Ben
Below is code which I have written to extract unique IP+Code from
large file. (File format is ID | Time | IP | code).

I am not sure which will be best way to do this.

Click to expand...

Well, it's not the way you posted.

Did you actually read the perldoc Ben mentioned above? You don't use a
hash at all, so I'm guessing not.
ACK!

while ( my $line = <$INFO> )
{
chomp( $line );
# split the data.. you can split directly into the variables..
# my ( $v1, $v2 ) = ( split( /\|/, $line ) )[1,2];
# print $line to $OUT if the hash key of $cli_ip and $id doesn't already
exist.

Click to expand...

That will print each IP+code exactly once. I think (but I may be
mistaken, the OPs isn't clear on that) he wants only those lines, that
_are_ unique wrt. the IP+code, i.e. where there is no second line with
the same IP+code.

You're right, I mis-understood.

A fairly easy to follow solution would be to keep track of the data,
using two hashes.

my (%times, %line );

while(...)
{
# chomp,split,...
# times is the number of times the $cli_ip and $id were found
$times{ $cli_ip . $id }++;
# could 'next' if it is > 1
# and store the line itself, for the $cli_ip and $id
$line{ $cli_ip . $id } = $line;
}

Then, after the while, for each of the keys in %times, print the
value from %line where the value of $times{ $key } is 1, to the output file.

That should be enough to get the OP in the right direction, without
writing the whole darn thing for them.

friend.05 · Oct 10, 2008

Below is code which I have written to extract unique IP+Code from
large file. (File format is ID | Time | IP | code).

Click to expand...

I am not sure which will be best way to do this.

Click to expand...

#!/usr/local/bin/perl
$pri_file = "out_pri.txt";

Click to expand...

$cnt = 0;
$flag = 0;

Click to expand...

open(INFO_PRI,$pri_file)or die $!;
open(INFO,$pri_file)or die $!;

Click to expand...

@pri_lines_ = <INFO>;

Click to expand...

while($pri_line = <INFO_PRI>)

Click to expand...

[rest of code snipped]

Many things I don't understand in this code, among them why you are
using 2 file handles to the same file, why you are slurping in the whole
file on one file handle and then process the file line by line on the
other file handle, why you have a nested loop, etc, etc.

Your requirements seem to be straight forward and easy to translate into
a simple algorithm (warning, sketch only, not tested):

my %idtable;
open ($F, '<', $myfile) of die "Cannot read $myfile because $!\n";
while (<$F>) { #loop through file and gather all IP | Code combinations
(undef, undef, $ip, $code) = split '\|';
$idtable{"$ip|$code"}++; #record this ip-code combination}

seek $F, 0; #reset file to start
while (<$F>) { #loop through file again and ....
(undef, undef, $ip, $code) = split '\|';
print if $idtable{"$ip|$code"} == 1;
#... print that line if the ip-code combination
#exists exactly once in the file
close $F;

jue- Hide quoted text -

- Show quoted text -

Hi jue,

IF I use

$idtable{"$ip|$code"}++; #record this ip-code combination

will this not replace previous valuse if same key(ip-code) comes
again ?

Jürgen Exner · Oct 10, 2008

What is this "Hide quoted text - Show quoted text" nonsense?

IF I use

$idtable{"$ip|$code"}++; #record this ip-code combination

will this not replace previous valuse if same key(ip-code) comes
again ?

Of course it does, that is the whole purpose. Or how do you suggest to
count the number of occurences if not by replacing the previous number
with the new number?

jue

friend.05 · Oct 10, 2008

What is this "Hide quoted text - Show quoted text" nonsense?

Of course it does, that is the whole purpose. Or how do you suggest to
count the number of occurences if not by replacing the previous number
with the new number?

jue

Got it thanks.

Sorry abt hide quoted text. I also don't knw wht is tht by mistake I
must click it while replying

sln · Oct 10, 2008

[email protected] said:
[email protected] said:

Quoth "(e-mail address removed)" <[email protected]>:

Click to expand...

I have a large file in following format:

Click to expand...

ID | Time | IP | Code

Click to expand...

I want only data lines which has unique IP+Code.

Click to expand...

If IP+Code is repeated then I don't want line.

Click to expand...

Below is code which I have written to extract unique IP+Code from
large file. (File format is ID | Time | IP | code).

Click to expand...

I am not sure which will be best way to do this.

Click to expand...

#!/usr/local/bin/perl
$pri_file = "out_pri.txt";

Click to expand...

$cnt = 0;
$flag = 0;

Click to expand...

open(INFO_PRI,$pri_file)or die $!;
open(INFO,$pri_file)or die $!;

Click to expand...

@pri_lines_ = <INFO>;

Click to expand...

while($pri_line = <INFO_PRI>)

Click to expand...

[rest of code snipped]

Many things I don't understand in this code, among them why you are
using 2 file handles to the same file, why you are slurping in the whole
file on one file handle and then process the file line by line on the
other file handle, why you have a nested loop, etc, etc.

Your requirements seem to be straight forward and easy to translate into
a simple algorithm (warning, sketch only, not tested):

my %idtable;
open ($F, '<', $myfile) of die "Cannot read $myfile because $!\n";
while (<$F>) { #loop through file and gather all IP | Code combinations
(undef, undef, $ip, $code) = split '\|';
$idtable{"$ip|$code"}++; #record this ip-code combination}

seek $F, 0; #reset file to start
while (<$F>) { #loop through file again and ....
(undef, undef, $ip, $code) = split '\|';
print if $idtable{"$ip|$code"} == 1;
#... print that line if the ip-code combination
#exists exactly once in the file
close $F;

jue- Hide quoted text -

- Show quoted text -

Click to expand...

Hi jue,

IF I use

$idtable{"$ip|$code"}++; #record this ip-code combination

will this not replace previous valuse if same key(ip-code) comes
again ?

This may not have been clear....

"$idtable{"$ip|$code"}", in this case is just a variable used as
a counter. Its no different than incrementing any other counter,
like $cnt++

In that respect, it just uses the IP and Code as a concantinated
string as a key into a hash array. Where the key contains the
encoded data.

In my opinion, this is not the way to go. If there is only a few IP
and many many Code, this could create an inordinantly large hash,
resulting in long lookup times.

You could double your money by getting unique IP, as well as shortening the
cpu overhead if you do it this way:

$idtable{$ip}->{$code}++

There is a tradeoff. Don't know really. Depends on the prediction if the amount of unique
Codes outnumbers the amount of IPs ... or something like that.

sln

sln · Oct 10, 2008

Jürgen Exner said:
Jürgen Exner said:

J. Gleixner said:

(e-mail address removed) wrote:
Quoth "(e-mail address removed)" <[email protected]>:

I have a large file in following format:
ID | Time | IP | Code
I want only data lines which has unique IP+Code.
If IP+Code is repeated then I don't want line.
perldoc -q unique

Ben
Below is code which I have written to extract unique IP+Code from
large file. (File format is ID | Time | IP | code).

I am not sure which will be best way to do this.
Well, it's not the way you posted.

Did you actually read the perldoc Ben mentioned above? You don't use a
hash at all, so I'm guessing not.
ACK!

while ( my $line = <$INFO> )
{
chomp( $line );
# split the data.. you can split directly into the variables..
# my ( $v1, $v2 ) = ( split( /\|/, $line ) )[1,2];
# print $line to $OUT if the hash key of $cli_ip and $id doesn't already
exist.

Click to expand...

That will print each IP+code exactly once. I think (but I may be
mistaken, the OPs isn't clear on that) he wants only those lines, that
_are_ unique wrt. the IP+code, i.e. where there is no second line with
the same IP+code.

Click to expand...

You're right, I mis-understood.

A fairly easy to follow solution would be to keep track of the data,
using two hashes.

my (%times, %line );

while(...)
{
# chomp,split,...
# times is the number of times the $cli_ip and $id were found
$times{ $cli_ip . $id }++;
# could 'next' if it is > 1
# and store the line itself, for the $cli_ip and $id
$line{ $cli_ip . $id } = $line;
}

Then, after the while, for each of the keys in %times, print the
value from %line where the value of $times{ $key } is 1, to the output file.

That should be enough to get the OP in the right direction, without
writing the whole darn thing for them.

Doesen't this overwrite whats already there? Not sure.
$line{ $cli_ip . $id } = $line;

sln

xhoster · Oct 10, 2008

Doesen't this overwrite whats already there? Not sure.
$line{ $cli_ip . $id } = $line;

Yes, of course. But since those lines won't get printed anyway (because
count > 1) then it doesn't matter if they get overwritten.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

xhoster · Oct 10, 2008

J. Gleixner said:
You're right, I mis-understood.

A fairly easy to follow solution would be to keep track of the data,
using two hashes.

my (%times, %line );

while(...)
{
# chomp,split,...
# times is the number of times the $cli_ip and $id were found
$times{ $cli_ip . $id }++;
# could 'next' if it is > 1
# and store the line itself, for the $cli_ip and $id
$line{ $cli_ip . $id } = $line;
}

I might go with just a single hash, using undef as a special value to
indicate we already have seen more than one.

my %line;

while(...)
{
# chomp,split,...
if (exists $line{ $cli_ip . $id }) {
$line{ $cli_ip . $id } = undef; #skunked
} else {
$line{ $cli_ip . $id } = $line;
};
}

Then, after the while, for each of the keys in %times, print the
value from %line where the value of $times{ $key } is 1, to the output
file.

Under my method, print the things from %line where the value is defined.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Tad J McClellan · Oct 10, 2008

Sorry abt hide quoted text. I also don't knw wht is tht by mistake I

^^^ ^^^ ^^^ ^^^
^^^ ^^^ ^^^ ^^^

Vanna, I would like to buy a vowel!

Jürgen Exner · Oct 10, 2008

J. Gleixner said:
$times{ $cli_ip . $id }++;

Careful! This may give wrong results in odd circumstances.
Example:
$cli_ip='foobar', $id='buz';
and
$cli_ip='foo', $id='barbuz';

Better to use the same separator as in the original data set, regardless
of if such a scenario may or may not happen with the OPs data set:

$times{ $cli_ip . '|' . $id }++;

jue

SSL from squarespace to my EC2	0	Sep 21, 2023
Capture unique IP addresses to Prometheus exporter	0	Jul 20, 2023
Help with Python Flask on PI as server SSE to website	0	Apr 23, 2022
HCaptcha - How to stop page from refreshing on submit if captcha is not checked/validated	1	Aug 29, 2023
Is this right way to convert data attributes values to number in javascipt? Need to get valid numeric value or 0	2	May 30, 2023
Camera Access Project	1	Mar 11, 2021
Cannot Read Data in 1-Port RAM IP Core	0	Aug 10, 2017
EEG stream data with mne and brainfolw	0	Jul 26, 2023

data file

friend.05

Ben Morrow

friend.05

Jürgen Exner

Tad J McClellan

Ben Morrow

J. Gleixner

Jürgen Exner

friend.05

Jürgen Exner

J. Gleixner

friend.05

Jürgen Exner

friend.05

sln

sln

xhoster

xhoster

Tad J McClellan

Jürgen Exner

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads