data file

Discussion in 'Perl Misc' started by friend.05@gmail.com, Oct 9, 2008.

  1. Guest

    I have a large file in following format:

    ID | Time | IP | Code


    I want only data lines which has unique IP+Code.

    If IP+Code is repeated then I don't want line.
     
    , Oct 9, 2008
    #1
    1. Advertising

  2. Ben Morrow Guest

    Quoth "" <>:
    > I have a large file in following format:
    >
    > ID | Time | IP | Code
    >
    >
    > I want only data lines which has unique IP+Code.
    >
    > If IP+Code is repeated then I don't want line.


    perldoc -q unique

    Ben

    --
    Musica Dei donum optimi, trahit homines, trahit deos. |
    Musica truces mollit animos, tristesque mentes erigit. |
    Musica vel ipsas arbores et horridas movet feras. |
     
    Ben Morrow, Oct 9, 2008
    #2
    1. Advertising

  3. Guest

    On Oct 9, 6:08 pm, Ben Morrow <> wrote:
    > Quoth "" <>:
    >
    > > I have a large file in following format:

    >
    > > ID | Time | IP | Code

    >
    > > I want only data lines which has unique IP+Code.

    >
    > > If IP+Code is repeated then I don't want line.

    >
    > perldoc -q unique
    >
    > Ben
    >
    > --
    > Musica Dei donum optimi, trahit homines, trahit deos.    |
    > Musica truces mollit animos, tristesque mentes erigit.   |  
    > Musica vel ipsas arbores et horridas movet feras.        |



    Below is code which I have written to extract unique IP+Code from
    large file. (File format is ID | Time | IP | code).

    I am not sure which will be best way to do this.

    #!/usr/local/bin/perl

    print "Welcome\n";

    $pri_file = "out_pri.txt";

    $cnt = 0;
    $flag = 0;

    open(INFO_PRI,$pri_file)or die $!;
    open(INFO,$pri_file)or die $!;

    @pri_lines_ = <INFO>;

    while($pri_line = <INFO_PRI>)
    {
    @primary = split('\|',$pri_line);
    $pri_cli_ip = $primary[4];
    $pri_id = $primary[7];
    print "$pri_id\n";


    foreach $p_line (@pri_lines_)
    {
    @pri = split('\|',$p_line);
    $cli_ip = $pri[4];
    $id = $pri[7];

    if(($pri_cli_ip == $cli_ip) && ($pri__id == $id))
    {
    $cnt++;
    if($cnt == 2){
    $cnt = 0;
    $flag = 1;
    last;
    }
    }
    }
    if($flag == 0){
    open(FILE,'>>pri_unique.txt');
    print FILE "$pri_line\n";
    close(FILE);
    }else{
    $flag = 0;
    }
    }

    close(INFO_PRI);
    close(INFO);
     
    , Oct 10, 2008
    #3
  4. "" <> wrote:
    >On Oct 9, 6:08 pm, Ben Morrow <> wrote:
    >> Quoth "" <>:
    >>
    >> > I have a large file in following format:

    >>
    >> > ID | Time | IP | Code

    >>
    >> > I want only data lines which has unique IP+Code.

    >>
    >> > If IP+Code is repeated then I don't want line.

    >
    >Below is code which I have written to extract unique IP+Code from
    >large file. (File format is ID | Time | IP | code).
    >
    >I am not sure which will be best way to do this.
    >
    >#!/usr/local/bin/perl
    >$pri_file = "out_pri.txt";
    >
    >$cnt = 0;
    >$flag = 0;
    >
    >open(INFO_PRI,$pri_file)or die $!;
    >open(INFO,$pri_file)or die $!;
    >
    >@pri_lines_ = <INFO>;
    >
    >while($pri_line = <INFO_PRI>)

    [rest of code snipped]

    Many things I don't understand in this code, among them why you are
    using 2 file handles to the same file, why you are slurping in the whole
    file on one file handle and then process the file line by line on the
    other file handle, why you have a nested loop, etc, etc.

    Your requirements seem to be straight forward and easy to translate into
    a simple algorithm (warning, sketch only, not tested):

    my %idtable;
    open ($F, '<', $myfile) of die "Cannot read $myfile because $!\n";
    while (<$F>) { #loop through file and gather all IP | Code combinations
    (undef, undef, $ip, $code) = split '\|';
    $idtable{"$ip|$code"}++; #record this ip-code combination
    }
    seek $F, 0; #reset file to start
    while (<$F>) { #loop through file again and ....
    (undef, undef, $ip, $code) = split '\|';
    print if $idtable{"$ip|$code"} == 1;
    #... print that line if the ip-code combination
    #exists exactly once in the file
    close $F;

    jue
     
    Jürgen Exner, Oct 10, 2008
    #4
  5. <> wrote:

    > $flag = 0;



    You should choose meaningful variable names.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Oct 10, 2008
    #5
  6. Ben Morrow Guest

    [don't quote .signatures]

    Quoth "" <>:
    > On Oct 9, 6:08 pm, Ben Morrow <> wrote:
    > > Quoth "" <>:
    > >
    > > > I have a large file in following format:

    > >
    > > > ID | Time | IP | Code

    > >
    > > > I want only data lines which has unique IP+Code.

    > >
    > > > If IP+Code is repeated then I don't want line.

    > >
    > > perldoc -q unique

    >
    > Below is code which I have written to extract unique IP+Code from
    > large file. (File format is ID | Time | IP | code).
    >
    > I am not sure which will be best way to do this.
    >
    > #!/usr/local/bin/perl


    Where is

    use warnings;
    use strict;

    ? You have already been told to include this.

    > print "Welcome\n";
    >
    > $pri_file = "out_pri.txt";
    >
    > $cnt = 0;
    > $flag = 0;
    >
    > open(INFO_PRI,$pri_file)or die $!;
    > open(INFO,$pri_file)or die $!;


    You have already been told to use lexical filehandles and 3-arg open.
    You should make the error message actually useful:

    open (my $INFO_PRI, "<", $pri_file)
    or die "can't open '$pri_file': $!";

    Why are you opening the same file twice? Just iterate over @pri_lines_
    instead.

    > @pri_lines_ = <INFO>;


    Why on earth are you using a variable name ending in _?

    > while($pri_line = <INFO_PRI>)
    > {
    > @primary = split('\|',$pri_line);
    > $pri_cli_ip = $primary[4];
    > $pri_id = $primary[7];
    > print "$pri_id\n";
    >
    >
    > foreach $p_line (@pri_lines_)
    > {
    > @pri = split('\|',$p_line);


    You keep doing the same split over and over. Split the line first, and
    keep the results in a datastructure till you need them.

    > $cli_ip = $pri[4];
    > $id = $pri[7];
    >
    > if(($pri_cli_ip == $cli_ip) && ($pri__id == $id))


    Did you read perldoc -q unique? It says to use a hash for finding
    uniqueness.

    > {
    > $cnt++;


    You are not resetting $cnt between iterations of the outer loop, so
    every other line will be considered duplicate.

    > if($cnt == 2){
    > $cnt = 0;
    > $flag = 1;
    > last;


    If you give the outer loop a label, you can use next LABEL and avoid
    $flag.

    > }
    > }
    > }
    > if($flag == 0){
    > open(FILE,'>>pri_unique.txt');
    > print FILE "$pri_line\n";
    > close(FILE);


    Why do you keep opening and closing this file?

    Ben

    --
    Outside of a dog, a book is a man's best friend.
    Inside of a dog, it's too dark to read.
    Groucho Marx
     
    Ben Morrow, Oct 10, 2008
    #6
  7. J. Gleixner Guest

    wrote:
    > On Oct 9, 6:08 pm, Ben Morrow <> wrote:
    >> Quoth "" <>:
    >>
    >>> I have a large file in following format:
    >>> ID | Time | IP | Code
    >>> I want only data lines which has unique IP+Code.
    >>> If IP+Code is repeated then I don't want line.

    >> perldoc -q unique
    >>
    >> Ben


    > Below is code which I have written to extract unique IP+Code from
    > large file. (File format is ID | Time | IP | code).
    >
    > I am not sure which will be best way to do this.


    Well, it's not the way you posted.

    Did you actually read the perldoc Ben mentioned above? You don't use a
    hash at all, so I'm guessing not.

    >
    > #!/usr/local/bin/perl

    use strict;

    open( my $INFO, '<', $pri_file ) or die "Can't open $pri_file: $!";
    open( my $OUT, '>', 'unique.out' ) or die "Can't open unique.out: $!";

    my %info;
    while ( my $line = <$INFO> )
    {
    chomp( $line );
    # split the data.. you can split directly into the variables..
    # my ( $v1, $v2 ) = ( split( /\|/, $line ) )[1,2];
    # print $line to $OUT if the hash key of $cli_ip and $id doesn't already
    exist.

    }
     
    J. Gleixner, Oct 10, 2008
    #7
  8. "J. Gleixner" <> wrote:
    > wrote:
    >> On Oct 9, 6:08 pm, Ben Morrow <> wrote:
    >>> Quoth "" <>:
    >>>
    >>>> I have a large file in following format:
    >>>> ID | Time | IP | Code
    >>>> I want only data lines which has unique IP+Code.
    >>>> If IP+Code is repeated then I don't want line.
    >>> perldoc -q unique
    >>>
    >>> Ben

    >
    >> Below is code which I have written to extract unique IP+Code from
    >> large file. (File format is ID | Time | IP | code).
    >>
    >> I am not sure which will be best way to do this.

    >
    >Well, it's not the way you posted.
    >
    >Did you actually read the perldoc Ben mentioned above? You don't use a
    >hash at all, so I'm guessing not.


    ACK!

    >while ( my $line = <$INFO> )
    >{
    > chomp( $line );
    ># split the data.. you can split directly into the variables..
    ># my ( $v1, $v2 ) = ( split( /\|/, $line ) )[1,2];
    ># print $line to $OUT if the hash key of $cli_ip and $id doesn't already
    >exist.


    That will print each IP+code exactly once. I think (but I may be
    mistaken, the OPs isn't clear on that) he wants only those lines, that
    _are_ unique wrt. the IP+code, i.e. where there is no second line with
    the same IP+code.

    jue
     
    Jürgen Exner, Oct 10, 2008
    #8
  9. Guest

    On Oct 10, 12:57 pm, Jürgen Exner <> wrote:
    > "J. Gleixner" <> wrote:
    > > wrote:
    > >> On Oct 9, 6:08 pm, Ben Morrow <> wrote:
    > >>> Quoth "" <>:

    >
    > >>>> I have a large file in following format:
    > >>>> ID | Time | IP | Code
    > >>>> I want only data lines which has unique IP+Code.
    > >>>> If IP+Code is repeated then I don't want line.
    > >>> perldoc -q unique

    >
    > >>> Ben

    >
    > >> Below is code which I have written to extract unique IP+Code from
    > >> large file. (File format is ID | Time | IP | code).

    >
    > >> I am not sure which will be best way to do this.

    >
    > >Well, it's not the way you posted.

    >
    > >Did you actually read the perldoc Ben mentioned above?  You don't use a
    > >hash at all, so I'm guessing not.

    >
    > ACK!
    >
    > >while ( my $line = <$INFO> )
    > >{
    > >    chomp( $line );
    > ># split the data.. you can split directly into the variables..
    > ># my ( $v1, $v2 ) = ( split( /\|/, $line ) )[1,2];
    > ># print $line to $OUT if the hash key of $cli_ip and $id doesn't already
    > >exist.

    >
    > That will print each IP+code exactly once. I think (but I may be
    > mistaken, the OPs isn't clear on that) he wants only those lines, that
    > _are_ unique wrt. the IP+code, i.e. where there is no second line with
    > the same IP+code.
    >
    > jue- Hide quoted text -
    >
    > - Show quoted text -


    Thanks to all for help. That was helpful.

    But.

    I created the hash (IP+Code) combination.

    But How to chk if this hash(each combination) is exactly one time in
    file ?
     
    , Oct 10, 2008
    #9
  10. "" <> wrote:
    >I created the hash (IP+Code) combination.
    >
    >But How to chk if this hash(each combination) is exactly one time in
    >file ?


    You could count the number of occurences and then compare the count
    against 1?

    $IDTable{"$IP+$Code"}++;
    [......]

    if ($IDTable{"$IP+$Code"} == 1) {
    print "Look ma, $IP+$Code occurs exactly once in the file\n";
     
    Jürgen Exner, Oct 10, 2008
    #10
  11. J. Gleixner Guest

    Jürgen Exner wrote:
    > "J. Gleixner" <> wrote:
    >> wrote:
    >>> On Oct 9, 6:08 pm, Ben Morrow <> wrote:
    >>>> Quoth "" <>:
    >>>>
    >>>>> I have a large file in following format:
    >>>>> ID | Time | IP | Code
    >>>>> I want only data lines which has unique IP+Code.
    >>>>> If IP+Code is repeated then I don't want line.
    >>>> perldoc -q unique
    >>>>
    >>>> Ben
    >>> Below is code which I have written to extract unique IP+Code from
    >>> large file. (File format is ID | Time | IP | code).
    >>>
    >>> I am not sure which will be best way to do this.

    >> Well, it's not the way you posted.
    >>
    >> Did you actually read the perldoc Ben mentioned above? You don't use a
    >> hash at all, so I'm guessing not.

    >
    > ACK!
    >
    >> while ( my $line = <$INFO> )
    >> {
    >> chomp( $line );
    >> # split the data.. you can split directly into the variables..
    >> # my ( $v1, $v2 ) = ( split( /\|/, $line ) )[1,2];
    >> # print $line to $OUT if the hash key of $cli_ip and $id doesn't already
    >> exist.

    >
    > That will print each IP+code exactly once. I think (but I may be
    > mistaken, the OPs isn't clear on that) he wants only those lines, that
    > _are_ unique wrt. the IP+code, i.e. where there is no second line with
    > the same IP+code.


    You're right, I mis-understood.

    A fairly easy to follow solution would be to keep track of the data,
    using two hashes.

    my (%times, %line );

    while(...)
    {
    # chomp,split,...
    # times is the number of times the $cli_ip and $id were found
    $times{ $cli_ip . $id }++;
    # could 'next' if it is > 1
    # and store the line itself, for the $cli_ip and $id
    $line{ $cli_ip . $id } = $line;
    }

    Then, after the while, for each of the keys in %times, print the
    value from %line where the value of $times{ $key } is 1, to the output file.

    That should be enough to get the OP in the right direction, without
    writing the whole darn thing for them.
     
    J. Gleixner, Oct 10, 2008
    #11
  12. Guest

    On Oct 10, 12:18 pm, Jürgen Exner <> wrote:
    > "" <> wrote:
    > >On Oct 9, 6:08 pm, Ben Morrow <> wrote:
    > >> Quoth "" <>:

    >
    > >> > I have a large file in following format:

    >
    > >> > ID | Time | IP | Code

    >
    > >> > I want only data lines which has unique IP+Code.

    >
    > >> > If IP+Code is repeated then I don't want line.

    >
    > >Below is code which I have written to extract unique IP+Code from
    > >large file. (File format is ID | Time | IP | code).

    >
    > >I am not sure which will be best way to do this.

    >
    > >#!/usr/local/bin/perl
    > >$pri_file = "out_pri.txt";

    >
    > >$cnt = 0;
    > >$flag = 0;

    >
    > >open(INFO_PRI,$pri_file)or die $!;
    > >open(INFO,$pri_file)or die $!;

    >
    > >@pri_lines_ = <INFO>;

    >
    > >while($pri_line = <INFO_PRI>)

    >
    > [rest of code snipped]
    >
    > Many things I don't understand in this code, among them why you are
    > using 2 file handles to the same file, why you are slurping in the whole
    > file on one file handle and then process the file line by line on the
    > other file handle, why you have a nested loop, etc, etc.
    >
    > Your requirements seem to be straight forward and easy to translate into
    > a simple algorithm (warning, sketch only, not tested):
    >
    > my %idtable;
    > open ($F, '<', $myfile) of die "Cannot read $myfile because $!\n";
    > while (<$F>) { #loop through file and gather all IP | Code combinations
    >         (undef, undef, $ip, $code) = split '\|';
    >         $idtable{"$ip|$code"}++; #record this ip-code combination}
    >
    > seek $F, 0; #reset file to start
    > while (<$F>) { #loop through file again and ....
    >         (undef, undef, $ip, $code) = split '\|';
    >         print if $idtable{"$ip|$code"} == 1;
    >                 #... print that line if the ip-code combination
    >                 #exists exactly once in the file
    > close $F;
    >
    > jue- Hide quoted text -
    >
    > - Show quoted text -


    Hi jue,

    IF I use

    $idtable{"$ip|$code"}++; #record this ip-code combination

    will this not replace previous valuse if same key(ip-code) comes
    again ?
     
    , Oct 10, 2008
    #12
  13. "" <> wrote:
    >> jue- Hide quoted text -
    >>
    >> - Show quoted text -


    What is this "Hide quoted text - Show quoted text" nonsense?

    >IF I use
    >
    >$idtable{"$ip|$code"}++; #record this ip-code combination
    >
    >will this not replace previous valuse if same key(ip-code) comes
    >again ?


    Of course it does, that is the whole purpose. Or how do you suggest to
    count the number of occurences if not by replacing the previous number
    with the new number?

    jue
     
    Jürgen Exner, Oct 10, 2008
    #13
  14. Guest

    On Oct 10, 2:31 pm, Jürgen Exner <> wrote:
    > "" <> wrote:
    > >> jue- Hide quoted text -

    >
    > >> - Show quoted text -

    >
    > What is this "Hide quoted text - Show quoted text" nonsense?
    >
    > >IF I use

    >
    > >$idtable{"$ip|$code"}++; #record this ip-code combination

    >
    > >will this not replace previous valuse if same key(ip-code) comes
    > >again ?

    >
    > Of course it does, that is the whole purpose. Or how do you suggest to
    > count the number of occurences if not by replacing the previous number
    > with the new number?
    >
    > jue


    Got it thanks.

    Sorry abt hide quoted text. I also don't knw wht is tht by mistake I
    must click it while replying
     
    , Oct 10, 2008
    #14
  15. Guest

    On Fri, 10 Oct 2008 10:55:22 -0700 (PDT), "" <> wrote:

    >On Oct 10, 12:18 pm, Jürgen Exner <> wrote:
    >> "" <> wrote:
    >> >On Oct 9, 6:08 pm, Ben Morrow <> wrote:
    >> >> Quoth "" <>:

    >>
    >> >> > I have a large file in following format:

    >>
    >> >> > ID | Time | IP | Code

    >>
    >> >> > I want only data lines which has unique IP+Code.

    >>
    >> >> > If IP+Code is repeated then I don't want line.

    >>
    >> >Below is code which I have written to extract unique IP+Code from
    >> >large file. (File format is ID | Time | IP | code).

    >>
    >> >I am not sure which will be best way to do this.

    >>
    >> >#!/usr/local/bin/perl
    >> >$pri_file = "out_pri.txt";

    >>
    >> >$cnt = 0;
    >> >$flag = 0;

    >>
    >> >open(INFO_PRI,$pri_file)or die $!;
    >> >open(INFO,$pri_file)or die $!;

    >>
    >> >@pri_lines_ = <INFO>;

    >>
    >> >while($pri_line = <INFO_PRI>)

    >>
    >> [rest of code snipped]
    >>
    >> Many things I don't understand in this code, among them why you are
    >> using 2 file handles to the same file, why you are slurping in the whole
    >> file on one file handle and then process the file line by line on the
    >> other file handle, why you have a nested loop, etc, etc.
    >>
    >> Your requirements seem to be straight forward and easy to translate into
    >> a simple algorithm (warning, sketch only, not tested):
    >>
    >> my %idtable;
    >> open ($F, '<', $myfile) of die "Cannot read $myfile because $!\n";
    >> while (<$F>) { #loop through file and gather all IP | Code combinations
    >>         (undef, undef, $ip, $code) = split '\|';
    >>         $idtable{"$ip|$code"}++; #record this ip-code combination}
    >>
    >> seek $F, 0; #reset file to start
    >> while (<$F>) { #loop through file again and ....
    >>         (undef, undef, $ip, $code) = split '\|';
    >>         print if $idtable{"$ip|$code"} == 1;
    >>                 #... print that line if the ip-code combination
    >>                 #exists exactly once in the file
    >> close $F;
    >>
    >> jue- Hide quoted text -
    >>
    >> - Show quoted text -

    >
    >Hi jue,
    >
    >IF I use
    >
    >$idtable{"$ip|$code"}++; #record this ip-code combination
    >
    >will this not replace previous valuse if same key(ip-code) comes
    >again ?


    This may not have been clear....

    "$idtable{"$ip|$code"}", in this case is just a variable used as
    a counter. Its no different than incrementing any other counter,
    like $cnt++

    In that respect, it just uses the IP and Code as a concantinated
    string as a key into a hash array. Where the key contains the
    encoded data.

    In my opinion, this is not the way to go. If there is only a few IP
    and many many Code, this could create an inordinantly large hash,
    resulting in long lookup times.

    You could double your money by getting unique IP, as well as shortening the
    cpu overhead if you do it this way:

    $idtable{$ip}->{$code}++

    There is a tradeoff. Don't know really. Depends on the prediction if the amount of unique
    Codes outnumbers the amount of IPs ... or something like that.

    sln
     
    , Oct 10, 2008
    #15
  16. Guest

    On Fri, 10 Oct 2008 12:49:22 -0500, "J. Gleixner" <> wrote:

    >Jürgen Exner wrote:
    >> "J. Gleixner" <> wrote:
    >>> wrote:
    >>>> On Oct 9, 6:08 pm, Ben Morrow <> wrote:
    >>>>> Quoth "" <>:
    >>>>>
    >>>>>> I have a large file in following format:
    >>>>>> ID | Time | IP | Code
    >>>>>> I want only data lines which has unique IP+Code.
    >>>>>> If IP+Code is repeated then I don't want line.
    >>>>> perldoc -q unique
    >>>>>
    >>>>> Ben
    >>>> Below is code which I have written to extract unique IP+Code from
    >>>> large file. (File format is ID | Time | IP | code).
    >>>>
    >>>> I am not sure which will be best way to do this.
    >>> Well, it's not the way you posted.
    >>>
    >>> Did you actually read the perldoc Ben mentioned above? You don't use a
    >>> hash at all, so I'm guessing not.

    >>
    >> ACK!
    >>
    >>> while ( my $line = <$INFO> )
    >>> {
    >>> chomp( $line );
    >>> # split the data.. you can split directly into the variables..
    >>> # my ( $v1, $v2 ) = ( split( /\|/, $line ) )[1,2];
    >>> # print $line to $OUT if the hash key of $cli_ip and $id doesn't already
    >>> exist.

    >>
    >> That will print each IP+code exactly once. I think (but I may be
    >> mistaken, the OPs isn't clear on that) he wants only those lines, that
    >> _are_ unique wrt. the IP+code, i.e. where there is no second line with
    >> the same IP+code.

    >
    >You're right, I mis-understood.
    >
    >A fairly easy to follow solution would be to keep track of the data,
    >using two hashes.
    >
    >my (%times, %line );
    >
    >while(...)
    >{
    > # chomp,split,...
    > # times is the number of times the $cli_ip and $id were found
    > $times{ $cli_ip . $id }++;
    > # could 'next' if it is > 1
    > # and store the line itself, for the $cli_ip and $id
    > $line{ $cli_ip . $id } = $line;
    >}
    >
    >Then, after the while, for each of the keys in %times, print the
    >value from %line where the value of $times{ $key } is 1, to the output file.
    >
    >That should be enough to get the OP in the right direction, without
    >writing the whole darn thing for them.


    Doesen't this overwrite whats already there? Not sure.
    $line{ $cli_ip . $id } = $line;

    sln
     
    , Oct 10, 2008
    #16
  17. Guest

    wrote:
    > On Fri, 10 Oct 2008 12:49:22 -0500, "J. Gleixner"
    > <> wrote:
    > >
    > >A fairly easy to follow solution would be to keep track of the data,
    > >using two hashes.
    > >
    > >my (%times, %line );
    > >
    > >while(...)
    > >{
    > > # chomp,split,...
    > > # times is the number of times the $cli_ip and $id were found
    > > $times{ $cli_ip . $id }++;
    > > # could 'next' if it is > 1
    > > # and store the line itself, for the $cli_ip and $id
    > > $line{ $cli_ip . $id } = $line;
    > >}
    > >
    > >Then, after the while, for each of the keys in %times, print the
    > >value from %line where the value of $times{ $key } is 1, to the output
    > >file.
    > >
    > >That should be enough to get the OP in the right direction, without
    > >writing the whole darn thing for them.

    >
    > Doesen't this overwrite whats already there? Not sure.
    > $line{ $cli_ip . $id } = $line;


    Yes, of course. But since those lines won't get printed anyway (because
    count > 1) then it doesn't matter if they get overwritten.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
     
    , Oct 10, 2008
    #17
  18. Guest

    "J. Gleixner" <> wrote:
    > >
    > > That will print each IP+code exactly once. I think (but I may be
    > > mistaken, the OPs isn't clear on that) he wants only those lines, that
    > > _are_ unique wrt. the IP+code, i.e. where there is no second line with
    > > the same IP+code.

    >
    > You're right, I mis-understood.
    >
    > A fairly easy to follow solution would be to keep track of the data,
    > using two hashes.
    >
    > my (%times, %line );
    >
    > while(...)
    > {
    > # chomp,split,...
    > # times is the number of times the $cli_ip and $id were found
    > $times{ $cli_ip . $id }++;
    > # could 'next' if it is > 1
    > # and store the line itself, for the $cli_ip and $id
    > $line{ $cli_ip . $id } = $line;
    > }


    I might go with just a single hash, using undef as a special value to
    indicate we already have seen more than one.

    my %line;

    while(...)
    {
    # chomp,split,...
    if (exists $line{ $cli_ip . $id }) {
    $line{ $cli_ip . $id } = undef; #skunked
    } else {
    $line{ $cli_ip . $id } = $line;
    };
    }



    >
    > Then, after the while, for each of the keys in %times, print the
    > value from %line where the value of $times{ $key } is 1, to the output
    > file.


    Under my method, print the things from %line where the value is defined.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
     
    , Oct 10, 2008
    #18
  19. <> wrote:

    > Sorry abt hide quoted text. I also don't knw wht is tht by mistake I

    ^^^ ^^^ ^^^ ^^^
    ^^^ ^^^ ^^^ ^^^

    Vanna, I would like to buy a vowel!


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Oct 10, 2008
    #19
  20. "J. Gleixner" <> wrote:
    > $times{ $cli_ip . $id }++;


    Careful! This may give wrong results in odd circumstances.
    Example:
    $cli_ip='foobar', $id='buz';
    and
    $cli_ip='foo', $id='barbuz';

    Better to use the same separator as in the original data set, regardless
    of if such a scenario may or may not happen with the OPs data set:

    $times{ $cli_ip . '|' . $id }++;

    jue
     
    Jürgen Exner, Oct 10, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jane Austine
    Replies:
    14
    Views:
    819
    Dennis Lee Bieber
    Oct 9, 2004
  2. Jane Austine
    Replies:
    2
    Views:
    476
    Changjune Kim
    Oct 5, 2004
  3. sweety
    Replies:
    9
    Views:
    1,065
    Richard Heathfield
    Feb 7, 2006
  4. roughtrader
    Replies:
    3
    Views:
    454
    James Kanze
    Feb 17, 2009
  5. Replies:
    5
    Views:
    109
    Chris Angelico
    May 14, 2014
Loading...

Share This Page