restrict a hash to 15 pairs and iterate over it

Discussion in 'Perl Misc' started by Marek, Feb 15, 2009.

  1. Marek

    Marek Guest

    Hello all!


    I am still a beginner, so please be patient with me.

    I have a big file with numbers and dates like follows here:


    01.01.98
    31
    33
    14
    7
    35
    16
    20
    20
    13
    55
    1
    1
    7


    etc etc

    I need a complicate hash to know the occurrences of numbers in a scope
    of 15:

    We skip the dates, and we count the lines. The structure of my %hash
    looks like follows:

    ($number{$line, $line, ...}) => $how_many_times

    In my example the 20 occurs in line 7 and 8 -> two times:

    20{7,8} => 2

    And we iterate over it, and keep only 15 numbers in the hash and count
    each time the occurrences of each number.

    Could somebody help me with this?


    Thank you in advance


    marek
    Marek, Feb 15, 2009
    #1
    1. Advertising

  2. Marek

    Marek Guest

    On Feb 15, 10:06 am, Ben Morrow <> wrote:

    >
    > What do you mean by 'the occurrences of numbers'? Do you mean the number
    > of times each number shows up, so for a list like
    >
    > 1 2 1 4 2
    >
    > you would get the results
    >
    > 1: 2 times
    > 2: 2 times
    > 4: 1 time
    >


    yes! :)

    >
    > If you need a data structure like this (I'm not yet sure whether you do)
    > you want something more like
    >
    > my %numbers = (
    > 20 => [7, 8]
    > );
    >
    > You don't need to record the count separately: Perl arrays know how long
    > they are.
    >


    Thank you! Good idea!

    > How do you choose which 15 numbers to keep?
    >


    >
    > Post the code you've got so far, in as close to working condition as you
    > can get it (that is, make sure there aren't any syntax errors, or bits
    > left over from things you've tried before).
    >
    > Ben


    Thank you Ben for your patience!

    1. You are asking for a code. But I am ashamed to post it here,
    because it is too childish. I have tried with an array of a reference
    to a hash.

    2. What I need, you guessed already well:

    I have a large file with many numbers in it. Each line a number and
    sometimes some dates. I read in from the beginning 15 numbers, here
    separated with a tab:

    1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
    ^ ^


    So the first step would be to read in the numbers until 15th line and
    see if there are double numbers or triple numbers. In my example here
    there are two 1 and two 2 ... Then I do something with this result and
    I read in the next 15 numbers starting with 2 ...


    1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
    ^ ^

    next step we read in starting with 3


    1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
    ^ ^

    But you gave already a valuable hint: we only need a hash of each

    %number = (1 => [1,5]);

    And see how long is the anonymous array. No need to add one level more
    to each number and counting their occurrences.

    Hope this is clearer now? Thank you again!



    marek
    Marek, Feb 15, 2009
    #2
    1. Advertising

  3. Marek <> wrote:
    > On Feb 15, 10:06 am, Ben Morrow <> wrote:
    >
    >>
    >> so for a list like
    >>
    >> 1 2 1 4 2
    >>
    >> you would get the results
    >>
    >> 1: 2 times
    >> 2: 2 times
    >> 4: 1 time
    >>

    >
    > yes! :)



    > So the first step would be to read in the numbers until 15th line and
    > see if there are double numbers or triple numbers. In my example here
    > there are two 1 and two 2 ... Then I do something with this result and
    > I read in the next 15 numbers starting with 2 ...
    >
    >
    > 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
    > ^ ^
    >
    > next step we read in starting with 3
    >
    >
    > 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
    > ^ ^
    >
    > But you gave already a valuable hint: we only need a hash of each
    >
    > %number = (1 => [1,5]);



    If you instead use a hash that buffers 15 elements at a time, then
    you can generate a hash with the counts directly, as you don't seem
    to want to know the actual line numbers...

    I thinks this does what you are asking for:


    -----------------------------
    #!/usr/bin/perl
    use warnings;
    use strict;
    use Data::Dumper;

    my $size = 5; # 5 instead of 15
    my $line = 0; # "line" counter
    my %lines; # buffer up $size lines

    while ( <DATA> ) {
    next if /\./; # skip dates
    chomp;
    $line++;

    $lines{$line} = $_;

    if ( keys %lines == $size ) {

    print Dumper \%lines; # for debugging

    # count what is in the buffer
    my %nums;
    $nums{$_}++ for values %lines;

    # display what is in the (counted) buffer
    foreach my $num ( sort { $a <=> $b } keys %nums ) {
    printf "%3d: %3d times\n", $num, $nums{$num};
    }
    print "---------\n";

    # maintain buffer size
    delete $lines{ $line - $size + 1};
    }
    }

    __DATA__
    01.01.98
    31
    33
    01.02.98
    01.03.98
    14
    7
    35
    16
    20
    20
    13
    55
    1
    1
    7
    -----------------------------


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
    Tad J McClellan, Feb 15, 2009
    #3
  4. Marek wrote:
    > On Feb 15, 10:06 am, Ben Morrow <> wrote:
    >
    >> What do you mean by 'the occurrences of numbers'? Do you mean the number
    >> of times each number shows up, so for a list like
    >>
    >> 1 2 1 4 2
    >>
    >> you would get the results
    >>
    >> 1: 2 times
    >> 2: 2 times
    >> 4: 1 time
    >>

    >
    > yes! :)
    >
    >> If you need a data structure like this (I'm not yet sure whether you do)
    >> you want something more like
    >>
    >> my %numbers = (
    >> 20 => [7, 8]
    >> );
    >>
    >> You don't need to record the count separately: Perl arrays know how long
    >> they are.
    >>

    >
    > Thank you! Good idea!


    If you need to know both the counts of occurence and the locations, then
    this is a good idea. The list of locations automatically includes the
    count of occurences. But if *only* need the count, then I wouldn't
    store the list of locations as well.


    >
    > I have a large file with many numbers in it. Each line a number and
    > sometimes some dates. I read in from the beginning 15 numbers, here
    > separated with a tab:
    >
    > 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
    > ^ ^
    >
    >
    > So the first step would be to read in the numbers until 15th line and
    > see if there are double numbers or triple numbers. In my example here
    > there are two 1 and two 2 ... Then I do something with this result and
    > I read in the next 15 numbers starting with 2 ...
    >
    >
    > 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
    > ^ ^


    So it is a sliding window.

    my @window;
    my %count;
    while (<>) { chomp;
    next if looks_like_date_not_number($_);
    push @window, $_;
    $count{$_}++;
    if (@window>15) {
    ## window is too big, get rid of the first one;
    $count{$window[0]}--;
    shift @window;
    };
    if (@window==15) {
    # do whatever you want to do with %count
    }
    };

    Xho
    Xho Jingleheimerschmidt, Feb 15, 2009
    #4
  5. On 2009-02-15 12:21, Marek <> wrote:
    > I have a large file with many numbers in it. Each line a number and
    > sometimes some dates. I read in from the beginning 15 numbers, here
    > separated with a tab:
    >
    > 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
    > ^ ^
    >
    >
    > So the first step would be to read in the numbers until 15th line and
    > see if there are double numbers or triple numbers. In my example here
    > there are two 1 and two 2 ... Then I do something with this result and
    > I read in the next 15 numbers starting with 2 ...
    >
    >
    > 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
    > ^ ^
    >
    > next step we read in starting with 3
    >
    >
    > 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
    > ^ ^


    So you have a sliding window of 15 lines, and you always want to know
    how often a number occurs within that window? That is you want an output
    similar to this:

    line 1 - 15:
    1: 2
    2: 2
    3: 1
    4: 1
    5: 1
    6: 1
    7: 1
    8: 1
    9: 1
    10: 1
    11: 1
    12: 1
    13: 1
    line 2 - 16:
    1: 1
    2: 2
    3: 1
    4: 1
    5: 1
    6: 1
    7: 1
    8: 1
    9: 1
    10: 1
    11: 1
    12: 1
    13: 1
    14: 1
    line 3 - 17:
    1: 1
    2: 1
    3: 1
    4: 1
    5: 1
    6: 1
    7: 1
    8: 1
    9: 1
    10: 1
    11: 1
    12: 1
    13: 1
    14: 1
    15: 1

    One way to achieve this is to keep the window in an array. Add new lines
    with push, and remove old lines with shift. For each line added,
    increment the count of the corresponding number. For each line removed,
    decrement it. The counts can be kept in an array or a hash.

    hp
    Peter J. Holzer, Feb 15, 2009
    #5
  6. Marek

    Marek Guest

    Wow! I am impressed! I would never have found these solutions on
    myself!

    Special thanx to Tad; so short and elegant! I love these lines:

    if ( keys %lines == $size )

    and

    $nums{$_}++ for values %lines;

    and this one is my favourite :)

    delete $lines{ $line - $size + 1};

    Also Xho's suggestion is really tricky! Thank you!

    Good evening to all!


    marek
    Marek, Feb 15, 2009
    #6
  7. Marek

    Guest

    On Sat, 14 Feb 2009 23:51:34 -0800 (PST), Marek <> wrote:

    >
    >
    >Hello all!
    >
    >
    >I am still a beginner, so please be patient with me.
    >
    >I have a big file with numbers and dates like follows here:
    >
    >
    >01.01.98
    >31
    >33
    >14
    >7
    >35
    >16
    >20
    >20
    >13
    >55
    >1
    >1
    >7
    >
    >
    >etc etc
    >
    >I need a complicate hash to know the occurrences of numbers in a scope
    >of 15:
    >
    >We skip the dates, and we count the lines. The structure of my %hash
    >looks like follows:
    >
    >($number{$line, $line, ...}) => $how_many_times
    >
    >In my example the 20 occurs in line 7 and 8 -> two times:
    >
    >20{7,8} => 2
    >
    >And we iterate over it, and keep only 15 numbers in the hash and count
    >each time the occurrences of each number.
    >
    >Could somebody help me with this?
    >
    >
    >Thank you in advance
    >
    >
    >marek


    A rolling Frame that tracks line's of occurances is not as easy as you think.
    The concept is simple, the implementation is another thing altogether.
    This would not be a problem to present in a beginner Perl class.
    Its not actually Perl that would be a problem, its the implemtation of a rolling
    frame and tracking of line numbers from a given criteria.

    The below code is just a rudimentary framework to demonstrate the constructs that
    would be necessary. You might need a hardened programmer with large application
    experience to deal with rolling frames and data tracking.

    Could this rough code be thinned out? Sure. It just demonstrates the concept, its
    not production quality.

    Btw, the frame size was set to 5 for the example, change it to 15 or whatever it is
    your doing.

    Well, good luck and have fun!
    -sln

    ------------------------

    # Frames.pl
    # -------------------------
    # Template:
    # We assume a valid frame of 5 (not based on line count) This could be 15 or any number
    # @Frame_Cache = (number, number, number, ...); ## 5 elements
    # %Items = (number => [line,line,line], number => [line,line,line],...);


    use strict;
    use warnings;


    my @Frame_Cache = ();
    my %Items = ();
    my ($cache_size, $lncount, $framesize) = (0, 0, 5);

    while (<DATA>)
    {
    ++$lncount;

    # Digits only, anything else is invalid
    /^\s*(\d+)\s*$/;
    next if (!$1);

    # Add item to frame cache
    push @Frame_Cache, $1;

    # Add line number onto item array stack (in hash)
    push @{$Items{$1}}, $lncount;

    print "\nAdding $1 (line $lncount)\n";

    # Continue until full frame
    ++$cache_size;
    next if ($cache_size < $framesize);

    # First full frame, the roll starts on next one
    # Show Frame, do something with %Items
    if ($cache_size == $framesize)
    {
    PrintItems();
    next;
    }

    # Frame is moving, take head off cache
    my $item_number = shift @Frame_Cache;

    # Adjust lines going out of frame (all array's in hash).
    # Delete the item number line array if it is empty.

    print "Taking $item_number off (line ".${$Items{$item_number}}[0].")\n";

    my $line_going_out_of_frame = ${$Items{$item_number}}[0];
    for my $nbr (keys %Items)
    {
    shift @{$Items{$nbr}} if (${$Items{$nbr}}[0] <= $line_going_out_of_frame);
    delete $Items{$nbr} if (!@{$Items{$nbr}});
    }

    # Show Frame, do something with %Items
    PrintItems();
    }

    # You could print items down here if there is no full frame
    # ...

    # end of program ...


    # This prints the items hash (could use Data::Dumper), but more importantly
    # gives a template to access the data.
    # When your through with debug printing, just comment the print part out.
    # Process the data here, refactor this sub when done.
    # No sub should access global data imho.
    # -----------------
    sub PrintItems
    {
    print "Frame ".($cache_size-$framesize+1)." - $cache_size\n";
    for my $nbr (sort {$a<=>$b} keys %Items) {
    print "number = $nbr, on lines = [ @{$Items{$nbr}} ]\n";
    }
    }

    __DATA__

    01.01.98
    99
    31
    33
    14
    7
    35
    16
    20
    20
    13
    55
    1
    1
    7
    0
    2
    3
    0
    2
    3
    0
    2
    3
    0
    2
    3
    0
    2
    3
    0


    ---------------
    Output:


    c:\temp>perl frames.pl

    Adding 99 (line 3)

    Adding 31 (line 4)

    Adding 33 (line 5)

    Adding 14 (line 6)

    Adding 7 (line 7)
    Frame 1 - 5
    number = 7, on lines = [ 7 ]
    number = 14, on lines = [ 6 ]
    number = 31, on lines = [ 4 ]
    number = 33, on lines = [ 5 ]
    number = 99, on lines = [ 3 ]

    Adding 35 (line 8)
    Taking 99 off (line 3)
    Frame 2 - 6
    number = 7, on lines = [ 7 ]
    number = 14, on lines = [ 6 ]
    number = 31, on lines = [ 4 ]
    number = 33, on lines = [ 5 ]
    number = 35, on lines = [ 8 ]

    Adding 16 (line 9)
    Taking 31 off (line 4)
    Frame 3 - 7
    number = 7, on lines = [ 7 ]
    number = 14, on lines = [ 6 ]
    number = 16, on lines = [ 9 ]
    number = 33, on lines = [ 5 ]
    number = 35, on lines = [ 8 ]

    Adding 20 (line 10)
    Taking 33 off (line 5)
    Frame 4 - 8
    number = 7, on lines = [ 7 ]
    number = 14, on lines = [ 6 ]
    number = 16, on lines = [ 9 ]
    number = 20, on lines = [ 10 ]
    number = 35, on lines = [ 8 ]

    Adding 20 (line 11)
    Taking 14 off (line 6)
    Frame 5 - 9
    number = 7, on lines = [ 7 ]
    number = 16, on lines = [ 9 ]
    number = 20, on lines = [ 10 11 ]
    number = 35, on lines = [ 8 ]

    Adding 13 (line 12)
    Taking 7 off (line 7)
    Frame 6 - 10
    number = 13, on lines = [ 12 ]
    number = 16, on lines = [ 9 ]
    number = 20, on lines = [ 10 11 ]
    number = 35, on lines = [ 8 ]

    Adding 55 (line 13)
    Taking 35 off (line 8)
    Frame 7 - 11
    number = 13, on lines = [ 12 ]
    number = 16, on lines = [ 9 ]
    number = 20, on lines = [ 10 11 ]
    number = 55, on lines = [ 13 ]

    Adding 1 (line 14)
    Taking 16 off (line 9)
    Frame 8 - 12
    number = 1, on lines = [ 14 ]
    number = 13, on lines = [ 12 ]
    number = 20, on lines = [ 10 11 ]
    number = 55, on lines = [ 13 ]

    Adding 1 (line 15)
    Taking 20 off (line 10)
    Frame 9 - 13
    number = 1, on lines = [ 14 15 ]
    number = 13, on lines = [ 12 ]
    number = 20, on lines = [ 11 ]
    number = 55, on lines = [ 13 ]

    Adding 7 (line 16)
    Taking 20 off (line 11)
    Frame 10 - 14
    number = 1, on lines = [ 14 15 ]
    number = 7, on lines = [ 16 ]
    number = 13, on lines = [ 12 ]
    number = 55, on lines = [ 13 ]

    Adding 2 (line 18)
    Taking 13 off (line 12)
    Frame 11 - 15
    number = 1, on lines = [ 14 15 ]
    number = 2, on lines = [ 18 ]
    number = 7, on lines = [ 16 ]
    number = 55, on lines = [ 13 ]

    Adding 3 (line 19)
    Taking 55 off (line 13)
    Frame 12 - 16
    number = 1, on lines = [ 14 15 ]
    number = 2, on lines = [ 18 ]
    number = 3, on lines = [ 19 ]
    number = 7, on lines = [ 16 ]

    Adding 2 (line 21)
    Taking 1 off (line 14)
    Frame 13 - 17
    number = 1, on lines = [ 15 ]
    number = 2, on lines = [ 18 21 ]
    number = 3, on lines = [ 19 ]
    number = 7, on lines = [ 16 ]

    Adding 3 (line 22)
    Taking 1 off (line 15)
    Frame 14 - 18
    number = 2, on lines = [ 18 21 ]
    number = 3, on lines = [ 19 22 ]
    number = 7, on lines = [ 16 ]

    Adding 2 (line 24)
    Taking 7 off (line 16)
    Frame 15 - 19
    number = 2, on lines = [ 18 21 24 ]
    number = 3, on lines = [ 19 22 ]

    Adding 3 (line 25)
    Taking 2 off (line 18)
    Frame 16 - 20
    number = 2, on lines = [ 21 24 ]
    number = 3, on lines = [ 19 22 25 ]

    Adding 2 (line 27)
    Taking 3 off (line 19)
    Frame 17 - 21
    number = 2, on lines = [ 21 24 27 ]
    number = 3, on lines = [ 22 25 ]

    Adding 3 (line 28)
    Taking 2 off (line 21)
    Frame 18 - 22
    number = 2, on lines = [ 24 27 ]
    number = 3, on lines = [ 22 25 28 ]

    Adding 2 (line 30)
    Taking 3 off (line 22)
    Frame 19 - 23
    number = 2, on lines = [ 24 27 30 ]
    number = 3, on lines = [ 25 28 ]

    Adding 3 (line 31)
    Taking 2 off (line 24)
    Frame 20 - 24
    number = 2, on lines = [ 27 30 ]
    number = 3, on lines = [ 25 28 31 ]

    c:\temp>
    , Feb 16, 2009
    #7
  8. Marek

    Guest

    On Mon, 16 Feb 2009 03:25:00 GMT, wrote:

    >On Sat, 14 Feb 2009 23:51:34 -0800 (PST), Marek <> wrote:
    >
    >>
    >>
    >>Hello all!
    >>
    >>
    >>I am still a beginner, so please be patient with me.
    >>
    >>I have a big file with numbers and dates like follows here:
    >>
    >>
    >>01.01.98
    >>31
    >>33
    >>14
    >>7
    >>35
    >>16
    >>20
    >>20
    >>13
    >>55
    >>1
    >>1
    >>7
    >>
    >>
    >>etc etc
    >>
    >>I need a complicate hash to know the occurrences of numbers in a scope
    >>of 15:
    >>
    >>We skip the dates, and we count the lines. The structure of my %hash
    >>looks like follows:
    >>
    >>($number{$line, $line, ...}) => $how_many_times
    >>
    >>In my example the 20 occurs in line 7 and 8 -> two times:
    >>
    >>20{7,8} => 2
    >>
    >>And we iterate over it, and keep only 15 numbers in the hash and count
    >>each time the occurrences of each number.
    >>
    >>Could somebody help me with this?
    >>
    >>
    >>Thank you in advance
    >>
    >>
    >>marek

    >
    >A rolling Frame that tracks line's of occurances is not as easy as you think.
    >The concept is simple, the implementation is another thing altogether.
    >This would not be a problem to present in a beginner Perl class.
    >Its not actually Perl that would be a problem, its the implemtation of a rolling
    >frame and tracking of line numbers from a given criteria.
    >
    >The below code is just a rudimentary framework to demonstrate the constructs that
    >would be necessary. You might need a hardened programmer with large application
    >experience to deal with rolling frames and data tracking.
    >
    >Could this rough code be thinned out? Sure. It just demonstrates the concept, its
    >not production quality.
    >
    >Btw, the frame size was set to 5 for the example, change it to 15 or whatever it is
    >your doing.
    >
    >Well, good luck and have fun!
    >-sln
    >
    >------------------------
    >
    ># Frames.pl
    ># -------------------------
    ># Template:
    ># We assume a valid frame of 5 (not based on line count) This could be 15 or any number
    ># @Frame_Cache = (number, number, number, ...); ## 5 elements
    ># %Items = (number => [line,line,line], number => [line,line,line],...);
    >
    >

    [snip]
    > # Digits only, anything else is invalid
    > /^\s*(\d+)\s*$/;
    > next if (!$1);

    ^^^^^
    next if (!defined $1)

    Oops. I always say 'check your work'. Gotcha on me!

    -sln
    , Feb 16, 2009
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gogo
    Replies:
    1
    Views:
    2,100
    Sudsy
    Sep 4, 2003
  2. rp
    Replies:
    1
    Views:
    520
    red floyd
    Nov 10, 2011
  3. Bruno Moura
    Replies:
    2
    Views:
    146
    Bruno Moura
    Nov 28, 2009
  4. Glenn Ritz

    Iterate over hash of nested hashes

    Glenn Ritz, Feb 24, 2010, in forum: Ruby
    Replies:
    10
    Views:
    256
    Robert Klemme
    Mar 1, 2010
  5. Safas Khkjh

    Iterate over specific keys in a hash

    Safas Khkjh, Jun 23, 2010, in forum: Ruby
    Replies:
    4
    Views:
    97
    Safas Khkjh
    Jun 23, 2010
Loading...

Share This Page