restrict a hash to 15 pairs and iterate over it

M

Marek

Hello all!


I am still a beginner, so please be patient with me.

I have a big file with numbers and dates like follows here:


01.01.98
31
33
14
7
35
16
20
20
13
55
1
1
7


etc etc

I need a complicate hash to know the occurrences of numbers in a scope
of 15:

We skip the dates, and we count the lines. The structure of my %hash
looks like follows:

($number{$line, $line, ...}) => $how_many_times

In my example the 20 occurs in line 7 and 8 -> two times:

20{7,8} => 2

And we iterate over it, and keep only 15 numbers in the hash and count
each time the occurrences of each number.

Could somebody help me with this?


Thank you in advance


marek
 
M

Marek

What do you mean by 'the occurrences of numbers'? Do you mean the number
of times each number shows up, so for a list like

1 2 1 4 2

you would get the results

1: 2 times
2: 2 times
4: 1 time

yes! :)
If you need a data structure like this (I'm not yet sure whether you do)
you want something more like

my %numbers = (
20 => [7, 8]
);

You don't need to record the count separately: Perl arrays know how long
they are.

Thank you! Good idea!
How do you choose which 15 numbers to keep?
Post the code you've got so far, in as close to working condition as you
can get it (that is, make sure there aren't any syntax errors, or bits
left over from things you've tried before).

Ben

Thank you Ben for your patience!

1. You are asking for a code. But I am ashamed to post it here,
because it is too childish. I have tried with an array of a reference
to a hash.

2. What I need, you guessed already well:

I have a large file with many numbers in it. Each line a number and
sometimes some dates. I read in from the beginning 15 numbers, here
separated with a tab:

1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
^ ^


So the first step would be to read in the numbers until 15th line and
see if there are double numbers or triple numbers. In my example here
there are two 1 and two 2 ... Then I do something with this result and
I read in the next 15 numbers starting with 2 ...


1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
^ ^

next step we read in starting with 3


1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
^ ^

But you gave already a valuable hint: we only need a hash of each

%number = (1 => [1,5]);

And see how long is the anonymous array. No need to add one level more
to each number and counting their occurrences.

Hope this is clearer now? Thank you again!



marek
 
T

Tad J McClellan

Marek said:
so for a list like

1 2 1 4 2

you would get the results

1: 2 times
2: 2 times
4: 1 time

yes! :)

So the first step would be to read in the numbers until 15th line and
see if there are double numbers or triple numbers. In my example here
there are two 1 and two 2 ... Then I do something with this result and
I read in the next 15 numbers starting with 2 ...


1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
^ ^

next step we read in starting with 3


1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
^ ^

But you gave already a valuable hint: we only need a hash of each

%number = (1 => [1,5]);


If you instead use a hash that buffers 15 elements at a time, then
you can generate a hash with the counts directly, as you don't seem
to want to know the actual line numbers...

I thinks this does what you are asking for:


-----------------------------
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;

my $size = 5; # 5 instead of 15
my $line = 0; # "line" counter
my %lines; # buffer up $size lines

while ( <DATA> ) {
next if /\./; # skip dates
chomp;
$line++;

$lines{$line} = $_;

if ( keys %lines == $size ) {

print Dumper \%lines; # for debugging

# count what is in the buffer
my %nums;
$nums{$_}++ for values %lines;

# display what is in the (counted) buffer
foreach my $num ( sort { $a <=> $b } keys %nums ) {
printf "%3d: %3d times\n", $num, $nums{$num};
}
print "---------\n";

# maintain buffer size
delete $lines{ $line - $size + 1};
}
}

__DATA__
01.01.98
31
33
01.02.98
01.03.98
14
7
35
16
20
20
13
55
1
1
7
 
X

Xho Jingleheimerschmidt

Marek said:
What do you mean by 'the occurrences of numbers'? Do you mean the number
of times each number shows up, so for a list like

1 2 1 4 2

you would get the results

1: 2 times
2: 2 times
4: 1 time

yes! :)
If you need a data structure like this (I'm not yet sure whether you do)
you want something more like

my %numbers = (
20 => [7, 8]
);

You don't need to record the count separately: Perl arrays know how long
they are.

Thank you! Good idea!

If you need to know both the counts of occurence and the locations, then
this is a good idea. The list of locations automatically includes the
count of occurences. But if *only* need the count, then I wouldn't
store the list of locations as well.

I have a large file with many numbers in it. Each line a number and
sometimes some dates. I read in from the beginning 15 numbers, here
separated with a tab:

1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
^ ^


So the first step would be to read in the numbers until 15th line and
see if there are double numbers or triple numbers. In my example here
there are two 1 and two 2 ... Then I do something with this result and
I read in the next 15 numbers starting with 2 ...


1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
^ ^

So it is a sliding window.

my @window;
my %count;
while (<>) { chomp;
next if looks_like_date_not_number($_);
push @window, $_;
$count{$_}++;
if (@window>15) {
## window is too big, get rid of the first one;
$count{$window[0]}--;
shift @window;
};
if (@window==15) {
# do whatever you want to do with %count
}
};

Xho
 
P

Peter J. Holzer

I have a large file with many numbers in it. Each line a number and
sometimes some dates. I read in from the beginning 15 numbers, here
separated with a tab:

1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
^ ^


So the first step would be to read in the numbers until 15th line and
see if there are double numbers or triple numbers. In my example here
there are two 1 and two 2 ... Then I do something with this result and
I read in the next 15 numbers starting with 2 ...


1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
^ ^

next step we read in starting with 3


1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
^ ^

So you have a sliding window of 15 lines, and you always want to know
how often a number occurs within that window? That is you want an output
similar to this:

line 1 - 15:
1: 2
2: 2
3: 1
4: 1
5: 1
6: 1
7: 1
8: 1
9: 1
10: 1
11: 1
12: 1
13: 1
line 2 - 16:
1: 1
2: 2
3: 1
4: 1
5: 1
6: 1
7: 1
8: 1
9: 1
10: 1
11: 1
12: 1
13: 1
14: 1
line 3 - 17:
1: 1
2: 1
3: 1
4: 1
5: 1
6: 1
7: 1
8: 1
9: 1
10: 1
11: 1
12: 1
13: 1
14: 1
15: 1

One way to achieve this is to keep the window in an array. Add new lines
with push, and remove old lines with shift. For each line added,
increment the count of the corresponding number. For each line removed,
decrement it. The counts can be kept in an array or a hash.

hp
 
M

Marek

Wow! I am impressed! I would never have found these solutions on
myself!

Special thanx to Tad; so short and elegant! I love these lines:

if ( keys %lines == $size )

and

$nums{$_}++ for values %lines;

and this one is my favourite :)

delete $lines{ $line - $size + 1};

Also Xho's suggestion is really tricky! Thank you!

Good evening to all!


marek
 
S

sln

Hello all!


I am still a beginner, so please be patient with me.

I have a big file with numbers and dates like follows here:


01.01.98
31
33
14
7
35
16
20
20
13
55
1
1
7


etc etc

I need a complicate hash to know the occurrences of numbers in a scope
of 15:

We skip the dates, and we count the lines. The structure of my %hash
looks like follows:

($number{$line, $line, ...}) => $how_many_times

In my example the 20 occurs in line 7 and 8 -> two times:

20{7,8} => 2

And we iterate over it, and keep only 15 numbers in the hash and count
each time the occurrences of each number.

Could somebody help me with this?


Thank you in advance


marek

A rolling Frame that tracks line's of occurances is not as easy as you think.
The concept is simple, the implementation is another thing altogether.
This would not be a problem to present in a beginner Perl class.
Its not actually Perl that would be a problem, its the implemtation of a rolling
frame and tracking of line numbers from a given criteria.

The below code is just a rudimentary framework to demonstrate the constructs that
would be necessary. You might need a hardened programmer with large application
experience to deal with rolling frames and data tracking.

Could this rough code be thinned out? Sure. It just demonstrates the concept, its
not production quality.

Btw, the frame size was set to 5 for the example, change it to 15 or whatever it is
your doing.

Well, good luck and have fun!
-sln

------------------------

# Frames.pl
# -------------------------
# Template:
# We assume a valid frame of 5 (not based on line count) This could be 15 or any number
# @Frame_Cache = (number, number, number, ...); ## 5 elements
# %Items = (number => [line,line,line], number => [line,line,line],...);


use strict;
use warnings;


my @Frame_Cache = ();
my %Items = ();
my ($cache_size, $lncount, $framesize) = (0, 0, 5);

while (<DATA>)
{
++$lncount;

# Digits only, anything else is invalid
/^\s*(\d+)\s*$/;
next if (!$1);

# Add item to frame cache
push @Frame_Cache, $1;

# Add line number onto item array stack (in hash)
push @{$Items{$1}}, $lncount;

print "\nAdding $1 (line $lncount)\n";

# Continue until full frame
++$cache_size;
next if ($cache_size < $framesize);

# First full frame, the roll starts on next one
# Show Frame, do something with %Items
if ($cache_size == $framesize)
{
PrintItems();
next;
}

# Frame is moving, take head off cache
my $item_number = shift @Frame_Cache;

# Adjust lines going out of frame (all array's in hash).
# Delete the item number line array if it is empty.

print "Taking $item_number off (line ".${$Items{$item_number}}[0].")\n";

my $line_going_out_of_frame = ${$Items{$item_number}}[0];
for my $nbr (keys %Items)
{
shift @{$Items{$nbr}} if (${$Items{$nbr}}[0] <= $line_going_out_of_frame);
delete $Items{$nbr} if (!@{$Items{$nbr}});
}

# Show Frame, do something with %Items
PrintItems();
}

# You could print items down here if there is no full frame
# ...

# end of program ...


# This prints the items hash (could use Data::Dumper), but more importantly
# gives a template to access the data.
# When your through with debug printing, just comment the print part out.
# Process the data here, refactor this sub when done.
# No sub should access global data imho.
# -----------------
sub PrintItems
{
print "Frame ".($cache_size-$framesize+1)." - $cache_size\n";
for my $nbr (sort {$a<=>$b} keys %Items) {
print "number = $nbr, on lines = [ @{$Items{$nbr}} ]\n";
}
}

__DATA__

01.01.98
99
31
33
14
7
35
16
20
20
13
55
1
1
7
0
2
3
0
2
3
0
2
3
0
2
3
0
2
3
0


---------------
Output:


c:\temp>perl frames.pl

Adding 99 (line 3)

Adding 31 (line 4)

Adding 33 (line 5)

Adding 14 (line 6)

Adding 7 (line 7)
Frame 1 - 5
number = 7, on lines = [ 7 ]
number = 14, on lines = [ 6 ]
number = 31, on lines = [ 4 ]
number = 33, on lines = [ 5 ]
number = 99, on lines = [ 3 ]

Adding 35 (line 8)
Taking 99 off (line 3)
Frame 2 - 6
number = 7, on lines = [ 7 ]
number = 14, on lines = [ 6 ]
number = 31, on lines = [ 4 ]
number = 33, on lines = [ 5 ]
number = 35, on lines = [ 8 ]

Adding 16 (line 9)
Taking 31 off (line 4)
Frame 3 - 7
number = 7, on lines = [ 7 ]
number = 14, on lines = [ 6 ]
number = 16, on lines = [ 9 ]
number = 33, on lines = [ 5 ]
number = 35, on lines = [ 8 ]

Adding 20 (line 10)
Taking 33 off (line 5)
Frame 4 - 8
number = 7, on lines = [ 7 ]
number = 14, on lines = [ 6 ]
number = 16, on lines = [ 9 ]
number = 20, on lines = [ 10 ]
number = 35, on lines = [ 8 ]

Adding 20 (line 11)
Taking 14 off (line 6)
Frame 5 - 9
number = 7, on lines = [ 7 ]
number = 16, on lines = [ 9 ]
number = 20, on lines = [ 10 11 ]
number = 35, on lines = [ 8 ]

Adding 13 (line 12)
Taking 7 off (line 7)
Frame 6 - 10
number = 13, on lines = [ 12 ]
number = 16, on lines = [ 9 ]
number = 20, on lines = [ 10 11 ]
number = 35, on lines = [ 8 ]

Adding 55 (line 13)
Taking 35 off (line 8)
Frame 7 - 11
number = 13, on lines = [ 12 ]
number = 16, on lines = [ 9 ]
number = 20, on lines = [ 10 11 ]
number = 55, on lines = [ 13 ]

Adding 1 (line 14)
Taking 16 off (line 9)
Frame 8 - 12
number = 1, on lines = [ 14 ]
number = 13, on lines = [ 12 ]
number = 20, on lines = [ 10 11 ]
number = 55, on lines = [ 13 ]

Adding 1 (line 15)
Taking 20 off (line 10)
Frame 9 - 13
number = 1, on lines = [ 14 15 ]
number = 13, on lines = [ 12 ]
number = 20, on lines = [ 11 ]
number = 55, on lines = [ 13 ]

Adding 7 (line 16)
Taking 20 off (line 11)
Frame 10 - 14
number = 1, on lines = [ 14 15 ]
number = 7, on lines = [ 16 ]
number = 13, on lines = [ 12 ]
number = 55, on lines = [ 13 ]

Adding 2 (line 18)
Taking 13 off (line 12)
Frame 11 - 15
number = 1, on lines = [ 14 15 ]
number = 2, on lines = [ 18 ]
number = 7, on lines = [ 16 ]
number = 55, on lines = [ 13 ]

Adding 3 (line 19)
Taking 55 off (line 13)
Frame 12 - 16
number = 1, on lines = [ 14 15 ]
number = 2, on lines = [ 18 ]
number = 3, on lines = [ 19 ]
number = 7, on lines = [ 16 ]

Adding 2 (line 21)
Taking 1 off (line 14)
Frame 13 - 17
number = 1, on lines = [ 15 ]
number = 2, on lines = [ 18 21 ]
number = 3, on lines = [ 19 ]
number = 7, on lines = [ 16 ]

Adding 3 (line 22)
Taking 1 off (line 15)
Frame 14 - 18
number = 2, on lines = [ 18 21 ]
number = 3, on lines = [ 19 22 ]
number = 7, on lines = [ 16 ]

Adding 2 (line 24)
Taking 7 off (line 16)
Frame 15 - 19
number = 2, on lines = [ 18 21 24 ]
number = 3, on lines = [ 19 22 ]

Adding 3 (line 25)
Taking 2 off (line 18)
Frame 16 - 20
number = 2, on lines = [ 21 24 ]
number = 3, on lines = [ 19 22 25 ]

Adding 2 (line 27)
Taking 3 off (line 19)
Frame 17 - 21
number = 2, on lines = [ 21 24 27 ]
number = 3, on lines = [ 22 25 ]

Adding 3 (line 28)
Taking 2 off (line 21)
Frame 18 - 22
number = 2, on lines = [ 24 27 ]
number = 3, on lines = [ 22 25 28 ]

Adding 2 (line 30)
Taking 3 off (line 22)
Frame 19 - 23
number = 2, on lines = [ 24 27 30 ]
number = 3, on lines = [ 25 28 ]

Adding 3 (line 31)
Taking 2 off (line 24)
Frame 20 - 24
number = 2, on lines = [ 27 30 ]
number = 3, on lines = [ 25 28 31 ]

c:\temp>
 
S

sln

Hello all!


I am still a beginner, so please be patient with me.

I have a big file with numbers and dates like follows here:


01.01.98
31
33
14
7
35
16
20
20
13
55
1
1
7


etc etc

I need a complicate hash to know the occurrences of numbers in a scope
of 15:

We skip the dates, and we count the lines. The structure of my %hash
looks like follows:

($number{$line, $line, ...}) => $how_many_times

In my example the 20 occurs in line 7 and 8 -> two times:

20{7,8} => 2

And we iterate over it, and keep only 15 numbers in the hash and count
each time the occurrences of each number.

Could somebody help me with this?


Thank you in advance


marek

A rolling Frame that tracks line's of occurances is not as easy as you think.
The concept is simple, the implementation is another thing altogether.
This would not be a problem to present in a beginner Perl class.
Its not actually Perl that would be a problem, its the implemtation of a rolling
frame and tracking of line numbers from a given criteria.

The below code is just a rudimentary framework to demonstrate the constructs that
would be necessary. You might need a hardened programmer with large application
experience to deal with rolling frames and data tracking.

Could this rough code be thinned out? Sure. It just demonstrates the concept, its
not production quality.

Btw, the frame size was set to 5 for the example, change it to 15 or whatever it is
your doing.

Well, good luck and have fun!
-sln

------------------------

# Frames.pl
# -------------------------
# Template:
# We assume a valid frame of 5 (not based on line count) This could be 15 or any number
# @Frame_Cache = (number, number, number, ...); ## 5 elements
# %Items = (number => [line,line,line], number => [line,line,line],...);

[snip]
# Digits only, anything else is invalid
/^\s*(\d+)\s*$/;
next if (!$1);
^^^^^
next if (!defined $1)

Oops. I always say 'check your work'. Gotcha on me!

-sln
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top