variable/subroutine visibility

J

Jürgen Exner

ela said:
Previously, George has kindly designed and revised codes for "voting" the
most representative candidate from __DATA__ below. His initial design is to
separate subroutines query and ReadData, but as a file of million rows is to
read, I tried to embed subroutine query into ReadData but failed. There are
two problems (relevant lines marked with *************).

1) While $a[0] is printed as '02', I don't know why it can't be passed into
subroutine successfully
2) When the subroutine query is called inside subroutine ReadData, error
message "Can't coerce array into hash" will prompt during execution. In the
debug process, I hardcode it to "01" but in real case I wanna pass a[0] as
argument. The strange point is, if I pass a[0], the error message won't
prompt but still nothing is to print out.

You are missing
use strict; use warnings;
Had you done so then perl would have told you:

Global symbol "$firsttime" requires explicit package name at C:\tmp\t.pl
line 5.
Global symbol "$pre_a" requires explicit package name at C:\tmp\t.pl
line 6.
Global symbol "@a" requires explicit package name at C:\tmp\t.pl line 9.
Global symbol "$thr" requires explicit package name at C:\tmp\t.pl line
9.
Global symbol "$thr" requires explicit package name at C:\tmp\t.pl line
11.
Global symbol "$pre_a" requires explicit package name at C:\tmp\t.pl
line 34.
Global symbol "$pre_a" requires explicit package name at C:\tmp\t.pl
line 35.
Global symbol "$firsttime" requires explicit package name at C:\tmp\t.pl
line 36
..
Global symbol "$thr" requires explicit package name at C:\tmp\t.pl line
37.
Global symbol "$firsttime" requires explicit package name at C:\tmp\t.pl
line 40
..
Execution of C:\tmp\t.pl aborted due to compilation errors.

I would suggest to fix those first and then ask again.
Furthermore the indentation of your code is -shall we say- imaginative,
making it very hard to read your code.
my @col;
my %data;
$firsttime = 1;
$pre_a = "dummy";

ReadData();
$_ = query($a[0],$thr); # *************

However it appears to me that you didn't declared nor defined @a
anywhere on this level. Of course I may be mistaken, it is difficult to
tell what is top level code and what are subs because of poor
indentation.

jue
 
G

George Mpouras

It is not possible to start answer questions before read all the data,
because the last lines may affect the the processed values of the previous
IDs
So you can not combine read/ask at one logical step.

I assume what you want is not to spent for every execution the time to read
and process all input data again and again. There are four answers to this.

1) keep the program alive after reading, looping around the STDIN
2) Keep the program alive implementimg a multithread client/server SOCKET
solution
3) Write the data structure to a permanent file/dir structure at hard disk
4) Load your parsed data to a BerkleyDB

Below you can find an implementation of 1)
---------------------------------------------------------






my @col;
my %data;
ReadData();

select((select(STDOUT),$|=1)[0]);
print <<stop_printing;
Wellcome to LCA shell. Ready to answer questions.
Type "exit" or "quit" any moment to exit this shell.
please type your question at the format.

ID, THESHOLD

to receive your answer.

stop_printing
print 'LCA> ';
while (<STDIN>)
{
chomp;
if (/(?i)(quit|exit)/){print "Exit LCA.\n"; exit 0}
if (/^\s*$/){print 'LCA> ';next}

if ( /^\s*(.*?)\s*,\s*(\d+)\s*/ )
{
if ( exists $data{$1})
{
if (( $2 >= 0 ) && ( $2 <= 100 ))
{
$_ = query('01',10);
print "ANSWER> Field=$_->[0],Value=@{$_->[1]}\n"
}
else
{
print "ERROR> THRESHOLD \"$2\" should be an integer between 0 and 100\n"
}
}
else
{
print "ERROR> The ID \"$1\" does not exist\n";
}
}
else
{
print "ERROR> Bad query format, please use the syntax: ID, THRESHOLD\n"
}

print 'LCA> ';
}






sub ReadData
{
print STDOUT "Please wait, while creating knowledge base.\n";
while (<DATA>) { chomp;
my @a = split /\s*\|\s*/, $_, -1;
if (-1 == $#col){push @col, @a[1..$#a] ;next}
unless (1+$#col==$#a) {warn "Skip line number $. \"$_\" because it have
".(1+$#a)." fields, while it should have ".(1+$#col)."\n";next}
$data{$a[0]}->[0]++;
for(my $i=1;$i<=$#a;$i++){$data{$a[0]}->[1]->[$i-1]->[0]->{$a[$i]}++}}

foreach my $id (keys %data)
{
foreach my $f ( @{$data{$id}->[1]} )
{
foreach my $v ( keys %{$f->[0]} )
{
push @{ $f->[1]->{int 100*( $f->[0]->{$v}/$data{$id}->[0])} }, $v
}

# remove unnecessary structures
$f = $f->[1]
}

# remove unnecessary structures
$data{$id} = $data{$id}->[1]
}

#use Data::Dumper; print Dumper(\%data);exit;
print STDOUT (((scalar keys %data).' entries loaded and
calculated.')."\n\n");
}


sub query
{
for(my $i=$#{$data{$_[0]}}; $i>=0; $i--)
{
foreach my $RANK (sort {$b <=> $a} keys %{$data{$_[0]}->[$i]})
{
return [$col[$i], $data{$_[0]}->[$i]->{$RANK}] if $RANK >= $_[1]
}
}

['',[]]
}


__DATA__
ID|B|C|D|E|F|G|H
01|3|7|9|3|4|2|3
01|3|7|9|3|4|2|2
01|3|7|9|5|8|6|6
01|3|7|9|3|4|2|3
02|4|7|9|3|4|2|1
02|4|7|9|3|4|2|2
02|4|7|9|3|4|2|3
02|4|7|9|3|4|2|3
 
E

ela

Previously, George has kindly designed and revised codes for "voting" the
most representative candidate from __DATA__ below. His initial design is to
separate subroutines query and ReadData, but as a file of million rows is to
read, I tried to embed subroutine query into ReadData but failed. There are
two problems (relevant lines marked with *************).

1) While $a[0] is printed as '02', I don't know why it can't be passed into
subroutine successfully
2) When the subroutine query is called inside subroutine ReadData, error
message "Can't coerce array into hash" will prompt during execution. In the
debug process, I hardcode it to "01" but in real case I wanna pass a[0] as
argument. The strange point is, if I pass a[0], the error message won't
prompt but still nothing is to print out.

my @col;
my %data;
$firsttime = 1;
$pre_a = "dummy";

ReadData();
$_ = query($a[0],$thr); # *************
print "Field=$_->[0],Value=@{$_->[1]}\n";
$_ = query('02',$thr);
print "Field=$_->[0],Value=@{$_->[1]}\n";

sub query
{
for(my $i=$#{$data{$_[0]}}; $i>=0; $i--)
{
#foreach my $RANK (keys %{$data{$_[0]}->[$i]})
foreach my $RANK (sort {$b <=> $a} keys %{$data{$_[0]}->[$i]})

{
return [$col[$i], $data{$_[0]}->[$i]->{$RANK}] if $RANK >= $_[1]
}
}

['',[]]
}
sub ReadData
{
while (<DATA>) { chomp;
my @a = split /\s*\|\s*/, $_, -1;

if (-1 == $#col){push @col, @a[1..$#a] ;next}else {
if ($a[0] ne $pre_a) {
$pre_a = $a[0];
if ($firsttime != 1) {
$_ = query("01",$thr); #
*************
print "Field=$_->[0],Value=@{$_->[1]}\n";
}
$firsttime = 0;
}
}

unless (1+$#col==$#a) {warn "Skip line number $. \"$_\" because it have
".(1+$#a)." fields, while it should have ".(1+$#col)."\n";next}
$data{$a[0]}->[0]++;
for(my $i=1;$i<=$#a;$i++){$data{$a[0]}->[1]->[$i-1]->[0]->{$a[$i]}++}}

foreach my $id (keys %data)
{
foreach my $f ( @{$data{$id}->[1]} )
{
foreach my $v ( keys %{$f->[0]} )
{
push @{ $f->[1]->{int 100*( $f->[0]->{$v}/$data{$id}->[0])} }, $v
}

# remove unnecessary structures
$f = $f->[1]
}

# remove unnecessary structures
$data{$id} = $data{$id}->[1]
}

#use Data::Dumper; print Dumper(\%data);exit;
}

__DATA__
ID|B|C|D|E|F|G|H
01|3|7|9|3|4|2|3
01|3|7|9|3|4|2|2
01|3|7|9|5|8|6|6
01|3|7|9|3|4|2|3
02|4|7|9|3|4|2|1
02|4|7|9|3|4|2|2
02|4|7|9|3|4|2|3
02|4|7|9|3|4|2|3
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,044
Messages
2,570,388
Members
47,052
Latest member
ketan

Latest Threads

Top