I am so lost... sort and writing a shell script in Perl

E

Estella

Hello, I just learnt Perl scripting, and I have been trying to do this
hw assignment, and I got so stuck in sorting a file with the key that
is calculated using the fields in the file. Here is what I have to do:

There is a file that contains county name, population size, water area
(in square miles), land area (in square miles).
Adams County 16428 4.73 1924.96
Asotin County 20551 5.34 635.34
Benton County 142475 57.03 1703.09
Chelan County 66616 72.25 2921.37
Clallam County 64525 930.89 1739.45
Clark County 345238 27.99 628.22
....

So we need to calculate the population density and water percentage,
and then print out the ascending order of the population density, and
also ascending order of the water percentage.

I did something like this, but I couldn't sort the list.
#!/net/local/bin/perl

while (<>) {
my($aa, $bb, $cc, $dd) = /^(\w+.+)\t(\d+)\t(\d+.\d+)\t(\d+.\d+)/ or
(warn "bad format on line $.:$_"), next;

$popden = $bb/$dd;
$waterpec = ($cc/($cc+$dd))*100;

printf("%s %d %.2f%%\n", $aa, $popden, $waterpec);

#open FH, ">> $tmp" or die $!;
}

foreach (sort keys %popden) {
printf("%s %.2f %.2f%%\n", $aa, $popden, $waterpec);
}

I tried to look at a lot of sort examples online, but I am still
lost...is that something wrong with my logic? or I have to do
something more, like writing the list to a file first and then sort it
again..or..I dunno.
Thanks for helping...
 
G

gnari

[assignment]

I am not going to do your assignment, but maybe a few hints
while (<>)
...

printf("%s %d %.2f%%\n", $aa, $popden, $waterpec);

probably you do not want to print out at this stage,
but rather collect the data in some sortable structure,
like an array
push @list,[$aa, $popden, $waterpec];
#open FH, ">> $tmp" or die $!;
it scares me to see this comment inside the loop!

at this stage you need to figure out how to sort @list
by the correct value
....

and then
foreach (@list) {
printf("%s %.2f %.2f%%\n", $_->[0], $_->[1], $_->[2]);
}
 
T

Tore Aursand

There is a file that contains county name, population size, water area
(in square miles), land area (in square miles).

Adams County 16428 4.73 1924.96
Asotin County 20551 5.34 635.34
Benton County 142475 57.03 1703.09
Chelan County 66616 72.25 2921.37
Clallam County 64525 930.89 1739.45
Clark County 345238 27.99 628.22

#!/net/local/bin/perl

Please add these:

use strict;
use warnings;
while (<>) {
my($aa, $bb, $cc, $dd) = /^(\w+.+)\t(\d+)\t(\d+.\d+)\t(\d+.\d+)/ or
(warn "bad format on line $.:$_"), next;

$popden = $bb/$dd;
$waterpec = ($cc/($cc+$dd))*100;

printf("%s %d %.2f%%\n", $aa, $popden, $waterpec);

#open FH, ">> $tmp" or die $!;
}

IMO, better written as:

my %counties = ();
while ( <> ) {
chomp;
if ( /^(.*?)\s+(\d+)\s+(.*?)\s+(.*)$/ ) {
$counties{$1} = {
'population' => $2,
'water' => $3,
'land' => $4,
'pop_density' => $2 / $4,
'water_perc' => ($3 / ($3 + $4)) * 100,
};
}
else {
# Error handling
}
}
foreach (sort keys %popden) {
printf("%s %.2f %.2f%%\n", $aa, $popden, $waterpec);
}

Sorting (the arrays will consist of the hash keys):

my @sorted_pop_density = sort {
$counties{$a}->{'pop_density'} <=> $counties{$b}->{'pop_density'},
} keys %counties;

my @sorted_water_perc = sort {
I tried to look at a lot of sort examples online [...]

The FAQ covers a bit of this subject (ie. how to sort a hash on key and/or
value).

All my code above is untested.


--
Tore Aursand <[email protected]>
"Scientists are complaining that the new "Dinosaur" movie shows
dinosaurs with lemurs, who didn't evolve for another million years.
They're afraid the movie will give kids a mistaken impression. What
about the fact that the dinosaurs are singing and dancing?" (Jay Leno)
 
E

Estella

Christian Winter said:
Estella said:
Hello, I just learnt Perl scripting, and I have been trying to do this
hw assignment, and I got so stuck in sorting a file with the key that
is calculated using the fields in the file. Here is what I have to do:

There is a file that contains county name, population size, water area
(in square miles), land area (in square miles).
Adams County 16428 4.73 1924.96
Asotin County 20551 5.34 635.34
Benton County 142475 57.03 1703.09
Chelan County 66616 72.25 2921.37
Clallam County 64525 930.89 1739.45
Clark County 345238 27.99 628.22
....

So we need to calculate the population density and water percentage,
and then print out the ascending order of the population density, and
also ascending order of the water percentage.

I did something like this, but I couldn't sort the list.

Well, you seem to be a little confused with variable types
in perl. You are treating $popden as a scalar in the first
place, but then you try to access it as a hash.

Maybe you should look into "perldoc perlvar" and "perldoc perlref"
as well as "perldoc perldata" where you find a lot of information
on data types and nested data structures.

For your kind of problem, a good approach will be an array
of hashes, because every entry has more than one value assigned
to it (name, population density and water percentage) and you
need it in a sorted order.

You may, of course, also use a hash of hashes and only sort
it when printing your data, but that would make it even harder
to read and understand (IMHO).
#!/net/local/bin/perl

while (<>) {
my($aa, $bb, $cc, $dd) = /^(\w+.+)\t(\d+)\t(\d+.\d+)\t(\d+.\d+)/ or
(warn "bad format on line $.:$_"), next;

Try using a little more explicit variable names here. Imagine
you access your code after a year without having touched it in
between. You will find it hard to understand what they mean,
and for all usenet folks looking at your example it isn't any better.
$popden = $bb/$dd;
$waterpec = ($cc/($cc+$dd))*100;

printf("%s %d %.2f%%\n", $aa, $popden, $waterpec);

#open FH, ">> $tmp" or die $!;
}

foreach (sort keys %popden) {
printf("%s %.2f %.2f%%\n", $aa, $popden, $waterpec);
}

Of course this won't work, as there isn't anything like a
hash %popden. You should *really* start your scripts by
calling perl with "-w" in the shebang line or "use warnings;",
as well as "use strict;". This would have made this mistake
obvious.

Also your sort call needs a code block that tells it what to
sort after, this way it would just sort lexically on the hash
key itself.
I tried to look at a lot of sort examples online, but I am still
lost...is that something wrong with my logic? or I have to do
something more, like writing the list to a file first and then sort it
again..or..I dunno.

If you have done built your data structures right, a look at
"perldoc -f sort" (actually, you can help on any perl built-in
function by typing "perldoc -f FUNCTIONNAME" on the command line)
should be sufficient.

As you should do your homework yourself, I'm just putting in
the relevant lines like I would write them:

Create an array to hold your entries:
my @countydata;

To capture the needed values:
chomp;
my ($name, $population, $water, $land) = /([^\t]+)/g;
# Match any non-tab group of chars

Create a hash to hold the name and calculation results:
my %tempdata;
$tempdata{"name"} = $name;
$tempdata{"density"} = $population / $land;
$tempdata{"percent"} = $water / ( $land + $water ) * 100;

And add it to your array:
push @countydata, \%tempdata;

Sorting your array will work like this:
@countydata = sort { $a->{"density"} <=> $b->{"density"} ||
$a->{"percent"} <=> $b->{"percent"}
} @countydata;

# Notice the "||" (= OR-Operator), whose right hand side
# will only be interpreted if the left hand evaluates to
# zero, which means equal.

You can now iterate through the sorted array with
foreach my $entry ( @countydata ) {
... process entries...
}

Inside the loop you can access the element's values like
print $entry->{"name"}.": density: ".$entry->{"density"}."\n";

HTH
-Christian

Thank you so much, I got it right now.
 
T

Tsu-na-mi

Since all the replies I see seem overcomplex, if efficient, I will
provide my suggestion. Personally, I value readibility and ease of
maintainance over efficiency, so this should be easy for you to
follow. I'll start by saying the file format is in incredibly bad
form, but I'll show you how I would deal with it.

Since " County "ends the name of every county name, split on it. That
way "Main County" (2 words) and "Prince George County" (3 words) both
work, as does "County west of here County" because there is no leading
" " in front of the forst one. If something is named "X County
County" or something, it will break however. If someone wants to
provide you a weird regular expression to deal with that (find the
last instance of " County " in the string) I will leave that exercise
to them. My solution will work for all reasonably expected values.

==========================================

# filename is first argument passed to script
$filename = shift @ARGV;

open(IN,$filename);
while ($in = <IN>) {
# lose trailing newline
chomp $in;
# split county name and variables
($county,$other) = split(" County ",$in);
($pop,$water,$land) = split(" ",$other);
# assign values to two hashes, keyed by county name
# sprintf limits to n decimal places
$pop_density{$county} = sprintf("%.1f",$pop/$land);
$water_pct{$county} = sprintf(%.2f",$water/$land);
}

# sort in ascending order
foreach $county (sort {$a<=>$b} values %pop_density) {
print "$county County : $pop_density{$county}\n";
}
# sort in descending order
foreach $county (reverse sort {$a<=>$b} values %water_pct) {
print "$county County : $water_pct{$county}\n";
}
# sort by county name
foreach $county (sort keys %water_pct) {
print "$county County : $pop_density{$county} ,
$water_pct{$county}\n";
}

exit;

============================================

It would be better if you used a printf() statement when you printed
them out so you can have nuce columns, etc.
 
U

Uri Guttman

T> Since all the replies I see seem overcomplex, if efficient, I will
T> provide my suggestion. Personally, I value readibility and ease of
T> maintainance over efficiency, so this should be easy for you to
T> follow. I'll start by saying the file format is in incredibly bad
T> form, but I'll show you how I would deal with it.

and i value correctness over readability.

no strict
no warnings

you get no cookie.

T> open(IN,$filename);

always test the result of open.


T> while ($in = <IN>) {

T> $pop_density{$county} = sprintf("%.1f",$pop/$land);
T> $water_pct{$county} = sprintf(%.2f",$water/$land);

ok, you have two hashes keys by county with number values.

T> # sort in ascending order
T> foreach $county (sort {$a<=>$b} values %pop_density) {
T> print "$county County : $pop_density{$county}\n";

hmmm, what does values %hash return? its values, which are
numbers. great. so you loop over them and print out the numbers followed
by the word 'County' and then the pop_density of a county named for a
number.

nice work!

very readable too!

at least mark your post with <untested and broken code>

T> It would be better if you used a printf() statement when you printed
T> them out so you can have nuce columns, etc.

it would have been better if your code was tested and correct.

uri
 
T

Tsu-na-mi

T> $pop_density{$county} = sprintf("%.1f",$pop/$land);
T> $water_pct{$county} = sprintf(%.2f",$water/$land);

ok, you have two hashes keys by county with number values.

T> # sort in ascending order
T> foreach $county (sort {$a<=>$b} values %pop_density) {
T> print "$county County : $pop_density{$county}\n";

hmmm, what does values %hash return? its values, which are
numbers. great. so you loop over them and print out the numbers followed
by the word 'County' and then the pop_density of a county named for a
number.

oops. should be

(sort {$hash{$a}<=>$hash{$b}} keys %hash)

That'll teach me to code off something from memory without actually
thinking about what it's doing. >_<
 
T

Tore Aursand

My solution will work for all reasonably expected values.

Rule #1 in my programming life: Never expect that your application will be
feeded with "reasonable values". Never. Never. Ever!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,439
Members
44,829
Latest member
PIXThurman

Latest Threads

Top