I am so lost... sort and writing a shell script in Perl

Discussion in 'Perl Misc' started by Estella, Apr 28, 2004.

  1. Estella

    Estella Guest

    Hello, I just learnt Perl scripting, and I have been trying to do this
    hw assignment, and I got so stuck in sorting a file with the key that
    is calculated using the fields in the file. Here is what I have to do:

    There is a file that contains county name, population size, water area
    (in square miles), land area (in square miles).
    Adams County 16428 4.73 1924.96
    Asotin County 20551 5.34 635.34
    Benton County 142475 57.03 1703.09
    Chelan County 66616 72.25 2921.37
    Clallam County 64525 930.89 1739.45
    Clark County 345238 27.99 628.22
    ....

    So we need to calculate the population density and water percentage,
    and then print out the ascending order of the population density, and
    also ascending order of the water percentage.

    I did something like this, but I couldn't sort the list.
    #!/net/local/bin/perl

    while (<>) {
    my($aa, $bb, $cc, $dd) = /^(\w+.+)\t(\d+)\t(\d+.\d+)\t(\d+.\d+)/ or
    (warn "bad format on line $.:$_"), next;

    $popden = $bb/$dd;
    $waterpec = ($cc/($cc+$dd))*100;

    printf("%s %d %.2f%%\n", $aa, $popden, $waterpec);

    #open FH, ">> $tmp" or die $!;
    }

    foreach (sort keys %popden) {
    printf("%s %.2f %.2f%%\n", $aa, $popden, $waterpec);
    }

    I tried to look at a lot of sort examples online, but I am still
    lost...is that something wrong with my logic? or I have to do
    something more, like writing the list to a file first and then sort it
    again..or..I dunno.
    Thanks for helping...
     
    Estella, Apr 28, 2004
    #1
    1. Advertising

  2. Estella

    gnari Guest

    "Estella" <> wrote in message
    news:...
    [assignment]

    I am not going to do your assignment, but maybe a few hints

    > while (<>)


    > ...
    >
    > printf("%s %d %.2f%%\n", $aa, $popden, $waterpec);


    probably you do not want to print out at this stage,
    but rather collect the data in some sortable structure,
    like an array
    push @list,[$aa, $popden, $waterpec];

    >
    > #open FH, ">> $tmp" or die $!;

    it scares me to see this comment inside the loop!

    > }
    >


    at this stage you need to figure out how to sort @list
    by the correct value
    ....

    and then
    foreach (@list) {
    printf("%s %.2f %.2f%%\n", $_->[0], $_->[1], $_->[2]);
    }
     
    gnari, Apr 28, 2004
    #2
    1. Advertising

  3. Estella

    Tore Aursand Guest

    On Tue, 27 Apr 2004 21:46:48 -0700, Estella wrote:
    > There is a file that contains county name, population size, water area
    > (in square miles), land area (in square miles).
    >
    > Adams County 16428 4.73 1924.96
    > Asotin County 20551 5.34 635.34
    > Benton County 142475 57.03 1703.09
    > Chelan County 66616 72.25 2921.37
    > Clallam County 64525 930.89 1739.45
    > Clark County 345238 27.99 628.22
    >
    > #!/net/local/bin/perl


    Please add these:

    use strict;
    use warnings;

    > while (<>) {
    > my($aa, $bb, $cc, $dd) = /^(\w+.+)\t(\d+)\t(\d+.\d+)\t(\d+.\d+)/ or
    > (warn "bad format on line $.:$_"), next;
    >
    > $popden = $bb/$dd;
    > $waterpec = ($cc/($cc+$dd))*100;
    >
    > printf("%s %d %.2f%%\n", $aa, $popden, $waterpec);
    >
    > #open FH, ">> $tmp" or die $!;
    > }


    IMO, better written as:

    my %counties = ();
    while ( <> ) {
    chomp;
    if ( /^(.*?)\s+(\d+)\s+(.*?)\s+(.*)$/ ) {
    $counties{$1} = {
    'population' => $2,
    'water' => $3,
    'land' => $4,
    'pop_density' => $2 / $4,
    'water_perc' => ($3 / ($3 + $4)) * 100,
    };
    }
    else {
    # Error handling
    }
    }

    > foreach (sort keys %popden) {
    > printf("%s %.2f %.2f%%\n", $aa, $popden, $waterpec);
    > }


    Sorting (the arrays will consist of the hash keys):

    my @sorted_pop_density = sort {
    $counties{$a}->{'pop_density'} <=> $counties{$b}->{'pop_density'},
    } keys %counties;

    my @sorted_water_perc = sort {
    $counties{$a}->{'water_perc'} <=> $counties{$b}->{'water_perc'},
    } keys %counties;

    > I tried to look at a lot of sort examples online [...]


    The FAQ covers a bit of this subject (ie. how to sort a hash on key and/or
    value).

    All my code above is untested.


    --
    Tore Aursand <>
    "Scientists are complaining that the new "Dinosaur" movie shows
    dinosaurs with lemurs, who didn't evolve for another million years.
    They're afraid the movie will give kids a mistaken impression. What
    about the fact that the dinosaurs are singing and dancing?" (Jay Leno)
     
    Tore Aursand, Apr 28, 2004
    #3
  4. Estella

    Estella Guest

    Christian Winter <> wrote in message news:<408f5ce2$0$26346$-online.net>...
    > Estella schrieb:
    > > Hello, I just learnt Perl scripting, and I have been trying to do this
    > > hw assignment, and I got so stuck in sorting a file with the key that
    > > is calculated using the fields in the file. Here is what I have to do:
    > >
    > > There is a file that contains county name, population size, water area
    > > (in square miles), land area (in square miles).
    > > Adams County 16428 4.73 1924.96
    > > Asotin County 20551 5.34 635.34
    > > Benton County 142475 57.03 1703.09
    > > Chelan County 66616 72.25 2921.37
    > > Clallam County 64525 930.89 1739.45
    > > Clark County 345238 27.99 628.22
    > > ....
    > >
    > > So we need to calculate the population density and water percentage,
    > > and then print out the ascending order of the population density, and
    > > also ascending order of the water percentage.
    > >
    > > I did something like this, but I couldn't sort the list.

    >
    > Well, you seem to be a little confused with variable types
    > in perl. You are treating $popden as a scalar in the first
    > place, but then you try to access it as a hash.
    >
    > Maybe you should look into "perldoc perlvar" and "perldoc perlref"
    > as well as "perldoc perldata" where you find a lot of information
    > on data types and nested data structures.
    >
    > For your kind of problem, a good approach will be an array
    > of hashes, because every entry has more than one value assigned
    > to it (name, population density and water percentage) and you
    > need it in a sorted order.
    >
    > You may, of course, also use a hash of hashes and only sort
    > it when printing your data, but that would make it even harder
    > to read and understand (IMHO).
    >
    > > #!/net/local/bin/perl
    > >
    > > while (<>) {
    > > my($aa, $bb, $cc, $dd) = /^(\w+.+)\t(\d+)\t(\d+.\d+)\t(\d+.\d+)/ or
    > > (warn "bad format on line $.:$_"), next;

    >
    > Try using a little more explicit variable names here. Imagine
    > you access your code after a year without having touched it in
    > between. You will find it hard to understand what they mean,
    > and for all usenet folks looking at your example it isn't any better.
    >
    > >
    > > $popden = $bb/$dd;
    > > $waterpec = ($cc/($cc+$dd))*100;
    > >
    > > printf("%s %d %.2f%%\n", $aa, $popden, $waterpec);
    > >
    > > #open FH, ">> $tmp" or die $!;
    > > }
    > >
    > > foreach (sort keys %popden) {
    > > printf("%s %.2f %.2f%%\n", $aa, $popden, $waterpec);
    > > }

    >
    > Of course this won't work, as there isn't anything like a
    > hash %popden. You should *really* start your scripts by
    > calling perl with "-w" in the shebang line or "use warnings;",
    > as well as "use strict;". This would have made this mistake
    > obvious.
    >
    > Also your sort call needs a code block that tells it what to
    > sort after, this way it would just sort lexically on the hash
    > key itself.
    >
    > > I tried to look at a lot of sort examples online, but I am still
    > > lost...is that something wrong with my logic? or I have to do
    > > something more, like writing the list to a file first and then sort it
    > > again..or..I dunno.

    >
    > If you have done built your data structures right, a look at
    > "perldoc -f sort" (actually, you can help on any perl built-in
    > function by typing "perldoc -f FUNCTIONNAME" on the command line)
    > should be sufficient.
    >
    > As you should do your homework yourself, I'm just putting in
    > the relevant lines like I would write them:
    >
    > Create an array to hold your entries:
    > my @countydata;
    >
    > To capture the needed values:
    > chomp;
    > my ($name, $population, $water, $land) = /([^\t]+)/g;
    > # Match any non-tab group of chars
    >
    > Create a hash to hold the name and calculation results:
    > my %tempdata;
    > $tempdata{"name"} = $name;
    > $tempdata{"density"} = $population / $land;
    > $tempdata{"percent"} = $water / ( $land + $water ) * 100;
    >
    > And add it to your array:
    > push @countydata, \%tempdata;
    >
    > Sorting your array will work like this:
    > @countydata = sort { $a->{"density"} <=> $b->{"density"} ||
    > $a->{"percent"} <=> $b->{"percent"}
    > } @countydata;
    >
    > # Notice the "||" (= OR-Operator), whose right hand side
    > # will only be interpreted if the left hand evaluates to
    > # zero, which means equal.
    >
    > You can now iterate through the sorted array with
    > foreach my $entry ( @countydata ) {
    > ... process entries...
    > }
    >
    > Inside the loop you can access the element's values like
    > print $entry->{"name"}.": density: ".$entry->{"density"}."\n";
    >
    > HTH
    > -Christian


    Thank you so much, I got it right now.
     
    Estella, Apr 28, 2004
    #4
  5. Estella

    Tsu-na-mi Guest

    Since all the replies I see seem overcomplex, if efficient, I will
    provide my suggestion. Personally, I value readibility and ease of
    maintainance over efficiency, so this should be easy for you to
    follow. I'll start by saying the file format is in incredibly bad
    form, but I'll show you how I would deal with it.

    Since " County "ends the name of every county name, split on it. That
    way "Main County" (2 words) and "Prince George County" (3 words) both
    work, as does "County west of here County" because there is no leading
    " " in front of the forst one. If something is named "X County
    County" or something, it will break however. If someone wants to
    provide you a weird regular expression to deal with that (find the
    last instance of " County " in the string) I will leave that exercise
    to them. My solution will work for all reasonably expected values.

    ==========================================

    # filename is first argument passed to script
    $filename = shift @ARGV;

    open(IN,$filename);
    while ($in = <IN>) {
    # lose trailing newline
    chomp $in;
    # split county name and variables
    ($county,$other) = split(" County ",$in);
    ($pop,$water,$land) = split(" ",$other);
    # assign values to two hashes, keyed by county name
    # sprintf limits to n decimal places
    $pop_density{$county} = sprintf("%.1f",$pop/$land);
    $water_pct{$county} = sprintf(%.2f",$water/$land);
    }

    # sort in ascending order
    foreach $county (sort {$a<=>$b} values %pop_density) {
    print "$county County : $pop_density{$county}\n";
    }
    # sort in descending order
    foreach $county (reverse sort {$a<=>$b} values %water_pct) {
    print "$county County : $water_pct{$county}\n";
    }
    # sort by county name
    foreach $county (sort keys %water_pct) {
    print "$county County : $pop_density{$county} ,
    $water_pct{$county}\n";
    }

    exit;

    ============================================

    It would be better if you used a printf() statement when you printed
    them out so you can have nuce columns, etc.

    --
    Dave
     
    Tsu-na-mi, Apr 29, 2004
    #5
  6. Estella

    Uri Guttman Guest

    >>>>> "T" == Tsu-na-mi <> writes:

    T> Since all the replies I see seem overcomplex, if efficient, I will
    T> provide my suggestion. Personally, I value readibility and ease of
    T> maintainance over efficiency, so this should be easy for you to
    T> follow. I'll start by saying the file format is in incredibly bad
    T> form, but I'll show you how I would deal with it.

    and i value correctness over readability.

    no strict
    no warnings

    you get no cookie.

    T> open(IN,$filename);

    always test the result of open.


    T> while ($in = <IN>) {

    T> $pop_density{$county} = sprintf("%.1f",$pop/$land);
    T> $water_pct{$county} = sprintf(%.2f",$water/$land);

    ok, you have two hashes keys by county with number values.

    T> # sort in ascending order
    T> foreach $county (sort {$a<=>$b} values %pop_density) {
    T> print "$county County : $pop_density{$county}\n";

    hmmm, what does values %hash return? its values, which are
    numbers. great. so you loop over them and print out the numbers followed
    by the word 'County' and then the pop_density of a county named for a
    number.

    nice work!

    very readable too!

    at least mark your post with <untested and broken code>

    T> It would be better if you used a printf() statement when you printed
    T> them out so you can have nuce columns, etc.

    it would have been better if your code was tested and correct.

    uri
     
    Uri Guttman, Apr 29, 2004
    #6
  7. Estella

    Tsu-na-mi Guest

    > T> $pop_density{$county} = sprintf("%.1f",$pop/$land);
    > T> $water_pct{$county} = sprintf(%.2f",$water/$land);
    >
    > ok, you have two hashes keys by county with number values.
    >
    > T> # sort in ascending order
    > T> foreach $county (sort {$a<=>$b} values %pop_density) {
    > T> print "$county County : $pop_density{$county}\n";
    >
    > hmmm, what does values %hash return? its values, which are
    > numbers. great. so you loop over them and print out the numbers followed
    > by the word 'County' and then the pop_density of a county named for a
    > number.


    oops. should be

    (sort {$hash{$a}<=>$hash{$b}} keys %hash)

    That'll teach me to code off something from memory without actually
    thinking about what it's doing. >_<
     
    Tsu-na-mi, Apr 30, 2004
    #7
  8. Estella

    Tore Aursand Guest

    On Thu, 29 Apr 2004 14:26:42 -0700, Tsu-na-mi wrote:
    > My solution will work for all reasonably expected values.


    Rule #1 in my programming life: Never expect that your application will be
    feeded with "reasonable values". Never. Never. Ever!


    --
    Tore Aursand <>
    "War is too serious a matter to entrust to military men." (Georges
    Clemenceau)
     
    Tore Aursand, Apr 30, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Christian Heimes
    Replies:
    0
    Views:
    604
    Christian Heimes
    Feb 27, 2008
  2. Gerardo Herzig
    Replies:
    1
    Views:
    1,086
    Philipp Pagel
    Feb 27, 2008
  3. D'Arcy J.M. Cain
    Replies:
    0
    Views:
    868
    D'Arcy J.M. Cain
    Feb 27, 2008
  4. Navin
    Replies:
    1
    Views:
    699
    Ken Schaefer
    Sep 9, 2003
  5. moongeegee

    execute a shell script in a shell script

    moongeegee, Dec 3, 2007, in forum: Perl Misc
    Replies:
    2
    Views:
    252
    Ben Morrow
    Dec 4, 2007
Loading...

Share This Page