How do you sort a 2D array with column headers?

D

Dennis

I have a numerical array consisting of 5000 rows and 30 columns. The first row
consists of 30 ascii column labels for example L1,L2.....L30. I would like to
sort the column with the header L5 in ascending order leaving the header labels
intact on the first row.

I'm familiar with code

@array =sort { $a->[1] <=> $b->[1]} @array;

and I have read the perdoc -f sort.

But the code above doesn't allow me to sort the array by column labels.

How would I do that?


Any help would be appreciated.

Dennis
 
G

Greg Bacon

: I have a numerical array consisting of 5000 rows and 30 columns. The
: first row consists of 30 ascii column labels for example
: L1,L2.....L30. I would like to sort the column with the header L5 in
: ascending order leaving the header labels intact on the first row.

Assuming @array is an array of rows, you could use something similar
to the code below.

[14:53] ant% cat try
#! /usr/local/bin/perl

use warnings;
use strict;

sub find_column_index {
my $a = shift;
my $col = shift;

my $header = $a->[0];
my $colidx = 0;
for (@$header) {
last if $_ eq $col;
++$colidx;
}

$colidx >= @$header ? () : $colidx;
}

sub sort_by_column {
my $m = shift;
my $col = shift;

return unless ref($m) && @$m && $col;

my $colidx = find_column_index $m, $col;
return unless defined $colidx;

@{$m}[1..$#$m] = sort { $a->[$colidx] <=> $b->[$colidx] }
@{$m}[1..$#$m];
}

my @array = (
[qw/ L1 L2 L3 L4 L5 /],
[9, 8, 7, 6, 5],
[1, 2, 3, 4, 5],
[0, 0, 0, 0, 0],
);

sort_by_column \@array, 'L3';

for (@array) {
printf join(" ", ("%5s") x @$_) . "\n", @$_;
}
[14:53] ant% ./try
L1 L2 L3 L4 L5
0 0 0 0 0
1 2 3 4 5
9 8 7 6 5

Hope this helps,
Greg
 
D

Dennis

: I have a numerical array consisting of 5000 rows and 30 columns. The
: first row consists of 30 ascii column labels for example
: L1,L2.....L30. I would like to sort the column with the header L5 in
: ascending order leaving the header labels intact on the first row.

Assuming @array is an array of rows, you could use something similar
to the code below.

Hope this helps,
Greg

Thank you Greg!

A lot of neat code. Some of the perl syntax is new to me but I'll get to work
with my Perl books and learn. Thanks again.

Dennis
 
G

Greg Bacon

: [...]
:
: A lot of neat code. Some of the perl syntax is new to me but I'll get
: to work with my Perl books and learn. Thanks again.

Anything in particular that gave you trouble? This is a discussion
group, after all. :) If you'll permit a guess, reading the perlref,
perllol, and perldsc manpages will help your understanding.

Greg
 
D

Dennis

: [...]
:
: A lot of neat code. Some of the perl syntax is new to me but I'll get
: to work with my Perl books and learn. Thanks again.

Anything in particular that gave you trouble? This is a discussion
group, after all. :) If you'll permit a guess, reading the perlref,
perllol, and perldsc manpages will help your understanding.

Greg,

Well I read your above perl manpages and the subroutine section of "Perl
Cookbook" by Tom Christiansen & N. Torkington.

Below is the code I don't understand:

First in the subroutine sort_by_column

sub sort_by_column {
my $m = shift;
my $col = shift;

return unless ref($m) && @$m && $col;

my $colidx = find_column_index $m, $col;
return unless defined $colidx;

@{$m}[1..$#$m] = sort { $a->[$colidx] <=> $b->[$colidx] }
@{$m}[1..$#$m];
}

sort_by_column \@array, 'L3';

I don't understand the shift operator and how it moves \@array (a reference to
an array) and 'L3' into $m and $col. I know the input to a subroutine are the
elements of @_ but what does shift mean?

The statement return unless ref($m) && @$m && $col; tests to see that the
reference $m and value $col exist but what's @$m mean? An array whose pointer
reference starts at $m?

Also I'm not sure what the expression @{$m}[1..$#$m] means. obviously a
pointer $m to an array but [1..$#$m]? .

Next I don't understand some of the code in the subroutine find_column_index:

sub find_column_index {
my $a = shift;
my $col = shift;

my $header = $a->[0];
my $colidx = 0;
for (@$header) {
last if $_ eq $col;
++$colidx;
}

$colidx >= @$header ? () : $colidx;
}

I take it that "my $header = $a->[0];" means store the pointer reference of the
0'th element into $header? "for (@$header)" means for each element of the input
array do the below? I didn't know "last" would end the loop after the last
statement if the "if" statement was true. Neat. I take it that when you say
"for(@$header)" each element of the array is stored into $_ one by one in the
for loop?

Last what does $colidx >= @$header ? () : $colidx; mean? If the array element
number of 'L3' is greater then or equal to ...then I get lost.

Thanks for your help, I'm learning a lot!

Dennis
 
G

Greg Bacon

: [...]
:
: First in the subroutine sort_by_column
:
: sub sort_by_column {
: my $m = shift;
: my $col = shift;
:
: return unless ref($m) && @$m && $col;
:
: my $colidx = find_column_index $m, $col;
: return unless defined $colidx;
:
: @{$m}[1..$#$m] = sort { $a->[$colidx] <=> $b->[$colidx] }
: @{$m}[1..$#$m];
: }
:
: sort_by_column \@array, 'L3';
:
: I don't understand the shift operator and how it moves \@array (a
: reference to an array) and 'L3' into $m and $col. I know the input to
: a subroutine are the elements of @_ but what does shift mean?

From the perlfunc documentation on the shift operator:

Shifts the first value of the array off and returns it,
shortening the array by 1 and moving everything down. If
there are no elements in the array, returns the undefined
value. If ARRAY is omitted, shifts the "@_" array within
the lexical scope of subroutines . . .

The shifts are plucking off the subroutine's arguments. To see shift
in action, consider the following:

[16:15] ant% cat try
#! /usr/local/bin/perl

$" = "]["; # separator for interpolating arrays

@a = ('apples', 'oranges', 'bananas');
print "[@a]\n";

$first = shift @a;
print "\$first = [$first], \@a = [@a]\n";
[16:15] ant% ./try
[apples][oranges][bananas]
$first = [apples], @a = [oranges][bananas]

: The statement return unless ref($m) && @$m && $col; tests to see that
: the reference $m and value $col exist but what's @$m mean? An array
: whose pointer reference starts at $m?

Yes, but your terminology could stand polishing. (If I seem picky, I'm
only trying to help you learn.) In Perl parlance, we'd say that we're
making sure -- albeit indirectly -- that $m is an array reference, that
$m's thingy (Perl's pedestrian way of saying 'referent', i.e., the array
to which $m refers) has at least one element, and that we have a column
label to look for. See the perlref manpage.

We might have written the following

return unless ref($m) && @$m && $col;

to be more chatty as

unless ($m && ref($m) eq 'ARRAY') {
warn "'$m' is not an array reference";
return;
}

unless (@$m > 0) {
warn "no rows!";
return;
}

if (!defined($col) || $col eq '') {
warn "no column label!";
return;
}

I wrote the check the way I did because sort_by_column operates
in-place, so, at worst, I'd just leave the data alone. One line was
also a little more appealing than twelve. :)

There are also lots of hairy philosophical arguments surrounding this
issue such as "defensive programming is bad style because it hides
bugs", but let's not get into all that.

: Also I'm not sure what the expression @{$m}[1..$#$m] means.
: obviously a pointer $m to an array but [1..$#$m]? .

Remember that Perl doesn't have pointers but references.

Perl's .. operator can produce ranges, e.g.,

% perl -le 'print 0..9'
0123456789

Recall from the perldata manpage that $#ARRAY gives the index of the
last element of @ARRAY. For example

% perl -le '@a = (1..10); print $#a'
9

(I might be setting a bad example. mjd, rightly IMHO, says using
$#ARRAY is a red flag[*]. The usage is correct in this case, but
do what I say, not what I do. :)

[*] http://groups.google.com/[email protected]

The perlref manpage shows how to dereference arrays, and $#$m yields the
index of the last element in $m's thingy. @{$m}[...] takes a slice of
$m's thingy, i.e., a sublist -- see the perldata manpage.

Don't get bogged down in the low-level details. Think about what we're
trying to do: we want to leave the first row alone (the header) and
sort everything else, i.e., all the rows from index 1 up to the last
index in $m's thingy. We're operating in-place, so we put the rows back
where we got them:

@{$m}[1..$#$m] = sort { $a->[$colidx] <=> $b->[$colidx] }
@{$m}[1..$#$m];

: Next I don't understand some of the code in the subroutine
: find_column_index:
:
: sub find_column_index {
: my $a = shift;
: my $col = shift;
:
: my $header = $a->[0];
: my $colidx = 0;
: for (@$header) {
: last if $_ eq $col;
: ++$colidx;
: }
:
: $colidx >= @$header ? () : $colidx;
: }
:
: I take it that "my $header = $a->[0];" means store the pointer
: reference of the 'th element into $header?

Yes, we're storing a copy of a reference to the array of column headers.
I used a separate variable to show the code's intent.

: "for (@$header)" means for
: each element of the input array do the below? I didn't know "last"
: would end the loop after the last statement if the "if" statement was
: true. Neat.

Yes. Perl's last operator is like break in C but cooler.

: I take it that when you say "for(@$header)" each element
: of the array is stored into $_ one by one in the for loop?

Yes. See the perlsyn manpage.

: Last what does $colidx >= @$header ? () : $colidx; mean? If the array
: element number of 'L3' is greater then or equal to ...then I get lost.

That's the ternary operator as in C, sometimes called an "inline if".
See the perlsyn manpage.

That code is checking whether we found a match. If the condition is
true (no match), then $colidx will be at least as large as the number of
elements in @$header, and we return () or nothing. Otherwise (what's
after the colon), we send back the desired header's index.

Hope this helps,
Greg
 
M

Mark Jason Dominus

(I might be setting a bad example. mjd, rightly IMHO, says using
$#ARRAY is a red flag[*]. The usage is correct in this case, but
do what I say, not what I do. :)
[*] http://groups.google.com/[email protected]

In that article, I said I thought I was going to add $#array as a red
flag. I did add it to the class, but I did not accord it 'red flag'
status. A 'red flag' is something that is almost always wrong. After
doing a study, I concluded that although $#array is often wrong, it is
not 'almost always wrong'.

The details of the study are at
http://perl.plover.com/yak/flags/dollar-pound/. Here is the short
version. $#array is commonly used for five things:

1. Generating a list of indices for an array. (Your example above is
one of these; it is @{$m}[1..$#$m].)

2. The upper bound of a C-style 'for' loop, as

for ($i=0; $i <= $#array; $i++) {
do something with $array[$i];
}

3. As a boundary check to see if a value is in the proper index range
for an array. (2) could be considered a special case of this.
Here's an example:

if ($last_item >= $#list) {
$Init_Disp_Limits->();
}

4. To pre-extend an array, as with

$#array = $EXPECTED_NUMBER_OF_ITEMS;

5. To access the last element of an array, as with $last = $array[$#array].

In my judgement, all of the class (2) and (5) uses, and many of the
class (3) uses, would have been better written some other way. For
example, I think the example in (5) is obviously better as $last = $array[-1].

Overall, about 20% of the uses of $#array would have been better off
some other way. Class (1) did not seem to be in this 20%. I don't
know any better way to write

%hash = map { $array[$_] => $_ } 0 .. $#array;

without the $#array, for example.
 
D

Dennis

Hope this helps,
Greg

Thanks Greg for the really great Perl code and explanations on how it works. I
really appreciate the time and effort you put in to teach me and all the others
who are reading these posts. I have really learned a lot...much more than the
Perl books I've been reading.

Thanks again.

Dennis
 
G

Greg Bacon

: (e-mail address removed) (Mark Jason Dominus) wrote:
:
: > The details of the study are at
: > http://perl.plover.com/yak/flags/dollar-pound/. Here is the short
: > version. $#array is commonly used for five things:
: >
: > 1. Generating a list of indices for an array. (Your example above is
: > one of these; it is @{$m}[1..$#$m].)
:
: I wish the ".." operator, when occuring in a slice, were sufficiently
: magical to allow @{$m}[1..-1] to replace the above.

Amen! Almost anything other than big ugly $#{...} dereferences would
be nice.

: > 2. The upper bound of a C-style 'for' loop, as
: >
: > for ($i=0; $i <= $#array; $i++) {
: > do something with $array[$i];
: > }
:
: I use this very frequently when I have parallel arrays. Of course,
: that might not exactly fit in your criteria for inclusion in this
: category. I also use this when I want to change the length of @array
: during the loop.

When I find myself constructing parallel arrays, I almost always merge
them into arrays of either hashes or arrays.

Greg
 
C

ctcgag

Greg Bacon said:
: > 2. The upper bound of a C-style 'for' loop, as
: >
: > for ($i=0; $i <= $#array; $i++) {
: > do something with $array[$i];
: > }
:
: I use this very frequently when I have parallel arrays. Of course,
: that might not exactly fit in your criteria for inclusion in this
: category. I also use this when I want to change the length of @array
: during the loop.

When I find myself constructing parallel arrays, I almost always merge
them into arrays of either hashes or arrays.

I do that sometimes, but I tend to not do it as much as I could for three
reasons. $age[$id] and $sex[$id] take up much, much less room than
$person[$id]{age} and $person[$id]{sex} if there are a lot of entries. The
first way gives me compile time errors if I fat-finger "age" or "sex".
With the first I can easily pass one compartment to general functions
without using map: median(\@age) rather than
median([map $_->{age}, @person]). Of course, the converse is that the
parallel structure makes it harder to pass the whole structure around, but
if I have to do much of that I tend to encompass it into a class anyway.

Xho
 
M

Mark Jason Dominus

I wish the ".." operator, when occuring in a slice, were sufficiently
magical to allow @{$m}[1..-1] to replace the above.

Someone, I think Simon Cozens, submitted a patch to allow

@a[1..]

to do this. But it wasn't accepted; I forget why not.
2. The upper bound of a C-style 'for' loop, as

for ($i=0; $i <= $#array; $i++) {
do something with $array[$i];
}

I use this very frequently when I have parallel arrays. Of course,
that might not exactly fit in your criteria for inclusion in this
category.

It does. Ignoring the fact that parallel arrays are usually a sign of
misdesign in the program, you can write the code above more simply
and efficiently as:


for $i (0 .. $#array) {
do something with $array[$i];
}
I also use this when I want to change the length of @array
during the loop.

In such cases it makes perfect sense.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top