How do you sort a 2D array with column headers?

Dennis · Jun 28, 2003

I have a numerical array consisting of 5000 rows and 30 columns. The first row
consists of 30 ascii column labels for example L1,L2.....L30. I would like to
sort the column with the header L5 in ascending order leaving the header labels
intact on the first row.

I'm familiar with code

@array =sort { $a->[1] <=> $b->[1]} @array;

and I have read the perdoc -f sort.

But the code above doesn't allow me to sort the array by column labels.

How would I do that?

Any help would be appreciated.

Dennis

Greg Bacon · Jun 28, 2003

: I have a numerical array consisting of 5000 rows and 30 columns. The
: first row consists of 30 ascii column labels for example
: L1,L2.....L30. I would like to sort the column with the header L5 in
: ascending order leaving the header labels intact on the first row.

Assuming @array is an array of rows, you could use something similar
to the code below.

[14:53] ant% cat try
#! /usr/local/bin/perl

use warnings;
use strict;

sub find_column_index {
my $a = shift;
my $col = shift;

my $header = $a->[0];
my $colidx = 0;
for (@$header) {
last if $_ eq $col;
++$colidx;
}

$colidx >= @$header ? () : $colidx;
}

sub sort_by_column {
my $m = shift;
my $col = shift;

return unless ref($m) && @$m && $col;

my $colidx = find_column_index $m, $col;
return unless defined $colidx;

@{$m}[1..$#$m] = sort { $a->[$colidx] <=> $b->[$colidx] }
@{$m}[1..$#$m];
}

my @array = (
[qw/ L1 L2 L3 L4 L5 /],
[9, 8, 7, 6, 5],
[1, 2, 3, 4, 5],
[0, 0, 0, 0, 0],
);

sort_by_column \@array, 'L3';

for (@array) {
printf join(" ", ("%5s") x @$_) . "\n", @$_;
}
[14:53] ant% ./try
L1 L2 L3 L4 L5
0 0 0 0 0
1 2 3 4 5
9 8 7 6 5

Hope this helps,
Greg

Dennis · Jun 28, 2003

: I have a numerical array consisting of 5000 rows and 30 columns. The
: first row consists of 30 ascii column labels for example
: L1,L2.....L30. I would like to sort the column with the header L5 in
: ascending order leaving the header labels intact on the first row.

Assuming @array is an array of rows, you could use something similar
to the code below.

Hope this helps,
Greg

Thank you Greg!

A lot of neat code. Some of the perl syntax is new to me but I'll get to work
with my Perl books and learn. Thanks again.

Dennis

Greg Bacon · Jun 29, 2003

: [...]
:
: A lot of neat code. Some of the perl syntax is new to me but I'll get
: to work with my Perl books and learn. Thanks again.

Anything in particular that gave you trouble? This is a discussion
group, after all.

If you'll permit a guess, reading the perlref,
perllol, and perldsc manpages will help your understanding.

Greg

Dennis · Jun 29, 2003

: [...]
:
: A lot of neat code. Some of the perl syntax is new to me but I'll get
: to work with my Perl books and learn. Thanks again.

Anything in particular that gave you trouble? This is a discussion
group, after all. If you'll permit a guess, reading the perlref,
perllol, and perldsc manpages will help your understanding.

Greg,

Well I read your above perl manpages and the subroutine section of "Perl
Cookbook" by Tom Christiansen & N. Torkington.

Below is the code I don't understand:

First in the subroutine sort_by_column

sub sort_by_column {
my $m = shift;
my $col = shift;

return unless ref($m) && @$m && $col;

my $colidx = find_column_index $m, $col;
return unless defined $colidx;

@{$m}[1..$#$m] = sort { $a->[$colidx] <=> $b->[$colidx] }
@{$m}[1..$#$m];
}

sort_by_column \@array, 'L3';

I don't understand the shift operator and how it moves \@array (a reference to
an array) and 'L3' into $m and $col. I know the input to a subroutine are the
elements of @_ but what does shift mean?

The statement return unless ref($m) && @$m && $col; tests to see that the
reference $m and value $col exist but what's @$m mean? An array whose pointer
reference starts at $m?

Also I'm not sure what the expression @{$m}[1..$#$m] means. obviously a
pointer $m to an array but [1..$#$m]? .

Next I don't understand some of the code in the subroutine find_column_index:

sub find_column_index {
my $a = shift;
my $col = shift;

my $header = $a->[0];
my $colidx = 0;
for (@$header) {
last if $_ eq $col;
++$colidx;
}

$colidx >= @$header ? () : $colidx;
}

I take it that "my $header = $a->[0];" means store the pointer reference of the
0'th element into $header? "for (@$header)" means for each element of the input
array do the below? I didn't know "last" would end the loop after the last
statement if the "if" statement was true. Neat. I take it that when you say
"for(@$header)" each element of the array is stored into $_ one by one in the
for loop?

Last what does $colidx >= @$header ? () : $colidx; mean? If the array element
number of 'L3' is greater then or equal to ...then I get lost.

Thanks for your help, I'm learning a lot!

Dennis

Greg Bacon · Jun 29, 2003

: [...]
:
: First in the subroutine sort_by_column
:
: sub sort_by_column {
: my $m = shift;
: my $col = shift;
:
: return unless ref($m) && @$m && $col;
:
: my $colidx = find_column_index $m, $col;
: return unless defined $colidx;
:
: @{$m}[1..$#$m] = sort { $a->[$colidx] <=> $b->[$colidx] }
: @{$m}[1..$#$m];
: }
:
: sort_by_column \@array, 'L3';
:
: I don't understand the shift operator and how it moves \@array (a
: reference to an array) and 'L3' into $m and $col. I know the input to
: a subroutine are the elements of @_ but what does shift mean?

From the perlfunc documentation on the shift operator:

Shifts the first value of the array off and returns it,
shortening the array by 1 and moving everything down. If
there are no elements in the array, returns the undefined
value. If ARRAY is omitted, shifts the "@_" array within
the lexical scope of subroutines . . .

The shifts are plucking off the subroutine's arguments. To see shift
in action, consider the following:

[16:15] ant% cat try
#! /usr/local/bin/perl

$" = "]["; # separator for interpolating arrays

@a = ('apples', 'oranges', 'bananas');
print "[@a]\n";

$first = shift @a;
print "\$first = [$first], \@a = [@a]\n";
[16:15] ant% ./try
[apples][oranges][bananas]
$first = [apples], @a = [oranges][bananas]

: The statement return unless ref($m) && @$m && $col; tests to see that
: the reference $m and value $col exist but what's @$m mean? An array
: whose pointer reference starts at $m?

Yes, but your terminology could stand polishing. (If I seem picky, I'm
only trying to help you learn.) In Perl parlance, we'd say that we're
making sure -- albeit indirectly -- that $m is an array reference, that
$m's thingy (Perl's pedestrian way of saying 'referent', i.e., the array
to which $m refers) has at least one element, and that we have a column
label to look for. See the perlref manpage.

We might have written the following

return unless ref($m) && @$m && $col;

to be more chatty as

unless ($m && ref($m) eq 'ARRAY') {
warn "'$m' is not an array reference";
return;
}

unless (@$m > 0) {
warn "no rows!";
return;
}

if (!defined($col) || $col eq '') {
warn "no column label!";
return;
}

I wrote the check the way I did because sort_by_column operates
in-place, so, at worst, I'd just leave the data alone. One line was
also a little more appealing than twelve.

There are also lots of hairy philosophical arguments surrounding this
issue such as "defensive programming is bad style because it hides
bugs", but let's not get into all that.

: Also I'm not sure what the expression @{$m}[1..$#$m] means.
: obviously a pointer $m to an array but [1..$#$m]? .

Remember that Perl doesn't have pointers but references.

Perl's .. operator can produce ranges, e.g.,

% perl -le 'print 0..9'
0123456789

Recall from the perldata manpage that $#ARRAY gives the index of the
last element of @ARRAY. For example

% perl -le '@a = (1..10); print $#a'
9

(I might be setting a bad example. mjd, rightly IMHO, says using
$#ARRAY is a red flag[*]. The usage is correct in this case, but
do what I say, not what I do.

[*] http://groups.google.com/[email protected]

The perlref manpage shows how to dereference arrays, and $#$m yields the
index of the last element in $m's thingy. @{$m}[...] takes a slice of
$m's thingy, i.e., a sublist -- see the perldata manpage.

Don't get bogged down in the low-level details. Think about what we're
trying to do: we want to leave the first row alone (the header) and
sort everything else, i.e., all the rows from index 1 up to the last
index in $m's thingy. We're operating in-place, so we put the rows back
where we got them:

@{$m}[1..$#$m] = sort { $a->[$colidx] <=> $b->[$colidx] }
@{$m}[1..$#$m];

: Next I don't understand some of the code in the subroutine
: find_column_index:
:
: sub find_column_index {
: my $a = shift;
: my $col = shift;
:
: my $header = $a->[0];
: my $colidx = 0;
: for (@$header) {
: last if $_ eq $col;
: ++$colidx;
: }
:
: $colidx >= @$header ? () : $colidx;
: }
:
: I take it that "my $header = $a->[0];" means store the pointer
: reference of the 'th element into $header?

Yes, we're storing a copy of a reference to the array of column headers.
I used a separate variable to show the code's intent.

: "for (@$header)" means for
: each element of the input array do the below? I didn't know "last"
: would end the loop after the last statement if the "if" statement was
: true. Neat.

Yes. Perl's last operator is like break in C but cooler.

: I take it that when you say "for(@$header)" each element
: of the array is stored into $_ one by one in the for loop?

Yes. See the perlsyn manpage.

: Last what does $colidx >= @$header ? () : $colidx; mean? If the array
: element number of 'L3' is greater then or equal to ...then I get lost.

That's the ternary operator as in C, sometimes called an "inline if".
See the perlsyn manpage.

That code is checking whether we found a match. If the condition is
true (no match), then $colidx will be at least as large as the number of
elements in @$header, and we return () or nothing. Otherwise (what's
after the colon), we send back the desired header's index.

Hope this helps,
Greg

Mark Jason Dominus · Jun 30, 2003

(I might be setting a bad example. mjd, rightly IMHO, says using
$#ARRAY is a red flag[*]. The usage is correct in this case, but
do what I say, not what I do.
[*] http://groups.google.com/[email protected]

In that article, I said I thought I was going to add $#array as a red
flag. I did add it to the class, but I did not accord it 'red flag'
status. A 'red flag' is something that is almost always wrong. After
doing a study, I concluded that although $#array is often wrong, it is
not 'almost always wrong'.

The details of the study are at
http://perl.plover.com/yak/flags/dollar-pound/. Here is the short
version. $#array is commonly used for five things:

1. Generating a list of indices for an array. (Your example above is
one of these; it is @{$m}[1..$#$m].)

2. The upper bound of a C-style 'for' loop, as

for ($i=0; $i <= $#array; $i++) {
do something with $array[$i];
}

3. As a boundary check to see if a value is in the proper index range
for an array. (2) could be considered a special case of this.
Here's an example:

if ($last_item >= $#list) {
$Init_Disp_Limits->();
}

4. To pre-extend an array, as with

$#array = $EXPECTED_NUMBER_OF_ITEMS;

5. To access the last element of an array, as with $last = $array[$#array].

In my judgement, all of the class (2) and (5) uses, and many of the
class (3) uses, would have been better written some other way. For
example, I think the example in (5) is obviously better as $last = $array[-1].

Overall, about 20% of the uses of $#array would have been better off
some other way. Class (1) did not seem to be in this 20%. I don't
know any better way to write

%hash = map { $array[$_] => $_ } 0 .. $#array;

without the $#array, for example.

Dennis · Jun 30, 2003

Hope this helps,
Greg

Thanks Greg for the really great Perl code and explanations on how it works. I
really appreciate the time and effort you put in to teach me and all the others
who are reading these posts. I have really learned a lot...much more than the
Perl books I've been reading.

Thanks again.

Dennis

Greg Bacon · Jun 30, 2003

: (e-mail address removed) (Mark Jason Dominus) wrote:
:
: > The details of the study are at
: > http://perl.plover.com/yak/flags/dollar-pound/. Here is the short
: > version. $#array is commonly used for five things:
: >
: > 1. Generating a list of indices for an array. (Your example above is
: > one of these; it is @{$m}[1..$#$m].)
:
: I wish the ".." operator, when occuring in a slice, were sufficiently
: magical to allow @{$m}[1..-1] to replace the above.

Amen! Almost anything other than big ugly $#{...} dereferences would
be nice.

: > 2. The upper bound of a C-style 'for' loop, as
: >
: > for ($i=0; $i <= $#array; $i++) {
: > do something with $array[$i];
: > }
:
: I use this very frequently when I have parallel arrays. Of course,
: that might not exactly fit in your criteria for inclusion in this
: category. I also use this when I want to change the length of @array
: during the loop.

When I find myself constructing parallel arrays, I almost always merge
them into arrays of either hashes or arrays.

Greg

ctcgag · Jul 1, 2003

Greg Bacon said:
: > 2. The upper bound of a C-style 'for' loop, as
: >
: > for ($i=0; $i <= $#array; $i++) {
: > do something with $array[$i];
: > }
:
: I use this very frequently when I have parallel arrays. Of course,
: that might not exactly fit in your criteria for inclusion in this
: category. I also use this when I want to change the length of @array
: during the loop.

When I find myself constructing parallel arrays, I almost always merge
them into arrays of either hashes or arrays.

I do that sometimes, but I tend to not do it as much as I could for three
reasons. $age[$id] and $sex[$id] take up much, much less room than
$person[$id]{age} and $person[$id]{sex} if there are a lot of entries. The
first way gives me compile time errors if I fat-finger "age" or "sex".
With the first I can easily pass one compartment to general functions
without using map: median(\@age) rather than
median([map $_->{age}, @person]). Of course, the converse is that the
parallel structure makes it harder to pass the whole structure around, but
if I have to do much of that I tend to encompass it into a class anyway.

Xho

Mark Jason Dominus · Jul 1, 2003

I wish the ".." operator, when occuring in a slice, were sufficiently
magical to allow @{$m}[1..-1] to replace the above.

Someone, I think Simon Cozens, submitted a patch to allow

@a[1..]

to do this. But it wasn't accepted; I forget why not.

2. The upper bound of a C-style 'for' loop, as

for ($i=0; $i <= $#array; $i++) {
do something with $array[$i];
}

Click to expand...

I use this very frequently when I have parallel arrays. Of course,
that might not exactly fit in your criteria for inclusion in this
category.

It does. Ignoring the fact that parallel arrays are usually a sign of
misdesign in the program, you can write the code above more simply
and efficiently as:

for $i (0 .. $#array) {
do something with $array[$i];
}

I also use this when I want to change the length of @array
during the loop.

In such cases it makes perfect sense.

access a 2D array column by column	7	Nov 25, 2011
FAQ 4.52 How do I sort an array by (anything)?	0	Feb 23, 2011
How to Create a random password generator in a separate window	4	May 26, 2022
how to sort two dimensional array ??	1	Jan 19, 2010
sort method for a 2D object array?	7	Dec 18, 2004
How do you slide a table row/column smoothly?	1	Jan 16, 2008
How do I sort items in a tableview without a column being selected?	0	Jun 14, 2008
How to populate a 2D array data into Excel using WIN32OLE	3	Dec 4, 2006

How do you sort a 2D array with column headers?

Dennis

Greg Bacon

Dennis

Greg Bacon

Dennis

Greg Bacon

Mark Jason Dominus

Dennis

Greg Bacon

ctcgag

Mark Jason Dominus

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads