Sorting

keith · Jun 7, 2007

Hello,

I've had a search through CPAN, and have not been able to find an
answer yet, but I would like to know if there is something like
File::Sort which will allow me to specify that there is one or more
header records at the start of the input which should be untouched by
the sort. Does anyone know of such a module (or an easy way to do
this using File::Sort!)

Thx,
k

keith · Jun 7, 2007

File::Sort which will allow me to specify that there is one or more
header records at the start of the input which should be untouched by
the sort.

OK, no responses, so I had time to find more research material, which
led me to this solution. Any advice on ways to tighten this up a tad
without losing too much readability?

The data look like this (delimiters line up vertically):
==================================
Licence | Created| Crtd By | Products | Qty | To Loc | Last |
DZone
01799|05/06/07| OOS1| NIV0327R| 960| YH3621| |
BACK
1|07/06/07| SPODE| STT0014V| 156| SFF15| |
S
10106|06/06/07| DALEC| VAN1383T| 0| JLE12| |
GDSIN1
1015|29/05/07| OOSOFFC| CIF0012T| 192| XP4417| |
BACK
1022|31/05/07| WOODC| DET0065Y| 141| XE4313| |
BACK
10222|04/06/07| COLEROB| FLU0473P| 1640| UAB12| SMITHN|
None
10319|07/06/07| HALLPHIL| SCH3318Q| 240| MDL22| |
GDSIN1
10350|07/06/07| QUINNJ| DOS0030K| 4072| CRH52| |
GDSIN1
==================================

So, to preserve the header and sort by the 'Products' column:

==================================
#!/usr/local/bin/perl -w
@lines = ();
@key = ();
while (<>)
{
$row++;
if ($row == 1)
{
print;
next;
}
chomp;
push @lines,$_;
push @key, (split(/\|/))[3];
}

@indices = sort {$key[$a] cmp $key[$b]} 0..$#lines;
foreach $index (@indices)
{
print "$lines[$index]\n";
}
==================================

Gunnar Hjalmarsson · Jun 7, 2007

I would like to know if there is something like File::Sort which will
allow me to specify that there is one or more header records at the
start of the input which should be untouched by the sort.

my ( @headers, @records );
while ( <DATA> ) {
push @headers, $_;
push @records, <DATA> if /^===/;
}

print @headers, sort @records;

__DATA__
First header
Another header
============================
Record B
Record C
Record A

Paul Lalli · Jun 7, 2007

OK, no responses, so I had time to find more research material, which
led me to this solution. Any advice on ways to tighten this up a tad
without losing too much readability?

The data look like this (delimiters line up vertically):
==================================
Licence | Created| Crtd By | Products | Qty | To Loc | Last |
DZone
01799|05/06/07| OOS1| NIV0327R| 960| YH3621| |
BACK
1|07/06/07| SPODE| STT0014V| 156| SFF15| |
S
10106|06/06/07| DALEC| VAN1383T| 0| JLE12| |
GDSIN1
1015|29/05/07| OOSOFFC| CIF0012T| 192| XP4417| |
BACK
1022|31/05/07| WOODC| DET0065Y| 141| XE4313| |
BACK
10222|04/06/07| COLEROB| FLU0473P| 1640| UAB12| SMITHN|
None
10319|07/06/07| HALLPHIL| SCH3318Q| 240| MDL22| |
GDSIN1
10350|07/06/07| QUINNJ| DOS0030K| 4072| CRH52| |
GDSIN1
==================================

So, to preserve the header and sort by the 'Products' column:

==================================
#!/usr/local/bin/perl -w

use strict;

@lines = ();
@key = ();

no need to intialize an array to the empty list. That's what it is
already.

while (<>)
{
$row++;

This variable already exists for you. It's name is '$.'. No need to
keep track the line count separately.

if ($row == 1)
{
print;
next;
}
chomp;
push @lines,$_;
push @key, (split(/\|/))[3];

}

@indices = sort {$key[$a] cmp $key[$b]} 0..$#lines;
foreach $index (@indices)
{
print "$lines[$index]\n";}

rather than messing with a bunch of indices, I would prefer a
Schwartzian transform. The syntax has a bit of a learning curve, but
once you "get it", it becomes intuitive.

So my rewrite of your script comes down to:
#!/opt2/perl/bin/perl
use strict;
use warnings;

my @lines;
while (<DATA>) {
print and next if $. == 1;
push @lines, $_;
}
print map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [ $_, (split /\|/)[3] ] }
@lines;
__DATA__
Licence | Created| Crtd By | Products | Qty | To Loc | Last | DZone
01799|05/06/07| OOS1| NIV0327R| 960| YH3621| | BACK
1|07/06/07| SPODE| STT0014V| 156| SFF15| | S
10106|06/06/07| DALEC| VAN1383T| 0| JLE12| |
GDSIN1
1015|29/05/07| OOSOFFC| CIF0012T| 192| XP4417| | BACK
1022|31/05/07| WOODC| DET0065Y| 141| XE4313| | BACK
10222|04/06/07| COLEROB| FLU0473P| 1640| UAB12| SMITHN| None
10319|07/06/07| HALLPHIL| SCH3318Q| 240| MDL22| |
GDSIN1
10350|07/06/07| QUINNJ| DOS0030K| 4072| CRH52| |
GDSIN1

Paul Lalli

keith · Jun 8, 2007

Any advice on ways to tighten this up a tad without losing too much readability?

Click to expand...

rather than messing with a bunch of indices, I would prefer a
Schwartzian transform. The syntax has a bit of a learning curve, but
once you "get it", it becomes intuitive.

print map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [ $_, (split /\|/)[3] ] }
@lines;

Oh, that's sweet! All I need to do now is sit down and work out
exactly how the feck that works!

keith · Jun 8, 2007

rather than messing with a bunch of indices, I would prefer a
Schwartzian transform. The syntax has a bit of a learning curve, but
once you "get it", it becomes intuitive.

Click to expand...

print map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [ $_, (split /\|/)[3] ] }
@lines;

Click to expand...

Oh, that's sweet! All I need to do now is sit down and work out
exactly how the feck that works!

I've been working through this, and I think I'm getting there, slowly;
there's something going on here with anonymous list references, for a
start. But how would I use this paradigm if there was a more
complicated key? For example, in my original example, if I needed to
sort by the second column, which contains a date, I would have done
something like:

@fields = split(/\|/);
($dy,$mn,$yr) = split(/\//,$field[1]);
push @key, "$yr$mn$dy";
etc...

How would this transform approach allow me to do something similar?

Paul Lalli · Jun 8, 2007

print map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [ $_, (split /\|/)[3] ] }
@lines;

Click to expand...

Click to expand...

But how would I use this paradigm if there was a more
complicated key? For example, in my original example, if I
needed to sort by the second column, which contains a date, I
would have done something like:

@fields = split(/\|/);
($dy,$mn,$yr) = split(/\//,$field[1]);
push @key, "$yr$mn$dy";
etc...

How would this transform approach allow me to do something similar?

Well, obviously, it's going to be a little messier, but the concept is
the same;

print map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [
$_,
do {
my ($d,$m,$y) = split '/', (split /\|/)[1];
"$y$m$d";
}
]
}
@lines;

When trying to decipher a Schwartzian transform, read it backwards.
1) We start with the array of @lines.
2) The bottom map transform the array of lines into a list of array
references. The first element of the array reference is the line
itself, and the second is the value we want to sort by eventually. In
this case, that's the "year-month-day" value.
3) The sort now takes this list of array references, and sorts it by
the second element of each referenced array. That is, it sorts the
array references on our sort key.
4) The top map takes this sorted list of array references and
transforms it to a new list containing the first element of each
referenced array - that is, the original line.
5) print is passed this list of lines.

It might be helpful if you break it out into it's individual steps.
In this case, I'll use a generic get_key() to represent obtaining the
sort key from your line. That's the only part of a Schwartzian
transform that ever changes. The syntax is always the same for the
rest of it.

my @lines_keys = map { [ $_, get_key($_) ] } @lines;
my @sorted_lines_keys = sort { $a->[1] cmp $b->[1] } @lines_keys;
my @sorted_lines = map { $_->[0] } @sorted_lines_keys;
print @sorted_lines;

Hope that helps,
Paul Lalli

keith · Jun 8, 2007

When trying to decipher a Schwartzian transform, read it backwards.
1) We start with the array of @lines.
2) The bottom map transform the array of lines into a list of array
references. The first element of the array reference is the line
itself, and the second is the value we want to sort by eventually. In
this case, that's the "year-month-day" value.
3) The sort now takes this list of array references, and sorts it by
the second element of each referenced array. That is, it sorts the
array references on our sort key.
4) The top map takes this sorted list of array references and
transforms it to a new list containing the first element of each
referenced array - that is, the original line.
5) print is passed this list of lines.

I think I just had a religious experience. That is new and wonderful,
and thank you for explaining it for me!

Paul Lalli · Jun 8, 2007

[description of Schwartzian Transform]

Click to expand...

I think I just had a religious experience. That is new and
wonderful, and thank you for explaining it for me!

You're welcome. Glad to help.

I would be remiss, however, if I didn't point out that Uri has created
a module which generalizes the creation of a Schwartzian Transform
sort algorithm (amongst other things). It is available on the CPAN,
named Sort::Maker. Using that module, the process becomes:

use Sort::Maker
my $sorter = make_sorter('ST', string => \&get_key);
print $sorter->(@lines);

#get_key simply extracts the key from your data
#so in the second example, it would be:
sub get_key {
my $date = (split /\|/, $_)[1];
my ($d, $m, $y) = split '/', $date;
"$y$m$d";
}
#in the original, it would be as simple as:
sub get_key {
(split /\|/)[3];
}

Paul Lalli

Uri Guttman · Jun 8, 2007

k> I think I just had a religious experience. That is new and wonderful,
k> and thank you for explaining it for me!

if you want a module to do all that (and more) for you, check out
Sort::Maker.

uri

How to sort a CSV file with merge sort JAVA	7	May 6, 2021
Parallel sorting algorithms...	0	Sep 7, 2012
Beginner's Guide to getting CipherSweet working with PDO and MYSQL	1	Dec 1, 2022
Paralle sorting algorithms...	0	Sep 7, 2012
Sorting EBCDIC	4	Dec 15, 2006
ChatBot	4	Jan 19, 2021
Sorting list alphabetically	0	Apr 27, 2014
Sorting a hierarchical table (SQL)	0	Jan 30, 2013

Sorting

keith

keith

Gunnar Hjalmarsson

Paul Lalli

keith

keith

Paul Lalli

keith

Paul Lalli

Uri Guttman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads