Sorting

K

keith

Hello,

I've had a search through CPAN, and have not been able to find an
answer yet, but I would like to know if there is something like
File::Sort which will allow me to specify that there is one or more
header records at the start of the input which should be untouched by
the sort. Does anyone know of such a module (or an easy way to do
this using File::Sort!)

Thx,
k
 
K

keith

File::Sort which will allow me to specify that there is one or more
header records at the start of the input which should be untouched by
the sort.

OK, no responses, so I had time to find more research material, which
led me to this solution. Any advice on ways to tighten this up a tad
without losing too much readability?

The data look like this (delimiters line up vertically):
==================================
Licence | Created| Crtd By | Products | Qty | To Loc | Last |
DZone
01799|05/06/07| OOS1| NIV0327R| 960| YH3621| |
BACK
1|07/06/07| SPODE| STT0014V| 156| SFF15| |
S
10106|06/06/07| DALEC| VAN1383T| 0| JLE12| |
GDSIN1
1015|29/05/07| OOSOFFC| CIF0012T| 192| XP4417| |
BACK
1022|31/05/07| WOODC| DET0065Y| 141| XE4313| |
BACK
10222|04/06/07| COLEROB| FLU0473P| 1640| UAB12| SMITHN|
None
10319|07/06/07| HALLPHIL| SCH3318Q| 240| MDL22| |
GDSIN1
10350|07/06/07| QUINNJ| DOS0030K| 4072| CRH52| |
GDSIN1
==================================

So, to preserve the header and sort by the 'Products' column:

==================================
#!/usr/local/bin/perl -w
@lines = ();
@key = ();
while (<>)
{
$row++;
if ($row == 1)
{
print;
next;
}
chomp;
push @lines,$_;
push @key, (split(/\|/))[3];
}

@indices = sort {$key[$a] cmp $key[$b]} 0..$#lines;
foreach $index (@indices)
{
print "$lines[$index]\n";
}
==================================
 
G

Gunnar Hjalmarsson

I would like to know if there is something like File::Sort which will
allow me to specify that there is one or more header records at the
start of the input which should be untouched by the sort.

my ( @headers, @records );
while ( <DATA> ) {
push @headers, $_;
push @records, <DATA> if /^===/;
}

print @headers, sort @records;

__DATA__
First header
Another header
============================
Record B
Record C
Record A
 
P

Paul Lalli

OK, no responses, so I had time to find more research material, which
led me to this solution. Any advice on ways to tighten this up a tad
without losing too much readability?

The data look like this (delimiters line up vertically):
==================================
Licence | Created| Crtd By | Products | Qty | To Loc | Last |
DZone
01799|05/06/07| OOS1| NIV0327R| 960| YH3621| |
BACK
1|07/06/07| SPODE| STT0014V| 156| SFF15| |
S
10106|06/06/07| DALEC| VAN1383T| 0| JLE12| |
GDSIN1
1015|29/05/07| OOSOFFC| CIF0012T| 192| XP4417| |
BACK
1022|31/05/07| WOODC| DET0065Y| 141| XE4313| |
BACK
10222|04/06/07| COLEROB| FLU0473P| 1640| UAB12| SMITHN|
None
10319|07/06/07| HALLPHIL| SCH3318Q| 240| MDL22| |
GDSIN1
10350|07/06/07| QUINNJ| DOS0030K| 4072| CRH52| |
GDSIN1
==================================

So, to preserve the header and sort by the 'Products' column:

==================================
#!/usr/local/bin/perl -w

use strict;
@lines = ();
@key = ();

no need to intialize an array to the empty list. That's what it is
already.
while (<>)
{
$row++;

This variable already exists for you. It's name is '$.'. No need to
keep track the line count separately.
if ($row == 1)
{
print;
next;
}
chomp;
push @lines,$_;
push @key, (split(/\|/))[3];

}

@indices = sort {$key[$a] cmp $key[$b]} 0..$#lines;
foreach $index (@indices)
{
print "$lines[$index]\n";}

rather than messing with a bunch of indices, I would prefer a
Schwartzian transform. The syntax has a bit of a learning curve, but
once you "get it", it becomes intuitive.

So my rewrite of your script comes down to:
#!/opt2/perl/bin/perl
use strict;
use warnings;

my @lines;
while (<DATA>) {
print and next if $. == 1;
push @lines, $_;
}
print map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [ $_, (split /\|/)[3] ] }
@lines;
__DATA__
Licence | Created| Crtd By | Products | Qty | To Loc | Last | DZone
01799|05/06/07| OOS1| NIV0327R| 960| YH3621| | BACK
1|07/06/07| SPODE| STT0014V| 156| SFF15| | S
10106|06/06/07| DALEC| VAN1383T| 0| JLE12| |
GDSIN1
1015|29/05/07| OOSOFFC| CIF0012T| 192| XP4417| | BACK
1022|31/05/07| WOODC| DET0065Y| 141| XE4313| | BACK
10222|04/06/07| COLEROB| FLU0473P| 1640| UAB12| SMITHN| None
10319|07/06/07| HALLPHIL| SCH3318Q| 240| MDL22| |
GDSIN1
10350|07/06/07| QUINNJ| DOS0030K| 4072| CRH52| |
GDSIN1

Paul Lalli
 
K

keith

Any advice on ways to tighten this up a tad without losing too much readability?

rather than messing with a bunch of indices, I would prefer a
Schwartzian transform. The syntax has a bit of a learning curve, but
once you "get it", it becomes intuitive.

print map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [ $_, (split /\|/)[3] ] }
@lines;

Oh, that's sweet! All I need to do now is sit down and work out
exactly how the feck that works!
 
K

keith

rather than messing with a bunch of indices, I would prefer a
Schwartzian transform. The syntax has a bit of a learning curve, but
once you "get it", it becomes intuitive.
print map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [ $_, (split /\|/)[3] ] }
@lines;

Oh, that's sweet! All I need to do now is sit down and work out
exactly how the feck that works!

I've been working through this, and I think I'm getting there, slowly;
there's something going on here with anonymous list references, for a
start. But how would I use this paradigm if there was a more
complicated key? For example, in my original example, if I needed to
sort by the second column, which contains a date, I would have done
something like:

@fields = split(/\|/);
($dy,$mn,$yr) = split(/\//,$field[1]);
push @key, "$yr$mn$dy";
etc...

How would this transform approach allow me to do something similar?
 
P

Paul Lalli

print map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [ $_, (split /\|/)[3] ] }
@lines;

But how would I use this paradigm if there was a more
complicated key? For example, in my original example, if I
needed to sort by the second column, which contains a date, I
would have done something like:

@fields = split(/\|/);
($dy,$mn,$yr) = split(/\//,$field[1]);
push @key, "$yr$mn$dy";
etc...

How would this transform approach allow me to do something similar?

Well, obviously, it's going to be a little messier, but the concept is
the same;

print map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [
$_,
do {
my ($d,$m,$y) = split '/', (split /\|/)[1];
"$y$m$d";
}
]
}
@lines;


When trying to decipher a Schwartzian transform, read it backwards.
1) We start with the array of @lines.
2) The bottom map transform the array of lines into a list of array
references. The first element of the array reference is the line
itself, and the second is the value we want to sort by eventually. In
this case, that's the "year-month-day" value.
3) The sort now takes this list of array references, and sorts it by
the second element of each referenced array. That is, it sorts the
array references on our sort key.
4) The top map takes this sorted list of array references and
transforms it to a new list containing the first element of each
referenced array - that is, the original line.
5) print is passed this list of lines.

It might be helpful if you break it out into it's individual steps.
In this case, I'll use a generic get_key() to represent obtaining the
sort key from your line. That's the only part of a Schwartzian
transform that ever changes. The syntax is always the same for the
rest of it.

my @lines_keys = map { [ $_, get_key($_) ] } @lines;
my @sorted_lines_keys = sort { $a->[1] cmp $b->[1] } @lines_keys;
my @sorted_lines = map { $_->[0] } @sorted_lines_keys;
print @sorted_lines;

Hope that helps,
Paul Lalli
 
K

keith

When trying to decipher a Schwartzian transform, read it backwards.
1) We start with the array of @lines.
2) The bottom map transform the array of lines into a list of array
references. The first element of the array reference is the line
itself, and the second is the value we want to sort by eventually. In
this case, that's the "year-month-day" value.
3) The sort now takes this list of array references, and sorts it by
the second element of each referenced array. That is, it sorts the
array references on our sort key.
4) The top map takes this sorted list of array references and
transforms it to a new list containing the first element of each
referenced array - that is, the original line.
5) print is passed this list of lines.

I think I just had a religious experience. That is new and wonderful,
and thank you for explaining it for me!
 
P

Paul Lalli

[description of Schwartzian Transform]
I think I just had a religious experience. That is new and
wonderful, and thank you for explaining it for me!

You're welcome. Glad to help.

I would be remiss, however, if I didn't point out that Uri has created
a module which generalizes the creation of a Schwartzian Transform
sort algorithm (amongst other things). It is available on the CPAN,
named Sort::Maker. Using that module, the process becomes:

use Sort::Maker
my $sorter = make_sorter('ST', string => \&get_key);
print $sorter->(@lines);

#get_key simply extracts the key from your data
#so in the second example, it would be:
sub get_key {
my $date = (split /\|/, $_)[1];
my ($d, $m, $y) = split '/', $date;
"$y$m$d";
}
#in the original, it would be as simple as:
sub get_key {
(split /\|/)[3];
}


Paul Lalli
 
U

Uri Guttman

k> I think I just had a religious experience. That is new and wonderful,
k> and thank you for explaining it for me!

if you want a module to do all that (and more) for you, check out
Sort::Maker.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top