Sorting

Discussion in 'Perl Misc' started by keith@bytebrothers.co.uk, Jun 7, 2007.

  1. Guest

    Hello,

    I've had a search through CPAN, and have not been able to find an
    answer yet, but I would like to know if there is something like
    File::Sort which will allow me to specify that there is one or more
    header records at the start of the input which should be untouched by
    the sort. Does anyone know of such a module (or an easy way to do
    this using File::Sort!)

    Thx,
    k
     
    , Jun 7, 2007
    #1
    1. Advertising

  2. Guest

    On 7 Jun, 09:44, wrote:
    > File::Sort which will allow me to specify that there is one or more
    > header records at the start of the input which should be untouched by
    > the sort.


    OK, no responses, so I had time to find more research material, which
    led me to this solution. Any advice on ways to tighten this up a tad
    without losing too much readability?

    The data look like this (delimiters line up vertically):
    ==================================
    Licence | Created| Crtd By | Products | Qty | To Loc | Last |
    DZone
    01799|05/06/07| OOS1| NIV0327R| 960| YH3621| |
    BACK
    1|07/06/07| SPODE| STT0014V| 156| SFF15| |
    S
    10106|06/06/07| DALEC| VAN1383T| 0| JLE12| |
    GDSIN1
    1015|29/05/07| OOSOFFC| CIF0012T| 192| XP4417| |
    BACK
    1022|31/05/07| WOODC| DET0065Y| 141| XE4313| |
    BACK
    10222|04/06/07| COLEROB| FLU0473P| 1640| UAB12| SMITHN|
    None
    10319|07/06/07| HALLPHIL| SCH3318Q| 240| MDL22| |
    GDSIN1
    10350|07/06/07| QUINNJ| DOS0030K| 4072| CRH52| |
    GDSIN1
    ==================================

    So, to preserve the header and sort by the 'Products' column:

    ==================================
    #!/usr/local/bin/perl -w
    @lines = ();
    @key = ();
    while (<>)
    {
    $row++;
    if ($row == 1)
    {
    print;
    next;
    }
    chomp;
    push @lines,$_;
    push @key, (split(/\|/))[3];
    }

    @indices = sort {$key[$a] cmp $key[$b]} 0..$#lines;
    foreach $index (@indices)
    {
    print "$lines[$index]\n";
    }
    ==================================
     
    , Jun 7, 2007
    #2
    1. Advertising

  3. wrote:
    > I would like to know if there is something like File::Sort which will
    > allow me to specify that there is one or more header records at the
    > start of the input which should be untouched by the sort.


    my ( @headers, @records );
    while ( <DATA> ) {
    push @headers, $_;
    push @records, <DATA> if /^===/;
    }

    print @headers, sort @records;

    __DATA__
    First header
    Another header
    ============================
    Record B
    Record C
    Record A

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jun 7, 2007
    #3
  4. Paul Lalli Guest

    On Jun 7, 11:27 am, wrote:
    > On 7 Jun, 09:44, wrote:
    >
    > > File::Sort which will allow me to specify that there is one or more
    > > header records at the start of the input which should be untouched by
    > > the sort.

    >
    > OK, no responses, so I had time to find more research material, which
    > led me to this solution. Any advice on ways to tighten this up a tad
    > without losing too much readability?
    >
    > The data look like this (delimiters line up vertically):
    > ==================================
    > Licence | Created| Crtd By | Products | Qty | To Loc | Last |
    > DZone
    > 01799|05/06/07| OOS1| NIV0327R| 960| YH3621| |
    > BACK
    > 1|07/06/07| SPODE| STT0014V| 156| SFF15| |
    > S
    > 10106|06/06/07| DALEC| VAN1383T| 0| JLE12| |
    > GDSIN1
    > 1015|29/05/07| OOSOFFC| CIF0012T| 192| XP4417| |
    > BACK
    > 1022|31/05/07| WOODC| DET0065Y| 141| XE4313| |
    > BACK
    > 10222|04/06/07| COLEROB| FLU0473P| 1640| UAB12| SMITHN|
    > None
    > 10319|07/06/07| HALLPHIL| SCH3318Q| 240| MDL22| |
    > GDSIN1
    > 10350|07/06/07| QUINNJ| DOS0030K| 4072| CRH52| |
    > GDSIN1
    > ==================================
    >
    > So, to preserve the header and sort by the 'Products' column:
    >
    > ==================================
    > #!/usr/local/bin/perl -w


    use strict;

    > @lines = ();
    > @key = ();


    no need to intialize an array to the empty list. That's what it is
    already.

    > while (<>)
    > {
    > $row++;


    This variable already exists for you. It's name is '$.'. No need to
    keep track the line count separately.

    > if ($row == 1)
    > {
    > print;
    > next;
    > }
    > chomp;
    > push @lines,$_;
    > push @key, (split(/\|/))[3];
    >
    > }
    >
    > @indices = sort {$key[$a] cmp $key[$b]} 0..$#lines;
    > foreach $index (@indices)
    > {
    > print "$lines[$index]\n";}


    rather than messing with a bunch of indices, I would prefer a
    Schwartzian transform. The syntax has a bit of a learning curve, but
    once you "get it", it becomes intuitive.

    So my rewrite of your script comes down to:
    #!/opt2/perl/bin/perl
    use strict;
    use warnings;

    my @lines;
    while (<DATA>) {
    print and next if $. == 1;
    push @lines, $_;
    }
    print map { $_->[0] }
    sort { $a->[1] cmp $b->[1] }
    map { [ $_, (split /\|/)[3] ] }
    @lines;
    __DATA__
    Licence | Created| Crtd By | Products | Qty | To Loc | Last | DZone
    01799|05/06/07| OOS1| NIV0327R| 960| YH3621| | BACK
    1|07/06/07| SPODE| STT0014V| 156| SFF15| | S
    10106|06/06/07| DALEC| VAN1383T| 0| JLE12| |
    GDSIN1
    1015|29/05/07| OOSOFFC| CIF0012T| 192| XP4417| | BACK
    1022|31/05/07| WOODC| DET0065Y| 141| XE4313| | BACK
    10222|04/06/07| COLEROB| FLU0473P| 1640| UAB12| SMITHN| None
    10319|07/06/07| HALLPHIL| SCH3318Q| 240| MDL22| |
    GDSIN1
    10350|07/06/07| QUINNJ| DOS0030K| 4072| CRH52| |
    GDSIN1

    Paul Lalli
     
    Paul Lalli, Jun 7, 2007
    #4
  5. Guest

    On 7 Jun, 16:46, Paul Lalli <> wrote:
    > On Jun 7, 11:27 am, wrote:
    > > Any advice on ways to tighten this up a tad without losing too much readability?

    >
    > rather than messing with a bunch of indices, I would prefer a
    > Schwartzian transform. The syntax has a bit of a learning curve, but
    > once you "get it", it becomes intuitive.
    >
    > print map { $_->[0] }
    > sort { $a->[1] cmp $b->[1] }
    > map { [ $_, (split /\|/)[3] ] }
    > @lines;


    Oh, that's sweet! All I need to do now is sit down and work out
    exactly how the feck that works!
     
    , Jun 8, 2007
    #5
  6. Guest

    On 8 Jun, 09:28, wrote:
    > On 7 Jun, 16:46, Paul Lalli <> wrote:
    >
    > > On Jun 7, 11:27 am, wrote:
    > > > Any advice on ways to tighten this up a tad without losing too much readability?

    >
    > > rather than messing with a bunch of indices, I would prefer a
    > > Schwartzian transform. The syntax has a bit of a learning curve, but
    > > once you "get it", it becomes intuitive.

    >
    > > print map { $_->[0] }
    > > sort { $a->[1] cmp $b->[1] }
    > > map { [ $_, (split /\|/)[3] ] }
    > > @lines;

    >
    > Oh, that's sweet! All I need to do now is sit down and work out
    > exactly how the feck that works!


    I've been working through this, and I think I'm getting there, slowly;
    there's something going on here with anonymous list references, for a
    start. But how would I use this paradigm if there was a more
    complicated key? For example, in my original example, if I needed to
    sort by the second column, which contains a date, I would have done
    something like:

    @fields = split(/\|/);
    ($dy,$mn,$yr) = split(/\//,$field[1]);
    push @key, "$yr$mn$dy";
    etc...

    How would this transform approach allow me to do something similar?
     
    , Jun 8, 2007
    #6
  7. Paul Lalli Guest

    On Jun 8, 5:59 am, wrote:
    > > On 7 Jun, 16:46, Paul Lalli <> wrote:


    > > > print map { $_->[0] }
    > > > sort { $a->[1] cmp $b->[1] }
    > > > map { [ $_, (split /\|/)[3] ] }
    > > > @lines;

    >
    > But how would I use this paradigm if there was a more
    > complicated key? For example, in my original example, if I
    > needed to sort by the second column, which contains a date, I
    > would have done something like:
    >
    > @fields = split(/\|/);
    > ($dy,$mn,$yr) = split(/\//,$field[1]);
    > push @key, "$yr$mn$dy";
    > etc...
    >
    > How would this transform approach allow me to do something similar?


    Well, obviously, it's going to be a little messier, but the concept is
    the same;

    print map { $_->[0] }
    sort { $a->[1] cmp $b->[1] }
    map { [
    $_,
    do {
    my ($d,$m,$y) = split '/', (split /\|/)[1];
    "$y$m$d";
    }
    ]
    }
    @lines;


    When trying to decipher a Schwartzian transform, read it backwards.
    1) We start with the array of @lines.
    2) The bottom map transform the array of lines into a list of array
    references. The first element of the array reference is the line
    itself, and the second is the value we want to sort by eventually. In
    this case, that's the "year-month-day" value.
    3) The sort now takes this list of array references, and sorts it by
    the second element of each referenced array. That is, it sorts the
    array references on our sort key.
    4) The top map takes this sorted list of array references and
    transforms it to a new list containing the first element of each
    referenced array - that is, the original line.
    5) print is passed this list of lines.

    It might be helpful if you break it out into it's individual steps.
    In this case, I'll use a generic get_key() to represent obtaining the
    sort key from your line. That's the only part of a Schwartzian
    transform that ever changes. The syntax is always the same for the
    rest of it.

    my @lines_keys = map { [ $_, get_key($_) ] } @lines;
    my @sorted_lines_keys = sort { $a->[1] cmp $b->[1] } @lines_keys;
    my @sorted_lines = map { $_->[0] } @sorted_lines_keys;
    print @sorted_lines;

    Hope that helps,
    Paul Lalli
     
    Paul Lalli, Jun 8, 2007
    #7
  8. Guest

    On 8 Jun, 11:32, Paul Lalli <> wrote:
    > When trying to decipher a Schwartzian transform, read it backwards.
    > 1) We start with the array of @lines.
    > 2) The bottom map transform the array of lines into a list of array
    > references. The first element of the array reference is the line
    > itself, and the second is the value we want to sort by eventually. In
    > this case, that's the "year-month-day" value.
    > 3) The sort now takes this list of array references, and sorts it by
    > the second element of each referenced array. That is, it sorts the
    > array references on our sort key.
    > 4) The top map takes this sorted list of array references and
    > transforms it to a new list containing the first element of each
    > referenced array - that is, the original line.
    > 5) print is passed this list of lines.


    I think I just had a religious experience. That is new and wonderful,
    and thank you for explaining it for me!
     
    , Jun 8, 2007
    #8
  9. Paul Lalli Guest

    On Jun 8, 6:46 am, wrote:
    > On 8 Jun, 11:32, Paul Lalli <> wrote:


    > > [description of Schwartzian Transform]


    > I think I just had a religious experience. That is new and
    > wonderful, and thank you for explaining it for me!


    You're welcome. Glad to help.

    I would be remiss, however, if I didn't point out that Uri has created
    a module which generalizes the creation of a Schwartzian Transform
    sort algorithm (amongst other things). It is available on the CPAN,
    named Sort::Maker. Using that module, the process becomes:

    use Sort::Maker
    my $sorter = make_sorter('ST', string => \&get_key);
    print $sorter->(@lines);

    #get_key simply extracts the key from your data
    #so in the second example, it would be:
    sub get_key {
    my $date = (split /\|/, $_)[1];
    my ($d, $m, $y) = split '/', $date;
    "$y$m$d";
    }
    #in the original, it would be as simple as:
    sub get_key {
    (split /\|/)[3];
    }


    Paul Lalli
     
    Paul Lalli, Jun 8, 2007
    #9
  10. Uri Guttman Guest

    >>>>> "k" == keith <> writes:

    k> I think I just had a religious experience. That is new and wonderful,
    k> and thank you for explaining it for me!

    if you want a module to do all that (and more) for you, check out
    Sort::Maker.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
     
    Uri Guttman, Jun 8, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ashish

    sorting techniques

    Ashish, Nov 15, 2003, in forum: VHDL
    Replies:
    0
    Views:
    550
    Ashish
    Nov 15, 2003
  2. Replies:
    2
    Views:
    1,441
    James Kanze
    Jul 6, 2010
  3. Jason
    Replies:
    0
    Views:
    390
    Jason
    Oct 4, 2006
  4. Tom Kirchner

    sorting by multiple criterias (sub-sorting)

    Tom Kirchner, Oct 11, 2003, in forum: Perl Misc
    Replies:
    3
    Views:
    476
    Michael Budash
    Oct 11, 2003
  5. Íéêüëáïò Êïýñáò

    Sorting a set works, sorting a dictionary fails ?

    Íéêüëáïò Êïýñáò, Jun 10, 2013, in forum: Python
    Replies:
    12
    Views:
    161
    Ulrich Eckhardt
    Jun 10, 2013
Loading...

Share This Page