sort numeric lists

  • Thread starter The King of Pots and Pans
  • Start date
T

The King of Pots and Pans

On disk I have a 2d table of numbers. Something like this:

9 8 6
4 5 6

I read it in so that I have a list of references to lists. In other
words, a list of the rows.

I want to first sort by column 2, then column 3. When I sort by column
2 I get:

4 5 6
9 8 6

As expected because 5 < 8. When I sort by column 3 I get:

9 8 6
4 5 6

This is unexpected. Since 6 = 6, I don't want it to perform the
sort. See how it was already sorted by column 2 when column 3 elements
were equal, but now column 2 is unsorted again.
 
G

Gregory Toomey

The said:
On disk I have a 2d table of numbers. Something like this:

9 8 6
4 5 6

I read it in so that I have a list of references to lists. In other
words, a list of the rows.

I want to first sort by column 2, then column 3. When I sort by column
2 I get:

4 5 6
9 8 6

As expected because 5 < 8. When I sort by column 3 I get:

9 8 6
4 5 6

This is unexpected. Since 6 = 6, I don't want it to perform the
sort.

Then dont.
See how it was already sorted by column 2 when column 3 elements
were equal, but now column 2 is unsorted again.

What are you trying to do? Partial pivoting for Gaussian elimination? Tensor
algebra?

The Perl sort function works on lists.

gtoomey
 
B

Brad Baxter

On disk I have a 2d table of numbers. Something like this:

9 8 6
4 5 6

I read it in so that I have a list of references to lists. In other
words, a list of the rows.

I want to first sort by column 2, then column 3. When I sort by column
2 I get:

4 5 6
9 8 6

As expected because 5 < 8. When I sort by column 3 I get:

9 8 6
4 5 6

This is unexpected. Since 6 = 6, I don't want it to perform the
sort. See how it was already sorted by column 2 when column 3 elements
were equal, but now column 2 is unsorted again.

Typically, a sort algorithm will NOT preserve the original order when
sorting equal values. It's extra work that slows things down. It may be
unexpected, but is common. Often, sort routines offer an option that WILL
preserve original order. Perl's isn't one (that I'm aware--though I may
be mistaken).

I'm being terse in that explanation. Others may elaborate.

Regards,

Brad
 
U

Uri Guttman

BB> Typically, a sort algorithm will NOT preserve the original order
BB> when sorting equal values. It's extra work that slows things
BB> down. It may be unexpected, but is common. Often, sort routines
BB> offer an option that WILL preserve original order. Perl's isn't
BB> one (that I'm aware--though I may be mistaken).

wrong on several counts. perl's current default sort algorithm is stable
(which means it keeps records with the same key in their original order
relative to each other). and it is not extra work to have a stable sort,
it is just a different algorithm. both the old sort (which you can still
get via a pragma) and the new one are O(N log N) on average which means
they are the same over all speed.

anyhow, the OP's problem has nothing to do with stable sorting. his
problem is that he needs to do a multikey sort and didn't know how to
specify that or do it. and of course with no code shown, who could guess
what he really did. i am using PSI::ESP and my knowledge of sorting to
figure it out. his comment on sorting by column 1 and THEN 2 and THEN 3
makes it seem like he called sort 3 separate times which is so wrong.

so if the OP wants to learn about multikey sorting in perl he should
read this:

http://www.sysarch.com/perl/sort_paper.html

and the promised module in that paper (5 years ago) is actually under
full development right now and mostly working. it is the core of a talk
i am giving at yapc::buffalo this june. the module is called Sort::Maker
and this is what the OP would do to properly sort his rows of numbers

use Sort::Maker ;

my $sorter = make_sorter(
plain => 1,
number => '$_->[0]',
number => '$_->[1]',
number => '$_->[2]',
) ;

my @sorted = $sorter->( @unsorted ) ;

i may have an alpha version of this module ready in a few weeks if
anyone is interested in playing with it. the goal is to have it on cpan
before yapc which is in mid-june.

uri
 
J

Jim Cochrane

On disk I have a 2d table of numbers. Something like this:

9 8 6
4 5 6

I read it in so that I have a list of references to lists. In other
words, a list of the rows.

I want to first sort by column 2, then column 3. When I sort by column
2 I get:

4 5 6
9 8 6

As expected because 5 < 8. When I sort by column 3 I get:

9 8 6
4 5 6

This is unexpected. Since 6 = 6, I don't want it to perform the
sort. See how it was already sorted by column 2 when column 3 elements
were equal, but now column 2 is unsorted again.

Why don't you post your code and let people help you fix it? Otherwise,
you may not get much useful help.
 
B

Brad Baxter

BB> Typically, a sort algorithm will NOT preserve the original order
BB> when sorting equal values. It's extra work that slows things
BB> down. It may be unexpected, but is common. Often, sort routines
BB> offer an option that WILL preserve original order. Perl's isn't
BB> one (that I'm aware--though I may be mistaken).

wrong on several counts. perl's current default sort algorithm is stable
(which means it keeps records with the same key in their original order
relative to each other). and it is not extra work to have a stable sort,
it is just a different algorithm. both the old sort (which you can still
get via a pragma) and the new one are O(N log N) on average which means
they are the same over all speed.

I stand corrected. I appreciate it.

Regards,

Brad
 
T

Tassilo v. Parseval

Also sprach Uri Guttman:
and the promised module in that paper (5 years ago) is actually under
full development right now and mostly working. it is the core of a talk
i am giving at yapc::buffalo this june. the module is called Sort::Maker
and this is what the OP would do to properly sort his rows of numbers

use Sort::Maker ;

my $sorter = make_sorter(
plain => 1,
number => '$_->[0]',
number => '$_->[1]',
number => '$_->[2]',
) ;

my @sorted = $sorter->( @unsorted ) ;

i may have an alpha version of this module ready in a few weeks if
anyone is interested in playing with it. the goal is to have it on cpan
before yapc which is in mid-june.

Ah, the mythical sort module you've been telling us so often about is
eventually going to happen! ;-)

Nice! Its interface is quite cool.

Tassilo
 
U

Uri Guttman

TvP> Ah, the mythical sort module you've been telling us so often about is
TvP> eventually going to happen! ;-)

not mythical, but vaporific!

i actually started it way back then but after a viscious and well
deserved code review, i shelved it. and it took this many years before i
realized the major design flaw which was trying to have key code
extraction be described by some sick data structure. now i just make the
coder pass in the code. that removed over half the complexity. and also
i have damian helping me with the api and pod so it is much more fun. :)

TvP> Nice! Its interface is quite cool.

good to hear that. it was influenced (and it influenced) the sort stuff
that damian posted to the perl 6 lang list about 2 months ago.

uri
 
T

The King of Pots and Pans

Okay some of you asked for the code. Here it is. As you can tell I am
new to perl and am trying my best. I appreciate your kind help.

I cannot use any modules that does not come with the standard perl 5.6
install, as this program will be used on a number of different
machines with just the standard install.

--- code ---
#!/usr/bin/perl -w

use strict;

my @rows = ();
my @sorted = ();

open(DATA,"sort.txt") or die "cannot open file\n";

while(<DATA>)
{
my @fields = split /\s+/;
push @rows, \@fields; # list of refs to lists
}

@sorted = sort {$a->[1] <=> $b->[1]} @rows;
@sorted = sort {$a->[2] <=> $b->[2]} @rows;

for my $fields (@sorted)
{
print join " ", @$fields, "\n";
}

close(DATA);
--- code ---

sort.txt:
6 4 9
5 3 9


The first 'sort' command produces this:
5 3 9
6 4 9

This is correct because 3 < 4. The next 'sort' command produces this:

6 4 9
5 3 9

This is undesirable, as 9 = 9 so the order of the rows is desired to
remain the way it was in the previous sort with 3 < 4.

I hope this was more clear than my original post. I apologize for not
being so clear to begin with.

The data represented is just an example. There are actually many
columns and hundreds of rows.
 
J

Joe Smith

The said:
Okay some of you asked for the code. Here it is. As you can tell I am
new to perl and am trying my best. I appreciate your kind help.

@sorted = sort {$a->[1] <=> $b->[1]} @rows;
@sorted = sort {$a->[2] <=> $b->[2]} @rows;

There's your problem.
You need to make a single two-level sort, not two separate sorts.

@sorted = sort {$a->[1] <=> $b->[1] or $a->[2] <=> $b->[2]} @rows;

-Joe
 
G

Gregory Toomey

The said:
Okay some of you asked for the code. Here it is. As you can tell I am
new to perl and am trying my best. I appreciate your kind help.

I cannot use any modules that does not come with the standard perl 5.6
install, as this program will be used on a number of different
machines with just the standard install.

What you should have asked is how to sort on two fields.

The basic algorithm is to compare the two fields, and if they are equal,
compare the second fields. The gives a single boolean result.

You original post looked like weird matrix algebra & confused people.

gtoomey
 
U

Uri Guttman

RL" == René Larsen said:
@sorted = sort {$a->[1] <=> $b->[1]} @rows;
@sorted = sort {$a->[2] <=> $b->[2]} @rows;

RL> This is where you are making mistakes. You sort the data twice *but* you
RL> only keep the last. You probably meant:

RL> @sorted = sort {$a->[1] <=> $b->[1]} @rows;
RL> @sorted = sort {$a->[2] <=> $b->[2]} @sorted;

RL> But normally when sorting by more than one key, one sorts the *last* key
RL> first:

RL> @sorted = sort {$a->[2] <=> $b->[2]} @rows;
RL> @sorted = sort {$a->[1] <=> $b->[1]} @sorted;

RL> Try sorting your data on paper, and see what happens if you sort it from
RL> left to right or right to left.

that will not be any better than his code. 2 separate sorts will not do it.

RL> But, IMHO it would be better to sort it only once:

RL> @sorted = sort {
RL> $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2]
RL> } @rows;

this isn't better, it is correct. see my other post where i divined his
problem without even seeing any code. :)

RL> This cuts down on the comparisons, as you only test keys 2 if keys 1 are
RL> equal, and so on.

that has NOTHING to do with it. multikey sorts are not a speed up but a
CORRECY way to do sorting and subsorting. your first code and the OP's
code are both wrong, not slow.

RL> Anorther thing: Your test data is not good. You should at least use
RL> something like:

RL> 6 4 9
RL> 5 7 9

huh? his data was fine as it showed the problem. and your data isn't
good either as with three keys to sort on you should have even more
rows for testing.

uri
 
A

A. Sinan Unur

RL> But, IMHO it would be better to sort it only once:

RL> @sorted = sort {
RL> $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] || $a->[2] <=>
RL> $b->[2]
RL> } @rows;

this isn't better, it is correct. see my other post where i divined
his problem without even seeing any code. :)

Which actually amazed me ... Your ESP is better than mine :)

I had not realized people actually thought sorting the OP's way was a
reasonable thing to do until I saw my stats students this semester sorting
each column in a data set independently.
 
U

Uri Guttman

JG> In article <[email protected]>, Uri Guttman

RL> But normally when sorting by more than one key, one sorts the *last*RL> @sorted = sort {$a->[2] <=> $b->[2]} @rows;

JG> I am afraid that is incorrect. You can emulate a multi-column sort
JG> by sorting on individual columns in multiple passes provided 1)
JG> you have a stable sort (sequence-preserving) and 2) you sort in
JG> inverse order of priority. Try it and see.

yes, that would work. and in older perls the sort function was not
stable (it is now). so that is not always a viable option. so a multikey
sort is still a better choice IMO.

uri
 
T

The King of Pots and Pans

this isn't better, it is correct. see my other post where i divined
his problem without even seeing any code. :)

I appreciate your insight in this matter. You appear to have extensive
education in this area.

For this problem I can only use the standard modules that come with
Perl 5.6. Is perl 5.6 up to the task? Or can the solution only be had
by downloading your module?
 
U

Uri Guttman

TKoPaP> On Wed, 21 Apr 2004 at 12:48 GMT, Uri Guttman spoke:
TKoPaP> I appreciate your insight in this matter. You appear to have extensive
TKoPaP> education in this area.

TKoPaP> For this problem I can only use the standard modules that come
TKoPaP> with Perl 5.6. Is perl 5.6 up to the task? Or can the solution
TKoPaP> only be had by downloading your module?

you can do a multikey sort with any perl. you can also do the multipass
sort mentioned in this thread. both require you to understand some more
about sorting in general but neither is that complicated. and you have
been given enough code and clues in this thread to solve your
problem. so hack it up in a small script and if you can't get to work,
post that code back here. but you have to do the homework first so study
the examples and ideas posted already.

uri
 
P

Paul Lalli

I am afraid that is incorrect. You can emulate a multi-column sort by
sorting on individual columns in multiple passes provided 1) you have a
stable sort (sequence-preserving) and 2) you sort in inverse order of
priority. Try it and see.

The OP has already stated that he only has Perl 5.6 available to him.
Perl's sort() did not become stable until v5.8.0

Paul Lalli
 
C

christie

Uri Guttman said:
TKoPaP> On Wed, 21 Apr 2004 at 12:48 GMT, Uri Guttman spoke:

TKoPaP> I appreciate your insight in this matter. You appear to have extensive
TKoPaP> education in this area.

TKoPaP> For this problem I can only use the standard modules that come
TKoPaP> with Perl 5.6. Is perl 5.6 up to the task? Or can the solution
TKoPaP> only be had by downloading your module?

you can do a multikey sort with any perl. you can also do the multipass
sort mentioned in this thread. both require you to understand some more
about sorting in general but neither is that complicated. and you have
been given enough code and clues in this thread to solve your
problem. so hack it up in a small script and if you can't get to work,
post that code back here. but you have to do the homework first so study
the examples and ideas posted already.

uri
=============================================================
Guys.... Have you heard simplicity is beauty. Don't make your life
more complited than it is.

Here is a hint
9 2 6
5 1 6

using split to differenciate first and second row.
push 9 2 &6 into arr_tmp1
push 5 1 &6 into arr_tmp2

push arr_tmp1[0] & arr_tmp2[0] into array1
push arr_tmp1[1] & arr_tmp2[1] into array2
push arr_tmp1[2] & arr_tmp2[2] into array3

Just sort the array{1-3} . Sort array{1-3} == sort column{1-3}

To display, well.... you know how to display it.
 
U

Uri Guttman

c> Guys.... Have you heard simplicity is beauty. Don't make your life
c> more complited than it is.

my life is very complite. :)

c> Here is a hint

oh boy! i need one of those!

c> 9 2 6
c> 5 1 6

c> using split to differenciate first and second row.

hmm, the OP already had those in an list of lists. why is split needed?

c> push 9 2 &6 into arr_tmp1
c> push 5 1 &6 into arr_tmp2

and what language is that written in?

c> push arr_tmp1[0] & arr_tmp2[0] into array1
c> push arr_tmp1[1] & arr_tmp2[1] into array2
c> push arr_tmp1[2] & arr_tmp2[2] into array3

and what language is that written in?

that isn't even legal pseudo-code! :)

c> Just sort the array{1-3} . Sort array{1-3} == sort column{1-3}

and how does that sort multiple columns?

and your working code is where?

and how do you extend that beyond 2 rows of 3 numbers each?

what are you talking about?

try a little more complication in your life. being correct is better
than being simple.

uri
 
C

ctcgag

A. Sinan Unur said:
RL> But, IMHO it would be better to sort it only once:

RL> @sorted = sort {
RL> $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] || $a->[2] <=>
RL> $b->[2]
RL> } @rows;

this isn't better, it is correct. see my other post where i divined
his problem without even seeing any code. :)

Which actually amazed me ... Your ESP is better than mine :)

I had not realized people actually thought sorting the OP's way was a
reasonable thing to do until I saw my stats students this semester
sorting each column in a data set independently.

Were they sorting by one column, and then sorting by another?
Or where they sorting one column, then re-opening the original file
and sorting that by another? If the former, than they weren't doing it
the way the OP was originally.

But the stability of Excel's sort, and the use of this stability
to do multi-column sorting through sequential independent sorts, are well
known techniques. I've often thought a good interview question for
positions expected to have proficiency in Excel and data analysis would be
to ask them to sort data principally by column A and by column B within
equal A, without using the one-step multi-column sort feature.

Xho
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,781
Messages
2,569,615
Members
45,295
Latest member
EmilG1510

Latest Threads

Top