sort_by is not stable ?

Michel Demazure · Nov 7, 2010

sort_by is not a stable sorting method (ruby 1.9.2 p0)

I am doing word statistics in a text file. I have a long array of
entries [word, number], sorted by alphabetic order on word. When I sort
on number, entries with the same number are shuffled.

_md

Ammar Ali · Nov 7, 2010

sort_by =C2=A0is not a stable sorting method (ruby 1.9.2 p0)

I am doing word statistics in a text file. I have a long array of
entries [word, number], sorted by alphabetic order on word. When I sort
on number, entries with the same number are shuffled.

Without code, I have to guess. If I understood your issue correctly,
then you need to specify both fields in sort_by block:

a =3D [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b",=

Click to expand...

1]]
=3D> [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b", 1]]

a.sort_by {|e| [e.last, e.first]}

Click to expand...

=3D> [["b", 1], ["e", 2], ["a", 3], ["c", 3], ["d", 3], ["f", 3], ["g", 3]]

Regards,
Ammar

Michel Demazure · Nov 7, 2010

Ammar Ali wrote in post #959889:

If I understood your issue correctly,

Yes, it is exactly that. Your example shows it :

a = [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b",
1]]
b = a.sort_by{ |letter, number| letter}
# = > [["a", 3], ["b", 1], ["c", 3], ["d", 3], ["e", 2], ["f", 3], ["g",
3]]
c = b.sort_by{ |letter, number| number}
# = > [["b", 1], ["e", 2], ["d", 3], ["c", 3], ["a", 3], ["f", 3], ["g",
3]]

Without code, I have to guess

The code for sort_by refers to ruby_qsort

I think a built-in sort method should be stable (or have a stable
variant).
_md

Ammar Ali · Nov 7, 2010

Ammar Ali wrote in post #959889:

If I understood your issue correctly,

Click to expand...

Yes, it is exactly that. Your example shows it :

a = [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b",
1]]
b = a.sort_by{ |letter, number| letter}
# = > [["a", 3], ["b", 1], ["c", 3], ["d", 3], ["e", 2], ["f", 3], ["g",
3]]
c = b.sort_by{ |letter, number| number}
# = > [["b", 1], ["e", 2], ["d", 3], ["c", 3], ["a", 3], ["f", 3], ["g",
3]]

Without code, I have to guess

Click to expand...

The code for sort_by refers to ruby_qsort

I think a built-in sort method should be stable (or have a stable
variant).

Thanks for the clarification, now I understand what you mean. Based on
the results of your example, it appears that ruby's qsort is naive.
AFAIK, the majority of qsort implementations are.

Maybe this is a question/request for the core mailing list.

Regards,
Ammar

Robert Klemme · Nov 7, 2010

Ammar Ali wrote in post #959889:

If I understood your issue correctly,

Click to expand...

Yes, it is exactly that. Your example shows it :

a = [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b",
1]]
b = a.sort_by{ |letter, number| letter}
# => [["a", 3], ["b", 1], ["c", 3], ["d", 3], ["e", 2], ["f", 3], ["g",
3]]
c = b.sort_by{ |letter, number| number}
# => [["b", 1], ["e", 2], ["d", 3], ["c", 3], ["a", 3], ["f", 3], ["g",
3]]

Without code, I have to guess

Click to expand...

The code for sort_by refers to ruby_qsort

I think a built-in sort method should be stable (or have a stable
variant).

Click to expand...

Thanks for the clarification, now I understand what you mean. Based on
the results of your example, it appears that ruby's qsort is naive.
AFAIK, the majority of qsort implementations are.

IIRC quicksort is instable by definition. There may be stable variants
though.

In this case it's not really needed IMHO. You can either do

x.sort_by {|let,dig| [dig, let]}

or, if you don't want to create all those temporary arrays

x.sort {|(la,da),(lb,db)| da == db ? la <=> lb : da <=> db}
x.sort {|(la,da),(lb,db)| c = da <=> db; c == 0 ? la <=> lb : c}

Kind regards

robert

Phillip Gawlowski · Nov 7, 2010

Please, people, trim your quotes.

IIRC quicksort is instable by definition. =A0There may be stable variants
though.

It's only unstable if the implementation is inefficient (says
Wikipedia, so apply your truthiness filters).

However, QSort is (again, so says Wikipedia), only maintains the
relative order of records, if those are equal.
["a",3] and ["b",3] aren't equal as far as a computer is concerned,
resulting in the OP's problem.

--=20
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

Michel Demazure · Nov 7, 2010

Phillip Gawlowski wrote in post #959912:

However, QSort is (again, so says Wikipedia), only maintains the
relative order of records, if those are equal.
["a",3] and ["b",3] aren't equal as far as a computer is concerned,
resulting in the OP's problem.

@Phillip : I am not sure I agree. When sorting on the second half, the
couple (["a", 3], ["b", 3]) falls under the "equal" case of the choice.

Robert Klemme wrote :

In this case it's not really needed IMHO

@Robert : yes in this precise case with two fields, you can go around.
But in general, if you work with many fields and have to sort by one of
them, you should not have to remember the previous sorts and specify all
the unconcerned fields.
_md

Phillip Gawlowski · Nov 7, 2010

Phillip Gawlowski wrote in post #959912:
@Phillip : I am not sure I agree. When sorting on the second half, the
couple (["a", 3], ["b", 3]) falls under the "equal" case of the choice.

"a" != "b" results in true, thus the couplet is *not* equal.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

Robert Klemme · Nov 7, 2010

Phillip Gawlowski wrote in post #959912:
@Phillip : I am not sure I agree. When sorting on the second half, the
couple (["a", 3], ["b", 3]) falls under the "equal" case of the choice.

Click to expand...

"a" != "b" results in true, thus the couplet is *not* equal.

No, Michael is right. Every sorting algorithm only ever looks at *sort
keys*. So since our sort key is the second array member (the number)
both mentioned records are equivalent. If you include the other field a
stable sort isn't really interesting here since then all fields are part
of the sort key - stable and instable algorithms will return the same then.

Kind regards

robert

Michel Demazure · Nov 7, 2010

Robert Klemme wrote in post #959935:

No, Michael is right. Every sorting algorithm only ever looks at *sort
keys*. So since our sort key is the second array member (the number)
both mentioned records are equivalent.

Coming back to the beginning, quicksort correctly coded is stable (cut
recursively in three parts, with the middle part made by those elements
equivalent to the choosen pivot). Even the parallel version is stable.
In NESL :

function quicksort(a) =
if (#a < 2) then a
else
let pivot = a[#a/2];
lesser = {e in a| e < pivot};
equal = {e in a| e == pivot};
greater = {e in a| e > pivot};
result = {quicksort(v): v in [lesser,greater]};
in result[0] ++ equal ++ result[1];

Which means that ruby's is badly coded.
_md

å†·é›¨ · Nov 7, 2010

[Note: parts of this message were removed to make it a legal post.]

Ammar Ali wrote in post #959889:

If I understood your issue correctly,

Yes, it is exactly that. Your example shows it :

a = [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b",
1]]
b = a.sort_by{ |letter, number| letter}
# => [["a", 3], ["b", 1], ["c", 3], ["d", 3], ["e", 2], ["f", 3], ["g",
3]]
c = b.sort_by{ |letter, number| number}
# => [["b", 1], ["e", 2], ["d", 3], ["c", 3], ["a", 3], ["f", 3], ["g",
3]]

Without code, I have to guess

The code for sort_by refers to ruby_qsort

I think a built-in sort method should be stable (or have a stable
variant).

Click to expand...

Thanks for the clarification, now I understand what you mean. Based on
the results of your example, it appears that ruby's qsort is naive.
AFAIK, the majority of qsort implementations are.

Click to expand...

IIRC quicksort is instable by definition. There may be stable variants
though.

In this case it's not really needed IMHO. You can either do

x.sort_by {|let,dig| [dig, let]}

or, if you don't want to create all those temporary arrays

x.sort {|(la,da),(lb,db)| da == db ? la <=> lb : da <=> db}
x.sort {|(la,da),(lb,db)| c = da <=> db; c == 0 ? la <=> lb : c}

Kind regards

robert

From a practical viewpoint, stable sort should be implemented as stable
sort.
Not a multiple keys sort... or something else.

w_a_x_man · Nov 7, 2010

sort_by is not a stable sorting method (ruby 1.9.2 p0)

I am doing word statistics in a text file. I have a long array of
entries [word, number], sorted by alphabetic order on word. When I sort
on number, entries with the same number are shuffled.

I've noticed that it's unstable under 1.8.7:

%w(foo bar car war raw rat cat or bat is).sort.
sort_by{|s| s.size}
==>["is", "or", "bar", "cat", "foo", "car", "bat", "rat", "raw",
"war"]
VERSION
==>"1.8.7"

John W Higgins · Nov 7, 2010

[Note: parts of this message were removed to make it a legal post.]

Good Morning,

Coming back to the beginning, quicksort correctly coded is stable (cut
recursively in three parts, with the middle part made by those elements
equivalent to the choosen pivot).
...

Which means that ruby's is badly coded.

The rest of the sane world begs to differ.

http://en.wikipedia.org/wiki/Sorting_algorithm#Quicksort - "...Efficient
implementations of quicksort (with in-place partitioning) are typically
unstable sorts..."

http://perldoc.perl.org/sort.html - "...Mergesort is stable, quicksort is
not..."

http://www.codecogs.com/reference/c/stdlib.h/qsort.php - "...The algorithms
implemented by *qsort*, *qsort_r* and *heapsort* are not stable..."

Linux Man page for qsort - "....If two members compare as equal, their order
in the sorted array is undefined...."

Python in all fairness says they have stable sorts - but they really make no
mention of the algorithm used so I'm unclear as to whether they are using
quicksort at all.

I'd rather have the efficient implementation that one can easily make stable
by using the secondary comparison (as you can easily do in this case) than
an inefficient version that really accomplishes nothing that the efficient
version cannot.

John

P.S. There is nothing in the world stopping anyone from submitting a patch
to add a stable sort option to Ruby. But to pronounce that the hard work of
others is badly coded when it clearly conforms to the normal standard of
multiple mainstream languages is just plain ignorant at best.

Michel Demazure · Nov 7, 2010

John W Higgins wrote in post #959968:

... just plain ignorant at best.

Message received.
_md

å†·é›¨ · Nov 7, 2010

[Note: parts of this message were removed to make it a legal post.]

In fact, python uses Timsort. Java SE and Android too.
(wikipedia::Timsort <http://en.wikipedia.org/wiki/Timsort>)
It's very efficient, like quicksort.
(btw, quicksort isn't most efficient compare-based sort. see
wikipedia::introsort. <http://en.wikipedia.org/wiki/Introsort>)

w_a_x_man · Nov 7, 2010

I'd rather have the efficient implementation that one can easily make stable
by using the secondary comparison (as you can easily do in this case) than
an inefficient version that really accomplishes nothing that the efficient
version cannot.

I'd rather have a stable and efficient sort. If Python can do it,
then there's no excuse for Ruby.

John W Higgins · Nov 7, 2010

[Note: parts of this message were removed to make it a legal post.]

Good Afternoon

I'd rather have a stable and efficient sort. If Python can do it,
then there's no excuse for Ruby.

First, there are two issues here - one is whether or not to use quicksort as
the default sort and then whether or not there is an expectation as to
whether or not quicksort should be stable.

The issue I took was the statement that Ruby's quicksort was "badly coded".
This is not a correct statement given the many other instances of quicksort
that follow the same principal as Ruby's version in that it's not stable. I
think Ruby is just fine with respect to its quicksort implementation.

As to the concept of whether or not another algorithm should be used - it's
rather interesting to look here -
http://redmine.ruby-lang.org/issues/show/1089 and especially the comments of
Mr. Nutter who while working on JRuby had an apparently clean canvas to work
with and still went with an unstable sort. It's sparse on details and I'm
not trying to put words in his mouth either.

Again, anyone is free to put a patch on the table and really start a good
conversation as to how it would fit in - and quite possibly a stable_sort
method would be an outstanding option here. But there just is a huge
difference between someone coming to the table patch in hand to begin a
conversation - and just saying Ruby is "badly coded". This is a sort
algorithm - it shouldn't be rocket science for someone to create a different
algorithm patch if they wanted to. There are plenty of C reference
implementations available as starting points.

John

Charles Oliver Nutter · Nov 19, 2010

As to the concept of whether or not another algorithm should be used - it's
rather interesting to look here -
http://redmine.ruby-lang.org/issues/show/1089 and especially the comments of
Mr. Nutter who while working on JRuby had an apparently clean canvas to work
with and still went with an unstable sort. It's sparse on details and I'm
not trying to put words in his mouth either.

I can clarify.

JRuby used to just use the JVM-provided mergesort, which was stable
but markedly slower than Ruby 1.8.6/7's unstable quicksort. Because we
got a couple reports about sort being slower, we opted to do the
"wrong thing" and have a contributor implement a hybrid sort that
performed better than Ruby 1.8.6/7 but was unstable.

About a month later, someone raised the issue about MRI not having a
stable sort, and it was decided to make it stable. AARGH. I have not
kept up with discussions, however, and I believe sorts are still
unstable in 1.9.2, correct?

At this point I almost want there to be both options, since our
contributor (qbproger ... sorry man, your name escapes me at the
moment!) went to a lot of effort to get our current sort running well.

In any case, it wasn't an arbitrary decision. We debate almost
endlessly over these sorts of things, and I think I've taken a good
ten years off my life due to the related stress

- Charlie

Michel Demazure · Nov 19, 2010

Charles Nutter wrote in post #962516:

In any case, it wasn't an arbitrary decision. We debate almost
endlessly over these sorts of things, and I think I've taken a good
ten years off my life due to the related stress

Just to come back to my initial post : it was not about sort, but about
sort_by.

For sort_by, you need to compute the keys, and sort the indices
according to the keys. It is not in-place sorting. Yes, optimal in-place
quicksort is unstable. But for sort_by, one could use a stable version.

Am I wrong ?
_md

Charles Oliver Nutter · Dec 5, 2010

Just to come back to my initial post : it was not about sort, but about
sort_by.

For sort_by, you need to compute the keys, and sort the indices
according to the keys. It is not in-place sorting. Yes, optimal in-place
quicksort is unstable. But for sort_by, one could use a stable version.

We don't distinguish between the two in JRuby, but it might be a good
trade-off to say that sort_by is stable and sort does not necessarily
have that guarantee.

- Charlie

stable sort_by?	11	Dec 13, 2005
sort_by: multiple fields with reverse sort	15	Oct 13, 2010
#sort_by and #sort_obj	17	Oct 7, 2007
sort_by and directories	1	Apr 28, 2009
sort_by{rand} doesn't shuffle array	4	Jan 3, 2006
Stable sort?	10	Mar 17, 2005
HCaptcha - How to stop page from refreshing on submit if captcha is not checked/validated	1	Aug 29, 2023
[Q] best practice for redefining methods at runtime?	5	Oct 20, 2010

sort_by is not stable ?

Michel Demazure

Ammar Ali

Michel Demazure

Ammar Ali

Robert Klemme

Phillip Gawlowski

Michel Demazure

Phillip Gawlowski

Robert Klemme

Michel Demazure

å†·é›¨

w_a_x_man

John W Higgins

Michel Demazure

å†·é›¨

w_a_x_man

John W Higgins

Charles Oliver Nutter

Michel Demazure

Charles Oliver Nutter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads