sort_by is not stable ?

  • Thread starter Michel Demazure
  • Start date
M

Michel Demazure

sort_by is not a stable sorting method (ruby 1.9.2 p0)

I am doing word statistics in a text file. I have a long array of
entries [word, number], sorted by alphabetic order on word. When I sort
on number, entries with the same number are shuffled.

_md
 
A

Ammar Ali

sort_by =C2=A0is not a stable sorting method (ruby 1.9.2 p0)

I am doing word statistics in a text file. I have a long array of
entries [word, number], sorted by alphabetic order on word. When I sort
on number, entries with the same number are shuffled.

Without code, I have to guess. If I understood your issue correctly,
then you need to specify both fields in sort_by block:
a =3D [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b",=
1]]
=3D> [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b", 1]]
a.sort_by {|e| [e.last, e.first]}
=3D> [["b", 1], ["e", 2], ["a", 3], ["c", 3], ["d", 3], ["f", 3], ["g", 3]]

Regards,
Ammar
 
M

Michel Demazure

Ammar Ali wrote in post #959889:
If I understood your issue correctly,
Yes, it is exactly that. Your example shows it :

a = [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b",
1]]
b = a.sort_by{ |letter, number| letter}
# = > [["a", 3], ["b", 1], ["c", 3], ["d", 3], ["e", 2], ["f", 3], ["g",
3]]
c = b.sort_by{ |letter, number| number}
# = > [["b", 1], ["e", 2], ["d", 3], ["c", 3], ["a", 3], ["f", 3], ["g",
3]]
Without code, I have to guess
The code for sort_by refers to ruby_qsort

I think a built-in sort method should be stable (or have a stable
variant).
_md
 
A

Ammar Ali

Ammar Ali wrote in post #959889:
If I understood your issue correctly,
Yes, it is exactly that. Your example shows it :

a = [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b",
1]]
b = a.sort_by{ |letter, number| letter}
# = > [["a", 3], ["b", 1], ["c", 3], ["d", 3], ["e", 2], ["f", 3], ["g",
3]]
c = b.sort_by{ |letter, number| number}
# = > [["b", 1], ["e", 2], ["d", 3], ["c", 3], ["a", 3], ["f", 3], ["g",
3]]
Without code, I have to guess
The code for sort_by refers to ruby_qsort

I think a built-in sort method should be stable (or have a stable
variant).

Thanks for the clarification, now I understand what you mean. Based on
the results of your example, it appears that ruby's qsort is naive.
AFAIK, the majority of qsort implementations are.

Maybe this is a question/request for the core mailing list.

Regards,
Ammar
 
R

Robert Klemme

Ammar Ali wrote in post #959889:
If I understood your issue correctly,
Yes, it is exactly that. Your example shows it :

a = [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b",
1]]
b = a.sort_by{ |letter, number| letter}
# => [["a", 3], ["b", 1], ["c", 3], ["d", 3], ["e", 2], ["f", 3], ["g",
3]]
c = b.sort_by{ |letter, number| number}
# => [["b", 1], ["e", 2], ["d", 3], ["c", 3], ["a", 3], ["f", 3], ["g",
3]]
Without code, I have to guess
The code for sort_by refers to ruby_qsort

I think a built-in sort method should be stable (or have a stable
variant).

Thanks for the clarification, now I understand what you mean. Based on
the results of your example, it appears that ruby's qsort is naive.
AFAIK, the majority of qsort implementations are.

IIRC quicksort is instable by definition. There may be stable variants
though.

In this case it's not really needed IMHO. You can either do

x.sort_by {|let,dig| [dig, let]}

or, if you don't want to create all those temporary arrays

x.sort {|(la,da),(lb,db)| da == db ? la <=> lb : da <=> db}
x.sort {|(la,da),(lb,db)| c = da <=> db; c == 0 ? la <=> lb : c}

Kind regards

robert
 
P

Phillip Gawlowski

Please, people, trim your quotes.

IIRC quicksort is instable by definition. =A0There may be stable variants
though.

It's only unstable if the implementation is inefficient (says
Wikipedia, so apply your truthiness filters).

However, QSort is (again, so says Wikipedia), only maintains the
relative order of records, if those are equal.
["a",3] and ["b",3] aren't equal as far as a computer is concerned,
resulting in the OP's problem.

--=20
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
 
M

Michel Demazure

Phillip Gawlowski wrote in post #959912:
However, QSort is (again, so says Wikipedia), only maintains the
relative order of records, if those are equal.
["a",3] and ["b",3] aren't equal as far as a computer is concerned,
resulting in the OP's problem.

@Phillip : I am not sure I agree. When sorting on the second half, the
couple (["a", 3], ["b", 3]) falls under the "equal" case of the choice.

Robert Klemme wrote :
In this case it's not really needed IMHO

@Robert : yes in this precise case with two fields, you can go around.
But in general, if you work with many fields and have to sort by one of
them, you should not have to remember the previous sorts and specify all
the unconcerned fields.
_md
 
P

Phillip Gawlowski

Phillip Gawlowski wrote in post #959912:
@Phillip : I am not sure I agree. When sorting on the second half, the
couple (["a", 3], ["b", 3]) falls under the "equal" case of the choice.

"a" != "b" results in true, thus the couplet is *not* equal.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
 
R

Robert Klemme

Phillip Gawlowski wrote in post #959912:
@Phillip : I am not sure I agree. When sorting on the second half, the
couple (["a", 3], ["b", 3]) falls under the "equal" case of the choice.

"a" != "b" results in true, thus the couplet is *not* equal.

No, Michael is right. Every sorting algorithm only ever looks at *sort
keys*. So since our sort key is the second array member (the number)
both mentioned records are equivalent. If you include the other field a
stable sort isn't really interesting here since then all fields are part
of the sort key - stable and instable algorithms will return the same then.

Kind regards

robert
 
M

Michel Demazure

Robert Klemme wrote in post #959935:
No, Michael is right. Every sorting algorithm only ever looks at *sort
keys*. So since our sort key is the second array member (the number)
both mentioned records are equivalent.

Coming back to the beginning, quicksort correctly coded is stable (cut
recursively in three parts, with the middle part made by those elements
equivalent to the choosen pivot). Even the parallel version is stable.
In NESL :

function quicksort(a) =
if (#a < 2) then a
else
let pivot = a[#a/2];
lesser = {e in a| e < pivot};
equal = {e in a| e == pivot};
greater = {e in a| e > pivot};
result = {quicksort(v): v in [lesser,greater]};
in result[0] ++ equal ++ result[1];

Which means that ruby's is badly coded.
_md
 
Å

冷雨

[Note: parts of this message were removed to make it a legal post.]

Ammar Ali wrote in post #959889:

If I understood your issue correctly,

Yes, it is exactly that. Your example shows it :

a = [["g", 3], ["c", 3], ["a", 3], ["f", 3], ["e", 2], ["d", 3], ["b",
1]]
b = a.sort_by{ |letter, number| letter}
# => [["a", 3], ["b", 1], ["c", 3], ["d", 3], ["e", 2], ["f", 3], ["g",
3]]
c = b.sort_by{ |letter, number| number}
# => [["b", 1], ["e", 2], ["d", 3], ["c", 3], ["a", 3], ["f", 3], ["g",
3]]

Without code, I have to guess

The code for sort_by refers to ruby_qsort

I think a built-in sort method should be stable (or have a stable
variant).

Thanks for the clarification, now I understand what you mean. Based on
the results of your example, it appears that ruby's qsort is naive.
AFAIK, the majority of qsort implementations are.

IIRC quicksort is instable by definition. There may be stable variants
though.

In this case it's not really needed IMHO. You can either do

x.sort_by {|let,dig| [dig, let]}

or, if you don't want to create all those temporary arrays

x.sort {|(la,da),(lb,db)| da == db ? la <=> lb : da <=> db}
x.sort {|(la,da),(lb,db)| c = da <=> db; c == 0 ? la <=> lb : c}

Kind regards

robert
From a practical viewpoint, stable sort should be implemented as stable
sort.
Not a multiple keys sort... or something else.
 
W

w_a_x_man

sort_by  is not a stable sorting method (ruby 1.9.2 p0)

I am doing word statistics in a text file. I have a long array of
entries [word, number], sorted by alphabetic order on word. When I sort
on number, entries with the same number are shuffled.


I've noticed that it's unstable under 1.8.7:

%w(foo bar car war raw rat cat or bat is).sort.
sort_by{|s| s.size}
==>["is", "or", "bar", "cat", "foo", "car", "bat", "rat", "raw",
"war"]
VERSION
==>"1.8.7"
 
J

John W Higgins

[Note: parts of this message were removed to make it a legal post.]

Good Morning,

Coming back to the beginning, quicksort correctly coded is stable (cut
recursively in three parts, with the middle part made by those elements
equivalent to the choosen pivot).
...


Which means that ruby's is badly coded.

The rest of the sane world begs to differ.

http://en.wikipedia.org/wiki/Sorting_algorithm#Quicksort - "...Efficient
implementations of quicksort (with in-place partitioning) are typically
unstable sorts..."

http://perldoc.perl.org/sort.html - "...Mergesort is stable, quicksort is
not..."

http://www.codecogs.com/reference/c/stdlib.h/qsort.php - "...The algorithms
implemented by *qsort*, *qsort_r* and *heapsort* are not stable..."

Linux Man page for qsort - "....If two members compare as equal, their order
in the sorted array is undefined...."

Python in all fairness says they have stable sorts - but they really make no
mention of the algorithm used so I'm unclear as to whether they are using
quicksort at all.

I'd rather have the efficient implementation that one can easily make stable
by using the secondary comparison (as you can easily do in this case) than
an inefficient version that really accomplishes nothing that the efficient
version cannot.

John

P.S. There is nothing in the world stopping anyone from submitting a patch
to add a stable sort option to Ruby. But to pronounce that the hard work of
others is badly coded when it clearly conforms to the normal standard of
multiple mainstream languages is just plain ignorant at best.
 
W

w_a_x_man

I'd rather have the efficient implementation that one can easily make stable
by using the secondary comparison (as you can easily do in this case) than
an inefficient version that really accomplishes nothing that the efficient
version cannot.

I'd rather have a stable and efficient sort. If Python can do it,
then there's no excuse for Ruby.
 
J

John W Higgins

[Note: parts of this message were removed to make it a legal post.]

Good Afternoon

I'd rather have a stable and efficient sort. If Python can do it,
then there's no excuse for Ruby.

First, there are two issues here - one is whether or not to use quicksort as
the default sort and then whether or not there is an expectation as to
whether or not quicksort should be stable.

The issue I took was the statement that Ruby's quicksort was "badly coded".
This is not a correct statement given the many other instances of quicksort
that follow the same principal as Ruby's version in that it's not stable. I
think Ruby is just fine with respect to its quicksort implementation.

As to the concept of whether or not another algorithm should be used - it's
rather interesting to look here -
http://redmine.ruby-lang.org/issues/show/1089 and especially the comments of
Mr. Nutter who while working on JRuby had an apparently clean canvas to work
with and still went with an unstable sort. It's sparse on details and I'm
not trying to put words in his mouth either.

Again, anyone is free to put a patch on the table and really start a good
conversation as to how it would fit in - and quite possibly a stable_sort
method would be an outstanding option here. But there just is a huge
difference between someone coming to the table patch in hand to begin a
conversation - and just saying Ruby is "badly coded". This is a sort
algorithm - it shouldn't be rocket science for someone to create a different
algorithm patch if they wanted to. There are plenty of C reference
implementations available as starting points.

John
 
C

Charles Oliver Nutter

As to the concept of whether or not another algorithm should be used - it's
rather interesting to look here -
http://redmine.ruby-lang.org/issues/show/1089 and especially the comments of
Mr. Nutter who while working on JRuby had an apparently clean canvas to work
with and still went with an unstable sort. It's sparse on details and I'm
not trying to put words in his mouth either.

I can clarify.

JRuby used to just use the JVM-provided mergesort, which was stable
but markedly slower than Ruby 1.8.6/7's unstable quicksort. Because we
got a couple reports about sort being slower, we opted to do the
"wrong thing" and have a contributor implement a hybrid sort that
performed better than Ruby 1.8.6/7 but was unstable.

About a month later, someone raised the issue about MRI not having a
stable sort, and it was decided to make it stable. AARGH. I have not
kept up with discussions, however, and I believe sorts are still
unstable in 1.9.2, correct?

At this point I almost want there to be both options, since our
contributor (qbproger ... sorry man, your name escapes me at the
moment!) went to a lot of effort to get our current sort running well.

In any case, it wasn't an arbitrary decision. We debate almost
endlessly over these sorts of things, and I think I've taken a good
ten years off my life due to the related stress :)

- Charlie
 
M

Michel Demazure

Charles Nutter wrote in post #962516:
In any case, it wasn't an arbitrary decision. We debate almost
endlessly over these sorts of things, and I think I've taken a good
ten years off my life due to the related stress :)

Just to come back to my initial post : it was not about sort, but about
sort_by.

For sort_by, you need to compute the keys, and sort the indices
according to the keys. It is not in-place sorting. Yes, optimal in-place
quicksort is unstable. But for sort_by, one could use a stable version.

Am I wrong ?
_md
 
C

Charles Oliver Nutter

Just to come back to my initial post : it was not about sort, but about
sort_by.

For sort_by, you need to compute the keys, and sort the indices
according to the keys. It is not in-place sorting. Yes, optimal in-place
quicksort is unstable. But for sort_by, one could use a stable version.

We don't distinguish between the two in JRuby, but it might be a good
trade-off to say that sort_by is stable and sort does not necessarily
have that guarantee.

- Charlie
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top