"monty" < "python"

franzferdinand · Mar 20, 2013

"Monty" < "Python"
True
False
What's the rule about that? Is it the number of letters or what?
thanks

Jan Oelze · Mar 20, 2013

From the docs[0]:

"Strings are compared lexicographically using the numeric equivalents (the result of the built-in function ord()) of their characters. Unicode and 8-bit strings are fully interoperable in this behavior."

[0] http://docs.python.org/2/reference/expressions.html#not-in

R. Michael Weylandt · Mar 20, 2013

It's lexigraphic (order by first letter, but if those are the same,
compare the second, but if those are same compare the third, ... if
one ends while the other continues, it's considered 'lower') on the
character's ASCII (binary encoding values):

http://www.asciitable.com/

Note that all the upper case values appear before the lower case
values. (And there are some other 'characters' like newline before
that but you won't see them)

Cheers,
Michael

Roy Smith · Mar 20, 2013

Jan Oelze said:
From the docs[0]:

"Strings are compared lexicographically using the numeric equivalents (the
result of the built-in function ord()) of their characters. Unicode and 8-bit
strings are fully interoperable in this behavior."

Note, however, that sorting order is a really complicated subject.
Different languages have all sorts of rules for how to alphabetize
entries in a directory or dictionary. Does N sort the same as N? Does
E sort the same as E? What about C and C? Are these pairs all the same
letter, one of which is decorated with some mark, or are they different
letters?

If you're worried about these sorts of things, you need to be looking at
the locale module.

franzferdinand · Mar 20, 2013

Ok, thanks everybody!

Ian Foote · Mar 20, 2013

"Strings are compared lexicographically using the numeric equivalents
(the result of the built-in function ord()) of their characters. Unicode
and 8-bit strings are fully interoperable in this behavior."

This isn't true in python 3:

Python 3.2.3 (default, Oct 19 2012, 19:53:57)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: bytes() < str()

Ian F

Jan Oelze · Mar 20, 2013

Interesting. Thanks!

"Strings are compared lexicographically using the numeric equivalents
(the result of the built-in function ord()) of their characters. Unicode
and 8-bit strings are fully interoperable in this behavior."

Click to expand...

This isn't true in python 3:

Python 3.2.3 (default, Oct 19 2012, 19:53:57)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: bytes() < str()

Ian F

Grant Edwards · Mar 20, 2013

False
What's the rule about that?

I don't know what "that" refers to in your question, but 'a' comes
before 'y' if that's what you're asking.

Is it the number of letters or what?

Individual letters are compared until a mismatch is found.

jmfauth · Mar 20, 2013

----

Courageous people can try to do something with the unicode
collation algorithm (see unicode.org). Some time ago, for the fun,
I wrote something (not perfect) with a reduced keys table (see
unicode.org), only a keys subset for some scripts hold in memory.

It works with Py32 and Py33. In an attempt to just see the
performance and how it "can react", I did an horrible mistake,
I forgot Py33 is now optimized for ascii user, it is no more
unicode compliant and I stupidely tested/sorted lists of French
words...

jmf

Tim Delaney · Mar 20, 2013

Franz, please pay no attention to jmf. He has become obsessed with a single
small regression in Python 3.3 in performance with how strings perform in a
very small domain that rarely shows up in practice (although as he has
demonstrated, it is easy to create a microbenchmark that makes it appear to
be much worse than it is).

The regression is a consequence of the decision in Python 3.3 to
*correctly* support the full range of Unicode characters whilst also
reducing the required memory where possible. In the vast majority of cases
this is a performance *improvement*. It is only "optimised for the ascii
user" in the sense that in the Unicode standard the pre-existing ASCII
characters only require 1 byte per code point and hence can be stored in
less memory than most other Unicode code points. The possible character
widths are 1, 2 and 4 bytes.

The actual regression occurs when concatentating/replacing/etc a character
to a string that is wider than any other character currently in the string.
In this situation the new string needs to be widened (increase the number
of bytes used by every character) which is a much more expensive operation
than simply creating a new string (which is what would happen if the
character was the same size or smaller).

It has been acknowledged as a real regression, but he keeps hijacking every
thread where strings are mentioned to harp on about it. He has shown no
inclination to attempt to *fix* the regression and is rapidly coming to be
regarded as a troll by most participants in this list.

Tim Delaney

Michael Torrie · Mar 21, 2013

I forgot Py33 is now optimized for ascii user, it is no more
unicode compliant and I stupidely tested/sorted lists of French
words...

Just because you keep saying it does not make it true. How is Py33 not
unicode compliant anymore? And maybe you ought to post some real code
too. (not like you've posted to date with the little one-liners that
aren't actually used in practice.)

Steven D'Aprano · Mar 21, 2013

it [Python3.3] is no more unicode compliant

I don't often call people a liar. I prefer to think that they are merely
confused, or honestly hold a mistaken belief. But in this case, I will
make an exception.

JMF, I believe you are deliberately, maliciously lying about Python, to
further what I believe is your irrational hatred of ASCII users and your
desire to punish them with poor performance, even when that affects
*everybody*.

rusi · Mar 21, 2013

----

Courageous people can try to do something with the unicode
collation algorithm (see unicode.org). Some time ago, for the fun,
I wrote something (not perfect) with a reduced keys table (see
unicode.org), only a keys subset for some scripts hold in memory.

It works with Py32 and Py33. In an attempt to just see the
performance and how it "can react", I did an horrible mistake,
I forgot Py33 is now optimized for ascii user, it is no more
unicode compliant and I stupidely tested/sorted lists of French
words...

Now lets take this piece by piece…
"I did an horrible mistake" : I am sorry. Did you get bruised? Break
some bones? And is 'h' a vowel in french?
"I forgot Py33 is now optimized for ascii user" Ok.
"it is no more unicode compliant" I asked earlier and I ask again --
What do you mean by (non)compliant?

Steven D'Aprano · Mar 21, 2013

"I did an horrible mistake" [...] is 'h' a vowel in french?

No it is not, and writing "an horrible" is a trivial typo which can
easily happen if you start thinking "an awful ..." (for example) and then
change to "horrible". Been there, done that.

But more interesting is the idea that in English we use "an" before words
that start with a vowel, and "a" with words that start with a consonant:

a tiger
a car
a house

but

an elephant
an ambulance
an unit

Wait, what? "An unit"? What rubbish is that?

The rule actually depends on the *sound* of the first syllable, not the
letter. If the first syllable is a consonant sound, we say and write "a",
even if the first letter is a vowel:

a unique opportunity

since the U in "unique" is pronounced as a "Yoo" sound rather than "Ah"
sound. Likewise if the first consonant is silent, we use "an":

an honourable man
half an hour

Now think of somebody who pronounces horrible with a silent "h". In
English, an initial H used to *always* be silent, nowadays only some such
words are. It's more common in dialect though.

"I made a 'orrible mistake in getting a 'Arry Potter tattoo on my
forehead."

"I made an 'orrible mistake in getting an 'Arry Potter tattoo on my
forehead."

Say each sentence aloud. The second sounds far more natural, the "n" in
"an" creates a bridge between the vowel sounds of "a" and "orrible".

By the way, the "n" in "an" is not the only such "bridging" sound. In
Shakespearean times, it was usual to use "mine" in the same fashion:

my wife
my peach

but

mine husband
mine apple

This-language-lesson-was-brought-to-you-by-the-letters-thorn-wynn-and-ash-
ly y'rs,

Terry Reedy · Mar 21, 2013

Ok, thanks everybody!

Threads are like the Sorcerer's Apprentice. You can start 'em, but you
cannot stop 'em ;-)

David H Wild · Mar 21, 2013

Larry Hudson said:
The word "apron" was originally "napron", and over the years the phrase
"a napron" mutated to "an apron". So that became the accepted word.

Similarly, the snake was a nadder - congruent with the natterjack toad.

Chris Angelico · Mar 21, 2013

Similarly, the snake was a nadder - congruent with the natterjack toad.

Hey look, snakes, we're back on topic!

ChrisA

Roy Smith · Mar 21, 2013

Terry Reedy said:
Threads are like the Sorcerer's Apprentice. You can start 'em, but you
cannot stop 'em ;-)

Of course you can stop threads. Just call _exit(). No more threads!

Chris Angelico · Mar 21, 2013

Of course you can stop threads. Just call _exit(). No more threads!

I don't think Mickey Mouse knew about that call, otherwise he'd have
used it. Either that, or he had a completely saturated system and
couldn't type anything at the console, so it took the wizard's SSH
session to deal with the problem using "kill -9".

ChrisA

Wayne Werner · Mar 21, 2013

Of course you can stop threads. Just call _exit(). No more threads!

Thank you for making me laugh this morning - I found that extremely
amusing.

-W

Terry Jones: "Monty Python to reunite for stage show"	1	Nov 19, 2013
I have to finish this code for my assignment but I cant figure out how to solve it	1	Jun 27, 2023
Snake references just as ok as Monty Python jokes/references in python community? :)	8	Dec 8, 2006
Security test of embedded Python	9	Jun 22, 2011
Natural Language Processing with Python .dispersion_plot returns nothing	4	Jun 17, 2013
python operational semantics paper	2	Nov 9, 2013
SOLVE THIS IF YOU CAN PYTHON MASTER	7	Jan 30, 2023
PSF News: Guido van Rossum quitting Python to develop new, more difficult to learn, language.	12	Apr 1, 2013

"monty" < "python"

franzferdinand

Jan Oelze

R. Michael Weylandt

Roy Smith

franzferdinand

Ian Foote

Jan Oelze

Grant Edwards

jmfauth

Tim Delaney

Michael Torrie

Steven D'Aprano

rusi

Steven D'Aprano

Terry Reedy

David H Wild

Chris Angelico

Roy Smith

Chris Angelico

Wayne Werner

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads