"monty" < "python"

F

franzferdinand

"Monty" < "Python"
True
False
What's the rule about that? Is it the number of letters or what?
thanks
 
R

R. Michael Weylandt

It's lexigraphic (order by first letter, but if those are the same,
compare the second, but if those are same compare the third, ... if
one ends while the other continues, it's considered 'lower') on the
character's ASCII (binary encoding values):

http://www.asciitable.com/

Note that all the upper case values appear before the lower case
values. (And there are some other 'characters' like newline before
that but you won't see them)

Cheers,
Michael
 
R

Roy Smith

Jan Oelze said:
From the docs[0]:

"Strings are compared lexicographically using the numeric equivalents (the
result of the built-in function ord()) of their characters. Unicode and 8-bit
strings are fully interoperable in this behavior."

Note, however, that sorting order is a really complicated subject.
Different languages have all sorts of rules for how to alphabetize
entries in a directory or dictionary. Does N sort the same as N? Does
E sort the same as E? What about C and C? Are these pairs all the same
letter, one of which is decorated with some mark, or are they different
letters?

If you're worried about these sorts of things, you need to be looking at
the locale module.
 
I

Ian Foote

"Strings are compared lexicographically using the numeric equivalents
(the result of the built-in function ord()) of their characters. Unicode
and 8-bit strings are fully interoperable in this behavior."

This isn't true in python 3:

Python 3.2.3 (default, Oct 19 2012, 19:53:57)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: bytes() < str()

Ian F
 
J

Jan Oelze

Interesting. Thanks!

"Strings are compared lexicographically using the numeric equivalents
(the result of the built-in function ord()) of their characters. Unicode
and 8-bit strings are fully interoperable in this behavior."

This isn't true in python 3:

Python 3.2.3 (default, Oct 19 2012, 19:53:57)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: bytes() < str()

Ian F
 
G

Grant Edwards

False
What's the rule about that?

I don't know what "that" refers to in your question, but 'a' comes
before 'y' if that's what you're asking.
Is it the number of letters or what?

Individual letters are compared until a mismatch is found.
 
J

jmfauth

----

Courageous people can try to do something with the unicode
collation algorithm (see unicode.org). Some time ago, for the fun,
I wrote something (not perfect) with a reduced keys table (see
unicode.org), only a keys subset for some scripts hold in memory.

It works with Py32 and Py33. In an attempt to just see the
performance and how it "can react", I did an horrible mistake,
I forgot Py33 is now optimized for ascii user, it is no more
unicode compliant and I stupidely tested/sorted lists of French
words...

jmf
 
T

Tim Delaney

Franz, please pay no attention to jmf. He has become obsessed with a single
small regression in Python 3.3 in performance with how strings perform in a
very small domain that rarely shows up in practice (although as he has
demonstrated, it is easy to create a microbenchmark that makes it appear to
be much worse than it is).

The regression is a consequence of the decision in Python 3.3 to
*correctly* support the full range of Unicode characters whilst also
reducing the required memory where possible. In the vast majority of cases
this is a performance *improvement*. It is only "optimised for the ascii
user" in the sense that in the Unicode standard the pre-existing ASCII
characters only require 1 byte per code point and hence can be stored in
less memory than most other Unicode code points. The possible character
widths are 1, 2 and 4 bytes.

The actual regression occurs when concatentating/replacing/etc a character
to a string that is wider than any other character currently in the string.
In this situation the new string needs to be widened (increase the number
of bytes used by every character) which is a much more expensive operation
than simply creating a new string (which is what would happen if the
character was the same size or smaller).

It has been acknowledged as a real regression, but he keeps hijacking every
thread where strings are mentioned to harp on about it. He has shown no
inclination to attempt to *fix* the regression and is rapidly coming to be
regarded as a troll by most participants in this list.

Tim Delaney
 
M

Michael Torrie

I forgot Py33 is now optimized for ascii user, it is no more
unicode compliant and I stupidely tested/sorted lists of French
words...

Just because you keep saying it does not make it true. How is Py33 not
unicode compliant anymore? And maybe you ought to post some real code
too. (not like you've posted to date with the little one-liners that
aren't actually used in practice.)
 
S

Steven D'Aprano

it [Python3.3] is no more unicode compliant

I don't often call people a liar. I prefer to think that they are merely
confused, or honestly hold a mistaken belief. But in this case, I will
make an exception.

JMF, I believe you are deliberately, maliciously lying about Python, to
further what I believe is your irrational hatred of ASCII users and your
desire to punish them with poor performance, even when that affects
*everybody*.
 
R

rusi

----

Courageous people can try to do something with the unicode
collation algorithm (see unicode.org). Some time ago, for the fun,
I wrote something (not perfect) with a reduced keys table (see
unicode.org), only a keys subset for some scripts hold in memory.

It works with Py32 and Py33. In an attempt to just see the
performance and how it "can react", I did an horrible mistake,
I forgot Py33 is now optimized for ascii user, it is no more
unicode compliant and I stupidely tested/sorted lists of French
words...

Now lets take this piece by piece…
"I did an horrible mistake" : I am sorry. Did you get bruised? Break
some bones? And is 'h' a vowel in french?
"I forgot Py33 is now optimized for ascii user" Ok.
"it is no more unicode compliant" I asked earlier and I ask again --
What do you mean by (non)compliant?
 
S

Steven D'Aprano

"I did an horrible mistake" [...] is 'h' a vowel in french?

No it is not, and writing "an horrible" is a trivial typo which can
easily happen if you start thinking "an awful ..." (for example) and then
change to "horrible". Been there, done that.

But more interesting is the idea that in English we use "an" before words
that start with a vowel, and "a" with words that start with a consonant:

a tiger
a car
a house

but

an elephant
an ambulance
an unit

Wait, what? "An unit"? What rubbish is that?

The rule actually depends on the *sound* of the first syllable, not the
letter. If the first syllable is a consonant sound, we say and write "a",
even if the first letter is a vowel:

a unique opportunity

since the U in "unique" is pronounced as a "Yoo" sound rather than "Ah"
sound. Likewise if the first consonant is silent, we use "an":

an honourable man
half an hour

Now think of somebody who pronounces horrible with a silent "h". In
English, an initial H used to *always* be silent, nowadays only some such
words are. It's more common in dialect though.

"I made a 'orrible mistake in getting a 'Arry Potter tattoo on my
forehead."

"I made an 'orrible mistake in getting an 'Arry Potter tattoo on my
forehead."

Say each sentence aloud. The second sounds far more natural, the "n" in
"an" creates a bridge between the vowel sounds of "a" and "orrible".

By the way, the "n" in "an" is not the only such "bridging" sound. In
Shakespearean times, it was usual to use "mine" in the same fashion:

my wife
my peach

but

mine husband
mine apple



This-language-lesson-was-brought-to-you-by-the-letters-thorn-wynn-and-ash-
ly y'rs,
 
D

David H Wild

Larry Hudson said:
The word "apron" was originally "napron", and over the years the phrase
"a napron" mutated to "an apron". So that became the accepted word.

Similarly, the snake was a nadder - congruent with the natterjack toad.
 
R

Roy Smith

Terry Reedy said:
Threads are like the Sorcerer's Apprentice. You can start 'em, but you
cannot stop 'em ;-)

Of course you can stop threads. Just call _exit(). No more threads!
 
C

Chris Angelico

Of course you can stop threads. Just call _exit(). No more threads!

I don't think Mickey Mouse knew about that call, otherwise he'd have
used it. Either that, or he had a completely saturated system and
couldn't type anything at the console, so it took the wizard's SSH
session to deal with the problem using "kill -9".

ChrisA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top