Python 3.0 - is this true?

T

Terry Reedy

Duncan Grisby wrote:

Sorry for the delay in replying. Yes, that's not far off. Most of the
time the lists contain strings, though. A better approximation might
be to read lines from a file and randomly replace them with Nones:

l = []
for line in open("bigfile.txt"):
x = random.randint(0,100)
if x < 4: l.append(None)
else: l.append(line)

So use '' or '\0' instead of None for null lines. Or replace None for
the sort.
And maybe once in a while you end up with something not dissimilar to:

l = []
for line in open("bigfile.txt"):
x = random.randint(0,100)
if x < 4: l.append(None)
elif x < 5: l.append([line,line])
else: l.append(line)

In that kind of case it doesn't really matter what happens to list
items in the sort order, but it's important it doesn't fail to sort
the ones that are strings.

Replace the sublists with a coded string, such as '\0'+line.
If sorting is time (and space) critical, you want a straight string sort
without key *or* cmp function.

tjr
 
D

Duncan Grisby

[...]
l = []
for line in open("bigfile.txt"):
x = random.randint(0,100)
if x < 4: l.append(None)
else: l.append(line)

So use '' or '\0' instead of None for null lines. Or replace None for
the sort.

Both '' and '\0' could conceivably be valid data values. The problem
is that the data comes from a dynamically-typed database, so there is
no guarantee what the data values might be. This for loop with a file
is just an example that generates similar-looking data. It bears no
relation to the way the data is actually acquired.
And maybe once in a while you end up with something not dissimilar to:

l = []
for line in open("bigfile.txt"):
x = random.randint(0,100)
if x < 4: l.append(None)
elif x < 5: l.append([line,line])
else: l.append(line)

In that kind of case it doesn't really matter what happens to list
items in the sort order, but it's important it doesn't fail to sort
the ones that are strings.

Replace the sublists with a coded string, such as '\0'+line.

Again, this is just an example. As I say, in the real application, the
data has come from a dynamically-typed database. There's no easy way
to coerce all the items to the same type, because the application
doesn't know up-front what the data is going to look like. In some
cases, it may be that all the data items are lists of strings. In
other cases, all or most of the data items will be integers, for
example.
If sorting is time (and space) critical, you want a straight string sort
without key *or* cmp function.

That's exactly my point. Currently, the application just builds a list
of values retrieved from the database and asks Python to sort it. No
key or cmp function. With Python 3.0, it's going to require a lot more
work than that.

For my application, Python 3's comparison behaviour is a backwards
step. You can argue all you like that the new behaviour is the "right"
thing to do, and I'd be inclined to agree with you from a
philosophical point of view, but the fact is that it _will_ cause
problems for existing real code. The particularly ironic thing is that
the database's dynamic typing is closely modelled on Python, including
it's ability to gracefully handle mixed-type lists.

Cheers,

Duncan.
 
A

Aahz

Again, this is just an example. As I say, in the real application, the
data has come from a dynamically-typed database. There's no easy way
to coerce all the items to the same type, because the application
doesn't know up-front what the data is going to look like. In some
cases, it may be that all the data items are lists of strings. In
other cases, all or most of the data items will be integers, for
example.


That's exactly my point. Currently, the application just builds a list
of values retrieved from the database and asks Python to sort it. No
key or cmp function. With Python 3.0, it's going to require a lot more
work than that.

Unless I misunderstand you, your application is broken already. Complex
numbers have never been sortable. If your application works more like a
SQL database, then any given column will have only one datatype and as
long as you avoid types that cannot be compared against themselves
you're safe.

(I'll agree that from some perspectives the new behavior of None is a
wart but I think that in the end I agree with people who say that
preventing None from being sorted except intentionally will trap more
bugs earlier.)
For my application, Python 3's comparison behaviour is a backwards
step. You can argue all you like that the new behaviour is the "right"
thing to do, and I'd be inclined to agree with you from a
philosophical point of view, but the fact is that it _will_ cause
problems for existing real code. The particularly ironic thing is that
the database's dynamic typing is closely modelled on Python, including
it's ability to gracefully handle mixed-type lists.

What I think people are arguing about is your use of the word "backward".
I don't think anyone claims that fixing your application will be trivial,
but your application appears to be already broken, and Python 3.0 is
simply forcing you to confront it head-on.
 
T

Tim Rowe

2008/11/24 Aahz said:
(I'll agree that from some perspectives the new behavior of None is a
wart but I think that in the end I agree with people who say that
preventing None from being sorted except intentionally will trap more
bugs earlier.)

So will Python be introducing strong type checking and early binding,
so we can catch more bius earlier (compile rather than run time?) ;-)
 
D

Duncan Grisby

[...]
That's exactly my point. Currently, the application just builds a list
of values retrieved from the database and asks Python to sort it. No
key or cmp function. With Python 3.0, it's going to require a lot more
work than that.

Unless I misunderstand you, your application is broken already.

You misunderstand me.

My application is not broken. It handles a wide range of data types,
_but they are all sortable_. It doesn't involve any complex numbers.
There is no way a complex number could be stored in the database, so
that is not an issue.
Complex
numbers have never been sortable. If your application works more like a
SQL database, then any given column will have only one datatype and as
long as you avoid types that cannot be compared against themselves
you're safe.

My database isn't like a SQL database. It's dynamically typed, just
like Python. The range of types is limited to things that can be
marshalled into the underlying storage, so I can absolutely guarantee
that they are all sortable in Python 2, even though the permitted
range or types is quite wide.

[...]
What I think people are arguing about is your use of the word "backward".
I don't think anyone claims that fixing your application will be trivial,
but your application appears to be already broken, and Python 3.0 is
simply forcing you to confront it head-on.

The post you've quoted was the first time I used the word "backwards".
I didn't start this thread, I merely chipped in when people claimed
there were no real applications that would be affected by this change
to Python. I do have a real, widely deployed application that will be
affected. Claiming that it is not affected, or that it is already
"broken" does not change that fact.

Several people _have_ in fact claimed that I can't possibly have a
hard problem to solve in this respect. In general, the responses can
be grouped into those people claiming that it's trivial to fix, and
those (like you) claiming that it's ok that it's hard to fix because
it's already broken. Neither are true.

This issue is not impossible to deal with, of course, it's just one of
several things that will mean it's hard for us to migrate to Python 3.
What I find disturbing is the attitude that this change to Python's
comparison behaviour can't possibly have any downsides and that anyone
claiming it does is wrong.

Cheers,

Duncan.
 
M

Martin v. Löwis

Sorry for the delay in replying.

Thanks, and I'm in turn sorry myself, too. Here is my experiment:

I only looked at the first test case, where a bigfile.txt has
occasionally None lines. To run this test, I created a file with
33079296 and 704512 lines, by repeatedly cat'ing /etc/passwd.

I then ran this program:

import time,random
l = []

t=time.time()
for line in open("x"):
x = random.randint(0,100)
if x < 4: l.append(None)
else: l.append(line)
print(time.time()-t)

t=time.time()
for i in range(10):
L = l[:]
L.sort()
print(time.time()-t)

def key(n):
return n or ""

t=time.time()
for i in range(10):
L = l[:]
L.sort(key=key)
print(time.time()-t)

This key function is based on the assumption that
there won't be any strings in l, which is reasonable,
because they are all lines read from the input file
(i.e. contain \n at least). If truly empty strings
where possible, I'd map them to chr(0), which I would
assume couldn't occur elsewhere in the input.

With 2.5, I get these results (rounded)

3.8s
7.6s
14.3s

So with the key function, the overhead appears to have doubled.
I think the ratio should decrease as lists grow larger (because
the key function is linear, and the sort is nlogn, assuming the
overhead is in computing the key function), but I haven'd done
any series varying the file size.

With 3.0(rc3), I get (omitting the key-less sorting, as it won't
work)

24.8s
15.9s

So now the IO is much larger than the actual sorting, which is
about the same in 2.5 and 3.0. If you wonder where the overhead
of IO comes from: part of it probably from the UTF-8 decoding,
that the IO library now does transparently.

I don't know whether this overhead qualifies as "not
insignificant", most people probably would claim that a doubling
in time is indeed significant. OTOH, not knowing what the original
problem is (e.g. whether it's interactive or in a batch mode, and
whether bigfile.txt is much smaller or much larger than 32MiB),
I would probably
a) evaluate whether the increase in IO time is acceptable, and
b) if it is, just go with 3.0, as the increase in sorting time
is much smaller than the increase in IO time.

Regards,
Martin
 
A

Aahz

The post you've quoted was the first time I used the word "backwards".
I didn't start this thread, I merely chipped in when people claimed
there were no real applications that would be affected by this change
to Python. I do have a real, widely deployed application that will be
affected. Claiming that it is not affected, or that it is already
"broken" does not change that fact.

Fair enough; I've been mostly skimming the thread and only your use of
"backward" prompted me to chime in. ;-)
This issue is not impossible to deal with, of course, it's just one of
several things that will mean it's hard for us to migrate to Python 3.
What I find disturbing is the attitude that this change to Python's
comparison behaviour can't possibly have any downsides and that anyone
claiming it does is wrong.

Well, I agree with you that it does make things more difficult in some
respects. I also think that it's an aggregate improvement that is
similar to the way L.sort() returns None.
 
S

Steven D'Aprano

So will Python be introducing strong type checking and early binding, so
we can catch more bius earlier (compile rather than run time?) ;-)

Python already has strong type checking.

You're confusing it with static types, which Python doesn't have.
 
S

Steven D'Aprano

Again, this is just an example. As I say, in the real application, the
data has come from a dynamically-typed database. There's no easy way to
coerce all the items to the same type, because the application doesn't
know up-front what the data is going to look like. In some cases, it may
be that all the data items are lists of strings. In other cases, all or
most of the data items will be integers, for example. ....
For my application, Python 3's comparison behaviour is a backwards step.
You can argue all you like that the new behaviour is the "right" thing
to do, and I'd be inclined to agree with you from a philosophical point
of view, but the fact is that it _will_ cause problems for existing real
code. The particularly ironic thing is that the database's dynamic
typing is closely modelled on Python, including it's ability to
gracefully handle mixed-type lists.

If I have understood you correctly, essentially you just need a variation
on the Null object pattern, one that accepts comparisons with any other
type. That's easy, because you don't have to worry about mutual
comparisons of multiple arbitrary types (sorting mixed strings and
numbers and null). All you need is something that sorts below everything
else: something just like None, except sortable.

Unfortunately you can't inherit from NoneType (that I know of...) but
inheriting from object should be good enough. If it isn't, you can use
delegation instead of inheritance. I leave that as an exercise for the
reader.

# not tested in Python 3
class SortableNone(object):
def __lt__(self, other):
return True

Null = SortableNone()

Now just use Null instead of None in your database.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,440
Members
44,830
Latest member
ZADIva7383

Latest Threads

Top