Fun with fancy slicing

D

Dave Benjamin

Hey all,

I just realized you can very easily implement a sequence grouping function
using Python 2.3's fancy slicing support:

def group(values, size):
return map(None, *[values[i::size] for i in range(size)])
[(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11), (12, 13, 14, 15),
(16, 17, 18, 19)]
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]

I had to use map(None, *...) instead of zip(*...) to transpose the result
because zip throws away the "orphans" at the end. Anyway, this is a useful
function to have in your toolkit if you need to do pagination or
multi-column display, among other things...

Anyone have any other interesting applications of the extended slice syntax?

Peace,
Dave
 
M

Michael Hudson

Dave Benjamin said:
Anyone have any other interesting applications of the extended slice syntax?

You can use it in a sieve prime finder (ooh, thread crossing :):

/>> def sieve(p):
|.. candidates = range(p)
|.. for i in candidates[2:]:
|.. if not candidates: continue
|.. n = len(candidates[2*i::i])
|.. candidates[2*i::i] = [0]*n
|.. return filter(None, candidates[2:])
\__
->> sieve(50)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

but it's not especially efficient (in speed or memory). Quite cute,
though.

Cheers,
mwh
 
D

Dave Benjamin

You can use it in a sieve prime finder (ooh, thread crossing :):
...
but it's not especially efficient (in speed or memory). Quite cute,
though.

But quite expressive! I'm amazed at how small that code was. It reminds me
of the trademark quicksort implementation in Haskell, ie. it may not be
great in practice, but you can see the algorithm at a high level which makes
it easier to figure out what's going on.

Anyway, thanks. That was cool. =)

Peace,
Dave
 
G

Gerrit Holl

I do not seem to have received this message from Michael Hudson through
python-list. I am able to locate it on the newsgroup. Does anyone else
have this problem?

Gerrit Holl.
 
A

Alex Martelli

Dave Benjamin wrote:
...
of the trademark quicksort implementation in Haskell, ie. it may not be
great in practice, but you can see the algorithm at a high level which
makes it easier to figure out what's going on.

Hmmm, you mean as in...:

def quicksort(alist):
head = alist[0]
return quicksort([ x in alist[1:] if x<=head ]) + [
head ] + quicksort([ x in alist[1:] if x>head ])

....?


Alex
 
D

David Eppstein

Alex Martelli said:
Dave Benjamin wrote:
...
of the trademark quicksort implementation in Haskell, ie. it may not be
great in practice, but you can see the algorithm at a high level which
makes it easier to figure out what's going on.

Hmmm, you mean as in...:

def quicksort(alist):
head = alist[0]
return quicksort([ x in alist[1:] if x<=head ]) + [
head ] + quicksort([ x in alist[1:] if x>head ])

...?

Well, except that it gets a syntax error. And if you correct the syntax
it gets an IndexError due to the lack of a base case for the recursion.
And if you correct that, it still doesn't work correctly for lists with
repeated items.

How about:

def quicksort(L):
if len(L) < 2: return L
return quicksort([x for x in L if x < L[0]]) + \
[x for x in L if x == L[0]] + \
quicksort([x for x in L if x > L[0]])
 
A

Alex Martelli

David Eppstein wrote:
...
def quicksort(alist):
head = alist[0]
return quicksort([ x in alist[1:] if x<=head ]) + [
head ] + quicksort([ x in alist[1:] if x>head ])

Well, except that it gets a syntax error. And if you correct the syntax
it gets an IndexError due to the lack of a base case for the recursion.

Right on both counts -- sorry.
And if you correct that, it still doesn't work correctly for lists with
repeated items.

Why not? Having fixed the syntax and added the base-case, I see:

[alex@lancelot perex]$ python -i se.py
import random
x=4*range(5)
random.shuffle(x)
x [4, 4, 3, 1, 1, 1, 1, 2, 0, 0, 3, 4, 3, 4, 2, 0, 3, 0, 2, 2]
quicksort(x) [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4]

So what am I missing? Please show a case of "list with repeated
items" where the fixed quicksort "doesn't work correctly", thanks.

btw, my favourite fixed version is:

def quicksort(alist):
try: head = alist[0]
except IndexError: return alist
return quicksort([ x for x in alist[1:] if x<=head ]) + [
head ] + quicksort([ x for x in alist[1:] if x>head ])

but I do see that testing len(alist) is probably better.

How sweet it would be to be able to unpack by coding:
head, *tail = alist
....


Alex
 
D

David Eppstein

Alex Martelli said:
David Eppstein wrote:
...
def quicksort(alist):
head = alist[0]
return quicksort([ x in alist[1:] if x<=head ]) + [
head ] + quicksort([ x in alist[1:] if x>head ])

Well, except that it gets a syntax error. And if you correct the syntax
it gets an IndexError due to the lack of a base case for the recursion.

Right on both counts -- sorry.
And if you correct that, it still doesn't work correctly for lists with
repeated items.

Why not? Having fixed the syntax and added the base-case, I see:

Sorry, does work correctly, just not efficiently. I didn't see the = in
the first recursive call. But if you call your quicksort on say [0]*n,
each call will split the list as unevenly as possible and you get
quadratic runtime. Of course we know that nonrandom quicksort pivoting
can be quadratic anyway (e.g. on sorted input) but this to my mind is
worse because randomization doesn't make it any better. The fact that
some textbooks (e.g. CLRS!) make this mistake doesn't excuse it either.
 
D

Damien Wyart

* David Eppstein said:
Of course we know that nonrandom quicksort pivoting can be quadratic
anyway (e.g. on sorted input) but this to my mind is worse because
randomization doesn't make it any better. The fact that some textbooks
(e.g. CLRS!) make this mistake doesn't excuse it either.

Could you explain in more detail the error made in CLRS on this topic
(with a reference if possible) ? I did not precisely catch your
explanation here.

Thanks in advance,
 
M

Michael Hudson

Gerrit Holl said:
I do not seem to have received this message from Michael Hudson through
python-list. I am able to locate it on the newsgroup. Does anyone else
have this problem?

Hmm, I saw it :)

Google also has it:

http://groups.google.com/[email protected]

I don't really read the newsgroup closely enough any more to know if
any messages are going missing, I'm afraid.

Cheers,
mwh
 
D

David Eppstein

Damien Wyart said:
Could you explain in more detail the error made in CLRS on this topic
(with a reference if possible) ? I did not precisely catch your
explanation here.

Thanks in advance,

CLRS' partition routine ends up partitioning an n-item array into the
items <= the pivot, the pivot itself, and the items > the pivot.
If the items are all equal, that means that the first part gets n-1
items and the last part gets nothing, regardless of which item you
select as pivot. If you analyze the algorithm on this input, you get
the recurrence T(n)=O(n)+T(n-1)+T(0) which solves to O(n^2).

Throughout most of the quicksort chapter, CLRS at least include
exercises mentioning the case of all equal inputs, asking what happens
for those inputs, and in one exercise suggesting a partition routine
that might partition those inputs more equally (but who knows what it
should do when merely most of the inputs are equal...) However in the
randomized quicksort section they ignored this complication and seemed
to be claiming that their randomized quicksort has O(n log n) expected
time, unconditionally.

This problem was finally corrected in the fourth printing of the second
edition; see the errata at
<http://www.cs.dartmouth.edu/~thc/clrs-2e-bugs/bugs.php>
for more details.
 
D

David C. Fox

Greg said:
Indeed! I came across another use case for this
recently, as well. I was trying to parse some
command strings of the form

command arg arg ...

where some commands had args and some didn't.
I wanted to split the command off the front
and keep the args for later processing once
I'd decoded the command. My first attempt
went something like

command, args = cmdstring.split(" ", maxsplit = 1)

but this fails when there are no arguments,
because the returned list has only one element
in that case.

It would have been very nice to be able to
simply say

command, *args = cmdstring.split()

and get the command as a string and a
list of 0 or more argument strings.

I really ought to write a PEP about this...

In the mean time, it isn't too hard to write a function which does this:

def first_rest(x):
return x[0], x[1:]

command, args = first_rest(cmdstring.split())

or, in one step

def chop_word(s):
return first_rest(s.split())

command, args = chop_word(cmdstring)

David
 
L

Lulu of the Lotus-Eaters

|Alex Martelli wrote:
|> How sweet it would be to be able to unpack by coding:
|> head, *tail = alist

| command arg arg ...
| command, *args = cmdstring.split()
|and get the command as a string and a list of 0 or more argument strings.

I think I've written on this list before that I would like this
too--Ruby does this, IIRC. If anyone writes a PEP, you have my +1 in
advance.

For Greg's particular case, one approach that doesn't look too bad is:

command = args.pop(0)
# ... do stuff ...
try:
more = args.pop(0)
except IndexError:
# no more

For a really long list, it would be faster to do an 'args.reverse()' up
front, and '.pop()' off the end (Greg and Alex know all this, of course).

Yours, Lulu...
 
G

Greg Ewing (using news.cis.dfn.de)

Alex said:
How sweet it would be to be able to unpack by coding:
head, *tail = alist

Indeed! I came across another use case for this
recently, as well. I was trying to parse some
command strings of the form

command arg arg ...

where some commands had args and some didn't.
I wanted to split the command off the front
and keep the args for later processing once
I'd decoded the command. My first attempt
went something like

command, args = cmdstring.split(" ", maxsplit = 1)

but this fails when there are no arguments,
because the returned list has only one element
in that case.

It would have been very nice to be able to
simply say

command, *args = cmdstring.split()

and get the command as a string and a
list of 0 or more argument strings.

I really ought to write a PEP about this...
 
D

David Eppstein

"David C. Fox said:
In the mean time, it isn't too hard to write a function which does this:

def first_rest(x):
return x[0], x[1:]

Shouldn't that be called cons?
 
A

Alex Martelli

David said:
In the mean time, it isn't too hard to write a function which does this:

def first_rest(x):
return x[0], x[1:]

Shouldn't that be called cons?

Hmmm, feels more like the 'reverse' of a cons to me -- takes a list
and gives me the car and cdr...


Alex
 
A

Alex Martelli

Lulu of the Lotus-Eaters wrote:
...
|> head, *tail = alist

previously:
| command arg arg ...
| command, *args = cmdstring.split()
|and get the command as a string and a list of 0 or more argument strings.

I think I've written on this list before that I would like this
too--Ruby does this, IIRC. If anyone writes a PEP, you have my +1 in
advance.

For Greg's particular case, one approach that doesn't look too bad is:

command = args.pop(0)
# ... do stuff ...
try:
more = args.pop(0)
except IndexError:
# no more

I'm not sure what this 'more' is about, actually. Greg's case is
currently best solved [IMHO] with
args = cmdstring.split()
command = args.pop(0)
For a really long list, it would be faster to do an 'args.reverse()' up
front, and '.pop()' off the end (Greg and Alex know all this, of course).

Actually, I don't -- both args.reverse() and args.pop(0) are O(len(args)),
so I don't know their relative speeds offhand. Interestingly enough,
timeit.py, for once, doesn't want to let me know about them either...:

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags.pop(0)'
Traceback (most recent call last):
File "timeit.py", line 249, in main
x = t.timeit(number)
File "timeit.py", line 158, in timeit
return self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
x=ags.pop(0)
IndexError: pop from empty list
[alex@lancelot python2.3]$

....since each .pop is modifying the ags list, the once-only setup
statement doesn't suffice... OK, so we need to repeat (and thus,
alas, intrinsically measure) the list copying too (sigh):

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.5 usec per loop

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'ags.reverse(); x=ags[:].pop()'
10000 loops, best of 3: 23.2 usec per loop

the results of the two snippets aren't identical, but that should
not affect the timing measured (as the use of ags[:] and the
repetition are time-measurement artefacts anyway). So, it does
look like reversing first IS faster -- by a hair. Trying to
remove the ags[:] part / overhead gave me a shock, though...:

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 35.2 usec per loop

So -- it's clear to me that I do NOT understand what's going on
here. If just the ags[:] is 35.2 usec, how can the ags[:].pop(0)
be 24.5 ... ???

Tim...? Please HELP a fellow bot in need...!!!


Alex
 
M

Michael Hudson

Alex Martelli said:
So -- it's clear to me that I do NOT understand what's going on
here. If just the ags[:] is 35.2 usec, how can the ags[:].pop(0)
be 24.5 ... ???

How quiet was your machine when you ran the tests? I see behaviour
more in line with what you'd expect:

[mwh@pc150 build]$ ./python ../Lib/timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 31.6 usec per loop
[mwh@pc150 build]$ ./python ../Lib/timeit.py -s'ags=range(1000)' 'x=ags[:].pop(0)'
10000 loops, best of 3: 39.8 usec per loop

Cheers,
mwh
 
D

David Eppstein

Alex Martelli said:
David said:
In the mean time, it isn't too hard to write a function which does this:

def first_rest(x):
return x[0], x[1:]

Shouldn't that be called cons?

Hmmm, feels more like the 'reverse' of a cons to me -- takes a list
and gives me the car and cdr...

Well, but it returns an object composed of a car and a cdr, which is
exactly what a cons is...
 
A

Alex Martelli

Michael said:
Alex Martelli said:
So -- it's clear to me that I do NOT understand what's going on
here. If just the ags[:] is 35.2 usec, how can the ags[:].pop(0)
be 24.5 ... ???

How quiet was your machine when you ran the tests? I see behaviour

Very, and the numbers were highly repeatable:

-- running just now:

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 36.9 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 35.2 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 35.8 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 35.6 usec per loop

-- and from a copy & paste of the screen as of a while ago:

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.5 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.5 usec per loop

-- _however_ -- retrying the latter now:

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 48.8 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 46.1 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 46.5 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.5 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.4 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.5 usec per loop

-- _SO_ -- immediately going back to the just-copy tests...:

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
100000 loops, best of 3: 19.3 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
100000 loops, best of 3: 19.3 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
100000 loops, best of 3: 19.3 usec per loop


SO -- so much for the "quiet"... clearly there _IS_ something periodically
running and distorting the elapsed-time measurements (which are the default
here on Linux) by a hefty factor of 2. So much for this kind of casual
benchmarking...!-) I guess I've been doing a little too much of it...

more in line with what you'd expect:

[mwh@pc150 build]$ ./python ../Lib/timeit.py -s'ags=range(1000)'
['x=ags[:]'
10000 loops, best of 3: 31.6 usec per loop
[mwh@pc150 build]$ ./python ../Lib/timeit.py -s'ags=range(1000)'
['x=ags[:].pop(0)'
10000 loops, best of 3: 39.8 usec per loop

Yep, makes more sense. So, moving to the more-stable -c (CPU time,
as given by time.clock):

[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:]'
100000 loops, best of 3: 19.2 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:]'
100000 loops, best of 3: 19.2 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:]'
100000 loops, best of 3: 19.2 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 23 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'ags.reverse(); x=ags[:].pop()'
10000 loops, best of 3: 23 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'ags.reverse(); x=ags[:].pop()'
10000 loops, best of 3: 22 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'ags.reverse(); x=ags[:].pop()'
10000 loops, best of 3: 23 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'ags.reverse(); x=ags[:].pop()'
10000 loops, best of 3: 23 usec per loop

it WOULD seem to be confirmed that reverse-then-pop() is VERY
slightly faster than pop(0) -- 22/23 instead of 23/24 usec for
a 1000-long list... of which 19.2 are the apparently-repeatable
overhead of copying that list. I still wouldn't say that I
"KNOW" this, though -- the margin is SO tiny and uncertain...!!!

It seems to scale up linearly going from 1000 to 5000: just
the copy, 105-108; ags[:].pop(0), 120-123; reverse then pop,
116-117. This is always with the -c (once burned...).


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top