is there a better way?

C

Carl Friedrich Bolz

Alex said:
Problem:

You have a list of unknown length, such as this: list =
[X,X,X,O,O,O,O]. You want to extract all and only the X's. You know
the X's are all up front and you know that the item after the last X is
an O, or that the list ends with an X. There are never O's between
X's.


If the list is incredibly long, you should use a bisection approach.
Standard module bisect in the Python library could help, but mostly as
an _example_, since it unfortunately relies on normal alphabetic order,
and alphabetically speaking X should come _after_ O, not _before_.

But the algorithm is still sound:

1. look at the midpoint.
2. if it's an X, so are all previous items -- recurse to second half
3. if it's an O, so are all following items -- recurse to first half

Getting all conditions exactly right is tricky (which is why bisect is a
good model!), but this way you get O(log N) performance for a list of
length N.

If N is not too huge, O(N) might be OK, and is, of course, way simpler
to code!-)

The code would look something like this:

low = 0
high = len(L)
while low < high:
mid = (low + high) // 2
if L[mid] == 0:
high = mid
else:
low = mid + 1
list_of_X = L[:low]


Cheers,

Carl Friedrich Bolz
 
A

Alex Martelli

Charles Krug said:
Problem:

You have a list of unknown length, such as this: list =
[X,X,X,O,O,O,O]. You want to extract all and only the X's. You know
the X's are all up front and you know that the item after the last X is
an O, or that the list ends with an X. There are never O's between
X's.

If the list is incredibly long, you should use a bisection approach.
Standard module bisect in the Python library could help, but mostly as
an _example_, since it unfortunately relies on normal alphabetic order,
and alphabetically speaking X should come _after_ O, not _before_.

Isn't every call to list.index() an O(n) operation? We certainly want
to avoid multiple calls there if we can.

Why should we have ANY call to index whatsoever?
What happens if your split occurs in the middle of your block of Xs?
Then the split before/after fails --the stated algorithm says, "If the
split is an X, choose the front half," so perhaps the statement was
inprecise?

What statement? What are you TALKING about?! What I said, and you
don't quote, was:

how can you POSSIBLY read this as "choose the front half"?! I say to
recurse (iteration works, too, but it's even trickier to code) to the
SECOND half, to find the first-if-any non-'X'.
The only way you'll know if you have an X in a particular block is using
a linear search method, either in Python or with list.index()

Reread markscala's problem statement: all the Xs are up front followed
by 0 or more Os. So, the list is L = ['X']*N + ['O']*M for unknown N>=0
and M>=0. All we have to do is find N and M (if we know either, the
other is immediately computed, since N+M==len(L) and len() is O(1)).
If (as the OP seemed to state) we KNOW that there's only one block of
X's in the list:

1. Find the index of the first X

Why would we do that? He stated, very clearly, and you and I have both
been quoting:

So why would we do any work to find out what we altrady know?
2. Find the index of the last X.

Yep, the only certain task, and it's O(log(N+M)).
3. Delete the block we've found.

And the deletion is definitely linear time (and trivial), of course. I
was focusing on the only interesting part, (2).
Judging from way the OP worded the question, I'd advise making something
that works and that you understand how it works.

After that, s/he can worry about whether or not its performance is
suboptimal.

And indeed, part of what I said (and again you're snipping it rather
than quoting it was:

However, even though the O(N) in the deletion subtask would appear to
justify this shortcut, I think the task is way too trivial to justify a
linear-time approach to point 2 -- the obvious N = L.count('X'), of
course. It seems likely that the whole purpose of the exercise
(probably homework) is to have the student identify and develop a
bisection (a notoriously tricky-to-code thing).


Alex
 
B

Bruno Desthuilliers

(e-mail address removed) a écrit :
Problem:

You have a list of unknown length,

This doesn't exist in Python:
len(alist)
such as this: list =
[X,X,X,O,O,O,O]. You want to extract all and only the X's.

braindead solution - relying on zeros being zeros or any other False value:
all_xxx = filter(None, [X,X,X,O,O,O,O])
You know
the X's are all up front and you know that the item after the last X is
an O, or that the list ends with an X. There are never O's between
X's.

Then it *may* be possible to optimize - but code will break as soon as
this rule change. So is there a real need for optimisation ? (hint: dont
guess, profile)


FWIW, I've collected all suggestions here (and perhaps some others) and
hacked a Q&D benchmark. Usage is:

python test_lists.py [repeat [list_size [xcount]]

where:
* repeat is the number of iteration, default to 1000
* list_size is the size of the list, default to 100
* xcount is the number or non-zero values in the list, default is random

I've run it a few times, and it seems that in most cases,
* the bisect-like approach (Alex Martelli / Karl Friedrich Bolz) is
the winner
* the while/pop approach of the OP (slightly rewritten...) is really wrong

FWIW, one of the dummiest possible approach IMHO
(ie filter(None, alist)) behaves quite correctly wrt some other
'optimized' approachs - and is still safe if the rules about the
composition of the list changes... Could it be that no optimization is
sometime the best optimisation ?-)


# test_lists.py
from itertools import takewhile
from timeit import Timer
from random import randint

def get_list(size, xcount=None):
if xcount is None:
xcount = randint(1, size)
else:
assert xcount <= size
zcount = size - xcount
return [1] * xcount + [0] * zcount

def with_while(alist):
res = []
while alist and alist[0]:
res.append(alist.pop(0))
return res

def with_for(alist):
res = []
for x in alist:
if not x: break
res.append(x)
return res

def with_filter(alist):
return filter(None, alist)

def with_comprehension(alist):
return [x for x in alist if x]

def with_takewhile(alist):
return list(takewhile(lambda x: x!=0, alist))

def with_slice_try(alist):
try:
return alist[:alist.index(0)]
except ValueError:
return alist[:]

def with_slice_safe(alist):
alist.append(0)
return alist[:alist.index(0)]

def with_delslice_safe(alist):
alist.append(0)
del alist[alist.index(0):]
return alist

def with_sect(alist):
low = 0
high = len(alist)
while low < high:
mid = (low + high) // 2
if alist[mid] == 0:
high = mid
else:
low = mid + 1
return alist[:low]

_candidates = [n for n, o in locals().copy().items() \
if n.startswith('with_') and callable(o)]

def run_test(repeat=1000, list_size='100', xcount='None'):
global _candidate

print """
params :
* repeat : %s
* list_size : %s
* xcounts : %s
""" % (repeat, list_size, xcount)
results = {}
for n in _candidates:
stm, stp = ('%s(get_list(%s, %s))' % (n, list_size, xcount),
'from __main__ import %s, get_list' % n)
results[n] = Timer(stm, stp).timeit(repeat)

sorted_results = sorted([(time, (n, time)) \
for n, time in results.items()])
for _, result in sorted_results:
print "%s : %s" % result


def main(args):
try:
repeat = int(args.pop(0))
except:
repeat = 1000
try:
list_size = args.pop(0)
except:
list_size = '100'
try:
xcount = args.pop(0)
except:
xcount = 'None' # -> random

run_test(repeat, list_size, xcount)

if __name__ == '__main__':
import sys
main(sys.argv[1:])


HTH
 
D

drrngrvy

Bruno said:
(e-mail address removed) a écrit :
Problem:

You have a list of unknown length,

This doesn't exist in Python:
len(alist)
such as this: list > [X,X,X,O,O,O,O]. You want to extract all and only the X's.

braindead solution - relying on zeros being zeros or any other False value:
all_xxx
You know
the X's are all up front and you know that the item after the last X is
an O, or that the list ends with an X. There are never O's between
X's.

Then it *may* be possible to optimize - but code will break as soon as
this rule change. So is there a real need for optimisation ? (hint: dont
guess, profile)


FWIW, I've collected all suggestions here (and perhaps some others) and
hacked a Q&D benchmark. Usage is:

python test_lists.py [repeat [list_size [xcount]]

where:
* repeat is the number of iteration, default to 1000
* list_size is the size of the list, default to 100
* xcount is the number or non-zero values in the list, default is random

I've run it a few times, and it seems that in most cases,
* the bisect-like approach (Alex Martelli / Karl Friedrich Bolz) is
the winner
* the while/pop approach of the OP (slightly rewritten...) is really wrong

FWIW, one of the dummiest possible approach IMHO
(ie filter(None, alist)) behaves quite correctly wrt some other
'optimized' approachs - and is still safe if the rules about the
composition of the list changes... Could it be that no optimization is
sometime the best optimisation ?-)


# test_lists.py
from itertools import takewhile
from timeit import Timer
from random import randint

def get_list(size, xcount=None):
if xcount is None:
xcount else:
assert xcount < zcount return [1] * xcount + [0] * zcount

def with_while(alist):
res while alist and alist[0]:
res.append(alist.pop(0))
return res

def with_for(alist):
res for x in alist:
if not x: break
res.append(x)
return res

def with_filter(alist):
return filter(None, alist)

def with_comprehension(alist):
return [x for x in alist if x]

def with_takewhile(alist):
return list(takewhile(lambda x: x!=0, alist))

def with_slice_try(alist):
try:
return alist[:alist.index(0)]
except ValueError:
return alist[:]

def with_slice_safe(alist):
alist.append(0)
return alist[:alist.index(0)]

def with_delslice_safe(alist):
alist.append(0)
del alist[alist.index(0):]
return alist

def with_sect(alist):
low high while low < high:
mid if alist[mid] = 0:
high else:
low return alist[:low]

_candidates if n.startswith('with_') and callable(o)]

def run_test(repeat00, list_size='100', xcount='None'):
global _candidate

print """
params :
* repeat : %s
* list_size : %s
* xcounts : %s
""" % (repeat, list_size, xcount)
results for n in _candidates:
stm, stp 'from __main__ import %s, get_list' % n)
results[n]
sorted_results for n, time in results..items()])
for _, result in sorted_results:
print "%s : %s" % result


def main(args):
try:
repeat except:
repeat try:
list_size except:
list_size try:
xcount except:
xcount
run_test(repeat, list_size, xcount)

if __name__ = '__main__':
import sys
main(sys.argv[1:])


HTH
 
M

Magnus Lycka

Problem:

You have a list of unknown length, such as this: list =
[X,X,X,O,O,O,O]. You want to extract all and only the X's. You know
the X's are all up front and you know that the item after the last X is
an O, or that the list ends with an X. There are never O's between
X's.

I have been using something like this:
_____________________

while list[0] != O:
storage.append(list[0])
list.pop(0)
if len(list) == 0:
break
_____________________

But this seems ugly to me, and using "while" give me the heebies. Is
there a better approach?

It seems few suggested solutions actually do the same thing as
your code shows. I think this does. (Untested)

try:
ix = aList.index(O)
storage, aList = aList[:ix], aList[ix:]
except ValueError:
# No O's
storage, aList = aList, []


The main advantage with my approach over yours is that
all iterations, all the looping, takes place in fast C
code, unless your list elements are Python classes that
implement __cmp__ etc. If testing 'aList[0] == O'
involves Python, things might slow down a bit...

..index() takes linear time, but it's fairly fast. As
Alex suggested, you might want to replace it with a
O(log n) solution for big lists. You might need rather
big lists for the Python based O(log n) to get faster
than the C based O(n) though. Measure.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,265
Latest member
TodLarocca

Latest Threads

Top