Can I overload the compare (cmp()) function for a Lists ([]) index function?

X

xkenneth

Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

I assume it would be something like...

list.index(something,mycmp)

Thanks!
 
X

xkenneth

Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

I assume it would be something like...

list.index(something,mycmp)

Thanks!

or can i just say....

list.index.__cmp__ = mycmp

and do it that way? I just want to make sure I'm not doing anything
evil.
 
I

irstas

Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

I assume it would be something like...

list.index(something,mycmp)

Thanks!

Wouldn't it be enough to get the items that are "within a couple of
seconds" out of the list and into another list. Then you can process
the other list however you want. Like this:

def isNew(x):
return x < 5

data = range(20)
print data
out, data = filter(isNew, data), filter(lambda x: not isNew(x), data)
print out, data

Why do you want to use 'index'?

Your suggestion "list.index.__cmp__ = mycmp" certainly doesn't do
anything good. In fact, it just fails because the assignment is
illegal.. I don't think any documentation suggests doing that, so why
are you even trying to do that? It's just not a good idea to invent
semantics and hope that they work, in general.
 
S

Steven Bethard

Wouldn't it be enough to get the items that are "within a couple of
seconds" out of the list and into another list. Then you can process
the other list however you want. Like this:

def isNew(x):
return x < 5

data = range(20)
print data
out, data = filter(isNew, data), filter(lambda x: not isNew(x), data)
print out, data

Slightly off topic here, but these uses of filter will be slower than
the list comprehension equivalents::

out = [x for x in data if x < 5]
data = [x for x in data if x >= 5]

Here are sample timings::

$ python -m timeit -s "data = range(20)" -s "def is_new(x): return x <
5" "filter(is_new, data)"
100000 loops, best of 3: 5.05 usec per loop
$ python -m timeit -s "data = range(20)" "[x for x in data if x < 5]"
100000 loops, best of 3: 2.15 usec per loop

Functions like filter() and map() are really only more efficient when
you have an existing C-coded function, like ``map(str, items)``. Of
course, if the filter() code is clearer to you, feel free to use it, but
I find that most folks find list comprehensions easier to read than
map() and filter() code.

STeVe
 
P

Paul Rubin

xkenneth said:
Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

This sounds like you want itertools.groupby. What is the exact
requirement?
 
H

Hrvoje Niksic

xkenneth said:
Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

I assume it would be something like...

list.index(something,mycmp)

The obvious option is reimplementing the functionality of index as an
explicit loop, such as:

def myindex(lst, something, mycmp):
for i, el in enumerate(lst):
if mycmp(el, something) == 0:
return i
raise ValueError("element not in list")

Looping in Python is slower than looping in C, but since you're
calling a Python function per element anyway, the loop overhead might
be negligible.

A more imaginative way is to take advantage of the fact that index
uses the '==' operator to look for the item. You can create an object
whose == operator calls your comparison function and use that object
as the argument to list.index:

class Cmp(object):
def __init__(self, item, cmpfun):
self.item = item
self.cmpfun = cmpfun
def __eq__(self, other):
return self.cmpfun(self.item, other) == 0

# list.index(Cmp(something, mycmp))

For example:
def mycmp(s1, s2): .... return cmp(s1.tolower(), s2.tolower())
['foo', 'bar', 'baz'].index(Cmp('bar', mycmp)) 1
['foo', 'bar', 'baz'].index(Cmp('Bar', mycmp)) 1
['foo', 'bar', 'baz'].index(Cmp('nosuchelement', mycmp))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.index(x): x not in list

The timeit module shows, somewhat surprisingly, that the first method
is ~1.5 times faster, even for larger lists.
 
G

Gabriel Genellina

The comparison is made by the list elements themselves (using their __eq__
or __cmp__), not by the index method nor the list object.
So you should modify __cmp__ for all your timestamps (datetime.datetime, I
presume?), but that's not very convenient. A workaround is to wrap the
object you are searching into a new, different class - since the list
items won't know how to compare to it, Python will try reversing the
operands.
datetime objects are a bit special in this behavior: they refuse to
compare to anything else unless the other object has a `timetuple`
attribute (see <http://docs.python.org/lib/datetime-date.html> note (4))

<code>
import datetime

class datetime_tol(object):
timetuple=None # unused, just to trigger the reverse comparison to
datetime objects
default_tolerance = datetime.timedelta(0, 10)

def __init__(self, dt, tolerance=None):
if tolerance is None:
tolerance = self.default_tolerance
self.dt = dt
self.tolerance = tolerance

def __cmp__(self, other):
tolerance = self.tolerance
if isinstance(other, datetime_tol):
tolerance = min(tolerance, other.tolerance)
other = other.dt
if not isinstance(other, datetime.datetime):
return cmp(self.dt, other)
delta = self.dt-other
return -1 if delta<-tolerance else 1 if delta>tolerance else 0

def index_tol(dtlist, dt, tolerance=None):
return dtlist.index(datetime_tol(dt, tolerance))


d1 = datetime.datetime(2007, 7, 18, 9, 20, 0)
d2 = datetime.datetime(2007, 7, 18, 9, 30, 25)
d3 = datetime.datetime(2007, 7, 18, 9, 30, 30)
d4 = datetime.datetime(2007, 7, 18, 9, 30, 35)
d5 = datetime.datetime(2007, 7, 18, 9, 40, 0)
L = [d1,d2,d3,d4,d5]

assert d3 in L
assert L.index(d3)==2
assert L.index(datetime_tol(d3))==1 # using 10sec tolerance
assert index_tol(L, d3)==1
assert index_tol(L, datetime.datetime(2007, 7, 18, 9, 43, 20),
datetime.timedelta(0, 5*60))==4 # 5 minutes tolerance
</code>
 
A

alan.haffner

xkenneth said:
Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?
I assume it would be something like...
list.index(something,mycmp)

The obvious option is reimplementing the functionality of index as an
explicit loop, such as:

def myindex(lst, something, mycmp):
for i, el in enumerate(lst):
if mycmp(el, something) == 0:
return i
raise ValueError("element not in list")

Looping in Python is slower than looping in C, but since you're
calling a Python function per element anyway, the loop overhead might
be negligible.

A more imaginative way is to take advantage of the fact that index
uses the '==' operator to look for the item. You can create an object
whose == operator calls your comparison function and use that object
as the argument to list.index:

class Cmp(object):
def __init__(self, item, cmpfun):
self.item = item
self.cmpfun = cmpfun
def __eq__(self, other):
return self.cmpfun(self.item, other) == 0

# list.index(Cmp(something, mycmp))

For example:

... return cmp(s1.tolower(), s2.tolower())>>> ['foo', 'bar', 'baz'].index(Cmp('bar', mycmp))
1
['foo', 'bar', 'baz'].index(Cmp('Bar', mycmp)) 1
['foo', 'bar', 'baz'].index(Cmp('nosuchelement', mycmp))

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.index(x): x not in list

The timeit module shows, somewhat surprisingly, that the first method
is ~1.5 times faster, even for larger lists.

Hrvoje,

That's fun! thx.

--Alan

the cut-n-paste version /w minor fix to 'lower'.
# ----------------------------------------------
class Cmp(object):
def __init__(self, item, cmpfun):
self.item = item
self.cmpfun = cmpfun
def __eq__(self, other):
return self.cmpfun(self.item, other) == 0


def mycmp(s1, s2):
return cmp(s1.lower(), s2.lower())


print ['foo', 'bar', 'baz'].index(Cmp('bar', mycmp))
print ['foo', 'bar', 'baz'].index(Cmp('Bar', mycmp))
try:
print ['foo', 'bar', 'baz'].index(Cmp('nosuchelement', mycmp))
except ValueError:
print "Search String not found!"

# end example
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top