Simple Problem but tough for me if i want it in linear time

C

ChrisChia

dataList = [a, b, c, ...]
where a, b, c are objects of a Class X.
In Class X, it contains self.name and self.number

If i wish to test whether a number (let's say 100) appears in one of
the object, and return that object,
is that only fast way of solving this problem without iterating
through every object to see the number value?

dataList.__contains__ can only check my object instance name...
anyone can solve this in linear complexity?
 
P

Peter Otten

ChrisChia said:
dataList = [a, b, c, ...]
where a, b, c are objects of a Class X.
In Class X, it contains self.name and self.number

If i wish to test whether a number (let's say 100) appears in one of
the object, and return that object,
is that only fast way of solving this problem without iterating
through every object to see the number value?

dataList.__contains__ can only check my object instance name...
anyone can solve this in linear complexity?

Well, iteration as in

next(item for item in dataList if item.number == 100)

is O(n) and list.__contains__() has no magic way to do better. If you need
O(1) lookup you can use a dict or collections.defaultdict that maps
item.number to a list (or set) of X instances:

lookup = {}
for item in dataList:
lookup.setdefault(item.number, []).append(item)

print lookup[100]


Peter
 
M

MRAB

ChrisChia said:
dataList = [a, b, c, ...]
where a, b, c are objects of a Class X.
In Class X, it contains self.name and self.number

If i wish to test whether a number (let's say 100) appears in one of
the object, and return that object,
is that only fast way of solving this problem without iterating
through every object to see the number value?

dataList.__contains__ can only check my object instance name...
anyone can solve this in linear complexity?

Put them into a dict where the key is the number and the value is a list
of the objects with that number.
 
S

Steven D'Aprano

dataList = [a, b, c, ...]
where a, b, c are objects of a Class X. In Class X, it contains
self.name and self.number

If i wish to test whether a number (let's say 100) appears in one of the
object, and return that object,
is that only fast way of solving this problem without iterating through
every object to see the number value?

dataList.__contains__ can only check my object instance name...
anyone can solve this in linear complexity?

If all you want is *linear* time, then simply iterating over the items
will do the job:

for item in dataList:
if item.number == 100:
do_something_with(item)
break


If you want *less* than linear time, then keep the list sorted and do a
binary search, which gives you O(log N) time. Don't sort the list each
time, because that's O(N*log N). Or use a dict, which gives you O(1) time.
 
F

Frederic Rentsch

ChrisChia said:
dataList = [a, b, c, ...]
where a, b, c are objects of a Class X.
In Class X, it contains self.name and self.number

If i wish to test whether a number (let's say 100) appears in one of
the object, and return that object,
is that only fast way of solving this problem without iterating
through every object to see the number value?

dataList.__contains__ can only check my object instance name...
anyone can solve this in linear complexity?

Well, iteration as in

next(item for item in dataList if item.number == 100)

is O(n) and list.__contains__() has no magic way to do better. If you need
O(1) lookup you can use a dict or collections.defaultdict that maps
item.number to a list (or set) of X instances:

lookup = {}
for item in dataList:
lookup.setdefault(item.number, []).append(item)

print lookup[100]


Peter

How about
[obj for obj in dataList if obj.number == 100]

That should create a list of all objects whose .number is 100. No need
to cycle through a loop. If .number doesn't repeat get your object at
index 0. The approach may seem inefficient for the purpose of extracting
a single item, but the list needs to be gone through in any case and the
list comprehension is surely the most efficient way to do it.

Frederic
 
F

Frederic Rentsch

How about
[obj for obj in dataList if obj.number == 100]

That should create a list of all objects whose .number is 100. No need
to cycle through a loop.

What do you think the list comprehension does, if not cycle through a
loop?

What I think is that list comprehensions cycle through a loop a lot
faster than a coded loop (for n in ...:). As at the time of my post only
coded loops had been proposed and the OP was concerned about speed, I
thought I'd propose a list comprehension. I guess my explanation was
poorly phrased. Thanks for the reminder.

Frederic
 
S

Steven D'Aprano

How about

[obj for obj in dataList if obj.number == 100]

That should create a list of all objects whose .number is 100. No
need to cycle through a loop.

What do you think the list comprehension does, if not cycle through a
loop?

What I think is that list comprehensions cycle through a loop a lot
faster than a coded loop (for n in ...:).

I think measurement beats intuition:


[steve@wow-wow ~]$ python -m timeit '[str(i) for i in xrange(100000)]'
10 loops, best of 3: 84 msec per loop

[steve@wow-wow ~]$ python -m timeit 'L=[]
for i in xrange(100000):
L.append(str(i))
'
10 loops, best of 3: 105 msec per loop


But wait... we're not comparing apples with apples. There's an extra name
lookup in the for-loop that the list comp doesn't have. We can fix that:


[steve@wow-wow ~]$ python -m timeit 'L=[]; append = L.append
for i in xrange(100000):
append(str(i))
'
10 loops, best of 3: 86.7 msec per loop


The difference between 84 and 86 msec is essentially measurement error.
Hell, the difference between 84 and 104 msec is not terribly significant
either.


As at the time of my post only
coded loops had been proposed and the OP was concerned about speed, I
thought I'd propose a list comprehension.

Yes, but the OP was concerned with asymptotic speed (big-oh notation),
and both a for-loop and a list-comp are both O(N).

Frankly, I think the OP doesn't really know what he wants, other than
premature optimization. It's amazing how popular that is :)
 
T

Tim Chase

Frankly, I think the OP doesn't really know what he wants, other than
premature optimization. It's amazing how popular that is :)

You see, the trick to prematurely optimizing is to have a good
algorithm for prematurely optimizing...the real question them
becomes "How can I optimize my premature-optimization algorithms
to O(1) instead of O(newsgroup)?"

:)

-tkc


PS: I'm not positive, but O(newsgroup) may asymptotically
approach O(log n) if the question is well formed, but O(2^n) if
flaming, indentation/line-length preferences, the meaning of OOP,
SQL-parameter escaping, McNugget combinations, or suggestions
that Python is "just a scripting language" are involved...
 
A

Aahz

You see, the trick to prematurely optimizing is to have a good
algorithm for prematurely optimizing...the real question them
becomes "How can I optimize my premature-optimization algorithms
to O(1) instead of O(newsgroup)?"

:)

-tkc

PS: I'm not positive, but O(newsgroup) may asymptotically
approach O(log n) if the question is well formed, but O(2^n) if
flaming, indentation/line-length preferences, the meaning of OOP,
SQL-parameter escaping, McNugget combinations, or suggestions that
Python is "just a scripting language" are involved...

+1 QOTW
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,444
Messages
2,571,709
Members
48,796
Latest member
Greg L.
Top