Iterating across a filtered list

Drew · Mar 13, 2007

All -

I'm currently writing a toy program as I learn python that acts as a
simple address book. I've run across a situation in my search function
where I want to iterate across a filtered list. My code is working
just fine, but I'm wondering if this is the most "elegant" way to do
this. Essentially, I'm searching the dict self.contacts for a key that
matches the pattern entered by the user. If so, I print the value
associated with that key. A pastie to the method is below, any help/
advice is appreciated:

http://pastie.caboo.se/46647

Side note: I'm learning python after ruby experience. In ruby I would
do something like:

contacts.find_all{|name,contact| name =~ /search/}.each{|name,contact|
puts contact}

Paul Rubin · Mar 13, 2007

Drew said:
I'm currently writing a toy program as I learn python that acts as a
simple address book. I've run across a situation in my search function
where I want to iterate across a filtered list. My code is working
just fine, but I'm wondering if this is the most "elegant" way to do
this. Essentially, I'm searching the dict self.contacts for a key that
matches the pattern entered by the user. If so, I print the value
associated with that key. A pastie to the method is below, any help/
advice is appreciated:

If I can decipher your Ruby example (I don't know Ruby), I think you
want:

for name,contact in contacts.iteritems():
if re.search('search', name):
print contact

If you just want to filter the dictionary inside an expression, you
can use a generator expression:

d = ((name,contact) for (name,contact) in contacts.iteritems() \
if re.search('search', name))

print '\n'.join(d) # prints items from filtered dict, one per line

Note that d is an iterator, which means it mutates when you step
through it.

Drew · Mar 13, 2007

If I can decipher your Ruby example (I don't know Ruby), I think you
want:

for name,contact in contacts.iteritems():
if re.search('search', name):
print contact

If you just want to filter the dictionary inside an expression, you
can use a generator expression:

d = ((name,contact) for (name,contact) in contacts.iteritems() \
if re.search('search', name))

print '\n'.join(d) # prints items from filtered dict, one per line

Note that d is an iterator, which means it mutates when you step
through it.

Paul -

You're exactly on the mark. I guess I was just wondering if your first
example (that is, breaking the if statement away from the iteration)
was preferred rather than initially filtering and then iterating.
However, you're examples make a lot of sense are are quite helpful.

Thanks,
Drew

Arnaud Delobelle · Mar 13, 2007

All - Hi!

[snip]
http://pastie.caboo.se/46647

There is no need for such a convoluted list comprehension as you
iterate over it immediately! It is clearer to put the filtering logic
in the for loop. Moreover you recalculate the regexp for each element
of the list. Instead I would do something like this:

def find(search_str, flags=re.IGNORECASE):
print "Contact(s) found:"
search = re.compile(search_str, flags).search
for name, contact in self.contacts.items():
if search(name):
print contact
print

Although I would rather have one function that returns the list of all
found contacts:

def find(search_str, flags=re.IGNORECASE):
search = re.compile(search_str, flags).search
for name, contact in self.contacts.items():
if search(name):
yield contact

And then another one that prints it.

Side note: I'm learning python after ruby experience. In ruby I would
do something like:

contacts.find_all{|name,contact| name =~ /search/}.each{|name,contact|
puts contact}

And that's why you're right to learn Python

HTH

Paul Rubin · Mar 13, 2007

Drew said:
You're exactly on the mark. I guess I was just wondering if your first
example (that is, breaking the if statement away from the iteration)
was preferred rather than initially filtering and then iterating.

I think the multiple statement version is more in Python tradition.
Python is historically an imperative, procedural language with some OO
features. Iterators like that are a new Python feature and they have
some annoying characteristics, like the way they mutate when you touch
them. It's usually safest to create and consume them in the same
place, e.g. creating some sequence and passing it through map, filter, etc.

Paul Rubin · Mar 13, 2007

Arnaud Delobelle said:
in the for loop. Moreover you recalculate the regexp for each element
of the list.

The re library caches the compiled regexp, I think.

Arnaud Delobelle · Mar 13, 2007

The re library caches the compiled regexp, I think.

That would surprise me.
How can re.search know that string.lower(search) is the same each
time? Or else there is something that I misunderstand.

Moreover:

In [49]: from timeit import Timer
In [50]: Timer('for i in range(1000): search("abcdefghijk")', 'import
re; search=re.compile("ijk").search').timeit(100)
Out[50]: 0.36964607238769531

In [51]: Timer('for i in range(1000): re.search("ijk",
"abcdefghijk")', 'import re;
search=re.compile("ijk").search').timeit(100)
Out[51]: 1.4777300357818604

Paul Rubin · Mar 13, 2007

Bruno Desthuilliers said:
I don't know if I qualify as a Python traditionalist, but I'm using
Python since the 1.5.2 days, and I usually favor list comps or
generator expressions over old-style loops when it comes to this kind
of operations.

I like genexps when they're nested inside other expressions so they're
consumed as part of the evaluation of the outer expression. They're a
bit scary when the genexp-created iterator is saved in a variable.

Listcomps are different, they allocate storage for the entire list, so
they're just syntax sugar for a loop. They have an annoying
misfeature of their

Python has had functions as first class objects and
(quite-limited-but) anonymous functions, map(), filter() and reduce()
as builtin funcs at least since 1.5.2 (quite some years ago).

True, though no iterators so you couldn't easily use those functions
on lazily-evaluated streams like you can now.

Iterators like that are a new Python feature

Click to expand...

List comps are not that new (2.0 or 2.1 ?):
print "\n".join([contact for name, contact in contacts.items() \
if search.match(name)])

Well you could do it that way but it allocates the entire filtered
list in memory. In this example "\n".join() also builds up a string
in memory, but you could do something different, like run the sequence
through another filter or print out one element at a time, in which
case lazy evaluation can be important (imagine that contacts.iteritems
chugs through a billion row table in an SQL database).

Safest ? Why so ?

Just that things can get confusing if you're consuming the iterator in
more than one place. It can get to be like those old languages where
you had to do your own storage management ;-).

Arnaud Delobelle · Mar 13, 2007

Paul Rubin a écrit :
[snip]

Iterators like that are a new Python feature

Click to expand...

List comps are not that new (2.0 or 2.1 ?):
print "\n".join([contact for name, contact in contacts.items() \
if search.match(name)])

You can write this, but:
* it is difficult to argue that it is more readable than Paul's (or
my) 'imperative' version;
* it has no obvious performance benefit, in fact it creates a list
unnecessarily (I know you could use a generator with recent python).

While sequences are iterables, all iterables are not sequences. Know
what you use, and you'll be fine.

....And know when to use for statements

Bruno Desthuilliers · Mar 13, 2007

Paul Rubin a écrit :

I think the multiple statement version is more in Python tradition.

I don't know if I qualify as a Python traditionalist, but I'm using
Python since the 1.5.2 days, and I usually favor list comps or generator
expressions over old-style loops when it comes to this kind of operations.

Python is historically an imperative, procedural language with some OO
features.

Python has had functions as first class objects and (quite-limited-but)
anonymous functions, map(), filter() and reduce() as builtin funcs at
least since 1.5.2 (quite some years ago).

Iterators like that are a new Python feature

List comps are not that new (2.0 or 2.1 ?):
print "\n".join([contact for name, contact in contacts.items() \
if search.match(name)])

and they have
some annoying characteristics, like the way they mutate when you touch
them.

While sequences are iterables, all iterables are not sequences. Know
what you use, and you'll be fine.

It's usually safest to create and consume them in the same
place, e.g. creating some sequence and passing it through map, filter, etc.

Safest ? Why so ?

Gabriel Genellina · Mar 13, 2007

En Tue said:
I'm currently writing a toy program as I learn python that acts as a
simple address book. I've run across a situation in my search function
where I want to iterate across a filtered list. My code is working
just fine, but I'm wondering if this is the most "elegant" way to do
this. Essentially, I'm searching the dict self.contacts for a key that
matches the pattern entered by the user. If so, I print the value
associated with that key. A pastie to the method is below, any help/
advice is appreciated:

http://pastie.caboo.se/46647

Side note: I'm learning python after ruby experience. In ruby I would
do something like:

contacts.find_all{|name,contact| name =~ /search/}.each{|name,contact|
puts contact}

Just a few changes:

def find(self, search):
search_re = re.compile(search, re.IGNORECASE)
for result in [self.contacts[name] for name in self.contacts if
search_re.match(name)]:
print result

- you can iterate directly over a dictionary keys using: for key in dict
- you can compile a regexp to re-use it in all loops; using re.IGNORECASE,
you don't need to explicitely convert all to lowercase before comparing
- if all you want to do is to print the results, you can even avoid the
for loop:

print '\n'.join('%s' % self.contacts[name] for name in self.contacts
if search_re.match(name))

Arnaud Delobelle · Mar 13, 2007

On Mar 13, 8:59 pm, "Gabriel Genellina" <[email protected]>
wrote:
[snip]

def find(self, search):
search_re = re.compile(search, re.IGNORECASE)
for result in [self.contacts[name] for name in self.contacts if
search_re.match(name)]:
print result

I do not see how

for y in [f(x) for x in L if g(x)]:
do stuff with y

can be preferable to

for x in L:
if g(x):
do stuff with f(x)

What can be the benefit of creating a list by comprehension for the
sole purpose of iterating over it?

Gabriel Genellina · Mar 13, 2007

En Tue, 13 Mar 2007 17:19:53 -0300, Arnaud Delobelle

That would surprise me.
How can re.search know that string.lower(search) is the same each
time? Or else there is something that I misunderstand.

It does.

py> import re
py> x = re.compile("ijk")
py> y = re.compile("ijk")
py> x is y
True

Both, separate calls, returned identical results. You can show the cache:

py> re._cache
{(<type 'str'>, '%(?:\\((?P<key>.*?)\\))?(?P<modifiers>[-#0-9
+*.hlL]*?)[eEfFgGd
iouxXcrs%]', 0): <_sre.SRE_Pattern object at 0x00A786A0>,
(<type 'str'>, 'ijk', 0): <_sre.SRE_Pattern object at 0x00ABB338>}

Gabriel Genellina · Mar 13, 2007

En Tue, 13 Mar 2007 18:16:32 -0300, Arnaud Delobelle

On Mar 13, 8:59 pm, "Gabriel Genellina" <[email protected]>
wrote:
[snip]

def find(self, search):
search_re = re.compile(search, re.IGNORECASE)
for result in [self.contacts[name] for name in self.contacts if
search_re.match(name)]:
print result

Click to expand...

I do not see how

for y in [f(x) for x in L if g(x)]:
do stuff with y

can be preferable to

for x in L:
if g(x):
do stuff with f(x)

What can be the benefit of creating a list by comprehension for the
sole purpose of iterating over it?

No benefit...

Arnaud Delobelle · Mar 13, 2007

En Tue, 13 Mar 2007 17:19:53 -0300, Arnaud Delobelle

It does.

py> import re
py> x = re.compile("ijk")
py> y = re.compile("ijk")
py> x is y
True

Both, separate calls, returned identical results. You can show the cache:

OK I didn't realise this. But even so each time there is the cost of
looking up the regexp string in the cache dictionary.

Guest · Mar 13, 2007

Hi,

for x in L:
if g(x):
do stuff with f(x)

for x in itertools.ifilterfalse(g, L):
do stuff

Maybe this would be even better?

L

Gabriel Genellina · Mar 13, 2007

En Tue, 13 Mar 2007 19:12:12 -0300, Arnaud Delobelle

OK I didn't realise this. But even so each time there is the cost of
looking up the regexp string in the cache dictionary.

Sure, it's much better to create the regex only once. Just to note that
calling re.compile is not soooooooo bad as it could.

Bruno Desthuilliers · Mar 14, 2007

Paul Rubin a écrit :

True, though no iterators so you couldn't easily use those functions
on lazily-evaluated streams like you can now.

Obviously. But what I meant is that Python may not be *so* "historically
imperative" !-)

FWIW, I first learned FP concepts with Python.

Iterators like that are a new Python feature

Click to expand...

List comps are not that new (2.0 or 2.1 ?):
print "\n".join([contact for name, contact in contacts.items() \
if search.match(name)])

Click to expand...

Well you could do it that way but it allocates the entire filtered
list in memory.

Of course. But then nothing prevents you from using a genexp instead of
the list comp - same final result, and the syntax is quite close:

print "\n".join(contact for name, contact in contacts.items() \
if search.match(name))

So the fact that genexps are still a bit "new" is not a problem here
IMHO - this programming style is not new in Python.

Just that things can get confusing if you're consuming the iterator in
more than one place.

Indeed. But that's not what we have here. And FWIW, in programming, lots
of things tends to be confusing at first.

Bruno Desthuilliers · Mar 14, 2007

Arnaud Delobelle a écrit :

Paul Rubin a écrit :
[snip]

Iterators like that are a new Python feature

Click to expand...

List comps are not that new (2.0 or 2.1 ?):
print "\n".join([contact for name, contact in contacts.items() \
if search.match(name)])

Click to expand...

You can write this, but:
* it is difficult to argue that it is more readable than Paul's (or
my) 'imperative' version;

I personnaly find it more readable. To me, it tells what, not how.

* it has no obvious performance benefit,
No.

...And know when to use for statements

Don't worry, I still use them when appropriate.

Iterating Through Dictionary of Lists	2	Sep 11, 2009
A filtered iteration over a collection: current idiom?	20	Sep 18, 2010
Deleting from a list while iterating	9	Dec 3, 2006
REALLY need help with iterating a list.	7	Jun 11, 2007
Iterating Over Dictionary From Arbitrary Location	4	Jun 6, 2009
Iterating package's module list	0	May 10, 2005
[newbie] Iterating a list in reverse ?	4	Jun 21, 2006
iterating bit-by-bit across int?	12	Oct 23, 2003

Iterating across a filtered list

Drew

Paul Rubin

Drew

Arnaud Delobelle

Paul Rubin

Paul Rubin

Arnaud Delobelle

Paul Rubin

Arnaud Delobelle

Bruno Desthuilliers

Gabriel Genellina

Arnaud Delobelle

Gabriel Genellina

Gabriel Genellina

Arnaud Delobelle

Guest

Gabriel Genellina

Bruno Desthuilliers

Bruno Desthuilliers

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads