Iterating across a filtered list

D

Drew

All -

I'm currently writing a toy program as I learn python that acts as a
simple address book. I've run across a situation in my search function
where I want to iterate across a filtered list. My code is working
just fine, but I'm wondering if this is the most "elegant" way to do
this. Essentially, I'm searching the dict self.contacts for a key that
matches the pattern entered by the user. If so, I print the value
associated with that key. A pastie to the method is below, any help/
advice is appreciated:

http://pastie.caboo.se/46647

Side note: I'm learning python after ruby experience. In ruby I would
do something like:

contacts.find_all{|name,contact| name =~ /search/}.each{|name,contact|
puts contact}
 
P

Paul Rubin

Drew said:
I'm currently writing a toy program as I learn python that acts as a
simple address book. I've run across a situation in my search function
where I want to iterate across a filtered list. My code is working
just fine, but I'm wondering if this is the most "elegant" way to do
this. Essentially, I'm searching the dict self.contacts for a key that
matches the pattern entered by the user. If so, I print the value
associated with that key. A pastie to the method is below, any help/
advice is appreciated:

If I can decipher your Ruby example (I don't know Ruby), I think you
want:

for name,contact in contacts.iteritems():
if re.search('search', name):
print contact

If you just want to filter the dictionary inside an expression, you
can use a generator expression:

d = ((name,contact) for (name,contact) in contacts.iteritems() \
if re.search('search', name))

print '\n'.join(d) # prints items from filtered dict, one per line

Note that d is an iterator, which means it mutates when you step
through it.
 
D

Drew

If I can decipher your Ruby example (I don't know Ruby), I think you
want:

for name,contact in contacts.iteritems():
if re.search('search', name):
print contact

If you just want to filter the dictionary inside an expression, you
can use a generator expression:

d = ((name,contact) for (name,contact) in contacts.iteritems() \
if re.search('search', name))

print '\n'.join(d) # prints items from filtered dict, one per line

Note that d is an iterator, which means it mutates when you step
through it.

Paul -

You're exactly on the mark. I guess I was just wondering if your first
example (that is, breaking the if statement away from the iteration)
was preferred rather than initially filtering and then iterating.
However, you're examples make a lot of sense are are quite helpful.

Thanks,
Drew
 
A

Arnaud Delobelle


There is no need for such a convoluted list comprehension as you
iterate over it immediately! It is clearer to put the filtering logic
in the for loop. Moreover you recalculate the regexp for each element
of the list. Instead I would do something like this:

def find(search_str, flags=re.IGNORECASE):
print "Contact(s) found:"
search = re.compile(search_str, flags).search
for name, contact in self.contacts.items():
if search(name):
print contact
print

Although I would rather have one function that returns the list of all
found contacts:

def find(search_str, flags=re.IGNORECASE):
search = re.compile(search_str, flags).search
for name, contact in self.contacts.items():
if search(name):
yield contact

And then another one that prints it.
Side note: I'm learning python after ruby experience. In ruby I would
do something like:

contacts.find_all{|name,contact| name =~ /search/}.each{|name,contact|
puts contact}

And that's why you're right to learn Python ;)

HTH
 
P

Paul Rubin

Drew said:
You're exactly on the mark. I guess I was just wondering if your first
example (that is, breaking the if statement away from the iteration)
was preferred rather than initially filtering and then iterating.

I think the multiple statement version is more in Python tradition.
Python is historically an imperative, procedural language with some OO
features. Iterators like that are a new Python feature and they have
some annoying characteristics, like the way they mutate when you touch
them. It's usually safest to create and consume them in the same
place, e.g. creating some sequence and passing it through map, filter, etc.
 
P

Paul Rubin

Arnaud Delobelle said:
in the for loop. Moreover you recalculate the regexp for each element
of the list.

The re library caches the compiled regexp, I think.
 
A

Arnaud Delobelle

The re library caches the compiled regexp, I think.

That would surprise me.
How can re.search know that string.lower(search) is the same each
time? Or else there is something that I misunderstand.

Moreover:

In [49]: from timeit import Timer
In [50]: Timer('for i in range(1000): search("abcdefghijk")', 'import
re; search=re.compile("ijk").search').timeit(100)
Out[50]: 0.36964607238769531

In [51]: Timer('for i in range(1000): re.search("ijk",
"abcdefghijk")', 'import re;
search=re.compile("ijk").search').timeit(100)
Out[51]: 1.4777300357818604
 
P

Paul Rubin

Bruno Desthuilliers said:
I don't know if I qualify as a Python traditionalist, but I'm using
Python since the 1.5.2 days, and I usually favor list comps or
generator expressions over old-style loops when it comes to this kind
of operations.

I like genexps when they're nested inside other expressions so they're
consumed as part of the evaluation of the outer expression. They're a
bit scary when the genexp-created iterator is saved in a variable.

Listcomps are different, they allocate storage for the entire list, so
they're just syntax sugar for a loop. They have an annoying
misfeature of their
Python has had functions as first class objects and
(quite-limited-but) anonymous functions, map(), filter() and reduce()
as builtin funcs at least since 1.5.2 (quite some years ago).

True, though no iterators so you couldn't easily use those functions
on lazily-evaluated streams like you can now.
Iterators like that are a new Python feature
List comps are not that new (2.0 or 2.1 ?):
print "\n".join([contact for name, contact in contacts.items() \
if search.match(name)])

Well you could do it that way but it allocates the entire filtered
list in memory. In this example "\n".join() also builds up a string
in memory, but you could do something different, like run the sequence
through another filter or print out one element at a time, in which
case lazy evaluation can be important (imagine that contacts.iteritems
chugs through a billion row table in an SQL database).
Safest ? Why so ?

Just that things can get confusing if you're consuming the iterator in
more than one place. It can get to be like those old languages where
you had to do your own storage management ;-).
 
A

Arnaud Delobelle

Paul Rubin a écrit :
[snip]
Iterators like that are a new Python feature

List comps are not that new (2.0 or 2.1 ?):
print "\n".join([contact for name, contact in contacts.items() \
if search.match(name)])

You can write this, but:
* it is difficult to argue that it is more readable than Paul's (or
my) 'imperative' version;
* it has no obvious performance benefit, in fact it creates a list
unnecessarily (I know you could use a generator with recent python).
While sequences are iterables, all iterables are not sequences. Know
what you use, and you'll be fine.

....And know when to use for statements :)
 
B

Bruno Desthuilliers

Paul Rubin a écrit :
I think the multiple statement version is more in Python tradition.

I don't know if I qualify as a Python traditionalist, but I'm using
Python since the 1.5.2 days, and I usually favor list comps or generator
expressions over old-style loops when it comes to this kind of operations.
Python is historically an imperative, procedural language with some OO
features.

Python has had functions as first class objects and (quite-limited-but)
anonymous functions, map(), filter() and reduce() as builtin funcs at
least since 1.5.2 (quite some years ago).
Iterators like that are a new Python feature

List comps are not that new (2.0 or 2.1 ?):
print "\n".join([contact for name, contact in contacts.items() \
if search.match(name)])

and they have
some annoying characteristics, like the way they mutate when you touch
them.

While sequences are iterables, all iterables are not sequences. Know
what you use, and you'll be fine.
It's usually safest to create and consume them in the same
place, e.g. creating some sequence and passing it through map, filter, etc.

Safest ? Why so ?
 
G

Gabriel Genellina

I'm currently writing a toy program as I learn python that acts as a
simple address book. I've run across a situation in my search function
where I want to iterate across a filtered list. My code is working
just fine, but I'm wondering if this is the most "elegant" way to do
this. Essentially, I'm searching the dict self.contacts for a key that
matches the pattern entered by the user. If so, I print the value
associated with that key. A pastie to the method is below, any help/
advice is appreciated:

http://pastie.caboo.se/46647

Side note: I'm learning python after ruby experience. In ruby I would
do something like:

contacts.find_all{|name,contact| name =~ /search/}.each{|name,contact|
puts contact}

Just a few changes:

def find(self, search):
search_re = re.compile(search, re.IGNORECASE)
for result in [self.contacts[name] for name in self.contacts if
search_re.match(name)]:
print result

- you can iterate directly over a dictionary keys using: for key in dict
- you can compile a regexp to re-use it in all loops; using re.IGNORECASE,
you don't need to explicitely convert all to lowercase before comparing
- if all you want to do is to print the results, you can even avoid the
for loop:

print '\n'.join('%s' % self.contacts[name] for name in self.contacts
if search_re.match(name))
 
A

Arnaud Delobelle

On Mar 13, 8:59 pm, "Gabriel Genellina" <[email protected]>
wrote:
[snip]
def find(self, search):
search_re = re.compile(search, re.IGNORECASE)
for result in [self.contacts[name] for name in self.contacts if
search_re.match(name)]:
print result

I do not see how

for y in [f(x) for x in L if g(x)]:
do stuff with y

can be preferable to

for x in L:
if g(x):
do stuff with f(x)

What can be the benefit of creating a list by comprehension for the
sole purpose of iterating over it?
 
G

Gabriel Genellina

En Tue, 13 Mar 2007 17:19:53 -0300, Arnaud Delobelle
That would surprise me.
How can re.search know that string.lower(search) is the same each
time? Or else there is something that I misunderstand.

It does.

py> import re
py> x = re.compile("ijk")
py> y = re.compile("ijk")
py> x is y
True

Both, separate calls, returned identical results. You can show the cache:

py> re._cache
{(<type 'str'>, '%(?:\\((?P<key>.*?)\\))?(?P<modifiers>[-#0-9
+*.hlL]*?)[eEfFgGd
iouxXcrs%]', 0): <_sre.SRE_Pattern object at 0x00A786A0>,
(<type 'str'>, 'ijk', 0): <_sre.SRE_Pattern object at 0x00ABB338>}
 
G

Gabriel Genellina

En Tue, 13 Mar 2007 18:16:32 -0300, Arnaud Delobelle
On Mar 13, 8:59 pm, "Gabriel Genellina" <[email protected]>
wrote:
[snip]
def find(self, search):
search_re = re.compile(search, re.IGNORECASE)
for result in [self.contacts[name] for name in self.contacts if
search_re.match(name)]:
print result

I do not see how

for y in [f(x) for x in L if g(x)]:
do stuff with y

can be preferable to

for x in L:
if g(x):
do stuff with f(x)

What can be the benefit of creating a list by comprehension for the
sole purpose of iterating over it?

No benefit...
 
A

Arnaud Delobelle

En Tue, 13 Mar 2007 17:19:53 -0300, Arnaud Delobelle



It does.

py> import re
py> x = re.compile("ijk")
py> y = re.compile("ijk")
py> x is y
True

Both, separate calls, returned identical results. You can show the cache:

OK I didn't realise this. But even so each time there is the cost of
looking up the regexp string in the cache dictionary.
 
G

Gabriel Genellina

En Tue, 13 Mar 2007 19:12:12 -0300, Arnaud Delobelle
OK I didn't realise this. But even so each time there is the cost of
looking up the regexp string in the cache dictionary.

Sure, it's much better to create the regex only once. Just to note that
calling re.compile is not soooooooo bad as it could.
 
B

Bruno Desthuilliers

Paul Rubin a écrit :
True, though no iterators so you couldn't easily use those functions
on lazily-evaluated streams like you can now.

Obviously. But what I meant is that Python may not be *so* "historically
imperative" !-)

FWIW, I first learned FP concepts with Python.
Iterators like that are a new Python feature
List comps are not that new (2.0 or 2.1 ?):
print "\n".join([contact for name, contact in contacts.items() \
if search.match(name)])

Well you could do it that way but it allocates the entire filtered
list in memory.

Of course. But then nothing prevents you from using a genexp instead of
the list comp - same final result, and the syntax is quite close:

print "\n".join(contact for name, contact in contacts.items() \
if search.match(name))

So the fact that genexps are still a bit "new" is not a problem here
IMHO - this programming style is not new in Python.
Just that things can get confusing if you're consuming the iterator in
more than one place.

Indeed. But that's not what we have here. And FWIW, in programming, lots
of things tends to be confusing at first.
 
B

Bruno Desthuilliers

Arnaud Delobelle a écrit :
Paul Rubin a écrit :
[snip]
Iterators like that are a new Python feature
List comps are not that new (2.0 or 2.1 ?):
print "\n".join([contact for name, contact in contacts.items() \
if search.match(name)])

You can write this, but:
* it is difficult to argue that it is more readable than Paul's (or
my) 'imperative' version;

I personnaly find it more readable. To me, it tells what, not how.
* it has no obvious performance benefit,
No.


...And know when to use for statements :)

Don't worry, I still use them when appropriate.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top