Anyway to clarify this code? (dictionaries)

J

javuchi

I've been searching thru the library documentation, and this is the
best code I can produce for this alogorithm:

I'd like to return a dictionary which is a copy of 'another' dictionary
whoes values are bigger than 'x' and has the keys 'keys':

def my_search (another, keys, x):
temp = another.fromkeys(keys)
return dict([[k,v] for k in temp.keys() for v in temp.values()
if v>=x])

Is there any way to improve this code?
I want to avoid converting the dictionary to a list and then to a
dictionay. Are there speed penalties for such a conversion?

Bye.
 
M

Mike Meyer

javuchi said:
I've been searching thru the library documentation, and this is the
best code I can produce for this alogorithm:

I'd like to return a dictionary which is a copy of 'another' dictionary
whoes values are bigger than 'x' and has the keys 'keys':

def my_search (another, keys, x):
temp = another.fromkeys(keys)
return dict([[k,v] for k in temp.keys() for v in temp.values()
if v>=x])

Is there any way to improve this code?

Lots of them. Let's start by pointing out two bugs:

You're creating a list that's the "cross product" of keys and values,
then handing that to dict. You're handing it a list with entries for
the same key. That behavior may be defined, but I wouldn't count on
it.

fromkeys sets the values in temp to the same value - None. So
temp.values() is a list of None's, so v is None every time through the
loop.

So you could do what that code does with:

def my_search(another, keys, x):
if None >= x:
return another.fromkeys(keys)
else:
return dict()


You probably want something like:

def my_search(another, keys, x):
return dict([[k,another[k]] for k in keys if another[k] >= x]

If you really want to avoid indexing another twice, you could do:

def my_search(another, keys, x):
return dict([[k, v] for k, v in another.items() if v >= x and k in keys])

But then you're looking through all the keys in another, and searching
through keys multiple times, which probably adds up to a lot more
wasted work than indexing another twice.
I want to avoid converting the dictionary to a list and then to a
dictionay. Are there speed penalties for such a conversion?

No. But it does take time to do the conversion. I think I'd write it
out "longhand":

def my_search(another, keys, x):
new = dict()
for k in keys:
if another[k] >= x:
new[k] = another[k]
return new

This makes it clear that you only index another twice if you actually
use it. The list comprehension will do the loop in C, but it means you
have to scan two lists instead of one. If you're worried about which
is faster, measure it on your target platform.

<mike
 
B

bonono

Mike said:
def my_search(another, keys, x):
return dict([[k, v] for k, v in another.items() if v >= x and k in keys])

But then you're looking through all the keys in another, and searching
through keys multiple times, which probably adds up to a lot more
wasted work than indexing another twice.
Would you mind clarify ? Do you mean "k in keys" is a scan rather than
a lookup ? I find it to be pretty clean and straight forward.

I think one way or another, one need to loop through one of them, then
index search the other. It may help a bit to take the len() and loop
through the shorter one.

This seems like a SQL equivalent.

select * from a where a.key=b.key and a.v >= x
 
B

bonono

Mike said:
def my_search(another, keys, x):
new = dict()
for k in keys:
if another[k] >= x:
new[k] = another[k]
return new
BTW, this would raise exception if k is not in another.
 
B

Bengt Richter

I've been searching thru the library documentation, and this is the
best code I can produce for this alogorithm:

I'd like to return a dictionary which is a copy of 'another' dictionary
whoes values are bigger than 'x' and has the keys 'keys':

def my_search (another, keys, x):
temp = another.fromkeys(keys)
return dict([[k,v] for k in temp.keys() for v in temp.values()
if v>=x])

Is there any way to improve this code?
I want to avoid converting the dictionary to a list and then to a
dictionay. Are there speed penalties for such a conversion?

Bye.
...
a 0.606494662034
c 0.273998760342
b 0.358066029098
d 0.774406432218

If keys are few compared to the number of keys in another, this may be prefereable:
>>> def my_search(another, keys, x): return dict((k,another[k]) for k in keys if another[k]>x) ...
>>> my_search(another, 'cb', .3) {'b': 0.35806602909756235}
>>> my_search(another, 'abcd', .4)
{'a': 0.60649466203365532, 'd': 0.77440643221840166}

This sounds like homework though ... ?

Regards,
Bengt Richter
 
B

bonono

Bengt said:
def my_search(another, keys, x): return dict((k,another[k]) for k in keys if another[k]>x) ...
my_search(another, 'cb', .3) {'b': 0.35806602909756235}
my_search(another, 'abcd', .4)
{'a': 0.60649466203365532, 'd': 0.77440643221840166}
Do you need to guard the case "k not in another" ?
 
M

Mike Meyer

Mike said:
def my_search(another, keys, x):
return dict([[k, v] for k, v in another.items() if v >= x and k in keys])
But then you're looking through all the keys in another, and searching
through keys multiple times, which probably adds up to a lot more
wasted work than indexing another twice.
Would you mind clarify ? Do you mean "k in keys" is a scan rather than
a lookup ? I find it to be pretty clean and straight forward.

I assumed keys was a simple sequence of some kind, because you passed
it to fromkeys. I guess it could be set or a dictionary, in which case
"k in keys" would be a lookup. Were you trying to force a lookup by
creating a dict with the keys from k via fromkeys? If so, using a set
would have the same effect, and be a lot clearer:

temp = set(keys)
return dict([[k, v] for k, v in another.items() if v >= x and k in temp])
I think one way or another, one need to loop through one of them, then
index search the other. It may help a bit to take the len() and loop
through the shorter one.

First, remember the warnings about premature optimization. The
following might be worth looking into:

use = set(another) - set(keys)
return dict([[k, another[k]] for k in use if another[k] >= x]

Though I still think I prefer the longhand version:

out = dict()
for key in set(another) - set(keys):
if another[k] >= x:
out[k] = another[k]

The set difference is still going to loop through one and do lookups
in the other, but it'll happen in C instead of Python.

Unless your lists are *very* long, the performance differences will
probably negligible, and are liable to change as you change the
underlying platform. So I'd recommend you choose the form that's mostt
readable to you, and go with that.

<mike
 
B

bonono

Mike said:
First, remember the warnings about premature optimization.

Which is why I said the one-liner(your first one) is clean and clear,
and bug free in one go.
use = set(another) - set(keys)
return dict([[k, another[k]] for k in use if another[k] >= x]

Though I still think I prefer the longhand version:

out = dict()
for key in set(another) - set(keys):
if another[k] >= x:
out[k] = another[k]
This is definitely better than the other long hand version as the set
operation remove the potential problem of another[k] raise KeyError.
 
S

Steven D'Aprano

javuchi said:
I want to avoid converting the dictionary to a list and then to a
dictionay. Are there speed penalties for such a conversion?

You mean, is it faster to write, test, debug and
execute slow Python code rather than letting Python's
built-in routines written in fast C do the job?

I have no idea. Perhaps you should try it and see.
Write some code to do it all manually, and time it.

Make sure you use realistic test data: if your users
will be using dictionaries with 10,000 items, there is
no point in testing only dictionaries with 10 items.
For accuracy, run (say) 20 tests, and look at the
average speed. Of better still, use the timeit module.
 
B

Bengt Richter

Bengt said:
def my_search(another, keys, x): return dict((k,another[k]) for k in keys if another[k]>x) ...
my_search(another, 'cb', .3) {'b': 0.35806602909756235}
my_search(another, 'abcd', .4)
{'a': 0.60649466203365532, 'd': 0.77440643221840166}
Do you need to guard the case "k not in another" ?
Good catch ;-)
What did the OP want as a value if any for that case? None? or no entry at all?
Taking a cue from Mike, I like the set method of getting the common keys, to eliminate the entry (untested)

def my_search(another, keys, x):
return dict((k,another[k]) for k in (set(another)&set(keys)) if another[k]>x)

otherwise, to get Nones, maybe (untested)

def my_search(another, keys, x):
return dict((k,another.get(k)) for k in keys if k not in another or another[k]>x)

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top