Anyway to clarify this code? (dictionaries)

javuchi · Nov 23, 2005

I've been searching thru the library documentation, and this is the
best code I can produce for this alogorithm:

I'd like to return a dictionary which is a copy of 'another' dictionary
whoes values are bigger than 'x' and has the keys 'keys':

def my_search (another, keys, x):
temp = another.fromkeys(keys)
return dict([[k,v] for k in temp.keys() for v in temp.values()
if v>=x])

Is there any way to improve this code?
I want to avoid converting the dictionary to a list and then to a
dictionay. Are there speed penalties for such a conversion?

Bye.

Mike Meyer · Nov 23, 2005

javuchi said:
I've been searching thru the library documentation, and this is the
best code I can produce for this alogorithm:

I'd like to return a dictionary which is a copy of 'another' dictionary
whoes values are bigger than 'x' and has the keys 'keys':

def my_search (another, keys, x):
temp = another.fromkeys(keys)
return dict([[k,v] for k in temp.keys() for v in temp.values()
if v>=x])

Is there any way to improve this code?

Lots of them. Let's start by pointing out two bugs:

You're creating a list that's the "cross product" of keys and values,
then handing that to dict. You're handing it a list with entries for
the same key. That behavior may be defined, but I wouldn't count on
it.

fromkeys sets the values in temp to the same value - None. So
temp.values() is a list of None's, so v is None every time through the
loop.

So you could do what that code does with:

def my_search(another, keys, x):
if None >= x:
return another.fromkeys(keys)
else:
return dict()

You probably want something like:

def my_search(another, keys, x):
return dict([[k,another[k]] for k in keys if another[k] >= x]

If you really want to avoid indexing another twice, you could do:

def my_search(another, keys, x):
return dict([[k, v] for k, v in another.items() if v >= x and k in keys])

But then you're looking through all the keys in another, and searching
through keys multiple times, which probably adds up to a lot more
wasted work than indexing another twice.

I want to avoid converting the dictionary to a list and then to a
dictionay. Are there speed penalties for such a conversion?

No. But it does take time to do the conversion. I think I'd write it
out "longhand":

def my_search(another, keys, x):
new = dict()
for k in keys:
if another[k] >= x:
new[k] = another[k]
return new

This makes it clear that you only index another twice if you actually
use it. The list comprehension will do the loop in C, but it means you
have to scan two lists instead of one. If you're worried about which
is faster, measure it on your target platform.

<mike

bonono · Nov 23, 2005

Mike said:
def my_search(another, keys, x):
return dict([[k, v] for k, v in another.items() if v >= x and k in keys])

But then you're looking through all the keys in another, and searching
through keys multiple times, which probably adds up to a lot more
wasted work than indexing another twice.

Would you mind clarify ? Do you mean "k in keys" is a scan rather than
a lookup ? I find it to be pretty clean and straight forward.

I think one way or another, one need to loop through one of them, then
index search the other. It may help a bit to take the len() and loop
through the shorter one.

This seems like a SQL equivalent.

select * from a where a.key=b.key and a.v >= x

bonono · Nov 23, 2005

Mike said:
def my_search(another, keys, x):
new = dict()
for k in keys:
if another[k] >= x:
new[k] = another[k]
return new

BTW, this would raise exception if k is not in another.

Bengt Richter · Nov 23, 2005

I've been searching thru the library documentation, and this is the
best code I can produce for this alogorithm:

I'd like to return a dictionary which is a copy of 'another' dictionary
whoes values are bigger than 'x' and has the keys 'keys':

def my_search (another, keys, x):
temp = another.fromkeys(keys)
return dict([[k,v] for k in temp.keys() for v in temp.values()
if v>=x])

Is there any way to improve this code?
I want to avoid converting the dictionary to a list and then to a
dictionay. Are there speed penalties for such a conversion?

Bye.

...
a 0.606494662034
c 0.273998760342
b 0.358066029098
d 0.774406432218

If keys are few compared to the number of keys in another, this may be prefereable:

>>> def my_search(another, keys, x): return dict((k,another[k]) for k in keys if another[k]>x) ...
>>> my_search(another, 'cb', .3) {'b': 0.35806602909756235}
>>> my_search(another, 'abcd', .4)

Click to expand...

Click to expand...

{'a': 0.60649466203365532, 'd': 0.77440643221840166}

This sounds like homework though ... ?

Regards,
Bengt Richter

bonono · Nov 23, 2005

Bengt said:
def my_search(another, keys, x): return dict((k,another[k]) for k in keys if another[k]>x) ...
my_search(another, 'cb', .3) {'b': 0.35806602909756235}
my_search(another, 'abcd', .4)

Click to expand...

Click to expand...

{'a': 0.60649466203365532, 'd': 0.77440643221840166}

Do you need to guard the case "k not in another" ?

Mike Meyer · Nov 23, 2005

[email protected] said:
Mike said:

def my_search(another, keys, x):
return dict([[k, v] for k, v in another.items() if v >= x and k in keys])
But then you're looking through all the keys in another, and searching
through keys multiple times, which probably adds up to a lot more
wasted work than indexing another twice.

Click to expand...

Would you mind clarify ? Do you mean "k in keys" is a scan rather than
a lookup ? I find it to be pretty clean and straight forward.

I assumed keys was a simple sequence of some kind, because you passed
it to fromkeys. I guess it could be set or a dictionary, in which case
"k in keys" would be a lookup. Were you trying to force a lookup by
creating a dict with the keys from k via fromkeys? If so, using a set
would have the same effect, and be a lot clearer:

temp = set(keys)
return dict([[k, v] for k, v in another.items() if v >= x and k in temp])

I think one way or another, one need to loop through one of them, then
index search the other. It may help a bit to take the len() and loop
through the shorter one.

First, remember the warnings about premature optimization. The
following might be worth looking into:

use = set(another) - set(keys)
return dict([[k, another[k]] for k in use if another[k] >= x]

Though I still think I prefer the longhand version:

out = dict()
for key in set(another) - set(keys):
if another[k] >= x:
out[k] = another[k]

The set difference is still going to loop through one and do lookups
in the other, but it'll happen in C instead of Python.

Unless your lists are *very* long, the performance differences will
probably negligible, and are liable to change as you change the
underlying platform. So I'd recommend you choose the form that's mostt
readable to you, and go with that.

<mike

bonono · Nov 23, 2005

Mike said:
First, remember the warnings about premature optimization.

Which is why I said the one-liner(your first one) is clean and clear,
and bug free in one go.

use = set(another) - set(keys)
return dict([[k, another[k]] for k in use if another[k] >= x]

Though I still think I prefer the longhand version:

out = dict()
for key in set(another) - set(keys):
if another[k] >= x:
out[k] = another[k]

This is definitely better than the other long hand version as the set
operation remove the potential problem of another[k] raise KeyError.

Steven D'Aprano · Nov 23, 2005

javuchi said:
I want to avoid converting the dictionary to a list and then to a
dictionay. Are there speed penalties for such a conversion?

You mean, is it faster to write, test, debug and
execute slow Python code rather than letting Python's
built-in routines written in fast C do the job?

I have no idea. Perhaps you should try it and see.
Write some code to do it all manually, and time it.

Make sure you use realistic test data: if your users
will be using dictionaries with 10,000 items, there is
no point in testing only dictionaries with 10 items.
For accuracy, run (say) 20 tests, and look at the
average speed. Of better still, use the timeit module.

Bengt Richter · Nov 23, 2005

Bengt said:
Bengt said:

def my_search(another, keys, x): return dict((k,another[k]) for k in keys if another[k]>x) ...
my_search(another, 'cb', .3) {'b': 0.35806602909756235}
my_search(another, 'abcd', .4)

Click to expand...

{'a': 0.60649466203365532, 'd': 0.77440643221840166}

Click to expand...

Do you need to guard the case "k not in another" ?

Good catch ;-)
What did the OP want as a value if any for that case? None? or no entry at all?
Taking a cue from Mike, I like the set method of getting the common keys, to eliminate the entry (untested)

def my_search(another, keys, x):
return dict((k,another[k]) for k in (set(another)&set(keys)) if another[k]>x)

otherwise, to get Nones, maybe (untested)

def my_search(another, keys, x):
return dict((k,another.get(k)) for k in keys if k not in another or another[k]>x)

Regards,
Bengt Richter

Dictionaries	1	Mar 20, 2014
compare dictionaries	9	Sep 7, 2010
nested dictionaries and functions in data structures.	0	Jan 7, 2014
Help me with this task, please.	3	Mar 22, 2023
updating dictionaries from/to dictionaries	12	Aug 11, 2008
How to put a null check on this code	0	Jan 4, 2022
Python code problem	2	Apr 23, 2023
Is it possible to convert this JS code into .NET?	1	Oct 19, 2022

Anyway to clarify this code? (dictionaries)

javuchi

Mike Meyer

bonono

bonono

Bengt Richter

bonono

Mike Meyer

bonono

Steven D'Aprano

Bengt Richter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads