searching a value of a dict (each value is a list)

Seongsu Lee · Dec 9, 2007

Hi,

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

How can I do this with Python? Ideas?

Seongsu Lee · Dec 9, 2007

Hi,

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

How can I do this with Python? Ideas?

Hi,

I just let the dict work in bidirectional fashion so that
I can find out what I want by both key and value. A mark
or prefix was needed to distinguish between keys originated
from keys and keys originated from values. (value * (-1))

from pprint import pprint
dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}
for k, v in dict.items():
for x in v:
dict[x * -1] = k
pprint(dict)

{-105: 900000,
-104: 900000,
-103: 900000,
-102: 900000,
-101: 900000,
-100: 900000,
-22: 900001,
-21: 900001,
-20: 900001,
-19: 999999,
-18: 999999,
-17: 999999,
-16: 999999,
-15: 999999,
-12: 2,
-11: 2,
-10: 2,
-5: 1,
-4: 1,
-3: 1,
-2: 1,
-1: 1,
1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

What do you think of this? Ideas with less space complexity?

Pablo Ziliani · Dec 9, 2007

Seongsu Lee escribió:

Hi,

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
(...)

I want to find out the key value which has a specific
integer in the list of its value.

Sorry if this is unhelpful, but have you considered moving your data
model a proper database?
I ask because unless someone knows of a specific module, I think we are
in DB's authentic realm. Is the fastest solution, probably not just for
this particular operation you are trying to do.

Regards,
Pablo

Seongsu Lee · Dec 9, 2007

Seongsu Lee escribiÃ³:

Sorry if this is unhelpful, but have you considered moving your data
model a proper database?
I ask because unless someone knows of a specific module, I think we are
in DB's authentic realm. Is the fastest solution, probably not just for
this particular operation you are trying to do.

Regards,
Pablo

Hi Pablo,

Thank you for your posting! I wanted to solve the problem within
a given environment, python, and I think it is solved by
a dictionary with bidirectional key. I have posted it and want to
know if other knows more elegant way to do it.

Bruno Desthuilliers · Dec 9, 2007

Seongsu Lee a écrit :

Hi,

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

How can I do this with Python? Ideas?

A common solution is to build a reversed index - that is, a dict mapping
values to lists of keys. Now if you have a really huge dataset, this may
become cumbersome.

bearophileHUGS · Dec 9, 2007

Seongsu Lee:

What do you think of this? Ideas with less space complexity?

You can put the second group of keys in a second dictionary, so you
don't have to mangle them, and it may be a bit faster.

Regarding the space complexity, I don't know how you can reduce it
with Python. Probably you can create a list of long, sort it and use
bisect on it to find the keys. Such longs can be a combination of
shifted integer OR the other integer that is the key of the original
dict. But I don't how much you can gain like this.

Another solution to reduce space is to use a tiny external module
written in C, Pyrex or D. Here follows some simple D code you can
modify a bit to make it work with Pyd (http://pyd.dsource.org/):

import std.traits: ReturnType;
import std.stdio: writefln;

struct TyInt_int {
int el, n;
int opCmp(TyInt_int other) {
if (el == other.el)
return 0;
return (el < other.el) ? -1 : 1;
}
}

int bisect(TyElem, TyData, TyFun)(TyElem[] a, TyData x, TyFun key) {
int lo = 0;
int hi = a.length;
while (lo < hi) {
int mid = (lo + hi) / 2;
if (x < key(a[mid]))
hi = mid;
else
lo = mid + 1;
}
return lo;
}

void main() {
int[][int] int_arr;
int_arr[1] = [1, 2, 3, 4, 5];
int_arr[1] = [10, 11, 12],
int_arr[900000] = [100, 101, 102, 103, 104, 105],
int_arr[900001] = [20, 21, 22],
int_arr[999999] = [15, 16, 17, 18, 19];

int tot_len = 0;
foreach(arr; int_arr)
tot_len += arr.length;

auto data_arr = new TyInt_int[](tot_len);
int i = 0;
foreach(n, arr; int_arr)
foreach(el; arr)
data_arr[i++] = TyInt_int(el, n);

data_arr.sort;
writefln(bisect(data_arr, 100, (TyInt_int ii){return ii.el;}));
}

Bye,
bearophile

Jeremy C B Nicoll · Dec 9, 2007

Seongsu Lee said:
Hi,

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

Are the integers in the lists unique? I mean, could 104 occur in more than
one list? If it did, would it matter which key was returned?

How can I do this with Python? Ideas?

When I see something like this my natural response is to think that the data
structure is inappropriate for the use it's being put to.

The code someone else posted to reverse the keys is all very well, but
surely hugely wasteful on cpu, maybe storage, and elapsed time.

Even if the dict in this form is needed for some other reason, couldn't the
code that created it also create a reverse index at the same time?

bearophileHUGS · Dec 9, 2007

Jeremy C B Nicoll:

The code someone else posted to reverse the keys is all very well, but
surely hugely wasteful on cpu, maybe storage, and elapsed time.

If you are talking about my D code then I know it, the creation of the
first dict has to be skipped, if possible... The code I have posted
must be adapted.

Bye,
bearophile

Jeremy C B Nicoll · Dec 10, 2007

Jeremy C B Nicoll:

If you are talking about my D code then I know it...

No I meant the code that used python to iterate over the dict and create
zillions of extra keys. I've deleted earlier posts in the thread and wasn't
sure who suggested that.

Zepo Len · Dec 10, 2007

I have a dictionary with million keys. Each value in the

dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

How can I do this with Python? Ideas?

def find_key(dict, num):
for k in dict:
if num in dict[k]:
return k

Seongsu Lee · Dec 10, 2007

Seongsu Lee said:
Seongsu Lee said:

Hi,

Click to expand...

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

Click to expand...

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

Click to expand...

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

Click to expand...

Are the integers in the lists unique? I mean, could 104 occur in more than
one list? If it did, would it matter which key was returned?

Yes, the intergers in the lists are unique.

When I see something like this my natural response is to think that the data
structure is inappropriate for the use it's being put to.

The code someone else posted to reverse the keys is all very well, but
surely hugely wasteful on cpu, maybe storage, and elapsed time.

Even if the dict in this form is needed for some other reason, couldn't the
code that created it also create a reverse index at the same time?

The reason I use the dict for my data is to speed up the search by
key.

The code could create also a reverse index (a reverse dict) at the
time. But as you said, it waste the space and I wanted to ask someone
who may know some way to reduce the waste of space while searching
fast.

Jeremy C B Nicoll · Dec 10, 2007

Seongsu Lee said:
The reason I use the dict for my data is to speed up the search by key.

Ok, I understand that once the overhead of creating the dict has been done,
getting access to values within it is quick. And taking the time to create
a set of reverse keys speeds up the reverse access.

Rather than scanning the whole dict and creating reverse keys for everything
in it first, there might be an advantage in making search logic test if
there is a reverse key for the negative integer to be searched for, and if
so use it, otherwise scan the dict creating and examining reverse keys until
the integer is found. That way if the integer you're looking for is early
in the dict you'd only create reverse keys for the integers from the start
of the dict until the required one.

We don't know what external(?) process created the data that you've stored
in the dict, nor what you use it for - or more to the point - how often. If
you're going to make just one search of that data then there's little point
in having a fast search after a slow dict creation. On the other hand if
you have many many searches to do the initial overhead might be acceptable.

(I don't know how slow creating the dict would be for a typical example of a
million keys each keying lists of 1-1000 integers.)

The code could create also a reverse index (a reverse dict) at the
time. But as you said, it waste the space and I wanted to ask someone
who may know some way to reduce the waste of space while searching
fast.

Is the dict used by anything else? If the data in it was held in some other
form would that cause your program (or other programs) lots of problems? If
the range of values of the integers being stored is suitable, you might
sensibly use several or many smaller dicts to store all the data (and thus
save time reverse-keying much less of it).

Adonis Vargas · Dec 10, 2007

Seongsu said:
Hi,

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

How can I do this with Python? Ideas?

You can try this:

items = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

def findItem(item, dictionary):
for key, value in dictionary.iteritems():
if item in value:
print key, value

findItem(104, items)

This will allow you to work with the existing dataset without needing to
duplicate it. It will print all occurrances.

Also, you should never use reserved words like 'dict' this creates
confusion and can cause Python to misbehave since you are rebinding the
name.

Hope this helps.

Adonis Vargas

bearophileHUGS · Dec 10, 2007

Adonis Vargas:

Also, you should never use reserved words like 'dict' this creates
confusion and can cause Python to misbehave since you are rebinding the
name.
Adonis Vargas

After hearing this suggestion for the 300th time, I think it may be
the moment to fix this problem in Python3, and make the Python
compiler issue a syntax error if someone tries to reassign such kind
of words, like dict, set, etc.

Bye,
bearophile

Seongsu Lee · Dec 10, 2007

Seongsu said:
Seongsu said:

Hi,

Click to expand...

I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.

Click to expand...

dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

Click to expand...

I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.

Click to expand...

How can I do this with Python? Ideas?

Click to expand...

You can try this:

items = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

def findItem(item, dictionary):
for key, value in dictionary.iteritems():
if item in value:
print key, value

findItem(104, items)

This will allow you to work with the existing dataset without needing to
duplicate it. It will print all occurrances.

Hi,

Yes, it works. But I think it works in O(n * m), doesn't it?
(n is # of keys in the dictionary and m is # of items in the list.)
So, we need to create a reverse index. (a reverse dictionary) or
need something better at least, I think.

Also, you should never use reserved words like 'dict' this creates
confusion and can cause Python to misbehave since you are rebinding the
name.

Yep.

MonkeeSage · Dec 10, 2007

Seongsu said:
Seongsu said:

Hi,
I have a dictionary with million keys. Each value in the
dictionary has a list with up to thousand integers.
Follow is a simple example with 5 keys.
dict = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}
I want to find out the key value which has a specific
integer in the list of its value. For example, if I search
104 in the list, 900000 must be returned.
How can I do this with Python? Ideas?

Click to expand...

Click to expand...

You can try this:

Click to expand...

items = {1: [1, 2, 3, 4, 5],
2: [10, 11, 12],
900000: [100, 101, 102, 103, 104, 105],
900001: [20, 21, 22],
999999: [15, 16, 17, 18, 19]}

Click to expand...

def findItem(item, dictionary):
for key, value in dictionary.iteritems():
if item in value:
print key, value

Click to expand...

findItem(104, items)

Click to expand...

This will allow you to work with the existing dataset without needing to
duplicate it. It will print all occurrances.

Click to expand...

Hi,

Yes, it works. But I think it works in O(n * m), doesn't it?
(n is # of keys in the dictionary and m is # of items in the list.)
So, we need to create a reverse index. (a reverse dictionary) or
need something better at least, I think.

Also, you should never use reserved words like 'dict' this creates
confusion and can cause Python to misbehave since you are rebinding the
name.

Click to expand...

Yep.

Hope this helps.

Click to expand...

Adonis Vargas- µû¿Â ÅØ½ºÆ® ¼û±â±â -

Click to expand...

- µû¿Â ÅØ½ºÆ® º¸±â -

Click to expand...

If I'm not mistaken, building a reverse dictionary like that will be
O(n*m) because dict/list access is O(n) (ammortized). Somebody correct
me if I'm wrong. In that case, it really depends on how you will use
the dict to see whether you get any benefit from building the reversed
dict. If you want to do several lookups, then the initial overhead
(speed/memory) of building the reversed dict might be worth it so that
you can just run lookups at O(n). But if you only need it once, it is
a waste of time and space to create a reverse dict when your access
time is the same for the lookup as for building the reversed dict.

If you do need more than one lookup, it would also be a good
optimization strategy to build the reverse dict in parallel, as you
execute the first search; that way you can combine the time spent on
building the reverse dict and the lookup, to get a total of O(n*m)
rather than O(n^2*m). The first search is "free" since you need the
reverse dict anyway.

Regards,
Jordan

Neil Cerutti · Dec 10, 2007

If I'm not mistaken, building a reverse dictionary like that will be
O(n*m) because dict/list access is O(n) (ammortized). Somebody correct
me if I'm wrong. In that case, it really depends on how you will use
the dict to see whether you get any benefit from building the reversed
dict. If you want to do several lookups, then the initial overhead
(speed/memory) of building the reversed dict might be worth it so that
you can just run lookups at O(n).

It also depends on if the dictionary shall be mutated between
reverse lookups.

But if you only need it once, it is a waste of time and space
to create a reverse dict when your access time is the same for
the lookup as for building the reversed dict.

If you do need more than one lookup, it would also be a good
optimization strategy to build the reverse dict in parallel, as
you execute the first search; that way you can combine the time
spent on building the reverse dict and the lookup, to get a
total of O(n*m) rather than O(n^2*m). The first search is
"free" since you need the reverse dict anyway.

It wouldn't be merely an optimization if reverse lookups and
mutations were interleaved.

MonkeeSage · Dec 10, 2007

It also depends on if the dictionary shall be mutated between
reverse lookups.

It wouldn't be merely an optimization if reverse lookups and
mutations were interleaved.

Well true, but you enter a whole other level of complexity in that
case...something like Theta(log(n*(m-n))). I might have calculated
that incorrectly, but that just goes to show how complex a lookup
is(!) in such a case.

Regards,
Jordan

MonkeeSage · Dec 10, 2007

It also depends on if the dictionary shall be mutated between
reverse lookups.

It wouldn't be merely an optimization if reverse lookups and
mutations were interleaved.

Well true, but you enter a whole other level of complexity in that
case...something like Theta(log(n*(m-n))). I might have calculated
that incorrectly, but that just goes to show how complex a lookup
is(!) in such a case.

Regards,
Jordan

MonkeeSage · Dec 10, 2007

Well true, but you enter a whole other level of complexity in that
case...something like Theta(log(n*(m-n))). I might have calculated
that incorrectly, but that just goes to show how complex a lookup
is(!) in such a case.

Regards,
Jordan

Sorry for the double-post...google is being beligerant right now.

Average of MultiMode of a list of a list	1	Oct 28, 2022
Python 3: dict & dict.keys()	4	Jul 24, 2013
Trying to get the average value of the elements, please help ! JavaScript	3	Dec 13, 2022
set and dict iteration	42	Aug 16, 2012
Python dict as unicode	1	Nov 24, 2010
building a dict	12	Mar 13, 2010
Count each unique element in list of lists	1	Nov 8, 2013
Best way to add a "position" value to each item in a list	5	Jul 9, 2009

searching a value of a dict (each value is a list)

Seongsu Lee

Seongsu Lee

Pablo Ziliani

Seongsu Lee

Bruno Desthuilliers

bearophileHUGS

Jeremy C B Nicoll

bearophileHUGS

Jeremy C B Nicoll

Zepo Len

Seongsu Lee

Jeremy C B Nicoll

Adonis Vargas

bearophileHUGS

Seongsu Lee

MonkeeSage

Neil Cerutti

MonkeeSage

MonkeeSage

MonkeeSage

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads