Help with a reverse dictionary lookup

R

rh0dium

Hi all,

I have a dict which looks like this..

dict={'130nm': {'umc': ['1p6m_1.2-3.3_fsg_ms']},
'180nm': {'chartered': ['2p6m_1.8-3.3_sal_ms'], 'tsmc':
['1p6m_1.8-3.3_sal_log', '1p6m_1.8-3.3_sal_ms']},
'250nm': {'umc': ['2p6m_1.8-3.3_sal_ms'], 'tsmc':
['1p6m_2.2-3.5_sal_log', '1p6m_1.8-3.3_sal_ms']}
}

For clarification:
130nm,180nm,250nm refer to Nodes
tsmc,chartered,umc refer to Foundries
1p6m..blah, 2p6m..blah refer to Process

I want a function which can do this:

nodes = getNodes()
should return a list [130nm,180nm,250nm]

nodes = getNodes(Foundry="umc")
should return a list [130nm,250nm]

nodes = getNodes(Process="2p6m_1.8-3.3_sal_ms")
should return a list [180nm,250nm]

nodes = getNodes(Foundry="umc", Process="2p6m_1.8-3.3_sal_ms")
should return a list [250nm]

nodes = getNodes(Foundry="foo", Process="2p6m_1.8-3.3_sal_ms")
should return None

Obviously I want to extend this to Each area ( i.e., getFoundry(),
getProcess() ) but I can do that. I just don't know how to easily do
this reverse kind of look up? Can someone help me on this?
 
J

John McMonagle

Hi all,

I have a dict which looks like this..

dict={'130nm': {'umc': ['1p6m_1.2-3.3_fsg_ms']},
'180nm': {'chartered': ['2p6m_1.8-3.3_sal_ms'], 'tsmc':
['1p6m_1.8-3.3_sal_log', '1p6m_1.8-3.3_sal_ms']},
'250nm': {'umc': ['2p6m_1.8-3.3_sal_ms'], 'tsmc':
['1p6m_2.2-3.5_sal_log', '1p6m_1.8-3.3_sal_ms']}
}

For clarification:
130nm,180nm,250nm refer to Nodes
tsmc,chartered,umc refer to Foundries
1p6m..blah, 2p6m..blah refer to Process

I want a function which can do this:

nodes = getNodes()
should return a list [130nm,180nm,250nm]

I'm not going to write your function for you but the following should
help you get started.

dict.keys() <-- returns a list of the keys in your dictionary

[130nm,180nm,250nm]
nodes = getNodes(Foundry="umc")
should return a list [130nm,250nm]

Here you want to check if the key in the child dictionary matches some
text:

Eg:

Foundry = 'umc'
FoundryList = []
for key in dict.keys:
val = dict.get(key)
if Foundry in val.keys():
FoundryList.append(key)

print FoundryList

[130nm,250nm]
nodes = getNodes(Process="2p6m_1.8-3.3_sal_ms")
should return a list [180nm,250nm]

Same concept here:

Process = ['2p6m_1.8-3.3_sal_ms']
ProcessList = []
for key in dict.keys():
val = dict.get(key)
if Process in val.values():
ProcessList.append(key)

print ProcessList

[180nm,250nm]


So key concepts you need are:

- Get a list of keys from the dictionary <-- dict.keys()
- Get the value matching a key <-- dict.get(key)
- Get a list of values from the dictionary <-- dict.values()

This should be enough to get you going.
nodes = getNodes(Foundry="umc", Process="2p6m_1.8-3.3_sal_ms")
should return a list [250nm]

nodes = getNodes(Foundry="foo", Process="2p6m_1.8-3.3_sal_ms")
should return None

Obviously I want to extend this to Each area ( i.e., getFoundry(),
getProcess() ) but I can do that. I just don't know how to easily do
this reverse kind of look up? Can someone help me on this?
 
L

Lonnie Princehouse

Here's the illegible gibberish version of your function. Once you
understand completely the following line of code, you will be well on
your way to Python nirvana:

getNodes = lambda Foundry=None,Process=None: [node for node,foundries
in dict.iteritems() if ((Foundry is None) and ((Process is None) or
(len([foundry for foundry,processes in foundries.iteritems() if Process
in processes]) > 0))) or ((Foundry in foundries) and ((Process is None)
or (Process in foundries[Foundry])))]

# ex:
getNodes(Foundry="umc", Process="2p6m_1.8-3.3_sal_ms")

# note: returns an empty list rather than None if no matching nodes
exist
# hint: You'll probably want to do rearrange your data structure or do
some caching if you have a large dataset. The dict-and-list solution
doesn't make this search very efficient.

And: I just noticed that len() won't work with a generator expression
as its argument. What's up with that?
 
R

rh0dium

Thanks!!

I got all of this. The problem that I was trying to figure out was
this.

Basically there are multiple combinatories here - I was hoping someone
could point me to a general approach. Writing the actual funtion is
not necessary - as you pointed out I can certainly do that. Here is my
problem - I did exactly as you and said OK I can

if Foundry==None and Process==None:

elif Foundy==None and Process!=None:

elif Foundy!=None and Process==None:

elif Foundy!=None and Process!=None:

But this seems very ugly... And if I want to do this for the three
areas, it seems very repetitive. I was looking for a better way to
break this down which is reusable.

Thanks for the help - I guess I should have been more clear.
 
R

rh0dium

Hey thanks - OK how would you arrange the data structure? I think that
is my problem - I can arrange in any order - I just want something
which makes sense - this "seemed" logical but can you point me in a
better method.. Basically I am parsing a directory structure:

TECHROOT/
130nm/
tsmc/
1p2m<blah>
2p2m<blah>
umc/
3p2m<blah>
180nm/
tsmc/
1p2m<blah>

Any thoughts -- To parse this I use this..


def _getNodes(self):
self.nodes={}
for node in os.listdir(self.foundrypath):
node = os.path.join( self.foundrypath, node)
if os.path.isdir(node):
self.logger.debug("Node found - %s" % node)
self.nodes[os.path.basename(node)]={}
else:
self.logger.debug("Excluding %s" % node)

return self.nodes

def _getFoundries(self):
for node in self.nodes:
node = os.path.join( self.foundrypath, node)
for foundry in os.listdir(node):
foundry = os.path.join( node, foundry)
if os.path.isdir(foundry):
self.logger.debug("Foundry found - %s" % foundry)
if os.path.basename(foundry) != "smsc":

self.nodes[os.path.basename(node)][os.path.basename(foundry)]=[]
else:
self.logger.debug("smsc foundry found - %s" %
foundry)
self.smscfoundry=1
else:
self.logger.debug("Excluding %s" % foundry)

def _getProcesses(self):
for node in self.nodes:
for foundry in self.nodes[node]:
foundry = os.path.join( self.foundrypath, node,
foundry)
for process in os.listdir(foundry):
process = os.path.join( foundry, process )
if os.path.isdir( process ):
self.logger.debug("Process found - %s" %
process )

self.nodes[node][os.path.basename(foundry)].append(os.path.basename(process))
else:
self.logger.debug("Excluding %s" % process)

Please feel free to comment and clean it up. I want it readable so I
can modify it later ;)
 
L

Lonnie Princehouse

The parsing is good; the structure can be transformed after the parsing
is done.

The basic problem with the "reverse lookup" search is that you need to
iterate over lots of things. If you're only doing one search, that's
not too horrible But if you're going to perform multiple searches, you
can do the iteration once and cache the results. This is a trade-off
between space and time, but the space shouldn't be a problem on a
modern computer unless you've got millions of nodes.

# take the already-defined dict and make some lookup tables with
foundries and processes as keys
from sets import Set
foundry_lookup_table = {}
process_lookup_table = {}

for node, foundries in dict.iteritems():
for foundry, processes in foundries.iteritems():
if foundry not in foundry_lookup_table:
foundry_lookup_table[foundry] = Set([node])
else:
foundry_lookup_table[foundry].add(node)
for process in processes:
if node not in process_lookup_table:
process_lookup_table[process] = Set([node])
else:
process_lookup_table[process].add(node)

# Now calls to getNode will be very fast
def getNode(Foundry=None, Process=None):
if Foundry is None and Process is None:
return dict.keys() # all nodes
elif Process is None:
return list(foundry_lookup_table.get(Foundry,Set()))
elif Foundry is None:
return list(process_lookup_table.get(Process,Set()))
else:
return list(
foundry_lookup_table.get(Foundry,Set()).intersection(process_lookup_table.get(Process,Set()))
 
L

Lonnie Princehouse

I made a logic error in that. Must be tired :-( Alas, there is no
undo on usenet.
 
D

Dennis Lee Bieber

I made a logic error in that. Must be tired :-( Alas, there is no
undo on usenet.

Well, if you are fast enough, and your server accepts them, there is
the CANCEL operation. But that may chase the original message around the
world until hitting some server that never cancels.
--
 
D

Dennis Lee Bieber

Basically there are multiple combinatories here - I was hoping someone
could point me to a general approach. Writing the actual funtion is

Lots of general approaches -- but it depends on what you find most
important.

You are attempting to retrieve data from a three level nested
dictionary; but by using any item as a key, it would seem.

Using your dictionary example, and unnesting gives:

130nm umc 1p6m_1.2-3.3_fsg_ms
180nm chartered 2p6m_1.8-3.3_sal_ms
180nm tsmc 1p6m_1.8-3.3_sal_log
180nm tsmc 1p6m_1.8-3.3_sal_ms
250nm umc 2p6m_1.8-3.3_sal_ms
250nm tsmc 1p6m_2.2-3.5_sal_log
250nm tsmc 1p6m_1.8-3.3_sal_ms

If the data is fixed {and I know it isn't, since I've looked ahead
to where you state you are traversing a file system to build the data}
I'd suggest prebuilding separate dictionaries for each look-up order you
need.

However... Did the above flattened list give you any ideas?

I'd suggest a lightweight RDBM (maybe you can get SQLite (sp?)
working [I'm not sure I found a prebuilt version that worked on my
machine]). Once you've populated the table, maybe create indices on
each column (and a composite unique key made from all three columns).
After that, all your retrieval becomes simple SELECT statements...

If the database is persistant, you can create an update process that
first creates file system path names from the table, does an I/O
operation to see if the file exists; if it doesn't, delete the record
having the three pieces. Then, after deletes complete (and if you have
the unique key) go over the file system trying inserts (catching
duplicate key errors for preexisting date -- ignore).
--
 
S

Scott David Daniels

rh0dium said:
Basically there are multiple combinatories here - I was hoping someone
could point me to a general approach. Writing the actual funtion is
not necessary - as you pointed out I can certainly do that. Here is my
problem - I did exactly as you and said OK I can

if Foundry==None and Process==None:

elif Foundy==None and Process!=None:

elif Foundy!=None and Process==None:

elif Foundy!=None and Process!=None:

But this seems very ugly... And if I want to do this for the three
areas, it seems very repetitive. I was looking for a better way to
break this down which is reusable.

First, use identity comparisons with None.
Second, I'd do it like this:

if Foundry is None:
# common ~Foundry stuff
if Process is None:
pass # ~F & ~P stuff
else:
pass # ~F & P stuff
# common ~Foundry stuff needing Process resolved
else:
# common Foundry stuff
if Process is None:
pass # F & ~P stuff
else:
pass # F & P stuff
# common ~Foundry stuff needing Process resolved

Unless there is more Process / ~Process common work, in which
case re-nest. Don't be scared to write "the obvious" in Python.

Second, once it grows to over two, think about doing things by cases:

def AllTrue(Foundry, Process, Vendor):
print 'Triple', (Foundry, Process, Vendor)
def NoVendor(Foundry, Process, Vendor):
print 'Pair', (Foundry, Process)
...
def Nothing(Foundry, Process, Vendor):
print 'Nada'
behavior = {0: AllTrue, 1:NoVendor, 2:NoProcess, 3: OnlyFoundry,
4: NoFoundry, 5:OnlyProcess, 6:OnlyVendor, 7: Nothing}
def dispatch(f, p, v):
return behavior[((f is None) * 2 + (p is None)) * 2
+ (v is None)](f, p, v)
...
dispatch('Intel', 'Silicon', None)


--Scott David Daniels
(e-mail address removed)
 
B

bruno at modulix

rh0dium said:
Hi all,

I have a dict which looks like this..

dict={'130nm': {'umc': ['1p6m_1.2-3.3_fsg_ms']},
'180nm': {'chartered': ['2p6m_1.8-3.3_sal_ms'], 'tsmc':
['1p6m_1.8-3.3_sal_log', '1p6m_1.8-3.3_sal_ms']},
'250nm': {'umc': ['2p6m_1.8-3.3_sal_ms'], 'tsmc':
['1p6m_2.2-3.5_sal_log', '1p6m_1.8-3.3_sal_ms']}
}

For clarification:
130nm,180nm,250nm refer to Nodes
tsmc,chartered,umc refer to Foundries
1p6m..blah, 2p6m..blah refer to Process

I want a function which can do this:

nodes = getNodes()
should return a list [130nm,180nm,250nm]

nodes = getNodes(Foundry="umc")
should return a list [130nm,250nm]

nodes = getNodes(Process="2p6m_1.8-3.3_sal_ms")
should return a list [180nm,250nm]

nodes = getNodes(Foundry="umc", Process="2p6m_1.8-3.3_sal_ms")
should return a list [250nm]

nodes = getNodes(Foundry="foo", Process="2p6m_1.8-3.3_sal_ms")
should return None

Obviously I want to extend this to Each area ( i.e., getFoundry(),
getProcess() ) but I can do that. I just don't know how to easily do
this reverse kind of look up? Can someone help me on this?

Not tested:


import pprint

data={
'130nm': {
'umc': ['1p6m_1.2-3.3_fsg_ms']
},
'180nm': {
'chartered': ['2p6m_1.8-3.3_sal_ms'],
'tsmc': ['1p6m_1.8-3.3_sal_log', '1p6m_1.8-3.3_sal_ms']
},
'250nm': {
'umc': ['2p6m_1.8-3.3_sal_ms'],
'tsmc':['1p6m_2.2-3.5_sal_log', '1p6m_1.8-3.3_sal_ms']
}
}


# ----------------------------------------------------------------------
class DataFinder(object):
def __init__(self, data):
self._data = data
self._nodes = data.keys()
self._foundries_nodes = {}
self._procs_nodes = {}
self._procs_foundries = {}

for node, foundries in data.items():
for foundry, procs in foundries.items():
self._foundries_nodes.setdefault(foundry,
[]).append(node)
for proc in procs:
self._procs_nodes.setdefault(proc, []).append(node)
self._procs_foundries.setdefault(proc,
[]).append(foundry)
self._foundries = self._foundries_nodes.keys()
self._procs = self._procs_foundries.keys()

# ------------------------------------------------------------------
# API
# ------------------------------------------------------------------
def get_nodes(foundry=None, process=None):
if foundry is None and process is None:
return self._nodes[:]
if process is None:
return self._get_nodes_for_foundry(foundry)
if foundry is None:
return self._get_nodes_for_proc(proc)
return self._get_nodes_for_foundry_and_proc(foundry, proc)

def get_foundries(node=None, process=None):
if node is None and process is None:
return self._foundries[:]
if process is None:
return self._get_foundries_for_node(node)
if node is None:
return self._get_foundries_for_proc(node)
return self._get_foundries_for_node_and_proc(node, proc)

# TODO : get_procs

#-------------------------------------------------------------------
# IMPL
# ------------------------------------------------------------------
def _get_foundries_for_node(self, node):
return self._foundries_nodes.get(node)

def _get_foundries_for_proc(self, proc):
return self._procs_foundries.get(proc)

def _get_foundries_for_node_and_proc(self, node, proc):
return list(set(self._get_foundries_for_node(node))
& set(self._get_foundries_for_proc(proc)))

# ------------------------------------------------------------------
def _get_nodes_for_foundry(self, foundry):
return self._foundries_nodes.get(foundry)

def _get_nodes_for_proc(self, proc):
return self._procs_nodes.get(proc)

def _get_nodes_for_foundry_and_proc(self, foundry, proc):
return list(set(self._get_nodes_for_foundry(foundry))
& set(self._get_nodes_for_proc(proc)))

# ------------------------------------------------------------------
# MISC
# ------------------------------------------------------------------
def _inspect(self):
ft = pprint.pformat
s = """
---
+ data :
%s
+ foundries_nodes :
%s
+ procs_nodes :
%s
+ procs_foundries :
%s
""" % (ft(self._data),
ft(self._foundries_nodes),
ft(self._procs_nodes),
ft(self._procs_foundries)
)
print s



HTH
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top