Searching for uniqness in a list of data

R

rh0dium

Hi all,

I am having a bit of difficulty in figuring out an efficient way to
split up my data and identify the unique pieces of it.

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']

Now I want to split each item up on the "_" and compare it with all
others on the list, if there is a difference I want to create a list of
the possible choices, and ask the user which choice of the list they
want. I have the questioning part under control. I can't seem to get
my hands around the logic - the list could be 2 items or 100 long. The
point of this is that I am trying to narrow a decision down for an end
user. In other words the end user needs to select one of the list
items, and by breaking it down for them I hope to simplify this.

list=['1p2m_3.3-1.8v_sal_ms','1p6m_3.3-1.8_sal_log']
would only question the first data set ['1p2m', '1p6m' ]

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
If on the list ['1p2m','1p2m','1p3m'] the user selected 1p2m then the
next list would only be ['sal','pol']
but if the user initially only selected 1p3m they would be done..

I hope this clarifies what I am trying to do. I just can't seem to get
my hands around this - so an explaination of logic would really be
helpfull. I picture a 2d list but I can't seem to get it..
 
C

Claudio Grondi

rh0dium said:
Hi all,

I am having a bit of difficulty in figuring out an efficient way to
split up my data and identify the unique pieces of it.

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']

Now I want to split each item up on the "_" and compare it with all
others on the list, if there is a difference I want to create a list of
the possible choices, and ask the user which choice of the list they
want. I have the questioning part under control. I can't seem to get
my hands around the logic - the list could be 2 items or 100 long. The
point of this is that I am trying to narrow a decision down for an end
user. In other words the end user needs to select one of the list
items, and by breaking it down for them I hope to simplify this.

list=['1p2m_3.3-1.8v_sal_ms','1p6m_3.3-1.8_sal_log']
would only question the first data set ['1p2m', '1p6m' ]

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
If on the list ['1p2m','1p2m','1p3m'] the user selected 1p2m then the
next list would only be ['sal','pol']
but if the user initially only selected 1p3m they would be done..

I hope this clarifies what I am trying to do. I just can't seem to get
my hands around this - so an explaination of logic would really be
helpfull. I picture a 2d list but I can't seem to get it..
<code>
list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
dictQlevel_1={}
dictQlevel_2={}
dictQlevel_3={}
for item in list:
splitted = item.split('_')
dictQlevel_1[splitted[0]] = True
dictQlevel_2[splitted[1]] = True
dictQlevel_3[splitted[2]] = True

print 'choose one of: '
for key_1 in dictQlevel_1.keys():
print key_1
print
usrInput = raw_input()

if usrInput == '':
print 'choose one of: '
for key_1 in dictQlevel_1.keys():
for key_2 in dictQlevel_2.keys():
print key_1, key_2
print
usrInput = raw_input()
else:
pass
# or do something

# etc.
</code>

Hope it is what you are looking for.

Claudio
 
A

Alexander Schmolck

rh0dium said:
Hi all,

I am having a bit of difficulty in figuring out an efficient way to
split up my data and identify the unique pieces of it.

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']

Now I want to split each item up on the "_" and compare it with all
others on the list, if there is a difference I want to create a list of
the possible choices, and ask the user which choice of the list they
want. I have the questioning part under control. I can't seem to get
my hands around the logic - the list could be 2 items or 100 long. The
point of this is that I am trying to narrow a decision down for an end
user. In other words the end user needs to select one of the list
items, and by breaking it down for them I hope to simplify this.

list=['1p2m_3.3-1.8v_sal_ms','1p6m_3.3-1.8_sal_log']
would only question the first data set ['1p2m', '1p6m' ]

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
If on the list ['1p2m','1p2m','1p3m'] the user selected 1p2m then the
next list would only be ['sal','pol']
but if the user initially only selected 1p3m they would be done..

I hope this clarifies what I am trying to do. I just can't seem to get
my hands around this - so an explaination of logic would really be
helpfull. I picture a 2d list but I can't seem to get it..

The easiest way to do this is to have a nested dictionary of prefixes: for
each prefix as key add a nested dictionary of the rest of the split as value
or an empty dict if the split is empty. Accessing the dict with an userinput
will give you all the possible next choices.

Spoiler Warning -- sample implementation follows below.
































(mostly untested)

def addSplit(d, split):
if len(split):
if split[0] not in d:
d[split[0]] = addSplit({}, split[1:])
else:
addSplit(d[split[0]], split[1:])
return d
def queryUser(chosen, choices):
next = raw_input('So far: %s\nNow type one of %s: ' %
(chosen,choices.keys()))
return chosen+next, choices[next]
wordList=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
choices = reduce(addSplit,(s.split('_') for s in wordList), {})
chosen = ""
while choices:
chosen, choices = queryUser(chosen, choices)
print "You chose:", chosen

'as
 
J

johnzenger

You can come quite close to what you want without splitting the string
at all. It sounds like you are asking the user to build up a string,
and you want to keep checking through your list to find any items that
begin with the string built up by the user. Try something like this:

mylist = ['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']
sofar = ""

loop = True
while loop:
selections = [ x[len(sofar):x.index("_", len(sofar) + 1)]
for x in mylist if x.startswith(sofar) ]
loop = len(selections) > 1
if loop:
print selections
sofar += raw_input("Pick one of those: ")
 
A

Alexander Schmolck

Alexander Schmolck said:
The easiest way to do this is to have a nested dictionary of prefixes: for
each prefix as key add a nested dictionary of the rest of the split as value
or an empty dict if the split is empty. Accessing the dict with an userinput
will give you all the possible next choices.

Oops I was reading this too hastily -- forgot to compact and take care of sep.
You might also want to google 'trie', BTW.


(again, not really tested)


def addSplit(d, split):
if len(split):
if split[0] not in d:
d[split[0]] = addSplit({}, split[1:])
else:
addSplit(d[split[0]], split[1:])
return d
def compactify(choices, parentKey='', sep=''):
if len(choices) == 1:
return compactify(choices.values()[0],
parentKey+sep+choices.keys()[0], sep)
else:
for key in choices.keys():
newKey, newValue = compactify(choices[key], key, sep)
if newKey != key: del choices[key]
choices[newKey] = newValue
return (parentKey, choices)
def queryUser(chosen, choices, sep=''):
next = raw_input('So far: %s\nNow type one of %s: ' %
(chosen,choices.keys()))
return chosen+sep+next, choices[next]
wordList=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
choices = compactify(reduce(addSplit,(s.split('_') for s in wordList), {}),
sep='_')[1]
chosen = ""

while choices:
chosen, choices = queryUser(chosen, choices, '_')
print "You chose:", chosen
 
P

Paul McGuire

rh0dium said:
Hi all,

I am having a bit of difficulty in figuring out an efficient way to
split up my data and identify the unique pieces of it.

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']

Now I want to split each item up on the "_" and compare it with all
others on the list, if there is a difference I want to create a list of
the possible choices, and ask the user which choice of the list they
want.
<snip>

Check out difflib.
data=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']
data[0].split("_") ['1p2m', '3.3-1.8v', 'sal', 'ms']
data[1].split("_") ['1p2m', '3.3-1.8', 'sal', 'log']
from difflib import SequenceMatcher
s = SequenceMatcher(None, data[0].split("_"), data[1].split("_"))
s.matching_blocks
[(0, 0, 1), (2, 2, 1), (4, 4, 0)]

I believe one interprets the tuples in matching_blocks as:
(seq1index,seq2index,numberOfMatchingItems)

In your case, the sequences have a matching element 0 and matching element
2, each of length 1. I don't fully grok the meaning of the (4,4,0) tuple,
unless this is intended to show that both sequences have the same length.

Perhaps from here, you could locate the gaps in the
SequenceMatcher.matching_blocks property, and prompt for the user's choice.

-- Paul
 
B

Bruno Desthuilliers

Claudio Grondi a écrit :
(snip)
<code>
list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']

Avoid using 'list' as an identifier.

(snip)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top