Searching for uniqness in a list of data

rh0dium · Mar 1, 2006

Hi all,

I am having a bit of difficulty in figuring out an efficient way to
split up my data and identify the unique pieces of it.

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']

Now I want to split each item up on the "_" and compare it with all
others on the list, if there is a difference I want to create a list of
the possible choices, and ask the user which choice of the list they
want. I have the questioning part under control. I can't seem to get
my hands around the logic - the list could be 2 items or 100 long. The
point of this is that I am trying to narrow a decision down for an end
user. In other words the end user needs to select one of the list
items, and by breaking it down for them I hope to simplify this.

list=['1p2m_3.3-1.8v_sal_ms','1p6m_3.3-1.8_sal_log']
would only question the first data set ['1p2m', '1p6m' ]

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
If on the list ['1p2m','1p2m','1p3m'] the user selected 1p2m then the
next list would only be ['sal','pol']
but if the user initially only selected 1p3m they would be done..

I hope this clarifies what I am trying to do. I just can't seem to get
my hands around this - so an explaination of logic would really be
helpfull. I picture a 2d list but I can't seem to get it..

Claudio Grondi · Mar 1, 2006

rh0dium said:
Hi all,

I am having a bit of difficulty in figuring out an efficient way to
split up my data and identify the unique pieces of it.

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']

Now I want to split each item up on the "_" and compare it with all
others on the list, if there is a difference I want to create a list of
the possible choices, and ask the user which choice of the list they
want. I have the questioning part under control. I can't seem to get
my hands around the logic - the list could be 2 items or 100 long. The
point of this is that I am trying to narrow a decision down for an end
user. In other words the end user needs to select one of the list
items, and by breaking it down for them I hope to simplify this.

list=['1p2m_3.3-1.8v_sal_ms','1p6m_3.3-1.8_sal_log']
would only question the first data set ['1p2m', '1p6m' ]

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
If on the list ['1p2m','1p2m','1p3m'] the user selected 1p2m then the
next list would only be ['sal','pol']
but if the user initially only selected 1p3m they would be done..

I hope this clarifies what I am trying to do. I just can't seem to get
my hands around this - so an explaination of logic would really be
helpfull. I picture a 2d list but I can't seem to get it..

<code>
list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
dictQlevel_1={}
dictQlevel_2={}
dictQlevel_3={}
for item in list:
splitted = item.split('_')
dictQlevel_1[splitted[0]] = True
dictQlevel_2[splitted[1]] = True
dictQlevel_3[splitted[2]] = True

print 'choose one of: '
for key_1 in dictQlevel_1.keys():
print key_1
print
usrInput = raw_input()

if usrInput == '':
print 'choose one of: '
for key_1 in dictQlevel_1.keys():
for key_2 in dictQlevel_2.keys():
print key_1, key_2
print
usrInput = raw_input()
else:
pass
# or do something

# etc.
</code>

Hope it is what you are looking for.

Claudio

Alexander Schmolck · Mar 1, 2006

rh0dium said:
Hi all,

I am having a bit of difficulty in figuring out an efficient way to
split up my data and identify the unique pieces of it.

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']

Now I want to split each item up on the "_" and compare it with all
others on the list, if there is a difference I want to create a list of
the possible choices, and ask the user which choice of the list they
want. I have the questioning part under control. I can't seem to get
my hands around the logic - the list could be 2 items or 100 long. The
point of this is that I am trying to narrow a decision down for an end
user. In other words the end user needs to select one of the list
items, and by breaking it down for them I hope to simplify this.

list=['1p2m_3.3-1.8v_sal_ms','1p6m_3.3-1.8_sal_log']
would only question the first data set ['1p2m', '1p6m' ]

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
If on the list ['1p2m','1p2m','1p3m'] the user selected 1p2m then the
next list would only be ['sal','pol']
but if the user initially only selected 1p3m they would be done..

I hope this clarifies what I am trying to do. I just can't seem to get
my hands around this - so an explaination of logic would really be
helpfull. I picture a 2d list but I can't seem to get it..

The easiest way to do this is to have a nested dictionary of prefixes: for
each prefix as key add a nested dictionary of the rest of the split as value
or an empty dict if the split is empty. Accessing the dict with an userinput
will give you all the possible next choices.

Spoiler Warning -- sample implementation follows below.

(mostly untested)

def addSplit(d, split):
if len(split):
if split[0] not in d:
d[split[0]] = addSplit({}, split[1:])
else:
addSplit(d[split[0]], split[1:])
return d
def queryUser(chosen, choices):
next = raw_input('So far: %s\nNow type one of %s: ' %
(chosen,choices.keys()))
return chosen+next, choices[next]
wordList=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
choices = reduce(addSplit,(s.split('_') for s in wordList), {})
chosen = ""
while choices:
chosen, choices = queryUser(chosen, choices)
print "You chose:", chosen

'as

johnzenger · Mar 1, 2006

You can come quite close to what you want without splitting the string
at all. It sounds like you are asking the user to build up a string,
and you want to keep checking through your list to find any items that
begin with the string built up by the user. Try something like this:

mylist = ['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']
sofar = ""

loop = True
while loop:
selections = [ x[len(sofar):x.index("_", len(sofar) + 1)]
for x in mylist if x.startswith(sofar) ]
loop = len(selections) > 1
if loop:
print selections
sofar += raw_input("Pick one of those: ")

Alexander Schmolck · Mar 1, 2006

Alexander Schmolck said:
The easiest way to do this is to have a nested dictionary of prefixes: for
each prefix as key add a nested dictionary of the rest of the split as value
or an empty dict if the split is empty. Accessing the dict with an userinput
will give you all the possible next choices.

Oops I was reading this too hastily -- forgot to compact and take care of sep.
You might also want to google 'trie', BTW.

(again, not really tested)

def addSplit(d, split):
if len(split):
if split[0] not in d:
d[split[0]] = addSplit({}, split[1:])
else:
addSplit(d[split[0]], split[1:])
return d
def compactify(choices, parentKey='', sep=''):
if len(choices) == 1:
return compactify(choices.values()[0],
parentKey+sep+choices.keys()[0], sep)
else:
for key in choices.keys():
newKey, newValue = compactify(choices[key], key, sep)
if newKey != key: del choices[key]
choices[newKey] = newValue
return (parentKey, choices)
def queryUser(chosen, choices, sep=''):
next = raw_input('So far: %s\nNow type one of %s: ' %
(chosen,choices.keys()))
return chosen+sep+next, choices[next]
wordList=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']
choices = compactify(reduce(addSplit,(s.split('_') for s in wordList), {}),
sep='_')[1]
chosen = ""

while choices:
chosen, choices = queryUser(chosen, choices, '_')
print "You chose:", chosen

Paul McGuire · Mar 1, 2006

rh0dium said:
Hi all,

I am having a bit of difficulty in figuring out an efficient way to
split up my data and identify the unique pieces of it.

list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']

Now I want to split each item up on the "_" and compare it with all
others on the list, if there is a difference I want to create a list of
the possible choices, and ask the user which choice of the list they
want.

<snip>

Check out difflib.

data=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']
data[0].split("_") ['1p2m', '3.3-1.8v', 'sal', 'ms']
data[1].split("_") ['1p2m', '3.3-1.8', 'sal', 'log']
from difflib import SequenceMatcher
s = SequenceMatcher(None, data[0].split("_"), data[1].split("_"))
s.matching_blocks

Click to expand...

Click to expand...

[(0, 0, 1), (2, 2, 1), (4, 4, 0)]

I believe one interprets the tuples in matching_blocks as:
(seq1index,seq2index,numberOfMatchingItems)

In your case, the sequences have a matching element 0 and matching element
2, each of length 1. I don't fully grok the meaning of the (4,4,0) tuple,
unless this is intended to show that both sequences have the same length.

Perhaps from here, you could locate the gaps in the
SequenceMatcher.matching_blocks property, and prompt for the user's choice.

-- Paul

Bruno Desthuilliers · Mar 2, 2006

Claudio Grondi a écrit :
(snip)

<code>
list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8v_pol_ms','1p3m_3.3-18.v_sal_ms']

Avoid using 'list' as an identifier.

(snip)

Average of MultiMode of a list of a list	1	Oct 28, 2022
Add a list of videos each one in a different button in a web page	1	Dec 10, 2022
Can't copy lowercase version of list	3	Sep 23, 2023
Searching for Lottery drawing list of ticket match...	3	Aug 10, 2011
How does a HEAD pointer end up pointing to the first node in a linked list?	3	Jan 24, 2023
Hot to get the list of folders in google drive using php and curl	2	Oct 10, 2023
I am having trouble finding a method of using the git enterprise api to scrape data from projects	1	Jun 1, 2023
searching in list	4	May 30, 2011

Searching for uniqness in a list of data

rh0dium

Claudio Grondi

Alexander Schmolck

johnzenger

Alexander Schmolck

Paul McGuire

Bruno Desthuilliers

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads