Wrapping around a list

A

amjadcsu

Hello,
I am working on a problem (Bioinformatics domain) where all possible combinations of input string needs to be printed as sublist

For example:
Input string : "LEQN"
Output= "[L","E","Q","N"]["LE","EQ","QN","NL"] ["LEQ","EQN","QNE","NLE"]
["LEQN"]

The code i have written for this is as follows:

from itertools import chain, repeat, islice
from collections import deque

def sliding_window(iterable, n, fill=False, fillvalue=None):
it = iter(iterable)
if fill:
it = chain(it, repeat(fillvalue, n - 1))
w = deque(islice(it, n - 1))
for x in it:
w.append(x)
yield w
w.popleft()

input="LENQ"

lstinput= list(input)
lenlstinput=len(lstinput)
list1=[''.join(x) for x in sliding_window(lstinput, 2)]
list2= [''.join(x) for x in sliding_window(lstinput, 3)]
list3= [''.join(x) for x in sliding_window(lstinput, 4)]



The output i get as follows:
List 1 is ['LE', 'EN', 'NQ'] Should be ['LE','EN','NQ','QL']
List 2 is ['LEN', 'ENQ'] Should be ['LEN','ENQ','NQL','QLE']

So the question i am asking , how can i add wrapping around sublist in my sliding window function.

Thanks
 
C

Chris Angelico

So the question i am asking , how can i add wrapping around sublist in my sliding window function.

By the look of what you now have, the easiest way would probably be to
duplicate the list before you slide. So instead of working on "LEQN",
you work on "LEQNLEQN" - or possibly "LEQNLEQ", or other shortened
versions, if it's easier than pulling out the duplicates.

ChrisA
 
A

Amjad Syed

By the look of what you now have, the easiest way would probably be to

duplicate the list before you slide. So instead of working on "LEQN",

you work on "LEQNLEQN" - or possibly "LEQNLEQ", or other shortened

versions, if it's easier than pulling out the duplicates.



ChrisA


Thanks Chris for the reply. But i would like sliding function to be scalable, as input string can be of 100 letters.
 
C

Chris Angelico

Thanks Chris for the reply. But i would like sliding function to be scalable, as input string can be of 100 letters.

A hundred isn't much to work with, and your code will be fairly
simple. Give it a try with small strings, see how it goes; then try it
on your full-size - I doubt there'll be a problem.

Now, if you were talking about a hundred million letters, then maybe
there'd be a problem. But even there, I'd start with the simple and
clean approach, and only optimize for performance when it becomes
obviously necessary (like when my MUD client was able to saturate one
core of my CPU, just by me typing a lot of commands very rapidly -
that needed fixing!).

ChrisA
 
N

Ned Batchelder

A hundred isn't much to work with, and your code will be fairly
simple. Give it a try with small strings, see how it goes; then try it
on your full-size - I doubt there'll be a problem.

Now, if you were talking about a hundred million letters, then maybe
there'd be a problem. But even there, I'd start with the simple and
clean approach, and only optimize for performance when it becomes
obviously necessary (like when my MUD client was able to saturate one
core of my CPU, just by me typing a lot of commands very rapidly -
that needed fixing!).

ChrisA

Using ChrisA's idea:

def sliding_window(iterable, n, fill=None):
values = list(iterable)
num_values = len(values)
if fill is not None:
values.extend([fill]*(n-1))
need_more = (2*num_values-1) - len(values)
values.extend(values[:need_more])
for start in range(num_values):
yield values[start:start+n]

l = list("LEQNABC")
for n in range(2, len(l)+1):
print [''.join(x) for x in sliding_window(l, n)]
for n in range(2, len(l)+1):
print [''.join(x) for x in sliding_window(l, n, fill="_")]

Produces:

['LE', 'EQ', 'QN', 'NA', 'AB', 'BC', 'CL']
['LEQ', 'EQN', 'QNA', 'NAB', 'ABC', 'BCL', 'CLE']
['LEQN', 'EQNA', 'QNAB', 'NABC', 'ABCL', 'BCLE', 'CLEQ']
['LEQNA', 'EQNAB', 'QNABC', 'NABCL', 'ABCLE', 'BCLEQ', 'CLEQN']
['LEQNAB', 'EQNABC', 'QNABCL', 'NABCLE', 'ABCLEQ', 'BCLEQN', 'CLEQNA']
['LEQNABC', 'EQNABCL', 'QNABCLE', 'NABCLEQ', 'ABCLEQN', 'BCLEQNA',
'CLEQNAB']
['LE', 'EQ', 'QN', 'NA', 'AB', 'BC', 'C_']
['LEQ', 'EQN', 'QNA', 'NAB', 'ABC', 'BC_', 'C__']
['LEQN', 'EQNA', 'QNAB', 'NABC', 'ABC_', 'BC__', 'C___']
['LEQNA', 'EQNAB', 'QNABC', 'NABC_', 'ABC__', 'BC___', 'C____']
['LEQNAB', 'EQNABC', 'QNABC_', 'NABC__', 'ABC___', 'BC____', 'C_____']
['LEQNABC', 'EQNABC_', 'QNABC__', 'NABC___', 'ABC____', 'BC_____',
'C______']

100 elements is really nothing, don't try to over-optimize it. And if
your inputs are really strings, not general iterables, you can use the
same logic but with string operations, and you'll likely have better
performance anyway. Less general, true, but better for your actual problem.

--Ned.
 
N

Neil Cerutti

I am working on a problem (Bioinformatics domain) where all
possible combinations of input string needs to be printed as
sublist

For example:
Input string : "LEQN"
Output= "[L","E","Q","N"]
["LE","EQ","QN","NL"]["LEQ","EQN","QNE","NLE"]["LEQN"]

How about itertools.combinations?

import itertools

s = "LEQN"
for i in range(len(s)):
for comb in itertools.combinations(s, i+1):
print(comb)
Output:
('L',)
('E',)
('Q',)
('N',)
('L', 'E')
('L', 'Q')
('L', 'N')
('E', 'Q')
('E', 'N')
('Q', 'N')
('L', 'E', 'Q')
('L', 'E', 'N')
('L', 'Q', 'N')
('E', 'Q', 'N')
('L', 'E', 'Q', 'N')

For some reason I've got more 2-character combinations than you,
though.
 
R

rusi

Hello,

I am working on a problem (Bioinformatics domain) where all possible combinations of input string needs to be printed as sublist

If we take the standard combinations (Pascal triangle) result
nCr + nCr-1 = n+1Cr
and make it into a recursive python function we get:

def c(n,r):
return (1 if r == 0 else
0 if n == 0 else
c(n-1,r) + c(n-1,r-1))

(yeah the base cases are tricky)

Now instead of enumerating the *number* of combinations if we want the
combinations *themselves* we can do almost isomorphically:
[Writing d analogous to c]

def d(l,r):
if r == 0: return [[]]
if not l: return []
x = l[0]
xs = l[1:]
return d(xs,r) + [[x]+c for c in d(xs,(r-1))]

Note the difference in types:
c(4,2) 6
d("LEQN", 2) [['Q', 'N'], ['E', 'N'], ['E', 'Q'], ['L', 'N'], ['L', 'Q'], ['L', 'E']]

Now we can generator-ize it like so:

def e(l,r):
if r == 0:
yield []
elif l:
x = l[0]
xs = l[1:]
for y in chain(e(xs,r),([x]+c for c in e(xs,(r-1)))):
yield y
# python 3: yield from chain(e(xs,r),([x]+c for c in e(xs,(r-1))))


called as
list(e("LEQN", 2)) [['Q', 'N'], ['E', 'N'], ['E', 'Q'], ['L', 'N'], ['L', 'Q'], ['L', 'E']]

BTW: This is neater in Haskell than in Python.

c n 0 = 1
c 0 r = 0
c n r = c (n-1) r + c (n-1) (r-1)


d l 0 = [[]]
d [] r = []
d (x:xs) r = d xs r ++ [x:c | c <- d xs (r-1)]

In particular the 'd' has list elegance of the python 'd' and generator
efficiency of python 'e'
 
P

Peter Pearson

I am working on a problem (Bioinformatics domain) where all
possible combinations of input string needs to be printed as
sublist

For example:
Input string : "LEQN"
Output= "[L","E","Q","N"]
["LE","EQ","QN","NL"]["LEQ","EQN","QNE","NLE"]["LEQN"]

How about itertools.combinations?

import itertools

s = "LEQN"
for i in range(len(s)):
for comb in itertools.combinations(s, i+1):
print(comb)
Output:
('L',)
('E',)
('Q',)
('N',)
('L', 'E')
('L', 'Q')
('L', 'N')
('E', 'Q')
('E', 'N')
('Q', 'N')
('L', 'E', 'Q')
('L', 'E', 'N')
('L', 'Q', 'N')
('E', 'Q', 'N')
('L', 'E', 'Q', 'N')

For some reason I've got more 2-character combinations than you,
though.

The original poster's combinations comprise letters that are *contiguous*,
in a circular sense. Yours include non-contiguous sets, like LQ.
 
A

Amjad Syed

A hundred isn't much to work with, and your code will be fairly

simple. Give it a try with small strings, see how it goes; then try it

on your full-size - I doubt there'll be a problem.



Now, if you were talking about a hundred million letters, then maybe

there'd be a problem. But even there, I'd start with the simple and

clean approach, and only optimize for performance when it becomes

obviously necessary (like when my MUD client was able to saturate one

core of my CPU, just by me typing a lot of commands very rapidly -

that needed fixing!).



ChrisA

Thanks Chris, the duplication did solve the problem.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,016
Latest member
TatianaCha

Latest Threads

Top