find sublist inside list

Matthias Gallé · May 4, 2009

Hi.

My problem is to replace all occurrences of a sublist with a new element.

Example:
Given ['a','c','a','c','c','g','a','c'] I want to replace all
occurrences of ['a','c'] by 6 (result [6,6,'c','g',6]).

If I do this with string ('acaccgac') I have the advantage of all the
'find' functions, but perfomance is bad and some extra care must be
taken if one element consist of more then one character (case of 11 for
example)

So I really would like to work with lists straightforward, but I could
not found anything to search a sublist inside a list.
Any propositions for a simple solution?

Thanks in advance,

--
Matthias Gallé
Project Symbiose
Centre de Recherche INRIA Rennes - Bretagne Atlantique,
Campus de Beaulieu, 35042 Rennes cedex, France
tel: (33|0) 2 9984 7523
http://www.irisa.fr/symbiose/matthias_galle

bearophileHUGS · May 4, 2009

Matthias Gallé:

My problem is to replace all occurrences of a sublist with a new element.
Example:
Given ['a','c','a','c','c','g','a','c'] I want to replace all
occurrences of ['a','c'] by 6 (result [6,6,'c','g',6]).

There are several ways to solve this problem. Representing a string as
a list of "chars" (1-strings) is fine if you work with small strings,
but once they get longer you need too much memory and time.

This is a starting point:

mapper = {"a":6, "c":6}
data = 'acaccgac'
[mapper.get(c, c) for c in data]

Click to expand...

Click to expand...

[6, 6, 6, 6, 6, 'g', 6, 6]

If you need higher performance, to represent few small numbers into a
well defined string like genomic data, you can use ASCII values of 6
and 11:

from string import maketrans
tab = maketrans("acgt", "".join([chr(6), chr(6), "gt"]))
s.translate(tab)

Click to expand...

Click to expand...

'\x06\x06\x06\x06\x06g\x06\x06'

Later in processing it's easy to tell apart normal genome bases from
those small numbers.

Note that there is the array.array function too in the standard lib,
and elsewhere there is Numpy too.

There are several other possible solutions, but I stop here so you can
explain your purposes and constraints better.

Bye,
bearophile

John O'Hagan · May 4, 2009

Hi.

My problem is to replace all occurrences of a sublist with a new element.

Example:
Given ['a','c','a','c','c','g','a','c'] I want to replace all
occurrences of ['a','c'] by 6 (result [6,6,'c','g',6]).

li=['a', 'c', 'a', 'c', 'c', 'g', 'a', 'c']
for i in range(len(li)):
if li[i:i + 2] == ['a', 'c']:
li[i:i + 2] = ['6']

HTH,

John

bearophileHUGS · May 4, 2009

John O'Hagan:

li=['a', 'c', 'a', 'c', 'c', 'g', 'a', 'c']
for i in range(len(li)):
if li[i:i + 2] == ['a', 'c']:
li[i:i + 2] = ['6']

Oh well, I have done a mistake, it seems.
Another solution then:
'\x06\x06cg\x06'

Bye,
bearophile

Matthias Gallé · May 4, 2009

John O'Hagan:

li=['a', 'c', 'a', 'c', 'c', 'g', 'a', 'c']
for i in range(len(li)):
if li[i:i + 2] == ['a', 'c']:
li[i:i + 2] = ['6']

Click to expand...

Oh well, I have done a mistake, it seems.
Another solution then:
'\x06\x06cg\x06'

Bye,
bearophile

Thanks bearophile and John for your quick answers.
Unfortunately, the int that can replace a sublist can be > 255, but
John's answer looks simple and good enough for me. I will use it as a
starting point.

Thank's again.

--
Matthias Gallé
Project Symbiose
Centre de Recherche INRIA Rennes - Bretagne Atlantique,
Campus de Beaulieu, 35042 Rennes cedex, France
tel: (33|0) 2 9984 7523
http://www.irisa.fr/symbiose/matthias_galle

bearophileHUGS · May 4, 2009

Matthias Gallé:

the int that can replace a sublist can be > 255,<

You didn't specify your integer ranges.
Probably there are many other solutions for your problem, but you have
to give more information. Like the typical array size, typical range
of the numbers, how much important is total memory used, how much
important is running speed, what kind of processing (or serialization/
output) you later have to do with such arrays, and so on.
Other solutions include using an array('H', []), and using 0-255 to
represent ASCII and numbers >255 <2^16 to represent the other numbers,
etc.
If speed is critical you can even think about creating a little
function with PyInline or D+Pyd, etc.

Bye,
bearophile

MRAB · May 4, 2009

Matthias said:
John O'Hagan:

li=['a', 'c', 'a', 'c', 'c', 'g', 'a', 'c']
for i in range(len(li)):
if li[i:i + 2] == ['a', 'c']:
li[i:i + 2] = ['6']

Click to expand...

Oh well, I have done a mistake, it seems.
Another solution then:

'acaccgac'.replace("ac", chr(6))

Click to expand...

'\x06\x06cg\x06'

Bye,
bearophile

Click to expand...

Thanks bearophile and John for your quick answers.
Unfortunately, the int that can replace a sublist can be > 255, but
John's answer looks simple and good enough for me. I will use it as a
starting point.

John's solution changes the length of the list over which it's
iterating.

I'd suggest something more like:

li = ['a', 'c', 'a', 'c', 'c', 'g', 'a', 'c']
pos = 0
try:
while True:
pos = li.index('a', pos)
if li[pos : pos + 2] == ['a', 'c']:
li[pos : pos + 2] = [6]
pos += 1
except ValueError:
pass

Aahz · May 4, 2009

My problem is to replace all occurrences of a sublist with a new element.

Example:
Given ['a','c','a','c','c','g','a','c'] I want to replace all
occurrences of ['a','c'] by 6 (result [6,6,'c','g',6]).

What's your goal? After you do this once, you cannot repeat the
operation with a different sublist because you are not tracking the
source of the numbers. You might look into standard compression
algorithms for information about how to accomplish this.

John O'Hagan · May 4, 2009

Hi.

My problem is to replace all occurrences of a sublist with a new
element.

Example:
Given ['a','c','a','c','c','g','a','c'] I want to replace all
occurrences of ['a','c'] by 6 (result [6,6,'c','g',6]).

Click to expand...

li=['a', 'c', 'a', 'c', 'c', 'g', 'a', 'c']
for i in range(len(li)):
if li[i:i + 2] == ['a', 'c']:
li[i:i + 2] = ['6']

HTH,

John

Click to expand...

Beware that you are mutating the list you are iterating over. That could
lead to some strange bugs (for instance if you replaced the deleted items
with a longer sequence, the range(len(li)) would still go up to the
original lenght).
It is better to modify a new list instead. Eg you could append to a new
list.

[...]

Quite right, while it happens to work in this particular example, as you and
MRAB point out, it's generally dangerous (and in fact this one silently and
uselessly iterates over the last couple of indexes which no longer exist); a
new list could be created like this:

index=0
newli=[]
while index<len(li):
if li[index:index+2]==['a', 'c']:
newli.append(6)
index += 2
else:
newli.append(li[index])
index += 1

Regards,

John

mzdude · May 4, 2009

[email protected] said:
[email protected] said:

John O'Hagan:

li=['a', 'c', 'a', 'c', 'c', 'g', 'a', 'c']
for i in range(len(li)):
if li[i:i + 2] == ['a', 'c']:
li[i:i + 2] = ['6']

Click to expand...

Click to expand...

Oh well, I have done a mistake, it seems.
Another solution then:

Bye,
bearophile

Click to expand...

Thanks bearophile and John for your quick answers.
Unfortunately, the int that can replace a sublist can be > 255, but
John's answer looks simple and good enough for me. I will use it as a
starting point.

substring isn't limited to 0..255'\x00x257\x00x257\x00x257\x00x257cg\x00x257\x00x257'

Gabriel Genellina · May 4, 2009

En Mon said:
substring isn't limited to 0..255
'\x00x257\x00x257\x00x257\x00x257cg\x00x257\x00x257'

This isn't what you think it is. Look carefully:

py> substring = "\0x%d\0x%d" % (257,257)
py> len(substring)
10
py> list(substring)
['\x00', 'x', '2', '5', '7', '\x00', 'x', '2', '5', '7']

Terry Reedy · May 4, 2009

Matthias said:
Hi.

My problem is to replace all occurrences of a sublist with a new element.

Example:
Given ['a','c','a','c','c','g','a','c'] I want to replace all
occurrences of ['a','c'] by 6 (result [6,6,'c','g',6]).

If I do this with string ('acaccgac') I have the advantage of all the
'find' functions, but perfomance is bad and some extra care must be
taken if one element consist of more then one character (case of 11 for
example)

So I really would like to work with lists straightforward, but I could
not found anything to search a sublist inside a list.
Any propositions for a simple solution?

For a mutable homogenous array, consider the array module.
Any algorithm that applies to a sequence of chars can be adjusted to
other sequences. For the above case, remember than you can easily
filter None out of a sequence. IE, replace 'a','c' with 6, None and
then filter when done.

Gerard Flanagan · May 5, 2009

Matthias said:
Hi.

My problem is to replace all occurrences of a sublist with a new element.

Example:
Given ['a','c','a','c','c','g','a','c'] I want to replace all
occurrences of ['a','c'] by 6 (result [6,6,'c','g',6]).

For novelty value:

from itertools import izip

def replace2(data, pattern):
assert len(pattern) == 2
pattern = tuple(pattern)
icopy = iter(data)
icopy.next()
gen = izip(data, icopy)
while True:
item = gen.next()
if item == pattern:
yield '6'
gen.next()
else:
yield item[0]

# works if list ends with ['a', 'c']
data = ['g', 'a', 'c', 'a', 'c', 'a', 'a', 'a', 'g', 'a', 'c']
want = 'g66aaag6'
assert ''.join(replace2(data, ['a', 'c'])) == want

# otherwise you lose the last element of the tail
data = ['g', 'a', 'c', 'a', 'c', 'a', 'a', 'a', 'g', 'a', 'c', 'c', 'g']
want = 'g66aaag6cg'
get = 'g66aaag6c'
assert not ''.join(replace2(data, ['a', 'c'])) == want
assert ''.join(replace2(data, ['a', 'c'])) == get

# fix by adding the pattern to the end of the data as a sentinel

def replace2(data, pattern):
assert len(pattern) == 2
def _replace2(data, pattern):
pattern = tuple(pattern)
icopy = iter(data)
icopy.next()
gen = izip(data, icopy)
while True:
item = gen.next()
if item == pattern:
yield '6'
gen.next()
else:
yield item[0]
data = data + pattern
return list(_replace2(data, pattern))[:-1]

data = ['g', 'a', 'c', 'a', 'c', 'a', 'a', 'a', 'g', 'a', 'c']
want = 'g66aaag6'
assert ''.join(replace2(data, ['a', 'c'])) == want

data = ['g', 'a', 'c', 'a', 'c', 'a', 'a', 'a', 'g', 'a', 'c', 'c', 'g']
want = 'g66aaag6cg'
assert ''.join(replace2(data, ['a', 'c'])) == want

print 'done'

mzdude · May 5, 2009

En Mon said:
En Mon said:

substring isn't limited to 0..255
'\x00x257\x00x257\x00x257\x00x257cg\x00x257\x00x257'

Click to expand...

This isn't what you think it is. Look carefully:

py> substring = "\0x%d\0x%d" % (257,257)
py> len(substring)
10
py> list(substring)
['\x00', 'x', '2', '5', '7', '\x00', 'x', '2', '5', '7']

OOPS. My bad. But I'm not going to give up.

l = ['a','b','c','a','c']
us = unicode("".join(l))
substr = unichr(257) + unichr(257)
us = us.replace(u'ac',substr)
print len(us)
print list(us)

output is5
[u'a', u'b', u'c', u'\u0101', u'\u0101']

sampling items from a nested list	5	Feb 16, 2005
substitution of list elements	2	Jul 18, 2008
20050111: list basics	0	Jan 12, 2005
Engineering a list container. Part 1.	71	Dec 7, 2013
EUROMEDIA 2008, April 9-11, 2008, FEUP-University of Porto, Portugal- 2nd Call for Papers	0	Dec 27, 2007
Call for Papers Reminder: The 2013 International Conference ofParallel and Distributed Computing (IC	0	Feb 25, 2013
Call for Papers Reminder: The 2013 International Conference ofInformation Security and Internet Engi	0	Feb 18, 2013
parsing string to a list of objects - help!	7	Nov 5, 2007

find sublist inside list

Matthias Gallé

bearophileHUGS

John O'Hagan

bearophileHUGS

Matthias Gallé

bearophileHUGS

MRAB

Aahz

John O'Hagan

mzdude

Gabriel Genellina

Terry Reedy

Gerard Flanagan

mzdude

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads