Extracting subsequences composed of the same character

C

candide

Suppose you have a string, for instance

"pyyythhooonnn ---> ++++"

and you search for the subquences composed of the same character, here
you get :

'yyy', 'hh', 'ooo', 'nnn', '---', '++++'

It's not difficult to write a Python code that solves the problem, for
instance :

def f(text):
ch=text
r=[]
if not text:
return r
else:
x=ch[0]
i=0
for c in ch:
if c!=x:
if i>1:
r+=[x*i]
x=c
i=1
else:
i+=1
return r+(i>1)*[i*x]

print f("pyyythhooonnn ---> ++++")


I should confess that this code is rather cumbersome so I was looking
for an alternative. I imagine that a regular expressions approach could
provide a better method. Does a such code exist ? Note that the string
is not restricted to the ascii charset.
 
M

MRAB

Suppose you have a string, for instance

"pyyythhooonnn ---> ++++"

and you search for the subquences composed of the same character, here
you get :

'yyy', 'hh', 'ooo', 'nnn', '---', '++++'

It's not difficult to write a Python code that solves the problem, for
instance :
[snip]

I should confess that this code is rather cumbersome so I was looking
for an alternative. I imagine that a regular expressions approach could
provide a better method. Does a such code exist ? Note that the string
is not restricted to the ascii charset.
[('yyy', 'y'), ('hh', 'h'), ('ooo', 'o'), ('nnn', 'n'), ('---', '-'),
('++++', '+')]
[m[0] for m in re.findall(r"((.)\2+)", s)]
['yyy', 'hh', 'ooo', 'nnn', '---', '++++']
 
R

Roy Smith

candide said:
Suppose you have a string, for instance

"pyyythhooonnn ---> ++++"

and you search for the subquences composed of the same character, here
you get :

'yyy', 'hh', 'ooo', 'nnn', '---', '++++'

I got the following. It's O(n) (with the minor exception that the string
addition isn't, but that's trivial to fix, and in practice, the bunches
are short enough it hardly matters).

#!/usr/bin/env python

s = "pyyythhooonnn ---> ++++"
answer = ['yyy', 'hh', 'ooo', 'nnn', '---', '++++']

last = None
bunches = []
bunch = ''
for c in s:
if c == last:
bunch += c
else:
if bunch:
bunches.append(bunch)
bunch = c
last = c
bunches.append(bunch)

multiples = [bunch for bunch in bunches if len(bunch) > 1]
print multiples
assert(multiples == answer)


[eagerly awaiting a PEP for collections.bunch and
collections.frozenbunch]
 
T

Tim Chase

Suppose you have a string, for instance

"pyyythhooonnn ---> ++++"

and you search for the subquences composed of the same character, here
you get :

'yyy', 'hh', 'ooo', 'nnn', '---', '++++'
import re
s = "pyyythhooonnn ---> ++++"
[m.group(0) for m in re.finditer(r"(.)\1+", s)] ['yyy', 'hh', 'ooo', 'nnn', '---', '++++']
[(m.group(0),m.group(1)) for m in re.finditer(r"(.)\1+", s)]
[('yyy', 'y'), ('hh', 'h'), ('ooo', 'o'), ('nnn', 'n'), ('---',
'-'), ('++++', '+')]

-tkc
 
T

Tim Chase

"pyyythhooonnn ---> ++++"

and you search for the subquences composed of the same character, here
you get :

'yyy', 'hh', 'ooo', 'nnn', '---', '++++'

Or, if you want to do it with itertools instead of the "re" module:
s = "pyyythhooonnn ---> ++++"
from itertools import groupby
[c*length for c, length in ((k, len(list(g))) for k, g in
groupby(s)) if length > 1]
['yyy', 'hh', 'ooo', 'nnn', '---', '++++']


-tkc
 
T

Terry Reedy

"pyyythhooonnn ---> ++++"

and you search for the subquences composed of the same character, here
you get :

'yyy', 'hh', 'ooo', 'nnn', '---', '++++'

Or, if you want to do it with itertools instead of the "re" module:
s = "pyyythhooonnn ---> ++++"
from itertools import groupby
[c*length for c, length in ((k, len(list(g))) for k, g in
groupby(s)) if length > 1]
['yyy', 'hh', 'ooo', 'nnn', '---', '++++']

Slightly shorter:
[r for r in (''.join(g) for k, g in groupby(s)) if len(r) > 1]
 
C

candide

Thanks, yours responses gave me the opportunity to understand the
"backreference" feature, it was not clear in spite of my intensive study
of the well known RE howto manual.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,481
Members
44,900
Latest member
Nell636132

Latest Threads

Top