finding indices in a sequence of parentheses

Steven Bethard · May 29, 2005

I have a list of strings that looks something like:

lst = ['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*', '*)', '*)']

The parentheses in the labels indicate where an "annotation" starts and
ends. So for example, the label '(*)' at index 2 of the list means that
I have an annotation at (2, 2), and the labels '(*', '*', '(*', '*))' at
indices 4 through 7 mean that I have an annotation at (4, 7) and an
annotation at (6, 7).

I'd like to determine all indices at which I have an annotation. So for
the data above, I want the indices:

(2, 2), (4, 7), (6, 7), (8, 9) and (8, 10)

Here's what I'm doing now:

py> def indices(lst):
.... stack = []
.... for i, s in enumerate(lst):
.... if s == 'O':
.... continue
.... stack.extend(*s.count('('))
.... if '*' in s and not stack:
.... raise Exception('No start for %r at %i' % (s, i))
.... for _ in range(s.count(')')):
.... try:
.... yield stack.pop(), i
.... except IndexError:
.... raise Exception('No start for %r at %i' % (s, i))
.... if stack:
.... raise Exception('No ends for starts at %r' % stack)
....
py> list(indices(['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*',
'*)', '*)', '0']))
[(2, 2), (6, 7), (4, 7), (8, 9), (8, 10)]

I think that works right, but I'm not certain. So two questions:

(1) Can anyone see anything wrong with the code above? and
(2) Does anyone see an easier/clearer/simpler[1] way of doing this?

Thanks,

STeVe

[1] Yes, I know easier/clearer/simpler are subjective terms. It's okay,
I'm only looking for opinions here anyway. =)

tiissa · May 29, 2005

Steven said:
(2) Does anyone see an easier/clearer/simpler[1] way of doing this?

I'd personnally extract the parenthesis then zip the lists of indices.
In short:
... lopen=reduce(list.__add__, [*s.count('(') for i,s in
enumerate(mylist)],[])
... lclose=reduce(list.__add__, [*s.count(')') for i,s in
enumerate(mylist)],[])
... return zip(lopen,lclose)
...

>>> indices(lst) [(2, 2), (4, 7), (6, 7), (8, 9), (8, 10)]
>>>

Click to expand...

Click to expand...

Click to expand...

Before returning, you can check if the lists have same size and if the
'(' has lower or equal index than ')' in each of these couples. If not
you can raise the appropriate exception.

Disclaimer: not tested further than example above (but confident).

Steven Bethard · May 29, 2005

tiissa said:
I'd personnally extract the parenthesis then zip the lists of indices.
In short:
... lopen=reduce(list.__add__, [*s.count('(') for i,s in
enumerate(mylist)],[])
... lclose=reduce(list.__add__, [*s.count(')') for i,s in
enumerate(mylist)],[])
... return zip(lopen,lclose)
...

indices(lst) [(2, 2), (4, 7), (6, 7), (8, 9), (8, 10)]

Click to expand...

Click to expand...

Thanks, that's a good idea. In case anyone else is reading this thread,
and had to mentally unwrap the reduce expressions, I believe they could
be written as:

lopen = [x for i, s in enumerate(lst) for x in *s.count('(')]
lclose = [x for i, s in enumerate(lst) for x in *s.count(')')]

or maybe:

lopen = [i for i, s in enumerate(lst) for _ in xrange(s.count('('))]
lclose = [i for i, s in enumerate(lst) for _ in xrange(s.count(')'))]

Sorry, I have an irrational fear of reduce.

STeVe

Raymond Hettinger · May 29, 2005

[Steven Bethard]

I have a list of strings that looks something like:
lst = ['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*', '*)', '*)'] . . .
I want the indices:
(2, 2), (4, 7), (6, 7), (8, 9) and (8, 10)

Click to expand...

opener_stack = []
for i, elem in enumerate(lst):
for c in elem:
if c == '(':
opener_stack.append(i)
elif c == ')':
print opener_stack.pop(), i

To see something like this in production code, look at
Tools/scripts/texcheck.py

Raymond Hettinger

Peter Otten · May 30, 2005

tiissa said:
Steven said:

(2) Does anyone see an easier/clearer/simpler[1] way of doing this?

Click to expand...

I'd personnally extract the parenthesis then zip the lists of indices.
In short:
... lopen=reduce(list.__add__, [*s.count('(') for i,s in
enumerate(mylist)],[])
... lclose=reduce(list.__add__, [*s.count(')') for i,s in
enumerate(mylist)],[])
... return zip(lopen,lclose)
...

indices(lst) [(2, 2), (4, 7), (6, 7), (8, 9), (8, 10)]

Click to expand...

Click to expand...

Before returning, you can check if the lists have same size and if the
'(' has lower or equal index than ')' in each of these couples. If not
you can raise the appropriate exception.

Disclaimer: not tested further than example above (but confident).

Not tested but confident should be an oxymoron for a programmer. Some
examples:

lst: ['(', '(', ')', ')']
hettinger [(1, 2), (0, 3)]
bethard [(1, 2), (0, 3)]
tiissa [(0, 2), (1, 3)] oops (or am I just spoilt by the XML spec?)

lst: ['(', ')(', ')']
hettinger [(0, 1), (1, 2)]
bethard [(1, 1), (0, 2)] oops
tiissa [(0, 1), (1, 2)]

So far Raymond's solution is my favourite...

Peter

tiissa · May 30, 2005

Peter said:
Not tested but confident should be an oxymoron for a programmer.

Not tested but confident is an oxymoron for mathemtaticians.
Programmers know better than that, they leave bugs in their code to have
more work to do.

OTOH, you're right. Matching parentheses cannot be done without a stack,
that should have rung a bell.

Peter Otten · May 30, 2005

tiissa said:
Not tested but confident is an oxymoron for mathemtaticians.

I think no amount of testing will give these strange people confidence.
"Proof" is the magic word here.

Peter

tiissa · May 30, 2005

Peter said:
I think no amount of testing will give these strange people confidence.
"Proof" is the magic word here.

Some would maybe be satisfied if your tests cover the whole set of input.

When that's possible, that may be useless. But that's not a matter to
bother them with.

(And of course, you just don't tell them how you generated their output
with the same program.)

Steven Bethard · May 31, 2005

Raymond said:
[Steven Bethard]

I have a list of strings that looks something like:
lst = ['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*', '*)', '*)']

Click to expand...

Click to expand...

. . .

opener_stack = []
for i, elem in enumerate(lst):
for c in elem:
if c == '(':
opener_stack.append(i)
elif c == ')':
print opener_stack.pop(), i

Thanks Raymond, this is definitely an elegant solution. It was also easy
to add all the error checking I needed into this one. For the curious,
my final solution looks something like:

def indices(lst):
stack = []
for i, elem in enumerate(lst):
for c in elem:
if c == '(':
stack.append(i)
elif c == ')' and not stack:
raise Exception('")" at %i without "("' % i)
elif c == ')':
yield stack.pop(), i
elif c == 'O' and stack:
raise Exception('"O" at %i after "(" at %i' %
(i, stack[-1]))
elif c == '*' and not stack:
raise Exception('"*" at %i without "("' % i)
if stack:
raise Exception('"(" at %r without ")"' % stack)

Thanks again!

STeVe

Need help with finding N.	1	Nov 21, 2022
Database Manager: A C++ Console Application	14	May 12, 2025
Cycle around a sequence	9	Feb 7, 2012
parentheses question	14	Nov 19, 2010
Best strategy for finding a pattern in a sequence of integers	10	Nov 21, 2008
How to multiply two matrices of size in using inline assembly in C++	3	Mar 3, 2024
random.seed question (not reproducing same sequence)	7	Apr 15, 2014
RSA implementation issues in public key pem loader function	0	May 21, 2025

finding indices in a sequence of parentheses

Steven Bethard

tiissa

Steven Bethard

Raymond Hettinger

Peter Otten

tiissa

Peter Otten

tiissa

Steven Bethard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads