finding indices in a sequence of parentheses

S

Steven Bethard

I have a list of strings that looks something like:

lst = ['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*', '*)', '*)']

The parentheses in the labels indicate where an "annotation" starts and
ends. So for example, the label '(*)' at index 2 of the list means that
I have an annotation at (2, 2), and the labels '(*', '*', '(*', '*))' at
indices 4 through 7 mean that I have an annotation at (4, 7) and an
annotation at (6, 7).

I'd like to determine all indices at which I have an annotation. So for
the data above, I want the indices:

(2, 2), (4, 7), (6, 7), (8, 9) and (8, 10)

Here's what I'm doing now:

py> def indices(lst):
.... stack = []
.... for i, s in enumerate(lst):
.... if s == 'O':
.... continue
.... stack.extend(*s.count('('))
.... if '*' in s and not stack:
.... raise Exception('No start for %r at %i' % (s, i))
.... for _ in range(s.count(')')):
.... try:
.... yield stack.pop(), i
.... except IndexError:
.... raise Exception('No start for %r at %i' % (s, i))
.... if stack:
.... raise Exception('No ends for starts at %r' % stack)
....
py> list(indices(['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*',
'*)', '*)', '0']))
[(2, 2), (6, 7), (4, 7), (8, 9), (8, 10)]

I think that works right, but I'm not certain. So two questions:

(1) Can anyone see anything wrong with the code above? and
(2) Does anyone see an easier/clearer/simpler[1] way of doing this?

Thanks,

STeVe

[1] Yes, I know easier/clearer/simpler are subjective terms. It's okay,
I'm only looking for opinions here anyway. =)
 
T

tiissa

Steven said:
(2) Does anyone see an easier/clearer/simpler[1] way of doing this?

I'd personnally extract the parenthesis then zip the lists of indices.
In short:
... lopen=reduce(list.__add__, [*s.count('(') for i,s in
enumerate(mylist)],[])
... lclose=reduce(list.__add__, [*s.count(')') for i,s in
enumerate(mylist)],[])
... return zip(lopen,lclose)
...
>>> indices(lst) [(2, 2), (4, 7), (6, 7), (8, 9), (8, 10)]
>>>

Before returning, you can check if the lists have same size and if the
'(' has lower or equal index than ')' in each of these couples. If not
you can raise the appropriate exception.

Disclaimer: not tested further than example above (but confident).
 
S

Steven Bethard

tiissa said:
I'd personnally extract the parenthesis then zip the lists of indices.
In short:
... lopen=reduce(list.__add__, [*s.count('(') for i,s in
enumerate(mylist)],[])
... lclose=reduce(list.__add__, [*s.count(')') for i,s in
enumerate(mylist)],[])
... return zip(lopen,lclose)
...
indices(lst) [(2, 2), (4, 7), (6, 7), (8, 9), (8, 10)]


Thanks, that's a good idea. In case anyone else is reading this thread,
and had to mentally unwrap the reduce expressions, I believe they could
be written as:

lopen = [x for i, s in enumerate(lst) for x in *s.count('(')]
lclose = [x for i, s in enumerate(lst) for x in *s.count(')')]

or maybe:

lopen = [i for i, s in enumerate(lst) for _ in xrange(s.count('('))]
lclose = [i for i, s in enumerate(lst) for _ in xrange(s.count(')'))]

Sorry, I have an irrational fear of reduce. ;)

STeVe
 
R

Raymond Hettinger

[Steven Bethard]
I have a list of strings that looks something like:
lst = ['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*', '*)', '*)'] . . .
I want the indices:
(2, 2), (4, 7), (6, 7), (8, 9) and (8, 10)



opener_stack = []
for i, elem in enumerate(lst):
for c in elem:
if c == '(':
opener_stack.append(i)
elif c == ')':
print opener_stack.pop(), i


To see something like this in production code, look at
Tools/scripts/texcheck.py



Raymond Hettinger
 
P

Peter Otten

tiissa said:
Steven said:
(2) Does anyone see an easier/clearer/simpler[1] way of doing this?

I'd personnally extract the parenthesis then zip the lists of indices.
In short:
... lopen=reduce(list.__add__, [*s.count('(') for i,s in
enumerate(mylist)],[])
... lclose=reduce(list.__add__, [*s.count(')') for i,s in
enumerate(mylist)],[])
... return zip(lopen,lclose)
...
indices(lst) [(2, 2), (4, 7), (6, 7), (8, 9), (8, 10)]

Before returning, you can check if the lists have same size and if the
'(' has lower or equal index than ')' in each of these couples. If not
you can raise the appropriate exception.

Disclaimer: not tested further than example above (but confident).


Not tested but confident should be an oxymoron for a programmer. Some
examples:

lst: ['(', '(', ')', ')']
hettinger [(1, 2), (0, 3)]
bethard [(1, 2), (0, 3)]
tiissa [(0, 2), (1, 3)] oops (or am I just spoilt by the XML spec?)

lst: ['(', ')(', ')']
hettinger [(0, 1), (1, 2)]
bethard [(1, 1), (0, 2)] oops
tiissa [(0, 1), (1, 2)]

So far Raymond's solution is my favourite...

Peter
 
T

tiissa

Peter said:
Not tested but confident should be an oxymoron for a programmer.

Not tested but confident is an oxymoron for mathemtaticians.
Programmers know better than that, they leave bugs in their code to have
more work to do. ;)

OTOH, you're right. Matching parentheses cannot be done without a stack,
that should have rung a bell.
 
P

Peter Otten

tiissa said:
Not tested but confident is an oxymoron for mathemtaticians.

I think no amount of testing will give these strange people confidence.
"Proof" is the magic word here.

Peter
 
T

tiissa

Peter said:
I think no amount of testing will give these strange people confidence.
"Proof" is the magic word here.

Some would maybe be satisfied if your tests cover the whole set of input.

When that's possible, that may be useless. But that's not a matter to
bother them with. ;)

(And of course, you just don't tell them how you generated their output
with the same program.)
 
S

Steven Bethard

Raymond said:
[Steven Bethard]
I have a list of strings that looks something like:
lst = ['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*', '*)', '*)']

. . .

opener_stack = []
for i, elem in enumerate(lst):
for c in elem:
if c == '(':
opener_stack.append(i)
elif c == ')':
print opener_stack.pop(), i

Thanks Raymond, this is definitely an elegant solution. It was also easy
to add all the error checking I needed into this one. For the curious,
my final solution looks something like:

def indices(lst):
stack = []
for i, elem in enumerate(lst):
for c in elem:
if c == '(':
stack.append(i)
elif c == ')' and not stack:
raise Exception('")" at %i without "("' % i)
elif c == ')':
yield stack.pop(), i
elif c == 'O' and stack:
raise Exception('"O" at %i after "(" at %i' %
(i, stack[-1]))
elif c == '*' and not stack:
raise Exception('"*" at %i without "("' % i)
if stack:
raise Exception('"(" at %r without ")"' % stack)

Thanks again!

STeVe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top