inserting bracketings into a string

S

Steven Bethard

I'm trying to insert some bracketings in a string based on a set of
labels and associated start and end indices. For example, I'd like to
do something like:
text = 'abcde fgh ijklmnop qrstu vw xyz'
spans = [('A', 0, 9), ('B', 6, 9), ('C', 25, 31)]
insert_bracketings(text, spans)
'[A abcde [B fgh]] ijklmnop qrstu [C vw xyz]'

My current implementation looks like:
.... starts = [start for _, start, _ in spans]
.... ends = [end for _, _, end in spans]
.... indices = sorted(set(starts + ends))
.... splits = [(text[start:end], start, end)
.... for start, end in zip([None] + indices, indices + [None])]
.... start_map, end_map = {}, {}
.... for label, start, end in spans:
.... start_map.setdefault(start, []).append('[%s ' % label)
.... end_map.setdefault(end, []).append(']')
.... result = []
.... for string, start, end in splits:
.... if start in start_map:
.... result.extend(start_map[start])
.... result.append(string)
.... if end in end_map:
.... result.extend(end_map[end])
.... return ''.join(result)
....

but it seems like there ought to be an easier way. Can anyone help me?

Thanks in advance,

Steve
 
M

Michael Loritsch

Steven Bethard said:
I'm trying to insert some bracketings in a string based on a set of
labels and associated start and end indices. For example, I'd like to
do something like:
text = 'abcde fgh ijklmnop qrstu vw xyz'
spans = [('A', 0, 9), ('B', 6, 9), ('C', 25, 31)]
insert_bracketings(text, spans)
'[A abcde [B fgh]] ijklmnop qrstu [C vw xyz]'

My current implementation looks like:
... starts = [start for _, start, _ in spans]
... ends = [end for _, _, end in spans]
... indices = sorted(set(starts + ends))
... splits = [(text[start:end], start, end)
... for start, end in zip([None] + indices, indices + [None])]
... start_map, end_map = {}, {}
... for label, start, end in spans:
... start_map.setdefault(start, []).append('[%s ' % label)
... end_map.setdefault(end, []).append(']')
... result = []
... for string, start, end in splits:
... if start in start_map:
... result.extend(start_map[start])
... result.append(string)
... if end in end_map:
... result.extend(end_map[end])
... return ''.join(result)
...

but it seems like there ought to be an easier way. Can anyone help me?

Thanks in advance,

Steve


Below is a little more readable and compact implementation that
produces the same result. I'm not entirely sure if it qualifies as
'better', but I do believe it is ultimately more readable.

def insert_brackets(text, spans):
brackets = []
for span in spans:
brackets.append((span[1], ("".join(('[', span[0], " ")))))
brackets.append((span[2], ']'))
brackets.sort() #Note: (n, '[X ') < (n, ']')
answer = []
lastIndex = 0
for bracket in brackets:
if lastIndex == bracket[0]: #Repeated index
answer.append(bracket[1])
else: #Non repeated index
answer.extend((text[lastIndex:bracket[0]], bracket[1]))
lastIndex = bracket[0]
return "".join(answer)

Regards,

Michael Loritsch
 
P

Peter Otten

Steven said:
I'm trying to insert some bracketings in a string based on a set of
labels and associated start and end indices.  For example, I'd like to
do something like:
text = 'abcde fgh ijklmnop qrstu vw xyz'
spans = [('A', 0, 9), ('B', 6, 9), ('C', 25, 31)]
insert_bracketings(text, spans)
'[A abcde [B fgh]] ijklmnop qrstu [C vw xyz]'

Not tested beyond what you see:

text = 'abcde fgh ijklmnop qrstu vw xyz'
spans = [('A', 0, 9), ('B', 6, 9), ('C', 25, 31)]

def insert_bracketings(text, spans):
inserts = [(s, "[%s " % r) for (r, s, t) in spans]
inserts.extend([(t, "]") for (r, s, t) in spans])
inserts.sort()
inserts.reverse()
text = list(text)
for (r, s) in inserts:
text.insert(r, s)
return "".join(text)

assert ('[A abcde [B fgh]] ijklmnop qrstu [C vw xyz]'
== insert_bracketings(text, spans))

Peter
 
E

Eddie Corns

Steven Bethard said:
I'm trying to insert some bracketings in a string based on a set of
labels and associated start and end indices. For example, I'd like to
do something like:
text = 'abcde fgh ijklmnop qrstu vw xyz'
spans = [('A', 0, 9), ('B', 6, 9), ('C', 25, 31)]
insert_bracketings(text, spans)
'[A abcde [B fgh]] ijklmnop qrstu [C vw xyz]'
My current implementation looks like: ... starts = [start for _, start, _ in spans]
... ends = [end for _, _, end in spans]
... indices = sorted(set(starts + ends))
... splits = [(text[start:end], start, end)
... for start, end in zip([None] + indices, indices + [None])]
... start_map, end_map = {}, {}
... for label, start, end in spans:
... start_map.setdefault(start, []).append('[%s ' % label)
... end_map.setdefault(end, []).append(']')
... result = []
... for string, start, end in splits:
... if start in start_map:
... result.extend(start_map[start])
... result.append(string)
... if end in end_map:
... result.extend(end_map[end])
... return ''.join(result)
...
but it seems like there ought to be an easier way. Can anyone help me?

def insert_bracketings (txt, spans):
text = list(txt)
for tg,start,end in spans:
text[start] = '[%s %s'%(tg,text[start])
text[end-1] = '%s]'%text[end-1]
return ''.join(text)

print insert_bracketings('abcde fgh ijklmnop qrstu vw xyz',[('A', 0, 9), ('B', 6, 9), ('C', 25, 31)])

Might not give what you expect if two spans start at the same place but you
haven't defined that.

Eddie
 
S

Steven Bethard

Eddie said:
def insert_bracketings (txt, spans):
text = list(txt)
for tg,start,end in spans:
text[start] = '[%s %s'%(tg,text[start])
text[end-1] = '%s]'%text[end-1]
return ''.join(text)

print insert_bracketings('abcde fgh ijklmnop qrstu vw xyz',[('A', 0, 9), ('B', 6, 9), ('C', 25, 31)])

Might not give what you expect if two spans start at the same place but you
haven't defined that.

If two spans start in the same place, I need them both to appear, but
the order is not important, so I believe your code here should work
fine. Very nice, thank you!

Steve
 
P

Peter Otten

Still not tested, but should do slightly better than my previous version.
Python 2.4 only:

from operator import itemgetter

def insert_bracketings(text, spans):
inserts = [(s, "[%s " % r) for (r, s, t) in spans]
inserts.extend((t, "]") for (r, s, t) in spans)
inserts.sort(key=itemgetter(0), reverse=True)
text = list(text)
for (r, s) in inserts:
text.insert(r, s)
return "".join(text)

Apart from cosmetics, this should insert start tags before end tags at the
same position. Relies on all end tags being equal.

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top