G
Gregory Nans
hello,
i need some help to 'tree-ify' a string...
for example i have strings such as :
s = """A(here 's , B(A ) silly test) C(to show D(what kind) of stuff i
need))"""
and i need to parse them to have a tree representation such as :
["A", [
"here 's" ,
[ "B", [
'A ) "silly test'
]],
["C", [
"to show",
["D", [
"what kind"
]],
"of stuff i need",
]]
]]
notes : "ALLCAPS(something else)" are some kind of tag, "something
else" may contain ALLCAPS() and thus be nested or contain others
characters. the parenthesis between 'A' and 'silly test' isn't a
mistake but a case to handle string is quite well formed = orphan
parenthesis will always be surrounded by whitespace - like in 'A )
silly test' or 'A ( silly test'
if have started to write a function which give me pos and nested
string, but still need help to finish to tree-ify it from there... or
maybe there is a more obvious way ? (i think i didn't need to create
real finite state automata for this stuff)
thanks
-1 D what kind 42 @ 51
-1 C to show D(what kind) of stuff i need 32 @ 68
-1 B A ) silly test 14 @ 28
-1 A here 's , B(A ) silly test) C(to show D(what kind) of stuff i
need) 2 @ 69
-------------------%<------------------
def parse(s):
n = len(s)
p = re.compile(r'([A-Z_]+\()')
pos = []
iterator = p.finditer(s)
for match in iterator:
pos.append(match.span())
j = 0
pos.reverse()
for (start, end) in pos:
opened = 0
i = end
while i < n - j:
if s == '(':
opened += 1
if s == ')' and s[i - 1] != ' ':
opened -= 1
if opened < 0:
print opened, s[start:end-1], s[end:i], "%s @ %s" % (end,
i)
break
i += 1
i need some help to 'tree-ify' a string...
for example i have strings such as :
s = """A(here 's , B(A ) silly test) C(to show D(what kind) of stuff i
need))"""
and i need to parse them to have a tree representation such as :
["A", [
"here 's" ,
[ "B", [
'A ) "silly test'
]],
["C", [
"to show",
["D", [
"what kind"
]],
"of stuff i need",
]]
]]
notes : "ALLCAPS(something else)" are some kind of tag, "something
else" may contain ALLCAPS() and thus be nested or contain others
characters. the parenthesis between 'A' and 'silly test' isn't a
mistake but a case to handle string is quite well formed = orphan
parenthesis will always be surrounded by whitespace - like in 'A )
silly test' or 'A ( silly test'
if have started to write a function which give me pos and nested
string, but still need help to finish to tree-ify it from there... or
maybe there is a more obvious way ? (i think i didn't need to create
real finite state automata for this stuff)
thanks
-1 D what kind 42 @ 51
-1 C to show D(what kind) of stuff i need 32 @ 68
-1 B A ) silly test 14 @ 28
-1 A here 's , B(A ) silly test) C(to show D(what kind) of stuff i
need) 2 @ 69
-------------------%<------------------
def parse(s):
n = len(s)
p = re.compile(r'([A-Z_]+\()')
pos = []
iterator = p.finditer(s)
for match in iterator:
pos.append(match.span())
j = 0
pos.reverse()
for (start, end) in pos:
opened = 0
i = end
while i < n - j:
if s == '(':
opened += 1
if s == ')' and s[i - 1] != ' ':
opened -= 1
if opened < 0:
print opened, s[start:end-1], s[end:i], "%s @ %s" % (end,
i)
break
i += 1