Help needed with nested parsing of file into objects

richard · Jun 4, 2012

Hi guys i am having a bit of dificulty finding the best approach /
solution to parsing a file into a list of objects / nested objects any
help would be greatly appreciated.

#file format to parse .txt

Code:

An instance of TestArray
a=a
b=b
c=c
List of 2 A elements:
Instance of A element
a=1
b=2
c=3
Instance of A element
d=1
e=2
f=3
List of 1 B elements
Instance of B element
a=1
b=2
c=3
List of 2 C elements
Instance of C element
a=1
b=2
c=3
Instance of C element
a=1
b=2
c=3

An instance of TestArray
a=1
b=2
c=3

expected output
list of 2 TestArray objects been the parents the first one having an
attribute holding a list of the 2 instance of A objects the parents
children, another
attribute of the parent holding a list of just the 1 child instance of
B object with the child object then containing an attribute holding a
list of the 2 Instance of C objects
but the nesting could be more this is just an example. The instance of
TestArray may or may not have any nesting at all
this is illustrated in the second TestArray. Basically just want to
create a list of objects with the objects may or may not contain more
nested objects as attributes but
need a generic way to do it that would work for any amount of depth.

#end list of objects with objects printed as dicts

Code:

parsed = [
{
"a":"a",
"b":"b",
"c":"c",
"A_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
],
"B_elements":[
{
"a":1,
"b":2,
"c":3,
"C_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
]
}
]
},

{
"a":"1",
"b":"2",
"c":"3",
}

]

#this is what i have so far which works with the 2nd instance but cant
figure
out the best way to handle the multi nested objects.

Code:

import re
def test_parser(filename):
parent_stanza = None
stanzas = []

class parentStanza:
pass

fo = open(filename)

for line in fo:
line = line.strip()
if re.search("An instance of TestArray", line):
if parent_stanza:
stanzas.append(parent_stanza)
parent_stanza = parentStanza()
if parent_stanza and "=" in line:
attr, val = line.split("=")
setattr(parent_stanza, attr, val)
else:
stanzas.append(parent_stanza)
return stanzas

stanzas = test_parser("test.txt")

import pprint
for stanza in stanzas:
pprint.pprint(stanza.__dict__)
n=raw_input("paused")

Roy Smith · Jun 4, 2012

richard said:
Hi guys i am having a bit of dificulty finding the best approach /
solution to parsing a file into a list of objects / nested objects any
help would be greatly appreciated.

The first question is "Why do you want to do this?" Is this some
pre-existing file format imposed by an external system that you can't
change? Or are you just looking for a generic way to store nested
structures in a file?

If the later, then I would strongly suggest not rolling your own. Take
a look at json or pickle (or even xml) and adopt one of those.

Alain Ketterlin · Jun 4, 2012

richard said:

Hi guys i am having a bit of dificulty finding the best approach /
solution to parsing a file into a list of objects / nested objects any
help would be greatly appreciated.

#file format to parse .txt

Code:

An instance of TestArray
a=a
b=b
c=c
List of 2 A elements:
Instance of A element[/QUOTE]
[...]

Below is a piece of code that seems to work on your data. It builds a
raw tree, i leave it to you to adapt and built the objects you want. The
assumption is that the number of leading blanks faithfully denotes
depth.

As noted in another message, you're probably better off using an
existing syntax (json, python literals, yaml, xml, ...)

-- Alain.

#!/usr/bin/env python

import sys
import re

RE = re.compile("( *)(.*)")
stack = [("-",[])] # tree nodes are: (head,[children])
for line in sys.stdin:
matches = RE.match(line)
if len(matches.group(2)) > 0:
depth = 1 + len(matches.group(1))
while len(stack) > depth:
stack[-2][1].append(stack[-1])
del stack[-1]
pass
stack.append( (matches.group(2),[]) )
pass
pass
while len(stack) > 1:
stack[-2][1].append(stack[-1])
del stack[-1]
pass

print(stack)

richard · Jun 4, 2012

richard said:
Hi guys i am having a bit of dificulty finding the best approach /
solution to parsing a file into a list of objects / nested objects any
help would be greatly appreciated.

#file format to parse .txt

Code:

An instance of TestArray
 a=a
 b=b
 c=c
 List of 2 A elements:
  Instance of A element[/QUOTE]

[...]

Below is a piece of code that seems to work on your data. It builds a
raw tree, i leave it to you to adapt and built the objects you want. The
assumption is that the number of leading blanks faithfully denotes
depth.

As noted in another message, you're probably better off using an
existing syntax (json, python literals, yaml, xml, ...)

-- Alain.

#!/usr/bin/env python

import sys
import re

RE = re.compile("( *)(.*)")
stack = [("-",[])] # tree nodes are: (head,[children])
for line in sys.stdin:
    matches = RE.match(line)
    if len(matches.group(2)) > 0:
        depth = 1 + len(matches.group(1))
        while len(stack) > depth:
            stack[-2][1].append(stack[-1])
            del stack[-1]
            pass
        stack.append( (matches.group(2),[]) )
        pass
    pass
while len(stack) > 1:
    stack[-2][1].append(stack[-1])
    del stack[-1]
    pass

print(stack)[/QUOTE]

thank you both for your replies. Unfortunately it is a pre-existing
file format imposed by an external system that I can't
change. Thank you for the code snippet.

Eelco · Jun 5, 2012

thank you both for your replies. Unfortunately it is a pre-existing

file format imposed by an external system that I can't
change. Thank you for the code snippet.

Hi Richard,

Despite the fact that it is a preexisting format, it is very close
indeed to valid YAML code.

Writing your own whitespace-aware parser can be a bit of a pain, but
since YAML does this for you, I would argue the cleanest solution
would be to bootstrap that functionality, rather than roll your own
solution, or to resort to hard to maintain regex voodoo.

Here is my solution. As a bonus, it directly constructs a custom
object hierarchy (obviously you would want to expand on this, but the
essentials are there). One caveat: at the moment, the conversion to
YAML relies on the appparent convention that instances never directly
contain other instances, and lists never directly contain lists. This
means all instances are list entries and get a '-' appended, and this
just works. If this is not a general rule, youd have to keep track of
an enclosing scope stack an emit dashes based on that. Anyway, the
idea is there, and I believe it to be one worth looking at.

<code>
import yaml

class A(yaml.YAMLObject):
yaml_tag = u'!A'
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return 'A' + str(self.__dict__)

class B(yaml.YAMLObject):
yaml_tag = u'!B'
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return 'B' + str(self.__dict__)

class C(yaml.YAMLObject):
yaml_tag = u'!C'
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return 'C' + str(self.__dict__)

class TestArray(yaml.YAMLObject):
yaml_tag = u'!TestArray'
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return 'TestArray' + str(self.__dict__)

class myList(yaml.YAMLObject):
yaml_tag = u'!myList'
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return 'myList' + str(self.__dict__)

data = \
"""
An instance of TestArray
a=a
b=b
c=c
List of 2 A elements:
Instance of A element
a=1
b=2
c=3
Instance of A element
d=1
e=2
f=3
List of 1 B elements
Instance of B element
a=1
b=2
c=3
List of 2 C elements
Instance of C element
a=1
b=2
c=3
Instance of C element
a=1
b=2
c=3
An instance of TestArray
a=1
b=2
c=3
""".strip()

#remove trailing whitespace and seemingly erronous colon in line 5
lines = [' '+line.rstrip().rstrip(':') for line in data.split('\n')]

def transform(lines):
"""transform text line by line"""
for line in lines:
#regular mapping lines
if line.find('=') > 0:
yield line.replace('=', ': ')
#instance lines
p = line.find('nstance of')
if p > 0:
s = p + 11
e = line[s:].find(' ')
if e == -1: e = len(line[s:])
tag = line[s:s+e]
whitespace= line.partition(line.lstrip())[0]
yield whitespace[:-2]+' -'+ ' !'+tag
#list lines
p = line.find('List of')
if p > 0:
whitespace= line.partition(line.lstrip())[0]
yield whitespace[:-2]+' '+ 'myList:'

##transformed = (transform( lines))
##for i,t in enumerate(transformed):
## print '{:>3}{}'.format(i,t)

transformed = '\n'.join(transform( lines))
print transformed

res = yaml.load(transformed)
print res
print yaml.dump(res)
</code>

richard · Jun 5, 2012

richard said:

Hi guys i am having a bit of dificulty finding the best approach /
solution to parsing a file into a list of objects / nested objects any
help would be greatly appreciated.
#file format to parse .txt

Code:

An instance of TestArray
 a=a
 b=b
 c=c
 List of 2 A elements:
  Instance of A element[/QUOTE] [QUOTE]
[...][/QUOTE]

Below is a piece of code that seems to work on your data. It builds a
raw tree, i leave it to you to adapt and built the objects you want. The
assumption is that the number of leading blanks faithfully denotes
depth.[/QUOTE]
[QUOTE]
As noted in another message, you're probably better off using an
existing syntax (json, python literals, yaml, xml, ...)[/QUOTE]
[QUOTE]
-- Alain.[/QUOTE]
[QUOTE]
#!/usr/bin/env python[/QUOTE]
[QUOTE]
import sys
import re[/QUOTE]
[QUOTE]
RE = re.compile("( *)(.*)")
stack = [("-",[])] # tree nodes are: (head,[children])
for line in sys.stdin:
    matches = RE.match(line)
    if len(matches.group(2)) > 0:
        depth = 1 + len(matches.group(1))
        while len(stack) > depth:
            stack[-2][1].append(stack[-1])
            del stack[-1]
            pass
        stack.append( (matches.group(2),[]) )
        pass
    pass
while len(stack) > 1:
    stack[-2][1].append(stack[-1])
    del stack[-1]
    pass[/QUOTE]
[QUOTE]
print(stack)[/QUOTE]

thank you both for your replies. Unfortunately it is a pre-existing
file format imposed by an external system that I can't
change. Thank you for the code snippet.[/QUOTE]

Hi guys still struggling to get the code that was posted to me on this
forum to work in my favour and get the output in the format shown
above. This is what I have so far. Any help will be greatly
apprectiated.

output trying to achieve
parsed = [
{
"a":"a",
"b":"b",
"c":"c",
"A_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
],
"B_elements":[
{
"a":1,
"b":2,
"c":3,
"C_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
]
}
]
},

{
"a":"1",
"b":"2",
"c":"3",
}

]

file format unchangeable

An instance of TestArray
a=a
b=b
c=c
List of 2 A elements:
Instance of A element
a=1
b=2
c=3
Instance of A element
d=1
e=2
f=3
List of 1 B elements
Instance of B element
a=1
b=2
c=3
List of 2 C elements
Instance of C element
a=1
b=2
c=3
Instance of C element
a=1
b=2
c=3

An instance of TestArray
a=1
b=2
c=3

def test_parser(filename):
class Stanza:
def __init__(self, values):
for attr, val in values:
setattr(self, attr, val)

def build(couple):
if "=" in couple[0]:
attr, val = couple[0].split("=")
return attr,val
elif "Instance of" in couple[0]:
match = re.search("Instance of (.+) element", couple[0])
return ("attr_%s" % match.group(1),Stanza(couple[1]))
elif "List of" in couple[0]:
match = re.search("List of \d (.+) elements", couple[0])
return ("%s_elements" % match.group(1),couple[1])

fo = open(filename, "r")
RE = re.compile("( *)(.*)")
stack = [("-",[])]
for line in fo:
matches = RE.match(line)
if len(matches.group(2)) > 0:
depth = 1 + len(matches.group(1))
while len(stack) > depth:
stack[-2][1].append(build(stack[-1]))
del stack[-1]
stack.append( (matches.group(2),[]) )
while len(stack) > 1:
stack[-2][1].append(stack[-1])
del stack[-1]
return stack

stanzas = test_parser("test.txt")

richard · Jun 5, 2012

thank you both for your replies. Unfortunately it is a pre-existing
file format imposed by an external system that I can't
change. Thank you for the code snippet.

Click to expand...

Hi Richard,

Despite the fact that it is a preexisting format, it is very close
indeed to valid YAML code.

Writing your own whitespace-aware parser can be a bit of a pain, but
since YAML does this for you, I would argue the cleanest solution
would be to bootstrap that functionality, rather than roll your own
solution, or to resort to hard to maintain regex voodoo.

Here is my solution. As a bonus, it directly constructs a custom
object hierarchy (obviously you would want to expand on this, but the
essentials are there). One caveat: at the moment, the conversion to
YAML relies on the appparent convention that instances never directly
contain other instances, and lists never directly contain lists. This
means all instances are list entries and get a '-' appended, and this
just works. If this is not a general rule, youd have to keep track of
an enclosing scope stack an emit dashes based on that. Anyway, the
idea is there, and I believe it to be one worth looking at.

<code>
import yaml

class A(yaml.YAMLObject):
yaml_tag = u'!A'
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return 'A' + str(self.__dict__)

class B(yaml.YAMLObject):
yaml_tag = u'!B'
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return 'B' + str(self.__dict__)

class C(yaml.YAMLObject):
yaml_tag = u'!C'
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return 'C' + str(self.__dict__)

class TestArray(yaml.YAMLObject):
yaml_tag = u'!TestArray'
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return 'TestArray' + str(self.__dict__)

class myList(yaml.YAMLObject):
yaml_tag = u'!myList'
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return 'myList' + str(self.__dict__)

data = \
"""
An instance of TestArray
a=a
b=b
c=c
List of 2 A elements:
Instance of A element
a=1
b=2
c=3
Instance of A element
d=1
e=2
f=3
List of 1 B elements
Instance of B element
a=1
b=2
c=3
List of 2 C elements
Instance of C element
a=1
b=2
c=3
Instance of C element
a=1
b=2
c=3
An instance of TestArray
a=1
b=2
c=3
""".strip()

#remove trailing whitespace and seemingly erronous colon in line 5
lines = [' '+line.rstrip().rstrip(':') for line in data.split('\n')]

def transform(lines):
"""transform text line by line"""
for line in lines:
#regular mapping lines
if line.find('=') > 0:
yield line.replace('=', ': ')
#instance lines
p = line.find('nstance of')
if p > 0:
s = p + 11
e = line[s:].find(' ')
if e == -1: e = len(line[s:])
tag = line[s:s+e]
whitespace= line.partition(line.lstrip())[0]
yield whitespace[:-2]+' -'+ ' !'+tag
#list lines
p = line.find('List of')
if p > 0:
whitespace= line.partition(line.lstrip())[0]
yield whitespace[:-2]+' '+ 'myList:'

##transformed = (transform( lines))
##for i,t in enumerate(transformed):
## print '{:>3}{}'.format(i,t)

transformed = '\n'.join(transform( lines))
print transformed

res = yaml.load(transformed)
print res
print yaml.dump(res)
</code>

Hi Eelco many thanks for the reply / solution it definitely looks like
a clean way to go about it. However installing 3rd party libs like
yaml on the server I dont think is on the cards at the moment.

Alain Ketterlin · Jun 5, 2012

[I'm leaving the data in the message in case anybody has troubles going
up-thread.]

Hi guys still struggling to get the code that was posted to me on this
forum to work in my favour and get the output in the format shown
above. This is what I have so far. Any help will be greatly
apprectiated.

output trying to achieve
parsed = [
{
"a":"a",
"b":"b",
"c":"c",
"A_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
],
"B_elements":[
{
"a":1,
"b":2,
"c":3,
"C_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
]
}
]
},

{
"a":"1",
"b":"2",
"c":"3",
}

]

file format unchangeable

An instance of TestArray
a=a
b=b
c=c
List of 2 A elements:
Instance of A element
a=1
b=2
c=3
Instance of A element
d=1
e=2
f=3
List of 1 B elements
Instance of B element
a=1
b=2
c=3
List of 2 C elements
Instance of C element
a=1
b=2
c=3
Instance of C element
a=1
b=2
c=3

An instance of TestArray
a=1
b=2
c=3

def test_parser(filename):
class Stanza:
def __init__(self, values):
for attr, val in values:
setattr(self, attr, val)

def build(couple):
if "=" in couple[0]:
attr, val = couple[0].split("=")
return attr,val
elif "Instance of" in couple[0]:
match = re.search("Instance of (.+) element", couple[0])
return ("attr_%s" % match.group(1),Stanza(couple[1]))
elif "List of" in couple[0]:
match = re.search("List of \d (.+) elements", couple[0])
return ("%s_elements" % match.group(1),couple[1])

You forgot one case:

def build(couple):
if "=" in couple[0]:
attr, val = couple[0].split("=")
return attr,val
elif "Instance of" in couple[0]:
#match = re.search("Instance of (.+) element", couple[0])
#return ("attr_%s" % match.group(1),Stanza(couple[1]))
return dict(couple[1])
elif "An instance of" in couple[0]: # you forgot that case
return dict(couple[1])
elif "List of" in couple[0]:
match = re.search("List of \d (.+) elements", couple[0])
return ("%s_elements" % match.group(1),couple[1])
else:
pass # put a test here

fo = open(filename, "r")
RE = re.compile("( *)(.*)")
stack = [("-",[])]
for line in fo:
matches = RE.match(line)
if len(matches.group(2)) > 0:
depth = 1 + len(matches.group(1))
while len(stack) > depth:
stack[-2][1].append(build(stack[-1]))
del stack[-1]
stack.append( (matches.group(2),[]) )
while len(stack) > 1:
stack[-2][1].append(stack[-1])

Change this to:

stack[-2][1].append(build(stack[-1])) # call build() here also

del stack[-1]
return stack

Actually the first and only element of stack is a container: all you
need is the second element of the only tuple in stack, so:

return stack[0][1]

and this is your list. If you need it pretty printed, you'll have to
work the hierarchy.

-- Alain.

richard · Jun 5, 2012

[I'm leaving the data in the message in case anybody has troubles going
up-thread.]

Hi guys still struggling to get the code that was posted to me on this
forum to work in my favour and get the output in the format shown
above. This is what I have so far. Any help will be greatly
apprectiated.

Click to expand...

output trying to achieve
parsed = [
{
"a":"a",
"b":"b",
"c":"c",
"A_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
],
"B_elements":[
{
"a":1,
"b":2,
"c":3,
"C_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
]
}
]
},

Click to expand...

{
"a":"1",
"b":"2",
"c":"3",
}

]

Click to expand...

file format unchangeable

Click to expand...

An instance of TestArray
a=a
b=b
c=c
List of 2 A elements:
Instance of A element
a=1
b=2
c=3
Instance of A element
d=1
e=2
f=3
List of 1 B elements
Instance of B element
a=1
b=2
c=3
List of 2 C elements
Instance of C element
a=1
b=2
c=3
Instance of C element
a=1
b=2
c=3

Click to expand...

An instance of TestArray
a=1
b=2
c=3

Click to expand...

def test_parser(filename):
class Stanza:
def __init__(self, values):
for attr, val in values:
setattr(self, attr, val)

Click to expand...

def build(couple):
if "=" in couple[0]:
attr, val = couple[0].split("=")
return attr,val
elif "Instance of" in couple[0]:
match = re.search("Instance of (.+) element",couple[0])
return ("attr_%s" % match.group(1),Stanza(couple[1]))
elif "List of" in couple[0]:
match = re.search("List of \d (.+) elements",couple[0])
return ("%s_elements" % match.group(1),couple[1])

Click to expand...

You forgot one case:

def build(couple):
if "=" in couple[0]:
attr, val = couple[0].split("=")
return attr,val
elif "Instance of" in couple[0]:
#match = re.search("Instance of (.+) element", couple[0])
#return ("attr_%s" % match.group(1),Stanza(couple[1]))
return dict(couple[1])
elif "An instance of" in couple[0]: # you forgot that case
return dict(couple[1])
elif "List of" in couple[0]:
match = re.search("List of \d (.+) elements", couple[0])
return ("%s_elements" % match.group(1),couple[1])
else:
pass # put a test here

fo = open(filename, "r")
RE = re.compile("( *)(.*)")
stack = [("-",[])]
for line in fo:
matches = RE.match(line)
if len(matches.group(2)) > 0:
depth = 1 + len(matches.group(1))
while len(stack) > depth:
stack[-2][1].append(build(stack[-1]))
del stack[-1]
stack.append( (matches.group(2),[]) )
while len(stack) > 1:
stack[-2][1].append(stack[-1])

Click to expand...

Change this to:

stack[-2][1].append(build(stack[-1])) # call build() here also

del stack[-1]
return stack

Click to expand...

Actually the first and only element of stack is a container: all you
need is the second element of the only tuple in stack, so:

return stack[0][1]

and this is your list. If you need it pretty printed, you'll have to
work the hierarchy.

-- Alain.

Hi Alain thanks for the reply. With regards to the missing case "An
Instance of" im not sure where/ how that is working as the case i put
in originally "Instance of" is in the file and been handled in the
previous case. Also when running the final solution im getting a list
of [None, None] as the final stack? just busy debugging it to see
whats going wrong. But sorry should have been clearer with regards to
the format mentioned above. The objects are been printed out as dicts
so where you put in

elif "An Instance of" in couple[0]:
return dict(couple[1])

should still be ?
elif "Instance of" in couple[0]:
match = re.search("Instance of (.+) element", couple[0])
return ("attr_%s" % match.group(1),Stanza(couple[1])) #
instantiating new stanza object and setting attributes.

richard · Jun 5, 2012

[I'm leaving the data in the message in case anybody has troubles going
up-thread.]

Hi guys still struggling to get the code that was posted to me on this
forum to work in my favour and get the output in the format shown
above. This is what I have so far. Any help will be greatly
apprectiated.

Click to expand...

output trying to achieve
parsed = [
{
"a":"a",
"b":"b",
"c":"c",
"A_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
],
"B_elements":[
{
"a":1,
"b":2,
"c":3,
"C_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
]
}
]
},

Click to expand...

{
"a":"1",
"b":"2",
"c":"3",
}

]

Click to expand...

file format unchangeable

Click to expand...

An instance of TestArray
a=a
b=b
c=c
List of 2 A elements:
Instance of A element
a=1
b=2
c=3
Instance of A element
d=1
e=2
f=3
List of 1 B elements
Instance of B element
a=1
b=2
c=3
List of 2 C elements
Instance of C element
a=1
b=2
c=3
Instance of C element
a=1
b=2
c=3

Click to expand...

An instance of TestArray
a=1
b=2
c=3

Click to expand...

def test_parser(filename):
class Stanza:
def __init__(self, values):
for attr, val in values:
setattr(self, attr, val)

Click to expand...

def build(couple):
if "=" in couple[0]:
attr, val = couple[0].split("=")
return attr,val
elif "Instance of" in couple[0]:
match = re.search("Instance of (.+) element",couple[0])
return ("attr_%s" % match.group(1),Stanza(couple[1]))
elif "List of" in couple[0]:
match = re.search("List of \d (.+) elements",couple[0])
return ("%s_elements" % match.group(1),couple[1])

Click to expand...

You forgot one case:

def build(couple):
if "=" in couple[0]:
attr, val = couple[0].split("=")
return attr,val
elif "Instance of" in couple[0]:
#match = re.search("Instance of (.+) element", couple[0])
#return ("attr_%s" % match.group(1),Stanza(couple[1]))
return dict(couple[1])
elif "An instance of" in couple[0]: # you forgot that case
return dict(couple[1])
elif "List of" in couple[0]:
match = re.search("List of \d (.+) elements", couple[0])
return ("%s_elements" % match.group(1),couple[1])
else:
pass # put a test here

fo = open(filename, "r")
RE = re.compile("( *)(.*)")
stack = [("-",[])]
for line in fo:
matches = RE.match(line)
if len(matches.group(2)) > 0:
depth = 1 + len(matches.group(1))
while len(stack) > depth:
stack[-2][1].append(build(stack[-1]))
del stack[-1]
stack.append( (matches.group(2),[]) )
while len(stack) > 1:
stack[-2][1].append(stack[-1])

Click to expand...

Change this to:

stack[-2][1].append(build(stack[-1])) # call build() here also

del stack[-1]
return stack

Click to expand...

Actually the first and only element of stack is a container: all you
need is the second element of the only tuple in stack, so:

return stack[0][1]

and this is your list. If you need it pretty printed, you'll have to
work the hierarchy.

-- Alain.

Hi Alain, thanks for the reply. Amended the code and just busy
debugging but the stack i get back justs return [None, None]. Also
should have been clearer when i mentioned the format above the dicts
are actually objects instantaited from classes and just printed out as
obj.__dict__ just for representation putposes. so where you have
replaced the following i presume this was because of my format
confusion. Thanks

elif "Instance of" in couple[0]:
match = re.search("Instance of (.+) element", couple[0])
return ("attr_%s" % match.group(1),Stanza(couple[1])) #instantiating new object and setting attributes
with
elif "Instance of" in couple[0]:
#match = re.search("Instance of (.+) element", couple[0])
#return ("attr_%s" % match.group(1),Stanza(couple[1]))
return dict(couple[1])

richard · Jun 5, 2012

richard <[email protected]> writes:

Click to expand...

[I'm leaving the data in the message in case anybody has troubles going
up-thread.]

Hi guys still struggling to get the code that was posted to me on this
forum to work in my favour and get the output in the format shown
above. This is what I have so far. Any help will be greatly
apprectiated.
output trying to achieve
parsed = [
{
"a":"a",
"b":"b",
"c":"c",
"A_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
],
"B_elements":[
{
"a":1,
"b":2,
"c":3,
"C_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
]
}
]
},
{
"a":"1",
"b":"2",
"c":"3",
}
]
file format unchangeable
An instance of TestArray
a=a
b=b
c=c
List of 2 A elements:
Instance of A element
a=1
b=2
c=3
Instance of A element
d=1
e=2
f=3
List of 1 B elements
Instance of B element
a=1
b=2
c=3
List of 2 C elements
Instance of C element
a=1
b=2
c=3
Instance of C element
a=1
b=2
c=3
An instance of TestArray
a=1
b=2
c=3
def test_parser(filename):
class Stanza:
def __init__(self, values):
for attr, val in values:
setattr(self, attr, val)
def build(couple):
if "=" in couple[0]:
attr, val = couple[0].split("=")
return attr,val
elif "Instance of" in couple[0]:
match = re.search("Instance of (.+) element", couple[0])
return ("attr_%s" % match.group(1),Stanza(couple[1]))
elif "List of" in couple[0]:
match = re.search("List of \d (.+) elements", couple[0])
return ("%s_elements" % match.group(1),couple[1])

Click to expand...

Click to expand...

You forgot one case:

Click to expand...

def build(couple):
if "=" in couple[0]:
attr, val = couple[0].split("=")
return attr,val
elif "Instance of" in couple[0]:
#match = re.search("Instance of (.+) element", couple[0])
#return ("attr_%s" % match.group(1),Stanza(couple[1]))
return dict(couple[1])
elif "An instance of" in couple[0]: # you forgot that case
return dict(couple[1])
elif "List of" in couple[0]:
match = re.search("List of \d (.+) elements",couple[0])
return ("%s_elements" % match.group(1),couple[1])
else:
pass # put a test here

fo = open(filename, "r")
RE = re.compile("( *)(.*)")
stack = [("-",[])]
for line in fo:
matches = RE.match(line)
if len(matches.group(2)) > 0:
depth = 1 + len(matches.group(1))
while len(stack) > depth:
stack[-2][1].append(build(stack[-1]))
del stack[-1]
stack.append( (matches.group(2),[]) )
while len(stack) > 1:
stack[-2][1].append(stack[-1])

Click to expand...

Click to expand...

Change this to:

Click to expand...

stack[-2][1].append(build(stack[-1])) # call build() here also

del stack[-1]
return stack

Click to expand...

Click to expand...

Actually the first and only element of stack is a container: all you
need is the second element of the only tuple in stack, so:

Click to expand...

return stack[0][1]

Click to expand...

and this is your list. If you need it pretty printed, you'll have to
work the hierarchy.

Click to expand...

-- Alain.

Click to expand...

Hi Alain, thanks for the reply. Amended the code and just busy
debugging but the stack i get back justs return [None, None]. Also
should have been clearer when i mentioned the format above the dicts
are actually objects instantaited from classes and just printed out as
obj.__dict__ just for representation putposes. so where you have
replaced the following i presume this was because of my format
confusion. Thanks

elif "Instance of" in couple[0]:
match = re.search("Instance of (.+) element",couple[0])
return ("attr_%s" % match.group(1),Stanza(couple[1])) #instantiating new object and setting attributes
with
elif "Instance of" in couple[0]:
#match = re.search("Instance of (.+) element", couple[0])
#return ("attr_%s" % match.group(1),Stanza(couple[1]))
return dict(couple[1])

Click to expand...

Sorry silly mistake made with "An instance" and "Instance of" code
emende below for fix

if "=" in couple[0]:
attr, val = couple[0].split("=")
return attr,val
elif re.search("Instance of .+",couple[0]):
#match = re.search("Instance of (.+) element", couple[0])
#return ("attr_%s" % match.group(1),Stanza(couple[1]))
return dict(couple[1])
elif re.search("An instance of .+", couple[0]):
return dict(couple[1])
elif "List of" in couple[0]:
match = re.search("List of \d (.+) elements", couple[0])
return ("%s_elements" % match.group(1),couple[1])
else:
pass

Alain Ketterlin · Jun 5, 2012

richard said:
[...]

Hi Alain thanks for the reply. With regards to the missing case "An
Instance of" im not sure where/ how that is working as the case i put
in originally "Instance of" is in the file and been handled in the
previous case.

Both cases are different in your example above. Top level elements are
labeled "An instance ...", whereas "inner" instances are labeled
"Instance of ...".

Also when running the final solution im getting a list of [None, None]
as the final stack?

There's only one way this can happen: by falling through to the last
case of build(). Check the regexps etc. again.

just busy debugging it to see whats going wrong. But sorry should have
been clearer with regards to the format mentioned above. The objects
are been printed out as dicts so where you put in

elif "An Instance of" in couple[0]:
return dict(couple[1])

should still be ?
elif "Instance of" in couple[0]:
match = re.search("Instance of (.+) element", couple[0])
return ("attr_%s" % match.group(1),Stanza(couple[1])) #
instantiating new stanza object and setting attributes.

Your last "Instance of..." case is correct, but "An instance..." is
different, because there's no containing object, so it's probably more
like: return Stanza(couple[1]).

-- Alain.

richard · Jun 5, 2012

richard said:
richard said:

An instance of TestArray
a=a
b=b
c=c
List of 2 A elements:
Instance of A element
a=1
b=2
c=3
Instance of A element
d=1
e=2
f=3
List of 1 B elements
Instance of B element
a=1
b=2
c=3
List of 2 C elements
Instance of C element
a=1
b=2
c=3
Instance of C element
a=1
b=2
c=3

Click to expand...

[...]

Hi Alain thanks for the reply. With regards to the missing case "An
Instance of" im not sure where/ how that is working as the case i put
in originally "Instance of" is in the file and been handled in the
previous case.

Click to expand...

Both cases are different in your example above. Top level elements are
labeled "An instance ...", whereas "inner" instances are labeled
"Instance of ...".

Also when running the final solution im getting a list of [None, None]
as the final stack?

Click to expand...

There's only one way this can happen: by falling through to the last
case of build(). Check the regexps etc. again.

just busy debugging it to see whats going wrong. But sorry should have
been clearer with regards to the format mentioned above. The objects
are been printed out as dicts so where you put in

Click to expand...

elif "An Instance of" in couple[0]:
return dict(couple[1])

Click to expand...

should still be ?
elif "Instance of" in couple[0]:
match = re.search("Instance of (.+) element",couple[0])
return ("attr_%s" % match.group(1),Stanza(couple[1])) #
instantiating new stanza object and setting attributes.

Click to expand...

Your last "Instance of..." case is correct, but "An instance..." is
different, because there's no containing object, so it's probably more
like: return Stanza(couple[1]).

-- Alain.

A big thank you to everyone who has helped me tackle / shed light on
this problem it is working great. Much appreciated.

Homework in C - Help Needed	1	Oct 16, 2024
Urgent Help Needed: Supporting Education and Family Through Hard Work	0	Nov 22, 2024
Calculate rang and derang of ordering of subsets	0	Feb 6, 2024
Calculate rang and derang of ordering of subsets	1	Feb 6, 2024
Help needed with code	5	Mar 7, 2021
parsing nested unbounded XML fields with ElementTree	6	Nov 25, 2013
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Help with passing test	3	Jun 8, 2023

Help needed with nested parsing of file into objects

richard

Roy Smith

Alain Ketterlin

richard

Eelco

richard

richard

Alain Ketterlin

richard

richard

richard

Alain Ketterlin

richard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads