Help needed with nested parsing of file into objects

R

richard

Hi guys i am having a bit of dificulty finding the best approach /
solution to parsing a file into a list of objects / nested objects any
help would be greatly appreciated.

#file format to parse .txt
Code:
An instance of TestArray
a=a
b=b
c=c
List of 2 A elements:
Instance of A element
a=1
b=2
c=3
Instance of A element
d=1
e=2
f=3
List of 1 B elements
Instance of B element
a=1
b=2
c=3
List of 2 C elements
Instance of C element
a=1
b=2
c=3
Instance of C element
a=1
b=2
c=3

An instance of TestArray
a=1
b=2
c=3

expected output
list of 2 TestArray objects been the parents the first one having an
attribute holding a list of the 2 instance of A objects the parents
children, another
attribute of the parent holding a list of just the 1 child instance of
B object with the child object then containing an attribute holding a
list of the 2 Instance of C objects
but the nesting could be more this is just an example. The instance of
TestArray may or may not have any nesting at all
this is illustrated in the second TestArray. Basically just want to
create a list of objects with the objects may or may not contain more
nested objects as attributes but
need a generic way to do it that would work for any amount of depth.

#end list of objects with objects printed as dicts

Code:
parsed = [
{
"a":"a",
"b":"b",
"c":"c",
"A_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
],
"B_elements":[
{
"a":1,
"b":2,
"c":3,
"C_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
]
}
]
},

{
"a":"1",
"b":"2",
"c":"3",
}

]

#this is what i have so far which works with the 2nd instance but cant
figure
out the best way to handle the multi nested objects.

Code:
import re
def test_parser(filename):
parent_stanza = None
stanzas = []

class parentStanza:
pass

fo = open(filename)

for line in fo:
line = line.strip()
if re.search("An instance of TestArray", line):
if parent_stanza:
stanzas.append(parent_stanza)
parent_stanza = parentStanza()
if parent_stanza and "=" in line:
attr, val = line.split("=")
setattr(parent_stanza, attr, val)
else:
stanzas.append(parent_stanza)
return stanzas

stanzas = test_parser("test.txt")

import pprint
for stanza in stanzas:
pprint.pprint(stanza.__dict__)
n=raw_input("paused")
 
R

Roy Smith

richard said:
Hi guys i am having a bit of dificulty finding the best approach /
solution to parsing a file into a list of objects / nested objects any
help would be greatly appreciated.

The first question is "Why do you want to do this?" Is this some
pre-existing file format imposed by an external system that you can't
change? Or are you just looking for a generic way to store nested
structures in a file?

If the later, then I would strongly suggest not rolling your own. Take
a look at json or pickle (or even xml) and adopt one of those.
 
A

Alain Ketterlin

richard said:
Hi guys i am having a bit of dificulty finding the best approach /
solution to parsing a file into a list of objects / nested objects any
help would be greatly appreciated.

#file format to parse .txt
Code:
An instance of TestArray
a=a
b=b
c=c
List of 2 A elements:
Instance of A element[/QUOTE]
[...]

Below is a piece of code that seems to work on your data. It builds a
raw tree, i leave it to you to adapt and built the objects you want. The
assumption is that the number of leading blanks faithfully denotes
depth.

As noted in another message, you're probably better off using an
existing syntax (json, python literals, yaml, xml, ...)

-- Alain.

#!/usr/bin/env python

import sys
import re

RE = re.compile("( *)(.*)")
stack = [("-",[])] # tree nodes are: (head,[children])
for line in sys.stdin:
matches = RE.match(line)
if len(matches.group(2)) > 0:
depth = 1 + len(matches.group(1))
while len(stack) > depth:
stack[-2][1].append(stack[-1])
del stack[-1]
pass
stack.append( (matches.group(2),[]) )
pass
pass
while len(stack) > 1:
stack[-2][1].append(stack[-1])
del stack[-1]
pass

print(stack)
 
R

richard

richard said:
Hi guys i am having a bit of dificulty finding the best approach /
solution to parsing a file into a list of objects / nested objects any
help would be greatly appreciated.
#file format to parse .txt
Code:
An instance of TestArray
 a=a
 b=b
 c=c
 List of 2 A elements:
  Instance of A element[/QUOTE]

[...]

Below is a piece of code that seems to work on your data. It builds a
raw tree, i leave it to you to adapt and built the objects you want. The
assumption is that the number of leading blanks faithfully denotes
depth.

As noted in another message, you're probably better off using an
existing syntax (json, python literals, yaml, xml, ...)

-- Alain.

#!/usr/bin/env python

import sys
import re

RE = re.compile("( *)(.*)")
stack = [("-",[])] # tree nodes are: (head,[children])
for line in sys.stdin:
    matches = RE.match(line)
    if len(matches.group(2)) > 0:
        depth = 1 + len(matches.group(1))
        while len(stack) > depth:
            stack[-2][1].append(stack[-1])
            del stack[-1]
            pass
        stack.append( (matches.group(2),[]) )
        pass
    pass
while len(stack) > 1:
    stack[-2][1].append(stack[-1])
    del stack[-1]
    pass

print(stack)[/QUOTE]

thank you both for your replies. Unfortunately it is a pre-existing
file format imposed by an external system that I can't
change. Thank you for the code snippet.
 
E

Eelco

thank you both for your replies. Unfortunately it is a pre-existing
file format imposed by an external system that I can't
change. Thank you for the code snippet.

Hi Richard,

Despite the fact that it is a preexisting format, it is very close
indeed to valid YAML code.

Writing your own whitespace-aware parser can be a bit of a pain, but
since YAML does this for you, I would argue the cleanest solution
would be to bootstrap that functionality, rather than roll your own
solution, or to resort to hard to maintain regex voodoo.

Here is my solution. As a bonus, it directly constructs a custom
object hierarchy (obviously you would want to expand on this, but the
essentials are there). One caveat: at the moment, the conversion to
YAML relies on the appparent convention that instances never directly
contain other instances, and lists never directly contain lists. This
means all instances are list entries and get a '-' appended, and this
just works. If this is not a general rule, youd have to keep track of
an enclosing scope stack an emit dashes based on that. Anyway, the
idea is there, and I believe it to be one worth looking at.

<code>
import yaml

class A(yaml.YAMLObject):
yaml_tag = u'!A'
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return 'A' + str(self.__dict__)

class B(yaml.YAMLObject):
yaml_tag = u'!B'
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return 'B' + str(self.__dict__)

class C(yaml.YAMLObject):
yaml_tag = u'!C'
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return 'C' + str(self.__dict__)

class TestArray(yaml.YAMLObject):
yaml_tag = u'!TestArray'
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return 'TestArray' + str(self.__dict__)

class myList(yaml.YAMLObject):
yaml_tag = u'!myList'
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return 'myList' + str(self.__dict__)


data = \
"""
An instance of TestArray
a=a
b=b
c=c
List of 2 A elements:
Instance of A element
a=1
b=2
c=3
Instance of A element
d=1
e=2
f=3
List of 1 B elements
Instance of B element
a=1
b=2
c=3
List of 2 C elements
Instance of C element
a=1
b=2
c=3
Instance of C element
a=1
b=2
c=3
An instance of TestArray
a=1
b=2
c=3
""".strip()

#remove trailing whitespace and seemingly erronous colon in line 5
lines = [' '+line.rstrip().rstrip(':') for line in data.split('\n')]


def transform(lines):
"""transform text line by line"""
for line in lines:
#regular mapping lines
if line.find('=') > 0:
yield line.replace('=', ': ')
#instance lines
p = line.find('nstance of')
if p > 0:
s = p + 11
e = line[s:].find(' ')
if e == -1: e = len(line[s:])
tag = line[s:s+e]
whitespace= line.partition(line.lstrip())[0]
yield whitespace[:-2]+' -'+ ' !'+tag
#list lines
p = line.find('List of')
if p > 0:
whitespace= line.partition(line.lstrip())[0]
yield whitespace[:-2]+' '+ 'myList:'

##transformed = (transform( lines))
##for i,t in enumerate(transformed):
## print '{:>3}{}'.format(i,t)

transformed = '\n'.join(transform( lines))
print transformed

res = yaml.load(transformed)
print res
print yaml.dump(res)
</code>
 
R

richard

richard said:
Hi guys i am having a bit of dificulty finding the best approach /
solution to parsing a file into a list of objects / nested objects any
help would be greatly appreciated.
#file format to parse .txt
Code:
An instance of TestArray
 a=a
 b=b
 c=c
 List of 2 A elements:
  Instance of A element[/QUOTE] [QUOTE]
[...][/QUOTE]

Below is a piece of code that seems to work on your data. It builds a
raw tree, i leave it to you to adapt and built the objects you want. The
assumption is that the number of leading blanks faithfully denotes
depth.[/QUOTE]
[QUOTE]
As noted in another message, you're probably better off using an
existing syntax (json, python literals, yaml, xml, ...)[/QUOTE]
[QUOTE]
-- Alain.[/QUOTE]
[QUOTE]
#!/usr/bin/env python[/QUOTE]
[QUOTE]
import sys
import re[/QUOTE]
[QUOTE]
RE = re.compile("( *)(.*)")
stack = [("-",[])] # tree nodes are: (head,[children])
for line in sys.stdin:
    matches = RE.match(line)
    if len(matches.group(2)) > 0:
        depth = 1 + len(matches.group(1))
        while len(stack) > depth:
            stack[-2][1].append(stack[-1])
            del stack[-1]
            pass
        stack.append( (matches.group(2),[]) )
        pass
    pass
while len(stack) > 1:
    stack[-2][1].append(stack[-1])
    del stack[-1]
    pass[/QUOTE]
[QUOTE]
print(stack)[/QUOTE]

thank you both for your replies. Unfortunately it is a pre-existing
file format imposed by an external system that I can't
change. Thank you for the code snippet.[/QUOTE]

Hi guys still struggling to get the code that was posted to me on this
forum to work in my favour and get the output in the format shown
above. This is what I have so far. Any help will be greatly
apprectiated.

output trying to achieve
parsed = [
{
"a":"a",
"b":"b",
"c":"c",
"A_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
],
"B_elements":[
{
"a":1,
"b":2,
"c":3,
"C_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
]
}
]
},

{
"a":"1",
"b":"2",
"c":"3",
}

]

file format unchangeable

An instance of TestArray
a=a
b=b
c=c
List of 2 A elements:
Instance of A element
a=1
b=2
c=3
Instance of A element
d=1
e=2
f=3
List of 1 B elements
Instance of B element
a=1
b=2
c=3
List of 2 C elements
Instance of C element
a=1
b=2
c=3
Instance of C element
a=1
b=2
c=3

An instance of TestArray
a=1
b=2
c=3

def test_parser(filename):
class Stanza:
def __init__(self, values):
for attr, val in values:
setattr(self, attr, val)

def build(couple):
if "=" in couple[0]:
attr, val = couple[0].split("=")
return attr,val
elif "Instance of" in couple[0]:
match = re.search("Instance of (.+) element", couple[0])
return ("attr_%s" % match.group(1),Stanza(couple[1]))
elif "List of" in couple[0]:
match = re.search("List of \d (.+) elements", couple[0])
return ("%s_elements" % match.group(1),couple[1])

fo = open(filename, "r")
RE = re.compile("( *)(.*)")
stack = [("-",[])]
for line in fo:
matches = RE.match(line)
if len(matches.group(2)) > 0:
depth = 1 + len(matches.group(1))
while len(stack) > depth:
stack[-2][1].append(build(stack[-1]))
del stack[-1]
stack.append( (matches.group(2),[]) )
while len(stack) > 1:
stack[-2][1].append(stack[-1])
del stack[-1]
return stack

stanzas = test_parser("test.txt")
 
R

richard

thank you both for your replies. Unfortunately it is a pre-existing
file format imposed by an external system that I can't
change. Thank you for the code snippet.

Hi Richard,

Despite the fact that it is a preexisting format, it is very close
indeed to valid YAML code.

Writing your own whitespace-aware parser can be a bit of a pain, but
since YAML does this for you, I would argue the cleanest solution
would be to bootstrap that functionality, rather than roll your own
solution, or to resort to hard to maintain regex voodoo.

Here is my solution. As a bonus, it directly constructs a custom
object hierarchy (obviously you would want to expand on this, but the
essentials are there). One caveat: at the moment, the conversion to
YAML relies on the appparent convention that instances never directly
contain other instances, and lists never directly contain lists. This
means all instances are list entries and get a '-' appended, and this
just works. If this is not a general rule, youd have to keep track of
an enclosing scope stack an emit dashes based on that. Anyway, the
idea is there, and I believe it to be one worth looking at.

<code>
import yaml

class A(yaml.YAMLObject):
    yaml_tag = u'!A'
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
    def __repr__(self):
        return 'A' + str(self.__dict__)

class B(yaml.YAMLObject):
    yaml_tag = u'!B'
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
    def __repr__(self):
        return 'B' + str(self.__dict__)

class C(yaml.YAMLObject):
    yaml_tag = u'!C'
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
    def __repr__(self):
        return 'C' + str(self.__dict__)

class TestArray(yaml.YAMLObject):
    yaml_tag = u'!TestArray'
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
    def __repr__(self):
        return 'TestArray' + str(self.__dict__)

class myList(yaml.YAMLObject):
    yaml_tag = u'!myList'
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
    def __repr__(self):
        return 'myList' + str(self.__dict__)

data = \
"""
An instance of TestArray
 a=a
 b=b
 c=c
 List of 2 A elements:
  Instance of A element
   a=1
   b=2
   c=3
  Instance of A element
   d=1
   e=2
   f=3
 List of 1 B elements
  Instance of B element
   a=1
   b=2
   c=3
   List of 2 C elements
    Instance of C element
     a=1
     b=2
     c=3
    Instance of C element
     a=1
     b=2
     c=3
An instance of TestArray
 a=1
 b=2
 c=3
""".strip()

#remove trailing whitespace and seemingly erronous colon in line 5
lines = ['  '+line.rstrip().rstrip(':') for line in data.split('\n')]

def transform(lines):
    """transform text line by line"""
    for line in lines:
        #regular mapping lines
        if line.find('=') > 0:
            yield line.replace('=', ': ')
        #instance lines
        p = line.find('nstance of')
        if p > 0:
            s = p + 11
            e = line[s:].find(' ')
            if e == -1: e = len(line[s:])
            tag = line[s:s+e]
            whitespace= line.partition(line.lstrip())[0]
            yield whitespace[:-2]+' -'+ ' !'+tag
        #list lines
        p = line.find('List of')
        if p > 0:
            whitespace= line.partition(line.lstrip())[0]
            yield whitespace[:-2]+'  '+ 'myList:'

##transformed = (transform( lines))
##for i,t in enumerate(transformed):
##    print '{:>3}{}'.format(i,t)

transformed = '\n'.join(transform( lines))
print transformed

res = yaml.load(transformed)
print res
print yaml.dump(res)
</code>

Hi Eelco many thanks for the reply / solution it definitely looks like
a clean way to go about it. However installing 3rd party libs like
yaml on the server I dont think is on the cards at the moment.
 
A

Alain Ketterlin

[I'm leaving the data in the message in case anybody has troubles going
up-thread.]
Hi guys still struggling to get the code that was posted to me on this
forum to work in my favour and get the output in the format shown
above. This is what I have so far. Any help will be greatly
apprectiated.

output trying to achieve
parsed = [
{
"a":"a",
"b":"b",
"c":"c",
"A_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
],
"B_elements":[
{
"a":1,
"b":2,
"c":3,
"C_elements":[
{
"a":1,
"b":2,
"c":3
},
{
"a":1,
"b":2,
"c":3
}
]
}
]
},

{
"a":"1",
"b":"2",
"c":"3",
}

]

file format unchangeable

An instance of TestArray
a=a
b=b
c=c
List of 2 A elements:
Instance of A element
a=1
b=2
c=3
Instance of A element
d=1
e=2
f=3
List of 1 B elements
Instance of B element
a=1
b=2
c=3
List of 2 C elements
Instance of C element
a=1
b=2
c=3
Instance of C element
a=1
b=2
c=3

An instance of TestArray
a=1
b=2
c=3

def test_parser(filename):
class Stanza:
def __init__(self, values):
for attr, val in values:
setattr(self, attr, val)

def build(couple):
if "=" in couple[0]:
attr, val = couple[0].split("=")
return attr,val
elif "Instance of" in couple[0]:
match = re.search("Instance of (.+) element", couple[0])
return ("attr_%s" % match.group(1),Stanza(couple[1]))
elif "List of" in couple[0]:
match = re.search("List of \d (.+) elements", couple[0])
return ("%s_elements" % match.group(1),couple[1])

You forgot one case:

def build(couple):
if "=" in couple[0]:
attr, val = couple[0].split("=")
return attr,val
elif "Instance of" in couple[0]:
#match = re.search("Instance of (.+) element", couple[0])
#return ("attr_%s" % match.group(1),Stanza(couple[1]))
return dict(couple[1])
elif "An instance of" in couple[0]: # you forgot that case
return dict(couple[1])
elif "List of" in couple[0]:
match = re.search("List of \d (.+) elements", couple[0])
return ("%s_elements" % match.group(1),couple[1])
else:
pass # put a test here
fo = open(filename, "r")
RE = re.compile("( *)(.*)")
stack = [("-",[])]
for line in fo:
matches = RE.match(line)
if len(matches.group(2)) > 0:
depth = 1 + len(matches.group(1))
while len(stack) > depth:
stack[-2][1].append(build(stack[-1]))
del stack[-1]
stack.append( (matches.group(2),[]) )
while len(stack) > 1:
stack[-2][1].append(stack[-1])

Change this to:

stack[-2][1].append(build(stack[-1])) # call build() here also
del stack[-1]
return stack

Actually the first and only element of stack is a container: all you
need is the second element of the only tuple in stack, so:

return stack[0][1]

and this is your list. If you need it pretty printed, you'll have to
work the hierarchy.

-- Alain.
 
R

richard

[I'm leaving the data in the message in case anybody has troubles going
up-thread.]








Hi guys still struggling to get the code that was posted to me on this
forum to work in my favour and get the output in the format shown
above. This is what I have so far. Any help will be greatly
apprectiated.
output trying to achieve
parsed = [
    {
      "a":"a",
      "b":"b",
      "c":"c",
      "A_elements":[
          {
            "a":1,
            "b":2,
            "c":3
          },
          {
             "a":1,
             "b":2,
             "c":3
          }
       ],
      "B_elements":[
          {
            "a":1,
            "b":2,
            "c":3,
            "C_elements":[
                 {
                     "a":1,
                     "b":2,
                     "c":3
                  },
                  {
                      "a":1,
                      "b":2,
                      "c":3
                  }
             ]
          }
       ]
    },
    {
      "a":"1",
      "b":"2",
      "c":"3",
    }

file format unchangeable
An instance of TestArray
 a=a
 b=b
 c=c
 List of 2 A elements:
  Instance of A element
   a=1
   b=2
   c=3
  Instance of A element
   d=1
   e=2
   f=3
 List of 1 B elements
  Instance of B element
   a=1
   b=2
   c=3
   List of 2 C elements
    Instance of C element
     a=1
     b=2
     c=3
    Instance of C element
     a=1
     b=2
     c=3
An instance of TestArray
 a=1
 b=2
 c=3
def test_parser(filename):
    class Stanza:
        def __init__(self, values):
            for attr, val in values:
                setattr(self, attr, val)
    def build(couple):
        if "=" in couple[0]:
            attr, val = couple[0].split("=")
            return attr,val
        elif "Instance of" in couple[0]:
            match = re.search("Instance of (.+) element",couple[0])
            return ("attr_%s" % match.group(1),Stanza(couple[1]))
        elif "List of" in couple[0]:
            match = re.search("List of \d (.+) elements",couple[0])
            return ("%s_elements" % match.group(1),couple[1])

You forgot one case:

    def build(couple):
        if "=" in couple[0]:
            attr, val = couple[0].split("=")
            return attr,val
        elif "Instance of" in couple[0]:
            #match = re.search("Instance of (.+) element", couple[0])
            #return ("attr_%s" % match.group(1),Stanza(couple[1]))
            return dict(couple[1])
        elif "An instance of" in couple[0]: # you forgot that case
            return dict(couple[1])
        elif "List of" in couple[0]:
            match = re.search("List of \d (.+) elements", couple[0])
            return ("%s_elements" % match.group(1),couple[1])
        else:
            pass # put a test here
    fo = open(filename, "r")
    RE = re.compile("( *)(.*)")
    stack = [("-",[])]
    for line in fo:
        matches = RE.match(line)
        if len(matches.group(2)) > 0:
            depth = 1 + len(matches.group(1))
            while len(stack) > depth:
                stack[-2][1].append(build(stack[-1]))
                del stack[-1]
            stack.append( (matches.group(2),[]) )
    while len(stack) > 1:
        stack[-2][1].append(stack[-1])

Change this to:

          stack[-2][1].append(build(stack[-1])) # call build() here also
        del stack[-1]
    return stack

Actually the first and only element of stack is a container: all you
need is the second element of the only tuple in stack, so:

      return stack[0][1]

and this is your list. If you need it pretty printed, you'll have to
work the hierarchy.

-- Alain.

Hi Alain thanks for the reply. With regards to the missing case "An
Instance of" im not sure where/ how that is working as the case i put
in originally "Instance of" is in the file and been handled in the
previous case. Also when running the final solution im getting a list
of [None, None] as the final stack? just busy debugging it to see
whats going wrong. But sorry should have been clearer with regards to
the format mentioned above. The objects are been printed out as dicts
so where you put in

elif "An Instance of" in couple[0]:
return dict(couple[1])

should still be ?
elif "Instance of" in couple[0]:
match = re.search("Instance of (.+) element", couple[0])
return ("attr_%s" % match.group(1),Stanza(couple[1])) #
instantiating new stanza object and setting attributes.
 
R

richard

[I'm leaving the data in the message in case anybody has troubles going
up-thread.]








Hi guys still struggling to get the code that was posted to me on this
forum to work in my favour and get the output in the format shown
above. This is what I have so far. Any help will be greatly
apprectiated.
output trying to achieve
parsed = [
    {
      "a":"a",
      "b":"b",
      "c":"c",
      "A_elements":[
          {
            "a":1,
            "b":2,
            "c":3
          },
          {
             "a":1,
             "b":2,
             "c":3
          }
       ],
      "B_elements":[
          {
            "a":1,
            "b":2,
            "c":3,
            "C_elements":[
                 {
                     "a":1,
                     "b":2,
                     "c":3
                  },
                  {
                      "a":1,
                      "b":2,
                      "c":3
                  }
             ]
          }
       ]
    },
    {
      "a":"1",
      "b":"2",
      "c":"3",
    }

file format unchangeable
An instance of TestArray
 a=a
 b=b
 c=c
 List of 2 A elements:
  Instance of A element
   a=1
   b=2
   c=3
  Instance of A element
   d=1
   e=2
   f=3
 List of 1 B elements
  Instance of B element
   a=1
   b=2
   c=3
   List of 2 C elements
    Instance of C element
     a=1
     b=2
     c=3
    Instance of C element
     a=1
     b=2
     c=3
An instance of TestArray
 a=1
 b=2
 c=3
def test_parser(filename):
    class Stanza:
        def __init__(self, values):
            for attr, val in values:
                setattr(self, attr, val)
    def build(couple):
        if "=" in couple[0]:
            attr, val = couple[0].split("=")
            return attr,val
        elif "Instance of" in couple[0]:
            match = re.search("Instance of (.+) element",couple[0])
            return ("attr_%s" % match.group(1),Stanza(couple[1]))
        elif "List of" in couple[0]:
            match = re.search("List of \d (.+) elements",couple[0])
            return ("%s_elements" % match.group(1),couple[1])

You forgot one case:

    def build(couple):
        if "=" in couple[0]:
            attr, val = couple[0].split("=")
            return attr,val
        elif "Instance of" in couple[0]:
            #match = re.search("Instance of (.+) element", couple[0])
            #return ("attr_%s" % match.group(1),Stanza(couple[1]))
            return dict(couple[1])
        elif "An instance of" in couple[0]: # you forgot that case
            return dict(couple[1])
        elif "List of" in couple[0]:
            match = re.search("List of \d (.+) elements", couple[0])
            return ("%s_elements" % match.group(1),couple[1])
        else:
            pass # put a test here
    fo = open(filename, "r")
    RE = re.compile("( *)(.*)")
    stack = [("-",[])]
    for line in fo:
        matches = RE.match(line)
        if len(matches.group(2)) > 0:
            depth = 1 + len(matches.group(1))
            while len(stack) > depth:
                stack[-2][1].append(build(stack[-1]))
                del stack[-1]
            stack.append( (matches.group(2),[]) )
    while len(stack) > 1:
        stack[-2][1].append(stack[-1])

Change this to:

          stack[-2][1].append(build(stack[-1])) # call build() here also
        del stack[-1]
    return stack

Actually the first and only element of stack is a container: all you
need is the second element of the only tuple in stack, so:

      return stack[0][1]

and this is your list. If you need it pretty printed, you'll have to
work the hierarchy.

-- Alain.

Hi Alain, thanks for the reply. Amended the code and just busy
debugging but the stack i get back justs return [None, None]. Also
should have been clearer when i mentioned the format above the dicts
are actually objects instantaited from classes and just printed out as
obj.__dict__ just for representation putposes. so where you have
replaced the following i presume this was because of my format
confusion. Thanks
elif "Instance of" in couple[0]:
match = re.search("Instance of (.+) element", couple[0])
return ("attr_%s" % match.group(1),Stanza(couple[1])) #instantiating new object and setting attributes
with
elif "Instance of" in couple[0]:
#match = re.search("Instance of (.+) element", couple[0])
#return ("attr_%s" % match.group(1),Stanza(couple[1]))
return dict(couple[1])
 
R

richard

[I'm leaving the data in the message in case anybody has troubles going
up-thread.]
Hi guys still struggling to get the code that was posted to me on this
forum to work in my favour and get the output in the format shown
above. This is what I have so far. Any help will be greatly
apprectiated.
output trying to achieve
parsed = [
    {
      "a":"a",
      "b":"b",
      "c":"c",
      "A_elements":[
          {
            "a":1,
            "b":2,
            "c":3
          },
          {
             "a":1,
             "b":2,
             "c":3
          }
       ],
      "B_elements":[
          {
            "a":1,
            "b":2,
            "c":3,
            "C_elements":[
                 {
                     "a":1,
                     "b":2,
                     "c":3
                  },
                  {
                      "a":1,
                      "b":2,
                      "c":3
                  }
             ]
          }
       ]
    },
    {
      "a":"1",
      "b":"2",
      "c":"3",
    }
]
file format unchangeable
An instance of TestArray
 a=a
 b=b
 c=c
 List of 2 A elements:
  Instance of A element
   a=1
   b=2
   c=3
  Instance of A element
   d=1
   e=2
   f=3
 List of 1 B elements
  Instance of B element
   a=1
   b=2
   c=3
   List of 2 C elements
    Instance of C element
     a=1
     b=2
     c=3
    Instance of C element
     a=1
     b=2
     c=3
An instance of TestArray
 a=1
 b=2
 c=3
def test_parser(filename):
    class Stanza:
        def __init__(self, values):
            for attr, val in values:
                setattr(self, attr, val)
    def build(couple):
        if "=" in couple[0]:
            attr, val = couple[0].split("=")
            return attr,val
        elif "Instance of" in couple[0]:
            match = re.search("Instance of (.+) element", couple[0])
            return ("attr_%s" % match.group(1),Stanza(couple[1]))
        elif "List of" in couple[0]:
            match = re.search("List of \d (.+) elements", couple[0])
            return ("%s_elements" % match.group(1),couple[1])
You forgot one case:
    def build(couple):
        if "=" in couple[0]:
            attr, val = couple[0].split("=")
            return attr,val
        elif "Instance of" in couple[0]:
            #match = re.search("Instance of (.+) element", couple[0])
            #return ("attr_%s" % match.group(1),Stanza(couple[1]))
            return dict(couple[1])
        elif "An instance of" in couple[0]: # you forgot that case
            return dict(couple[1])
        elif "List of" in couple[0]:
            match = re.search("List of \d (.+) elements",couple[0])
            return ("%s_elements" % match.group(1),couple[1])
        else:
            pass # put a test here
    fo = open(filename, "r")
    RE = re.compile("( *)(.*)")
    stack = [("-",[])]
    for line in fo:
        matches = RE.match(line)
        if len(matches.group(2)) > 0:
            depth = 1 + len(matches.group(1))
            while len(stack) > depth:
                stack[-2][1].append(build(stack[-1]))
                del stack[-1]
            stack.append( (matches.group(2),[]) )
    while len(stack) > 1:
        stack[-2][1].append(stack[-1])
Change this to:
          stack[-2][1].append(build(stack[-1])) # call build() here also
        del stack[-1]
    return stack
Actually the first and only element of stack is a container: all you
need is the second element of the only tuple in stack, so:
      return stack[0][1]
and this is your list. If you need it pretty printed, you'll have to
work the hierarchy.
-- Alain.

Hi Alain, thanks for the reply. Amended the code and just busy
debugging but the stack i get back justs return [None, None]. Also
should have been clearer when i mentioned the format above the dicts
are actually objects instantaited from classes and just printed out as
obj.__dict__ just for representation putposes. so where you have
replaced the following i presume this was because of my format
confusion. Thanks








        elif "Instance of" in couple[0]:
            match = re.search("Instance of (.+) element",couple[0])
            return ("attr_%s" % match.group(1),Stanza(couple[1])) #instantiating new object and setting attributes
with
        elif "Instance of" in couple[0]:
            #match = re.search("Instance of (.+) element", couple[0])
            #return ("attr_%s" % match.group(1),Stanza(couple[1]))
            return dict(couple[1])

Sorry silly mistake made with "An instance" and "Instance of" code
emende below for fix

if "=" in couple[0]:
attr, val = couple[0].split("=")
return attr,val
elif re.search("Instance of .+",couple[0]):
#match = re.search("Instance of (.+) element", couple[0])
#return ("attr_%s" % match.group(1),Stanza(couple[1]))
return dict(couple[1])
elif re.search("An instance of .+", couple[0]):
return dict(couple[1])
elif "List of" in couple[0]:
match = re.search("List of \d (.+) elements", couple[0])
return ("%s_elements" % match.group(1),couple[1])
else:
pass
 
A

Alain Ketterlin

richard said:
[...]

Hi Alain thanks for the reply. With regards to the missing case "An
Instance of" im not sure where/ how that is working as the case i put
in originally "Instance of" is in the file and been handled in the
previous case.

Both cases are different in your example above. Top level elements are
labeled "An instance ...", whereas "inner" instances are labeled
"Instance of ...".
Also when running the final solution im getting a list of [None, None]
as the final stack?

There's only one way this can happen: by falling through to the last
case of build(). Check the regexps etc. again.
just busy debugging it to see whats going wrong. But sorry should have
been clearer with regards to the format mentioned above. The objects
are been printed out as dicts so where you put in

elif "An Instance of" in couple[0]:
return dict(couple[1])

should still be ?
elif "Instance of" in couple[0]:
match = re.search("Instance of (.+) element", couple[0])
return ("attr_%s" % match.group(1),Stanza(couple[1])) #
instantiating new stanza object and setting attributes.

Your last "Instance of..." case is correct, but "An instance..." is
different, because there's no containing object, so it's probably more
like: return Stanza(couple[1]).

-- Alain.
 
R

richard

richard said:
An instance of TestArray
 a=a
 b=b
 c=c
 List of 2 A elements:
  Instance of A element
   a=1
   b=2
   c=3
  Instance of A element
   d=1
   e=2
   f=3
 List of 1 B elements
  Instance of B element
   a=1
   b=2
   c=3
   List of 2 C elements
    Instance of C element
     a=1
     b=2
     c=3
    Instance of C element
     a=1
     b=2
     c=3
[...]

Hi Alain thanks for the reply. With regards to the missing case "An
Instance of" im not sure where/ how that is working as the case i put
in originally "Instance of" is in the file and been handled in the
previous case.

Both cases are different in your example above. Top level elements are
labeled "An instance ...", whereas "inner" instances are labeled
"Instance of ...".
Also when running the final solution im getting a list of [None, None]
as the final stack?

There's only one way this can happen: by falling through to the last
case of build(). Check the regexps etc. again.
just busy debugging it to see whats going wrong. But sorry should have
been clearer with regards to the format mentioned above. The objects
are been printed out as dicts so where you put in
        elif "An Instance of" in couple[0]:
            return dict(couple[1])
        should still be ?
        elif "Instance of" in couple[0]:
            match = re.search("Instance of (.+) element",couple[0])
            return ("attr_%s" % match.group(1),Stanza(couple[1])) #
instantiating new stanza object and setting attributes.

Your last "Instance of..." case is correct, but "An instance..." is
different, because there's no containing object, so it's probably more
like: return Stanza(couple[1]).

-- Alain.

A big thank you to everyone who has helped me tackle / shed light on
this problem it is working great. Much appreciated.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,044
Latest member
RonaldNen

Latest Threads

Top