Ok, I'm quite new to Python

M

Michael

But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in relevant
fields:

//Text File:

*Version 200
*SCENE {
AMBIENT_COLOUR 0.0 0.0 0.0
}

*MATERIAL_LIST{
*MATERIAL_COUNT 0
}
*GEOMOBJECT {
*NODE_NAME "obj1"
*MESH {
*MESH_VERTEX_LIST{
*MESH_VERTEX 0 0 0 0
*MESH_VERTEX 1 0 1 2
}
*MESH_FACE_LIST {
*MESH_FACE 1 2 3
}
}
}
/* ... More GEOMOBJECTS ...*/


but I have no idea what the best way to do this is?
Any thoughts??

Many Thanks

Mike
 
M

Mike C. Fletcher

Michael said:
But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in relevant
fields:
Well, if you have (or know how to write) an EBNF grammar, SimpleParse
would likely be ideal for this. See the VRML97 sample grammar in
SimpleParse (or even the VRML97 loader in OpenGLContext for a more
real-world example).

Primary value of SimpleParse for this kind of thing is that it's fast
compared to most other Python parser generators while still being easy
to use. If you're loading largish (say 10s of MBs) models the speed can
be quite useful. (It was originally written explicitly to produce a
fast VRML97 parser (btw)).

If you're loading *huge* models (100s of MBs), you may need to go for a
C/C++ extension to directly convert from an on-disk buffer to objects,
but try it with the Python versions first. Even with 100s of MBs, you
can write SimpleParse grammars fast enough to parse them quite quickly,
it just requires a little more care with how you structure your productions.
but I have no idea what the best way to do this is?
Any thoughts??
Mostly it's just a matter of what you feel comfortable with. There's
quite a range of Python text-processing tools available. See the text
"Text Processing in Python" (available in both dead-tree and online
format) for extensive treatment of various approaches, from writing your
own recursive descent parsers through using one of the parser-generators.

Good luck,
Mike

________________________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://www.vrplumber.com
http://blog.vrplumber.com
 
T

Terry Reedy

Michael said:
But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in
relevant
fields:

A more useful-to-the-reader and possibly more fruitful-to-you subject line
would have been something like 'Need help parsing text files'.

tjr
 
J

Jeff Epler

Well, here's a sre-based scanner and recursive-descent parser based on
my understanding of the grammar you gave.

Using a real scanner and parser may or may not be a better choice, but
it's not hard in Python to create a scanner and write a
recursive-descent parser for a simple grammar.

Jeff

------------------------------------------------------------------------
# This code is in the public domain
class PeekableIterator:
def __init__(self, s):
self.s = iter(s)
self._peek = []

def atend(self):
try:
self.peek()
except StopIteration:
return True
return False

def peek(self):
if not self._peek: self._peek = [self.s.next()]
return self._peek[0]

def next(self):
if self._peek:
return self._peek.pop()
return self.s.next()

def __iter__(self): return self

def tok(scanner, s):
return s
def num(scanner, s):
try:
return int(s)
except ValueError:
return float(s)

import sre
scanner = sre.Scanner([
(r"/\*(?:[^*]|[*]+[^/])*\*/", None),
(r"\*?[A-Za-z_][A-Za-z0-9_]*", tok),
(r"//.*$", None),
(r"[0-9]*\.[0-9]+|[0-9]+\.?", num),
(r"[{}]", tok),
(r'"(?:[^\\"]|\\.)*"', tok),
(r"[ \t\r\n]*", None),
], sre.MULTILINE)

class Node:
def __init__(self, name):
self.name = name
self.contents = []
def add(self, v): self.contents.append(v)
def __str__(self):
sc = " ".join(map(repr, self.contents))
return "<%s: %s>" % (self.name, sc)
__repr__ = __str__

def parse_nodes(t):
n = []
while 1:
if t.peek() == "}":
t.next()
break
n.append(parse_node(t))
return n

def parse_contents(n, t):
if t.atend(): return
if t.peek() == "{":
t.next()
for n1 in parse_nodes(t):
n.add(n1)
while 1:
if t.atend(): break
if t.peek() == "}": break
if isinstance(p, basestring) and t.peek().startswith("*"): break
n.add(t.next())

def parse_node(t):
n = Node(t.next())
parse_contents(n, t)
return n

def parse_top(t):
nodes = []
while not t.atend():
yield parse_node(t)


import sys
def main(source = sys.stdin):
tokens, rest = scanner.scan(source.read())
if rest:
print "Garbage at end of file:", `rest`
for n in parse_top(PeekableIterator(tokens)):
print n

if __name__ == '__main__': main()
------------------------------------------------------------------------
$ python michael.py < michael.txt # and reindented for show
<*Version: 200>
<*SCENE: <AMBIENT_COLOUR: 0.0 0.0 0.0>>
<*MATERIAL_LIST: <*MATERIAL_COUNT: 0>>
<*GEOMOBJECT:
<*NODE_NAME: '"obj1"'>
<*MESH:
<*MESH_VERTEX_LIST:
said:
said:

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFBbIQVJd01MZaTXX0RAlmXAJ9CjRfV1w4NQo2wSBa4doZSWuNvDQCeKzyd
Z0SHDzDLxFnacVGNf6PQmtE=
=s51L
-----END PGP SIGNATURE-----
 
B

Bengt Richter

But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in relevant
^^^^^^^^^^^^^^^^^[1] ^^^^^[2] ^^^^^^^^^^^[3] ^^^^^^^^-
-^^^^^[4]
[1] ok
[2] where?
[3] which
[4] relevant to what?
[5] ;-)
//Text File:

*Version 200
*SCENE {
AMBIENT_COLOUR 0.0 0.0 0.0
}

*MATERIAL_LIST{
*MATERIAL_COUNT 0
}
*GEOMOBJECT {
*NODE_NAME "obj1"
*MESH {
*MESH_VERTEX_LIST{
*MESH_VERTEX 0 0 0 0
*MESH_VERTEX 1 0 1 2
}
*MESH_FACE_LIST {
*MESH_FACE 1 2 3
}
}
}
/* ... More GEOMOBJECTS ...*/


but I have no idea what the best way to do this is?
^^^^^^^[1]
[1] do what?
Any thoughts??
Id probably start eith stripping out the tokens with a regular expression
and then process the list to build a tree that you can then walk? To start:
... *Version 200
... *SCENE {
... AMBIENT_COLOUR 0.0 0.0 0.0
... }
...
... *MATERIAL_LIST{
... *MATERIAL_COUNT 0
... }
... *GEOMOBJECT {
... *NODE_NAME "obj1"
... *MESH {
... *MESH_VERTEX_LIST{
... *MESH_VERTEX 0 0 0 0
... *MESH_VERTEX 1 0 1 2
... }
... *MESH_FACE_LIST {
... *MESH_FACE 1 2 3
... }
... }
... }
... """
>>> import re
>>> rxs = re.compile(r'([{}]|"[^"]*"|[*A-Z_a-z]+|[0-9.]+)')
>>> tokens = rxs.findall(data)
>>> tokens
['*Version', '200', '*SCENE', '{', 'AMBIENT_COLOUR', '0.0', '0.0', '0.0', '}', '*MATERIAL_LIST',
'{', '*MATERIAL_COUNT', '0', '}', '*GEOMOBJECT', '{', '*NODE_NAME', '"obj1"', '*MESH', '{', '*M
ESH_VERTEX_LIST', '{', '*MESH_VERTEX', '0', '0', '0', '0', '*MESH_VERTEX', '1', '0', '1', '2', '
}', '*MESH_FACE_LIST', '{', '*MESH_FACE', '1', '2', '3', '}', '}', '}']

IWT that isolates the basic info of interest. It should not be hard to make a tree or
extract what suits your purposes, but I'm not going to guess what those are ;-)

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top