Ok, I'm quite new to Python

Discussion in 'Python' started by Michael, Oct 13, 2004.

  1. Michael

    Michael Guest

    But i'm a good c++ programmer.

    What i want to do is parse a text file and store the information in relevant
    fields:

    //Text File:

    *Version 200
    *SCENE {
    AMBIENT_COLOUR 0.0 0.0 0.0
    }

    *MATERIAL_LIST{
    *MATERIAL_COUNT 0
    }
    *GEOMOBJECT {
    *NODE_NAME "obj1"
    *MESH {
    *MESH_VERTEX_LIST{
    *MESH_VERTEX 0 0 0 0
    *MESH_VERTEX 1 0 1 2
    }
    *MESH_FACE_LIST {
    *MESH_FACE 1 2 3
    }
    }
    }
    /* ... More GEOMOBJECTS ...*/


    but I have no idea what the best way to do this is?
    Any thoughts??

    Many Thanks

    Mike
     
    Michael, Oct 13, 2004
    #1
    1. Advertising

  2. Michael wrote:

    >But i'm a good c++ programmer.
    >
    >What i want to do is parse a text file and store the information in relevant
    >fields:
    >
    >

    Well, if you have (or know how to write) an EBNF grammar, SimpleParse
    would likely be ideal for this. See the VRML97 sample grammar in
    SimpleParse (or even the VRML97 loader in OpenGLContext for a more
    real-world example).

    Primary value of SimpleParse for this kind of thing is that it's fast
    compared to most other Python parser generators while still being easy
    to use. If you're loading largish (say 10s of MBs) models the speed can
    be quite useful. (It was originally written explicitly to produce a
    fast VRML97 parser (btw)).

    If you're loading *huge* models (100s of MBs), you may need to go for a
    C/C++ extension to directly convert from an on-disk buffer to objects,
    but try it with the Python versions first. Even with 100s of MBs, you
    can write SimpleParse grammars fast enough to parse them quite quickly,
    it just requires a little more care with how you structure your productions.

    >but I have no idea what the best way to do this is?
    >Any thoughts??
    >
    >

    Mostly it's just a matter of what you feel comfortable with. There's
    quite a range of Python text-processing tools available. See the text
    "Text Processing in Python" (available in both dead-tree and online
    format) for extensive treatment of various approaches, from writing your
    own recursive descent parsers through using one of the parser-generators.

    Good luck,
    Mike

    ________________________________________________
    Mike C. Fletcher
    Designer, VR Plumber, Coder
    http://www.vrplumber.com
    http://blog.vrplumber.com
     
    Mike C. Fletcher, Oct 13, 2004
    #2
    1. Advertising

  3. Michael

    Terry Reedy Guest

    "Michael" <> wrote in message
    news:ckhrcs$g92$...
    > But i'm a good c++ programmer.
    >
    > What i want to do is parse a text file and store the information in
    > relevant
    > fields:


    A more useful-to-the-reader and possibly more fruitful-to-you subject line
    would have been something like 'Need help parsing text files'.

    tjr
     
    Terry Reedy, Oct 13, 2004
    #3
  4. Michael

    Jeff Epler Guest

    Well, here's a sre-based scanner and recursive-descent parser based on
    my understanding of the grammar you gave.

    Using a real scanner and parser may or may not be a better choice, but
    it's not hard in Python to create a scanner and write a
    recursive-descent parser for a simple grammar.

    Jeff

    ------------------------------------------------------------------------
    # This code is in the public domain
    class PeekableIterator:
    def __init__(self, s):
    self.s = iter(s)
    self._peek = []

    def atend(self):
    try:
    self.peek()
    except StopIteration:
    return True
    return False

    def peek(self):
    if not self._peek: self._peek = [self.s.next()]
    return self._peek[0]

    def next(self):
    if self._peek:
    return self._peek.pop()
    return self.s.next()

    def __iter__(self): return self

    def tok(scanner, s):
    return s
    def num(scanner, s):
    try:
    return int(s)
    except ValueError:
    return float(s)

    import sre
    scanner = sre.Scanner([
    (r"/\*(?:[^*]|[*]+[^/])*\*/", None),
    (r"\*?[A-Za-z_][A-Za-z0-9_]*", tok),
    (r"//.*$", None),
    (r"[0-9]*\.[0-9]+|[0-9]+\.?", num),
    (r"[{}]", tok),
    (r'"(?:[^\\"]|\\.)*"', tok),
    (r"[ \t\r\n]*", None),
    ], sre.MULTILINE)

    class Node:
    def __init__(self, name):
    self.name = name
    self.contents = []
    def add(self, v): self.contents.append(v)
    def __str__(self):
    sc = " ".join(map(repr, self.contents))
    return "<%s: %s>" % (self.name, sc)
    __repr__ = __str__

    def parse_nodes(t):
    n = []
    while 1:
    if t.peek() == "}":
    t.next()
    break
    n.append(parse_node(t))
    return n

    def parse_contents(n, t):
    if t.atend(): return
    if t.peek() == "{":
    t.next()
    for n1 in parse_nodes(t):
    n.add(n1)
    while 1:
    if t.atend(): break
    if t.peek() == "}": break
    if isinstance(p, basestring) and t.peek().startswith("*"): break
    n.add(t.next())

    def parse_node(t):
    n = Node(t.next())
    parse_contents(n, t)
    return n

    def parse_top(t):
    nodes = []
    while not t.atend():
    yield parse_node(t)


    import sys
    def main(source = sys.stdin):
    tokens, rest = scanner.scan(source.read())
    if rest:
    print "Garbage at end of file:", `rest`
    for n in parse_top(PeekableIterator(tokens)):
    print n

    if __name__ == '__main__': main()
    ------------------------------------------------------------------------
    $ python michael.py < michael.txt # and reindented for show
    <*Version: 200>
    <*SCENE: <AMBIENT_COLOUR: 0.0 0.0 0.0>>
    <*MATERIAL_LIST: <*MATERIAL_COUNT: 0>>
    <*GEOMOBJECT:
    <*NODE_NAME: '"obj1"'>
    <*MESH:
    <*MESH_VERTEX_LIST:
    <*MESH_VERTEX: 0 0 0 0>
    <*MESH_VERTEX: 1 0 1 2>
    >

    <*MESH_FACE_LIST: <*MESH_FACE: 1 2 3>>
    >
    >


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.6 (GNU/Linux)

    iD8DBQFBbIQVJd01MZaTXX0RAlmXAJ9CjRfV1w4NQo2wSBa4doZSWuNvDQCeKzyd
    Z0SHDzDLxFnacVGNf6PQmtE=
    =s51L
    -----END PGP SIGNATURE-----
     
    Jeff Epler, Oct 13, 2004
    #4
  5. On Wed, 13 Oct 2004 00:03:41 +0000 (UTC), "Michael" <> wrote:

    >But i'm a good c++ programmer.
    >
    >What i want to do is parse a text file and store the information in relevant

    ^^^^^^^^^^^^^^^^^[1] ^^^^^[2] ^^^^^^^^^^^[3] ^^^^^^^^-
    >fields:

    -^^^^^[4]
    [1] ok
    [2] where?
    [3] which
    [4] relevant to what?
    [5] ;-)
    >
    >//Text File:
    >
    >*Version 200
    >*SCENE {
    > AMBIENT_COLOUR 0.0 0.0 0.0
    > }
    >
    >*MATERIAL_LIST{
    > *MATERIAL_COUNT 0
    > }
    >*GEOMOBJECT {
    > *NODE_NAME "obj1"
    > *MESH {
    > *MESH_VERTEX_LIST{
    > *MESH_VERTEX 0 0 0 0
    > *MESH_VERTEX 1 0 1 2
    > }
    > *MESH_FACE_LIST {
    > *MESH_FACE 1 2 3
    > }
    > }
    > }
    >/* ... More GEOMOBJECTS ...*/
    >
    >
    >but I have no idea what the best way to do this is?

    ^^^^^^^[1]
    [1] do what?
    >Any thoughts??
    >

    Id probably start eith stripping out the tokens with a regular expression
    and then process the list to build a tree that you can then walk? To start:

    >>> data = """\

    ... *Version 200
    ... *SCENE {
    ... AMBIENT_COLOUR 0.0 0.0 0.0
    ... }
    ...
    ... *MATERIAL_LIST{
    ... *MATERIAL_COUNT 0
    ... }
    ... *GEOMOBJECT {
    ... *NODE_NAME "obj1"
    ... *MESH {
    ... *MESH_VERTEX_LIST{
    ... *MESH_VERTEX 0 0 0 0
    ... *MESH_VERTEX 1 0 1 2
    ... }
    ... *MESH_FACE_LIST {
    ... *MESH_FACE 1 2 3
    ... }
    ... }
    ... }
    ... """

    >>> import re
    >>> rxs = re.compile(r'([{}]|"[^"]*"|[*A-Z_a-z]+|[0-9.]+)')
    >>> tokens = rxs.findall(data)
    >>> tokens

    ['*Version', '200', '*SCENE', '{', 'AMBIENT_COLOUR', '0.0', '0.0', '0.0', '}', '*MATERIAL_LIST',
    '{', '*MATERIAL_COUNT', '0', '}', '*GEOMOBJECT', '{', '*NODE_NAME', '"obj1"', '*MESH', '{', '*M
    ESH_VERTEX_LIST', '{', '*MESH_VERTEX', '0', '0', '0', '0', '*MESH_VERTEX', '1', '0', '1', '2', '
    }', '*MESH_FACE_LIST', '{', '*MESH_FACE', '1', '2', '3', '}', '}', '}']

    IWT that isolates the basic info of interest. It should not be hard to make a tree or
    extract what suits your purposes, but I'm not going to guess what those are ;-)

    Regards,
    Bengt Richter
     
    Bengt Richter, Oct 13, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Simon Harvey
    Replies:
    3
    Views:
    609
    Kevin Spencer
    Aug 6, 2004
  2. =?Utf-8?B?RXNraW1v?=

    C# Databinder.Eval didn't work quite right...

    =?Utf-8?B?RXNraW1v?=, Nov 5, 2004, in forum: ASP .Net
    Replies:
    3
    Views:
    3,476
    Scott Allen
    Nov 7, 2004
  3. Raghu Raman

    Panel control is quite enough ?

    Raghu Raman, Dec 5, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    340
    Raghu Raman
    Dec 6, 2004
  4. =?Utf-8?B?bmV3Ymll?=
    Replies:
    0
    Views:
    315
    =?Utf-8?B?bmV3Ymll?=
    Sep 8, 2005
  5. Hans Baumann
    Replies:
    0
    Views:
    2,021
    Hans Baumann
    Feb 6, 2006
Loading...

Share This Page