Newbie programmer question: How do parsers work?(Python examples?)

B

bio_enthusiast

I was wondering exactly how you create a parser. I'm learning
Python and I recently have come across this material. I'm interested
in the method or art of writing a parser.

If anyone has some python code to post for an abstract parser, or links
to some informative tutorials, that would be great.
 
N

Neil Cerutti

I was wondering exactly how you create a parser. I'm learning
Python and I recently have come across this material. I'm
interested in the method or art of writing a parser.

If anyone has some python code to post for an abstract parser,
or links to some informative tutorials, that would be great.

Start with the comp.compilers FAQ.

http://compilers.iecc.com/faq.txt

For a treatment aimed at non-computer scientists, check out Jack
Crenshaw's Let's Build a Compiler series. The sample code is
Pascal, but translating it to Python is an OK way to learn Python
and get a taste of writing several different
recursive-descent-parsers.

http://compilers.iecc.com/crenshaw/

You don't really need to know Pascal to be able to read the
simple, small, functions, as long as you have experience in other
similar languages.
 
B

Brian Mills

You have a lot of choices with this sort of thing. What you'd use
depends largely on what sorts of files/input you'll be parsing.

For example, a common machine-friendly data format is the
comma-separated file. These, or really any file which uses a
character-based field seperator (including newline characters), are
usually best read with something like split([seperator]), which will
return an array of each element in the string you give it. Example:
['Brian', 'student', 'California', '555-0127']

More helpful is if the file has a header line:
.... 'Brian,student,California,555-0127\n'+\
.... 'Ann,carpenter,Georgia,555-3825'['Name,Occupation,Location,Phone', 'Brian,student,California,555-0127',
'Ann,carpenter,Georgia,555-3825']
header=entries[0]
entries=entries[1:]
for entry in entries:
.... tokens=entry.split(',')
.... [whatever]

A more powerful tool is the regular expression engine, which is
something you'll be using quite a lot if you get into heavy text
parsing. Some people have described it as its own mini-language, but
by no means is it Python specific: Perl, Java, various Unix shells, and
others all have a roughly equivalent setup.

Python regular expression engine is very object-oriented. As a simple
primer, you first import the re module, make a Pattern object from
re.compile(), and then run one of the Pattern's several ways of parsing
a line. A common example, where you want to know the version of the
Globus software installed from reading a filename:
'4.0.2'

This seems a bit overblown just to find this - after all, we could have
just split str on '/' to make a token array, grabbed token 4, split
again on '-', and taken token 1. The advantage to regular expressions
is that they're very flexible. This would work on any of the
following:

/users/username/globus/globus-4.0.2/lib/libglobus_gridftp_server_gcc32.so
.../../globus-4.0.2/lib/libglobus_gridftp_server_gcc32.so
ftp ftp.server.com -e 'get
globus/globus-4.0.2/lib/libglobus_gridftp_server_gcc32.so

and so on. The moral is that when the string you're parsing is fairly
regular, use something like split(), when it can vary a lot, use
regular expressions. Split is, as you may expect, quite a bit faster.

I should stress that this is a very barebones example, and doesn't even
begin to scratch the surface of regular expressions' power. It's also
a little too general, as any string fragment of
[number].[number].[number] will match here. An excellent resource on
regular expressions in Python (I believe lifted from the original
documentation, but I digress):

http://www.amk.ca/python/howto/regex/

XML is another common format to have to go through, but I don't have
much experience in this area. If memory serves, Python comes with a
built-in XML parser that makes a multi-level dictionary of any XML file
you give it. Hopefully others can fill in on that part.

Also, don't be afraid of having the interpreter open next to your
editor of choice, and of running test patterns through any parsing code
you're writing. Regular expressions in particular are very easy to
screw up, no matter how long you've been using them.
 
T

Terry Reedy

bio_enthusiast said:
I was wondering exactly how you create a parser. I'm learning
Python and I recently have come across this material. I'm interested
in the method or art of writing a parser.

If anyone has some python code to post for an abstract parser, or links
to some informative tutorials, that would be great.

I believe the book 'Text Processing in Python" (or something like that) by
Mertz has info on parsing and modules written in Python that does such.

tjr
 
R

Ramon Diaz-Uriarte

I was wondering exactly how you create a parser. I'm learning
Python and I recently have come across this material. I'm interested
in the method or art of writing a parser.

If anyone has some python code to post for an abstract parser, or links
to some informative tutorials, that would be great.

A very nice module, with many links for extra info:

http://pyparsing.wikispaces.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,778
Messages
2,569,605
Members
45,238
Latest member
Top CryptoPodcasts

Latest Threads

Top