parsing of initialization files

G

giulianodammando

In the development of a simple numerical simulation software i need to
read
some initialization parameters from a file that looks like:

# Global Setup

species = 1;

\begin{specie}<1>
name = NITROGEN;
aindex = 7;
ionstages = 1;
\begin{ionstage}<1>
nmax = -1;
iindex = 10;
ionDB = XSTAR;
eLevsDB = XSTAR;
bbCollDB = CHIANTI;
bfCollDB = NIST, XSTAR;
bbRadDB = OPIP;
bfRadDB = XSTAR;
ffRadDB = off;
\end{ionstage}<1>
\end{specie}<1>

# Here stay transport calculation
# related options.

# spatial mesh options
\begin{GridOpt}
Grid = Uniform; # uniform spacing
GridStep = 20; # step size in (cm)
\end{GridOpt}

# set physical domain extension
\begin{DomainOpt}
DomainLength = 100; # cm
\end{DomainOpt}

# Specify Boundary Conditions on intensity
\begin{RadBoundaryCond}
RadBoundaryPolicy = ZeroOnBoth; # no radiation
\end{RadBoundaryCond}

# NEEDS_WORK
# set other stuff
\begin{OtherOpt}
edenPolicy = ConstField;
eden = 1e12; # cm^-3
adenPolicy = ConstField;
aden = 1e16; # cm^-3
etemPolicy = ConstField
\end{OtherOpt}

I'm not a professional programmer, so i've implemented the reading
scheme as a simple token
iterator (basically jump #comments and tokenize the rest) which feeds a
simple parser.
The parser basically recognizes 3 kind of token:
1. \environment --> start an environment of some kind
2. variable names/ variable comma-separated lists
3. the "=" token, meaning assigment (typically an internal variable is
assignd with a numeric/string value or a list of these)
3. the ";" token, meaning end of the right hand side in an assigment

This result in about 2000 lines of code because of an enormous switch
statement testing for each valid
option (and reporting an error in case there's something wrong).
I'd like to use a more flexible approach but i've not idea were to
start. It would be useful for me also to
read small matrices and vectors of floats (to restart a simulation from
a previously interrupted one).
So i need something more similar to a grammar driven parsing approach,
but i've not idea were to start.
I would appreciate any suggestion or pointers to resources (availables
libraries, documents) useful
to solve this design problem.
Thanks
 
M

Michael

In the development of a simple numerical simulation software i need to
read
some initialization parameters from a file that looks like:

(bunch of stuff deleted)

I'd like to use a more flexible approach but i've not idea were to
start.

Two answers, depending on your parameters:
1) If the format of the parameter file is fixed, and you're just
looking to write the grammer for it, check out Lex and YACC. Those are
standard parser generator tools. Here are links to GNU's
implementations (called Flex and Bison, respectively):
http://flex.sourceforge.net/
http://www.gnu.org/software/bison/

2) If you have control over the parameter file format, consider
changing it to some kind of XML-based thing. XML is designed to be
more parseable.
So things like
\begin{specie}<1>
name = NITROGEN;
\begin{ionstage}<1>
nmax = -1;
\end{ionstage}<1>
\end{specie}<1>

would become
<specie>
<name>NITROGEN</name>
<ionstage>
<nmax>-1</nmax>
</ionstage>
</specie>

In that case, look at xerces (http://xml.apache.org). Unless I'm
really worried about the size of these files, I tend to use a DOM-based
approach to reading them, rather than an event-based approach, because
the client code is easier.

Good luck with your project.

Michael
 
J

Jerry Coffin

In the development of a simple numerical simulation software i need to
read some initialization parameters from a file that looks like:

[ ... example elided ]

The two obvious options for a parser are top-down and bottom-up. A
bottom-up parser is typically written with a tool like yacc. You supply
it with a description of the grammar, and it creates a table-driven
parser. The grammar description is usually something like a warped
version of BNF. For better or worse, it's a language all its own that's
probably not topical here, but (even though you're not technically
writing a compiler) almost certainly would be in comp.compilers. Given
the small size and complexity of your grammar, you might also want to
consider using Boost::Spirit, which is a parser generator written as a
set of C++ templates.

There's no theoretical reason you couldn't create top-down parsers the
same way, but most top-down parsers are written by hand, using recursive
descent. The basic idea is that you still have a grammar, but you
basically write a single function to recognize each non-terminal in your
grammar. Terminals in the grammar are normally recognized directly in
the lexer.

Glancing through your example, it looks like the grammar should come out
something like this.

file: statements

statements: statement | statement statements
statement: assignment | environment

assignment: variable '=' value ';'

environment: header statements footer
header: '\begin{' NAME '}<' NUMBER '>'
footer: '\end{' NAME '}<' NUMBER '>'

For the moment I've made this right-recursive (e.g. the definitions of
statements and list). If you decide to use a bottom-up parser, you'll
want to make those left-recursive.

I also haven't defined 'variable' or 'value' for the moment, because I'd
make that part of the parser data driven. Instead of the case statements
you used, I'd create a map that contained all allowable variable names,
and with each I'd associate a mini-parser that knew how to parse the
specific type of data that can be assigned to that variable. These would
all descend from a common base class that is passed (for example) a
string containing the raw data from the stream, up-to, but not including
the semicolon.

Some (maybe most) of those could be driven by external data as well --
for example:
name = NITROGEN;

You could use something like this:

class name_parser : public parser {
std::set<std::string> elements;
public:
name_parser(std::string file_name) {
std::ifstream infile(file_name);
std::istream_iterator<std::string>(input), end;
std::copy(input, end, std::inserter(elements));
}

bool operator()(std::string const &name) {
return elements.find(make_lower(name)) != elements.end();
}
}

This would use a file of allowable element names:
hydrogen
helium
lithium
[ ... ]
unuhexium

or, if you want to arrange it like a periodic table, you can do that
[though it really needs to be wider than is convenient on Usenet]:

hydrogen helium
lithium beryllium boron carbon nitrogen oxygen fluorine neon
[ ... ]

Basically, as long you just have names separated by white space (spaces,
tabs, new-lines) the program doesn't care at all about how the white
space is arranged. For that matter, the order doesn't matter either.
I've put them in atomic order for the moment, but the set will arrange
them in alphabetical order as it reads them -- though, oddly enough,
arranging them in alphabetical order in the file will actually reduce
efficiency.

Of coure, you might only allow a subset of the elements -- in which case
this file would only list those you allow. If you want to add new
elements as they're discovered, you can do that without changing the
program logic at all.
 
J

Jerry Coffin

[ ... ]
name_parser(std::string file_name) {
std::ifstream infile(file_name);
std::istream_iterator<std::string>(input), end;

Oops. That should be:

std::istream_iterator<std::string> input(infile)

Sorry 'bout that.
 
G

giulianodammando

Michael said:
Two answers, depending on your parameters:
1) If the format of the parameter file is fixed, and you're just
looking to write the grammer for it, check out Lex and YACC. Those are
standard parser generator tools. Here are links to GNU's
implementations (called Flex and Bison, respectively):
http://flex.sourceforge.net/
http://www.gnu.org/software/bison/

2) If you have control over the parameter file format, consider
changing it to some kind of XML-based thing. XML is designed to be
more parseable.
In that case, look at xerces (http://xml.apache.org). Unless I'm
really worried about the size of these files, I tend to use a DOM-based
approach to reading them, rather than an event-based approach, because
the client code is easier.

Good luck with your project.

Michael

I also think that the xml approach would be more feasible, because
there are
a bunch of xml parsers freely availables on the net, but i tend to
avoid this possibility
because the init file (at the moment) is intended to be manually
modified, and the
xml metalanguage is a bit too verbose (data are buried under tons of
formatting tags).
Obviously it would be fantastic if one could write a simple HTML form
to fill in a transparent
way the init file by means of an internet browser! I also i'm aware
that this should be a
joke for much relatively skilled people, but at the moment it would be
too complicated for me.

Thank you very much Michael.
 
G

giulianodammando

Jerry said:
In the development of a simple numerical simulation software i need to
read some initialization parameters from a file that looks like:

[ ... example elided ]
n
... the small size and complexity of your grammar, you might also want to
consider using Boost::Spirit, which is a parser generator written as a
set of C++ templates.

In effect i was considering the possibility of using Spirit as a
parsing framework.
The heavy use of high level generic programming in this library is
causing me
some problems, but obviously it needs some time to become familiar with
such a
complex tool.
There's no theoretical reason you couldn't create top-down parsers the
same way, but most top-down parsers are written by hand, using recursive
descent. The basic idea is that you still have a grammar, but you
basically write a single function to recognize each non-terminal in your
grammar. Terminals in the grammar are normally recognized directly in
the lexer.

Glancing through your example, it looks like the grammar should come out
something like this.

file: statements

statements: statement | statement statements
statement: assignment | environment

assignment: variable '=' value ';'

environment: header statements footer
header: '\begin{' NAME '}<' NUMBER '>'
footer: '\end{' NAME '}<' NUMBER '>'

This exactly the same grammar i've written yesterday, taking
inspiration from
the manual of my metapost installation! This should allow me also to
parse things
like:

\begin{vector}<dim>
<number list>
\end{vector}<dim>

\begin{matrix}<dim_1, dim_2>
<n-uple list>
name = NITROGEN;

You could use something like this:

class name_parser : public parser {
std::set<std::string> elements;
public:
name_parser(std::string file_name) {
std::ifstream infile(file_name);
std::istream_iterator<std::string>(input), end;
std::copy(input, end, std::inserter(elements));
}

bool operator()(std::string const &name) {
return elements.find(make_lower(name)) != elements.end();
}
}

and these should be the semantic actions i would attach to the parsers!
Thank you very much Jerry, i will try to implement your ideas.
 
J

Jerry Coffin

[ ... ]
This exactly the same grammar i've written yesterday, taking
inspiration from
the manual of my metapost installation! This should allow me also to
parse things
like:

\begin{vector}<dim>
<number list>
\end{vector}<dim>

\begin{matrix}<dim_1, dim_2>
<n-uple list>
\end{vector}<dim_1,dim_2>

With one minor change, it should anyway -- in 'header' and 'footer'
you'd need to change the 'NUMBER' to soemthing like 'list', where a list
is defined as a list of numbers:

list: NUMBER | list ',' NUMBER
Thank you very much Jerry, i will try to implement your ideas.

Glad to help -- I hope things work out well...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top