LaTeX-Like Parsing in C

N

nedelm

My problem's with parsing. I have this (arbitrary, from a file)
string, lets
say:

"Directory: /file{File:/filename(/size) }"

I would like it to behave similar to LaTeX. I parse it, and then I
write it
out for diferent variables, like:

"Directory: File:.(0) File:..(0) File:a.out(12) File:foo(1) "

But I keep getting into a mess of complication. I'm using C (of
course.) How
do I parse it? strpbrk(,"/{}") (what then?) How can I get the string
to a
data-structure that I could write out? Algorithms?

-Neil
 
R

Richard Heathfield

(e-mail address removed) said:
My problem's with parsing. I have this (arbitrary, from a file)
string, lets
say:

"Directory: /file{File:/filename(/size) }"

I would like it to behave similar to LaTeX. I parse it, and then I
write it
out for diferent variables, like:

"Directory: File:.(0) File:..(0) File:a.out(12) File:foo(1) "

But I keep getting into a mess of complication. I'm using C (of
course.) How
do I parse it? strpbrk(,"/{}") (what then?) How can I get the string
to a
data-structure that I could write out? Algorithms?

Start with a lexing stage, where you simply break the input into lexical
tokens, doing your best to identify them as you go but not worrying too
much about odd cases. Store your lexical tokens in some kind of dynamic
data structure such as a linked list. Yes, strpbrk will work for this,
or even strtok if your input is writeable.

That will massively reduce the complexity of the parsing stage, since
you won't have to worry about tokenisation (because each token is
simply the next node on the linked list), and so you can focus purely
on the grammar that you are trying to implement.
 
C

Chris Dollin

Richard said:
(e-mail address removed) said:


Start with a lexing stage, where you simply break the input into lexical
tokens, doing your best to identify them as you go but not worrying too
much about odd cases. Store your lexical tokens in some kind of dynamic
data structure such as a linked list. Yes, strpbrk will work for this,
or even strtok if your input is writeable.

And if your tokenisation rules are sufficiently bizarre [1], you can
resort to tools such as [f]lex, which [typically|can] generate C
code/tables for you.
That will massively reduce the complexity of the parsing stage, since
you won't have to worry about tokenisation (because each token is
simply the next node on the linked list), and so you can focus purely
on the grammar that you are trying to implement.

And again, if you end up with a sufficiently complex grammar [1again],
there are tools that will help. But if you're in control of the grammar,
such complexity may be a grammar smell ...

(Also helpful: existing books. And writing unit tests.)

[1] What counts as "sufficiently" is variable.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top