LaTeX-Like Parsing in C

nedelm · Jul 26, 2007

My problem's with parsing. I have this (arbitrary, from a file)
string, lets
say:

"Directory: /file{File:/filename(/size) }"

I would like it to behave similar to LaTeX. I parse it, and then I
write it
out for diferent variables, like:

"Directory: File:.(0) File:..(0) File:a.out(12) File:foo(1) "

But I keep getting into a mess of complication. I'm using C (of
course.) How
do I parse it? strpbrk(,"/{}") (what then?) How can I get the string
to a
data-structure that I could write out? Algorithms?

-Neil

Richard Heathfield · Jul 26, 2007

(e-mail address removed) said:

My problem's with parsing. I have this (arbitrary, from a file)
string, lets
say:

"Directory: /file{File:/filename(/size) }"

I would like it to behave similar to LaTeX. I parse it, and then I
write it
out for diferent variables, like:

"Directory: File:.(0) File:..(0) File:a.out(12) File:foo(1) "

But I keep getting into a mess of complication. I'm using C (of
course.) How
do I parse it? strpbrk(,"/{}") (what then?) How can I get the string
to a
data-structure that I could write out? Algorithms?

Start with a lexing stage, where you simply break the input into lexical
tokens, doing your best to identify them as you go but not worrying too
much about odd cases. Store your lexical tokens in some kind of dynamic
data structure such as a linked list. Yes, strpbrk will work for this,
or even strtok if your input is writeable.

That will massively reduce the complexity of the parsing stage, since
you won't have to worry about tokenisation (because each token is
simply the next node on the linked list), and so you can focus purely
on the grammar that you are trying to implement.

Chris Dollin · Jul 27, 2007

Richard said:
(e-mail address removed) said:

Start with a lexing stage, where you simply break the input into lexical
tokens, doing your best to identify them as you go but not worrying too
much about odd cases. Store your lexical tokens in some kind of dynamic
data structure such as a linked list. Yes, strpbrk will work for this,
or even strtok if your input is writeable.

And if your tokenisation rules are sufficiently bizarre [1], you can
resort to tools such as [f]lex, which [typically|can] generate C
code/tables for you.

That will massively reduce the complexity of the parsing stage, since
you won't have to worry about tokenisation (because each token is
simply the next node on the linked list), and so you can focus purely
on the grammar that you are trying to implement.

And again, if you end up with a sufficiently complex grammar [1again],
there are tools that will help. But if you're in control of the grammar,
such complexity may be a grammar smell ...

(Also helpful: existing books. And writing unit tests.)

[1] What counts as "sufficiently" is variable.

I would like to use awk to calculate the total number of records processed	1	Aug 25, 2022
Am I being too C++ like?	84	Jan 18, 2012
string parsing	8	Sep 2, 2010
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Write your own isascii in c programming	0	Nov 7, 2020
Parsing cdata using expat in C	0	Mar 27, 2012
Parsing files in python	0	Dec 23, 2012

LaTeX-Like Parsing in C

nedelm

Richard Heathfield

Chris Dollin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads