V
Vuun Harjnes
Hi,
I hope this is acceptable for the C newsgroup. I wasn't sure where else to
post this.
I'm trying to write a lex file to correctly identify the tokens in my
config file except I've run into a little puzzle. Perhaps this is just a
limitation of lex but I'd like to hear other opinnions.
Heres my config file format
[header]
label = value.
Pretty similar to windows .ini files - except the header string can contain
most printable characters (ie !@#$% or even ] if its escaped) Also the
value can contain a sumilar subset of characters.
The question I have is how can I tokenize the file if I've got strings that
are arbitrary length and can contain almost any character?
Ie the following wont work.
%{
#include <stdio.h>
%}
W [[:alnum:]_]
B [[:blank:]]
%%
{W}+ printf("WORD ");
\[ printf("OBRACE ");
\} printf("EBRACE ");
{B} /* Ignore whitespace */
\n printf("\n");
..+ printf("ARBITRARY-STRING ");
%%
The following is the closest I've been able to get to what I want.
%{
#include <stdio.h>
%}
W [[:alnum:]_]
B [[:blank:]]
V .
%%
\[(\\.|[^\\\]\n])*\] printf("HEADER ");
{W}+ printf("WORD ");
{B}*={B}*{V}+ printf("VALUE ");
{B} /* Ignore whitespace */
\n printf("\n");
.. printf("UNKNOWN ");
%%
While this works its not ideal as I've still got to parse VALUE and HEADER
to extract what I'm after. Is there a better way to do this?
As a side note, whats the preferred way to parse config files? I've only
recently started to go down the lex/yacc path but I'm noticing other tools
out there (like antlr) are they any good/better suited for what I want to
do?
Thanks for any response
Vuun
I hope this is acceptable for the C newsgroup. I wasn't sure where else to
post this.
I'm trying to write a lex file to correctly identify the tokens in my
config file except I've run into a little puzzle. Perhaps this is just a
limitation of lex but I'd like to hear other opinnions.
Heres my config file format
[header]
label = value.
Pretty similar to windows .ini files - except the header string can contain
most printable characters (ie !@#$% or even ] if its escaped) Also the
value can contain a sumilar subset of characters.
The question I have is how can I tokenize the file if I've got strings that
are arbitrary length and can contain almost any character?
Ie the following wont work.
%{
#include <stdio.h>
%}
W [[:alnum:]_]
B [[:blank:]]
%%
{W}+ printf("WORD ");
\[ printf("OBRACE ");
\} printf("EBRACE ");
{B} /* Ignore whitespace */
\n printf("\n");
..+ printf("ARBITRARY-STRING ");
%%
The following is the closest I've been able to get to what I want.
%{
#include <stdio.h>
%}
W [[:alnum:]_]
B [[:blank:]]
V .
%%
\[(\\.|[^\\\]\n])*\] printf("HEADER ");
{W}+ printf("WORD ");
{B}*={B}*{V}+ printf("VALUE ");
{B} /* Ignore whitespace */
\n printf("\n");
.. printf("UNKNOWN ");
%%
While this works its not ideal as I've still got to parse VALUE and HEADER
to extract what I'm after. Is there a better way to do this?
As a side note, whats the preferred way to parse config files? I've only
recently started to go down the lex/yacc path but I'm noticing other tools
out there (like antlr) are they any good/better suited for what I want to
do?
Thanks for any response
Vuun