ASCII file parser - to read between brackets ()

O

olson_ord

Hi,
My ascii file is not exactly a comma separated file. The following is
a small but complete example of such a file. (This is the ISCAS circuit
file format that I need to read in.)

----------- Example c17.bench ----------------
INPUT(1)
INPUT(2)
INPUT(3)
INPUT(6)
INPUT(7)

OUTPUT(22)
OUTPUT(23)

10 = NAND(1, 3)
11 = NAND(3, 6)
16 = NAND(2, 11)
19 = NAND(11, 7)
22 = NAND(10, 16)
23 = NAND(16, 19)
----------------End of Example ----------------

I would like to note that the numbers can also be replaced by some
other symbols - so they are actually to be treated as strings and not
as numbers.
Eg. "1" can also be "N_1" or "Node_1" - or any other string
representation.

Now I would like to know what should be my approach to reading in this
file, i.e. the algorithm.

Off the top of my head I think I would just have to read in each line
as a string. Then I would search the string for various keywords. On
finding a keyword I would then find the location of the two brackets ()
- and then parse the values between them.

I am wondering if this approach is the right way to go.

Thanks a lot guys,
O.O.
 
V

Victor Bazarov

Hi,
My ascii file is not exactly a comma separated file. [...]

Now I would like to know what should be my approach to reading in this
file, i.e. the algorithm.

Off the top of my head I think I would just have to read in each line
as a string. Then I would search the string for various keywords. On
finding a keyword I would then find the location of the two brackets ()
- and then parse the values between them.

I am wondering if this approach is the right way to go.

Sounds fine. I don't see any C++ relation, however. Please don't just
say that you're "writing it in C++". The algorithm you've described can
just as easily be written in almost any other language. Did you mean to
post it to 'comp.programming'?

V
 
T

TB

(e-mail address removed) sade:
Hi,
My ascii file is not exactly a comma separated file. The following is
a small but complete example of such a file. (This is the ISCAS circuit
file format that I need to read in.)

----------- Example c17.bench ----------------
INPUT(1)
INPUT(2)
INPUT(3)
INPUT(6)
INPUT(7)

OUTPUT(22)
OUTPUT(23)

10 = NAND(1, 3)
11 = NAND(3, 6)
16 = NAND(2, 11)
19 = NAND(11, 7)
22 = NAND(10, 16)
23 = NAND(16, 19)
----------------End of Example ----------------

I would like to note that the numbers can also be replaced by some
other symbols - so they are actually to be treated as strings and not
as numbers.
Eg. "1" can also be "N_1" or "Node_1" - or any other string
representation.

Now I would like to know what should be my approach to reading in this
file, i.e. the algorithm.

Off the top of my head I think I would just have to read in each line
as a string. Then I would search the string for various keywords. On
finding a keyword I would then find the location of the two brackets ()
- and then parse the values between them.

Tokenize the input before parsing.
 
I

Ivan Vecerina

: Hi,
: My ascii file is not exactly a comma separated file. The following is
: a small but complete example of such a file. (This is the ISCAS circuit
: file format that I need to read in.)
:
: ----------- Example c17.bench ----------------
: INPUT(1)
: INPUT(2)
: INPUT(3)
: INPUT(6)
: INPUT(7)
:
: OUTPUT(22)
: OUTPUT(23)
:
: 10 = NAND(1, 3)
: 11 = NAND(3, 6)
: 16 = NAND(2, 11)
: 19 = NAND(11, 7)
: 22 = NAND(10, 16)
: 23 = NAND(16, 19)
: ----------------End of Example ----------------
:
: I would like to note that the numbers can also be replaced by some
: other symbols - so they are actually to be treated as strings and not
: as numbers.
: Eg. "1" can also be "N_1" or "Node_1" - or any other string
: representation.
:
: Now I would like to know what should be my approach to reading in this
: file, i.e. the algorithm.
:
: Off the top of my head I think I would just have to read in each line
: as a string. Then I would search the string for various keywords. On
: finding a keyword I would then find the location of the two brackets ()
: - and then parse the values between them.
:
: I am wondering if this approach is the right way to go.

There are several ways in which this can be accomplished.
But because I don't know the complete 'grammar' of the file,
I am not sure which would be the most appropriate
(e.g. I assume there is not only NAND, but XOR etc.
Can a more complex expression be used? Unary NOT ? )

In any case, rather than parsing each line manually, you
could use one of the existing lexers or parser generators,
such as flex(with or without bison, a bit old-fashioned
but works - http://www.gnu.org/software/flex/),
or boost::spirit (http://www.boost.org/libs/spirit/index.html).

If the files are simple enough, a regular-expressions package
might be an alternative for extracting needed identifiers from
each line (e.g. http://www.boost.org/libs/regex/doc/index.html)


These are among a number of other options...
hth-Ivan
 
O

olson_ord

Dear Victor,
Thanks for responding. I forgot to mention in my post that I am
dealing with C++. I know that my algorithm was general, but sometimes a
certain language may have some features to handle this situation
differently. E.g. I had heard of RegEx's in perl, and I thought I
could not use them in C++. Also I did not know of what Tokenize means
which I learnt only after TB suggested it.
That's what I was looking for.
Thanks,
O.O.
 
V

Victor Bazarov

Thanks for responding. I forgot to mention in my post that I am
dealing with C++.

That's what I was afraid of...
> I know that my algorithm was general, but sometimes a
certain language may have some features to handle this situation
differently. E.g. I had heard of RegEx's in perl, and I thought I
could not use them in C++.

They are not part of the language yet. As soon as you see the TR1
implemented, you could try using <regex> and whatever it is going to
contain. Until then, alas, no language mechanism to help you except some
very simple ones, like 'string', 'fstream', and others of which you are
probably already aware.
> Also I did not know of what Tokenize means
which I learnt only after TB suggested it.

"Tokenize" usually means "identify and split the input stream into tokens"
and it can mean _whatever_you_make_it_to_mean_ because it depends entirely
on your definition of "a token".

V
 
O

olson_ord

Thanks Ivan. I have heard of RegEx's - but I have not used them
much. I think I would start with string tokenizer and if that becomes
too complicated I would attempt this method.
O.O.
 
P

pillbug

Thanks Ivan. I have heard of RegEx's - but I have not used them
much. I think I would start with string tokenizer and if that becomes
too complicated I would attempt this method.
O.O.

most implementations of scanf will handle this for you no problem.

if you don't like sscanf you can use regex.

if you don't like regex you can use lex (probably don't need a parser,
just the scanner should suffice).

if you don't like lex/yacc you can write your own scanner, the grammar
you have there isn't too complex.

if you don't want to write your own scanner you can..


actually that's sort of the problem people have with c++. your options
when dealing with any particular problem are quite literally, endless.

perl kind of herds you into trying to approach everything with regex's
and hashes, while vb/.net will get you to buy some prebuilt item. i'm
guessing thats why you came to c++ group, to find out what approach c++
lends itself most easily to. unfortunately, c++ lends itself to just
about every solution :p
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top