String Generation using Mask Parsing

J

James Arnold

Hello,

I am new to C and I am trying to write a few small applications to get
some hands-on practise! I am trying to write a random string
generator, based on a masked input. For example, given the string:
"AAANN" it would return a string containing 3 alphanumeric characters
followed by 3 digits. This part I have managed:)

I would now like to add some complexity to this, such as repetitions
and grouping. For example, I'd like to have masks similar to:
"AAN*10", which would return two alphanumeric chars followed by a
sequence of 10 numeric characters. However, the characters could be
grouped, such as: "A(AN)*10", which would now return an alphanumeric
character followed by a sequence of ten alternating alphanumeric/
numeric characters.

I'm not really sure where to start with this next step as I have
minimal experience. Any pointers in the right direction, or sample
code, would be appreciated.

Thanks in advance,
James.
 
C

CBFalconer

James said:
I am new to C and I am trying to write a few small applications
to get some hands-on practise! I am trying to write a random
string generator, based on a masked input. For example, given
the string: "AAANN" it would return a string containing 3
alphanumeric characters followed by 3 digits. This part I have
managed:)

I would now like to add some complexity to this, such as
repetitions and grouping. For example, I'd like to have masks
similar to: "AAN*10", which would return two alphanumeric chars
followed by a sequence of 10 numeric characters. However, the
characters could be grouped, such as: "A(AN)*10", which would
now return an alphanumeric character followed by a sequence of
ten alternating alphanumeric/ numeric characters.

I'm not really sure where to start with this next step as I
have minimal experience. Any pointers in the right direction,
or sample code, would be appreciated.

I think a study of regular expressions, as implemented in Unix and
Linux, would be instructive here.
 
J

James Arnold

I think a study of regular expressions, as implemented in Unix and
Linux, would be instructive here.

I am already familiar with regluar expressions, but I was under the
impression they can't be used to match braces when nested? So if for
example I wanted to do A(A(N)*10)*5, regular expressions wouldn't be
appropriate?
It turns into a grammar parsing problem.

I have been looking at Lex/Yacc (well, Flex/Bison) and written a
grammar to handle what I would like. I've compiled it and managed to
get it to output the detected tokens, but it definitely seemed to be
overkill for such a small program. Currently I'm just iterating
through a string and switch()'ing on each character, which covers most
of the functionality I'd like. I figured there must be a way of
tracking nested depth and calling the parse routine recursively for
each matched group?
After that, my suggestion would be to divide the program into two parts.
The first one would input a string like "AA(AN){10}" and expand it to
something like "AAANANANANANANANANANAN", which is really what you're
looking at.

This is also something I had considered, but I want to be able to use
a range for a specified repetition, e.g. repeat between 5 to 10 times.
This is fine, but if I want to generate 50 different outcomes the full
mask would need to be expanded each time, rather than just the
repeated bit. Surely that is not going to be very efficient? :)

Thanks for the replies!
 
B

Ben Bacarisse

[You or your news reader is not adding attribution lines. This is not
a good idea and you should have a look to see if you can fix it.]

James Arnold said:
I am already familiar with regluar expressions, but I was under the
impression they can't be used to match braces when nested? So if for
example I wanted to do A(A(N)*10)*5, regular expressions wouldn't be
appropriate?

I think the suggestion was only that you could look at REs for how to
write your masks. You are right that REs won't be any good as way of
implementing this. For example, some REs use (abc){3,6} for 3 to 6
repeats of "abc" and you might one day want things like [aeiou] rather
than just A and N indicators.
I have been looking at Lex/Yacc (well, Flex/Bison) and written a
grammar to handle what I would like. I've compiled it and managed to
get it to output the detected tokens, but it definitely seemed to be
overkill for such a small program.

Agreed. You have at most brackets and a couple of operators. No
need for lex and yacc.
Currently I'm just iterating
through a string and switch()'ing on each character, which covers most
of the functionality I'd like. I figured there must be a way of
tracking nested depth and calling the parse routine recursively for
each matched group?

That's roughly what I'd do. In fact, I'd probably make what you call
the parse routine do the actual generation as well. The parsing will
be so simple that actually storing the parse in some form in probably
not needed.
This is also something I had considered, but I want to be able to use
a range for a specified repetition, e.g. repeat between 5 to 10 times.
This is fine, but if I want to generate 50 different outcomes the full
mask would need to be expanded each time, rather than just the
repeated bit. Surely that is not going to be very efficient? :)

I agree. If you parse and generate on the fly, there is no need for
ether an intermediate mask or a stored parse tree.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top