finite state automaton

R

Roedy Green

Consider a simple finite state automaton to parse property files.

They look like this:
# a comment
keyword=value

I want to categorise each fragment of text as either comment, keyword
or value. Now throw in a complication. Inside any of those three
things might be literals of the form \uffff

I find myself creating all kinds of rinky dink mechanisms to handle
the literals. I wondered if there is a clean way to do it.

There are two problems.


1) It is clumsy to invent three literal states one for in comment, one
inkeyword and one invalue just so it can remember what it was doing.
Yet whole idea of a finite state automaton in that the memory of the
system is supposed to be encapsulated in the state.

2) you leave the literal state based on a count, not the presence of
some delimiter. I could create 5 states to mark progress down the
literal, but this seems a bit nuts.
 
R

Raymond DeCampo

Roedy said:
Consider a simple finite state automaton to parse property files.

They look like this:
# a comment
keyword=value

I want to categorise each fragment of text as either comment, keyword
or value. Now throw in a complication. Inside any of those three
things might be literals of the form \uffff

I find myself creating all kinds of rinky dink mechanisms to handle
the literals. I wondered if there is a clean way to do it.

There are two problems.


1) It is clumsy to invent three literal states one for in comment, one
inkeyword and one invalue just so it can remember what it was doing.
Yet whole idea of a finite state automaton in that the memory of the
system is supposed to be encapsulated in the state.

2) you leave the literal state based on a count, not the presence of
some delimiter. I could create 5 states to mark progress down the
literal, but this seems a bit nuts.

Roedy,

Why not run the property file through a pre-processor to handle escape
sequences, similar to what javac does? After all, the standard property
file format supports \\ and \ followed by a line break for line
continuation and who knows what else....

HTH,
Ray
 
S

Stefan Ram

The clean way is a scanner with two layers:

The first layer converts each \u-Sequence to a code point.

The second layer then reads code points supplied by the first
layer and does not have to care about the \u-sequences
anymore.
 
R

Raymond DeCampo

Stefan said:
The clean way is a scanner with two layers:

The first layer converts each \u-Sequence to a code point.

The second layer then reads code points supplied by the first
layer and does not have to care about the \u-sequences
anymore.

Gee, thanks for replying to my post, removing my contribution, removing
the OP's name making it seem as if I wrote what the OP did to the casual
observer, and then re-stating my idea. That was really helpful.

Ray
 
S

Stefan Ram

Roedy Green said:
I want to categorise each fragment of text as either comment, keyword
or value. Now throw in a complication. Inside any of those three
things might be literals of the form \uffff
I find myself creating all kinds of rinky dink mechanisms to handle
the literals. I wondered if there is a clean way to do it.

The clean way is a scanner with two layers:

The first layer converts each \u-Sequence to a code point.

The second layer then reads code points supplied by the first
layer and does not have to care about the \u-sequences
anymore.
 
R

Roedy Green

Why not run the property file through a pre-processor to handle escape
sequences, similar to what javac does? After all, the standard property
file format supports \\ and \ followed by a line break for line
continuation and who knows what else....

I considered that, but I wanted to display the file literally. If the
file contained embedded \uxxx characters in binary, I wanted to
display them differently from ones properly encoded with \uxxxx

I have since solved the problem with kludge, a lookahead that handles
the entire sequence as if it were a single char from the overall state
machine point of view.

You can see it working at http://mindprod.com/jgloss/properties.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top