finite state automaton

Discussion in 'Java' started by Roedy Green, Dec 23, 2005.

  1. Roedy Green

    Roedy Green Guest

    Consider a simple finite state automaton to parse property files.

    They look like this:
    # a comment
    keyword=value

    I want to categorise each fragment of text as either comment, keyword
    or value. Now throw in a complication. Inside any of those three
    things might be literals of the form \uffff

    I find myself creating all kinds of rinky dink mechanisms to handle
    the literals. I wondered if there is a clean way to do it.

    There are two problems.


    1) It is clumsy to invent three literal states one for in comment, one
    inkeyword and one invalue just so it can remember what it was doing.
    Yet whole idea of a finite state automaton in that the memory of the
    system is supposed to be encapsulated in the state.

    2) you leave the literal state based on a count, not the presence of
    some delimiter. I could create 5 states to mark progress down the
    literal, but this seems a bit nuts.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
    Roedy Green, Dec 23, 2005
    #1
    1. Advertising

  2. Roedy Green wrote:
    > Consider a simple finite state automaton to parse property files.
    >
    > They look like this:
    > # a comment
    > keyword=value
    >
    > I want to categorise each fragment of text as either comment, keyword
    > or value. Now throw in a complication. Inside any of those three
    > things might be literals of the form \uffff
    >
    > I find myself creating all kinds of rinky dink mechanisms to handle
    > the literals. I wondered if there is a clean way to do it.
    >
    > There are two problems.
    >
    >
    > 1) It is clumsy to invent three literal states one for in comment, one
    > inkeyword and one invalue just so it can remember what it was doing.
    > Yet whole idea of a finite state automaton in that the memory of the
    > system is supposed to be encapsulated in the state.
    >
    > 2) you leave the literal state based on a count, not the presence of
    > some delimiter. I could create 5 states to mark progress down the
    > literal, but this seems a bit nuts.


    Roedy,

    Why not run the property file through a pre-processor to handle escape
    sequences, similar to what javac does? After all, the standard property
    file format supports \\ and \ followed by a line break for line
    continuation and who knows what else....

    HTH,
    Ray

    --
    XML is the programmer's duct tape.
    Raymond DeCampo, Dec 31, 2005
    #2
    1. Advertising

  3. Roedy Green

    Stefan Ram Guest

    Stefan Ram, Dec 31, 2005
    #3
  4. Roedy Green

    Stefan Ram Guest

    Raymond DeCampo <> was quoting:
    >>I want to categorise each fragment of text as either comment, keyword
    >>or value. Now throw in a complication. Inside any of those three
    >>things might be literals of the form \uffff
    >>I find myself creating all kinds of rinky dink mechanisms to handle
    >>the literals. I wondered if there is a clean way to do it.


    The clean way is a scanner with two layers:

    The first layer converts each \u-Sequence to a code point.

    The second layer then reads code points supplied by the first
    layer and does not have to care about the \u-sequences
    anymore.
    Stefan Ram, Dec 31, 2005
    #4
  5. Stefan Ram wrote:
    > Raymond DeCampo <> was quoting:
    >
    >>>I want to categorise each fragment of text as either comment, keyword
    >>>or value. Now throw in a complication. Inside any of those three
    >>>things might be literals of the form \uffff
    >>>I find myself creating all kinds of rinky dink mechanisms to handle
    >>>the literals. I wondered if there is a clean way to do it.

    >
    >
    > The clean way is a scanner with two layers:
    >
    > The first layer converts each \u-Sequence to a code point.
    >
    > The second layer then reads code points supplied by the first
    > layer and does not have to care about the \u-sequences
    > anymore.
    >


    Gee, thanks for replying to my post, removing my contribution, removing
    the OP's name making it seem as if I wrote what the OP did to the casual
    observer, and then re-stating my idea. That was really helpful.

    Ray

    --
    XML is the programmer's duct tape.
    Raymond DeCampo, Jan 1, 2006
    #5
  6. Roedy Green

    Stefan Ram Guest

    Roedy Green <>
    might have written, quoted or indirectly quoted something like:
    >I want to categorise each fragment of text as either comment, keyword
    >or value. Now throw in a complication. Inside any of those three
    >things might be literals of the form \uffff
    >I find myself creating all kinds of rinky dink mechanisms to handle
    >the literals. I wondered if there is a clean way to do it.


    The clean way is a scanner with two layers:

    The first layer converts each \u-Sequence to a code point.

    The second layer then reads code points supplied by the first
    layer and does not have to care about the \u-sequences
    anymore.
    Stefan Ram, Jan 1, 2006
    #6
  7. Roedy Green

    Roedy Green Guest

    On Sat, 31 Dec 2005 20:46:42 GMT, Raymond DeCampo
    <> wrote, quoted or indirectly quoted someone who
    said :

    >Why not run the property file through a pre-processor to handle escape
    >sequences, similar to what javac does? After all, the standard property
    >file format supports \\ and \ followed by a line break for line
    >continuation and who knows what else....


    I considered that, but I wanted to display the file literally. If the
    file contained embedded \uxxx characters in binary, I wanted to
    display them differently from ones properly encoded with \uxxxx

    I have since solved the problem with kludge, a lookahead that handles
    the entire sequence as if it were a single char from the overall state
    machine point of view.

    You can see it working at http://mindprod.com/jgloss/properties.html


    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
    Roedy Green, Jan 2, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. deejayfred
    Replies:
    0
    Views:
    541
    deejayfred
    Oct 2, 2003
  2. Sidney Cadot
    Replies:
    0
    Views:
    2,356
    Sidney Cadot
    Apr 18, 2004
  3. SomeDude
    Replies:
    3
    Views:
    3,142
    arant
    Aug 14, 2006
  4. kpp9c
    Replies:
    6
    Views:
    386
    duncan smith
    Sep 23, 2009
  5. Clint Olsen
    Replies:
    2
    Views:
    156
    Clint Olsen
    Jun 29, 2004
Loading...

Share This Page