Pattern matching for a terminal emulator

Discussion in 'C Programming' started by Captain Dondo, Apr 19, 2007.

  1. I'm working on a terminal emulator for an embedded system.

    The key requirements are small size, code clarity, maintainability, and
    portability. We have machines that regularly see a service life of 30 years
    so it's not impossible that this code will be around that long.

    I'm trying to use termcap info to map incoming strings to display
    actions.

    In other words, I have an array that holds termcap info:

    termcap = {
    "ae=^O",
    "as=^N",
    "cm=\E[%i%d;%dH",
    "cs=\E[%i%d;%dr",
    ...
    }

    I'm trying to come up with a nice, clear algorithm for matching incoming
    characters to the patterns in the termcap array.

    So far I've struck out pretty much completely.

    For example, for the cm string, the terminal can see incoming strings like
    this:

    {esc}[3;4H
    {esc}[;4H
    {esc}[3;H

    I can't quite come up with a parser that can handle that, and which
    doesn't get all convoluted...

    I would love to have some suggestions on how to match those patterns.

    --Yan
    Captain Dondo, Apr 19, 2007
    #1
    1. Advertising

  2. Captain Dondo

    Bin Chen Guest

    On 4ÔÂ19ÈÕ, ÏÂÎç9ʱ42·Ö, Captain Dondo <> wrote:
    > I'm working on a terminal emulator for an embedded system.
    >
    > The key requirements are small size, code clarity, maintainability, and
    > portability. We have machines that regularly see a service life of 30 years
    > so it's not impossible that this code will be around that long.
    >
    > I'm trying to use termcap info to map incoming strings to display
    > actions.
    >
    > In other words, I have an array that holds termcap info:
    >
    > termcap = {
    > "ae=^O",
    > "as=^N",
    > "cm=\E[%i%d;%dH",
    > "cs=\E[%i%d;%dr",
    > ...
    > }
    >
    > I'm trying to come up with a nice, clear algorithm for matching incoming
    > characters to the patterns in the termcap array.
    >
    > So far I've struck out pretty much completely.
    >
    > For example, for the cm string, the terminal can see incoming strings like
    > this:
    >
    > {esc}[3;4H
    > {esc}[;4H
    > {esc}[3;H
    >
    > I can't quite come up with a parser that can handle that, and which
    > doesn't get all convoluted...
    >
    > I would love to have some suggestions on how to match those patterns.


    Have you considered the regular expression?
    Bin Chen, Apr 19, 2007
    #2
    1. Advertising

  3. V Thu, 19 Apr 2007 06:47:30 -0700, Bin Chen napsal(a):

    > On 4月19æ—¥, 下åˆ9æ—¶42分, Captain Dondo <> wrote:
    >> I'm working on a terminal emulator for an embedded system.
    >>
    >> The key requirements are small size, code clarity, maintainability, and
    >> portability. We have machines that regularly see a service life of 30 years
    >> so it's not impossible that this code will be around that long.
    >>
    >> I'm trying to use termcap info to map incoming strings to display
    >> actions.
    >>
    >> In other words, I have an array that holds termcap info:
    >>
    >> termcap = {
    >> "ae=^O",
    >> "as=^N",
    >> "cm=\E[%i%d;%dH",
    >> "cs=\E[%i%d;%dr",
    >> ...
    >> }
    >>
    >> I'm trying to come up with a nice, clear algorithm for matching incoming
    >> characters to the patterns in the termcap array.
    >>
    >> So far I've struck out pretty much completely.
    >>
    >> For example, for the cm string, the terminal can see incoming strings like
    >> this:
    >>
    >> {esc}[3;4H
    >> {esc}[;4H
    >> {esc}[3;H
    >>
    >> I can't quite come up with a parser that can handle that, and which
    >> doesn't get all convoluted...
    >>
    >> I would love to have some suggestions on how to match those patterns.

    >
    > Have you considered the regular expression?


    I have.... I'm just not sure how to apply it.

    The characters come in one at a time; I need to scan the list to see if
    the character matches the first character of any pattern. If it does,
    then I need to remember that.

    When the next character comes in, I need to scan the list to see if it
    matches the next character in the patterns, and so on.

    The pattern match ends when the match fails, at which point I need to see
    if any one pattern was matched fully.

    .....
    Captain Dondo, Apr 19, 2007
    #3
  4. Captain Dondo

    Richard Bos Guest

    Captain Dondo <> wrote:

    > I'm trying to use termcap info to map incoming strings to display
    > actions.
    >
    > In other words, I have an array that holds termcap info:
    >
    > termcap = {
    > "ae=^O",
    > "as=^N",
    > "cm=\E[%i%d;%dH",
    > "cs=\E[%i%d;%dr",
    > ...
    > }


    Except for the \E, which isn't a valid C escape character, and the ^O
    and ^N, which aren't C escape characters at all, those look very much
    like *scanf() strings to me.

    > For example, for the cm string, the terminal can see incoming strings like
    > this:
    >
    > {esc}[3;4H
    > {esc}[;4H
    > {esc}[3;H
    >
    > I can't quite come up with a parser that can handle that, and which
    > doesn't get all convoluted...


    Perhaps you don't need to? You might be able to get sscanf() to do the
    job for you.
    Translate the \E and ^Whatever in your strings to the corresponding real
    C characters (presumably your compiler allows \E as an extension, so you
    _might_ not need to do that bit, but your code does become non-portable
    if you rely on this; and ^Letter is presumably quite easy to do). Then,
    see if you can mangle either the %spec bits, or your call to sscanf(),
    so that it accepts your input.
    If that doesn't work, the easiest way to get your hands on a sscanf()-
    variation which can handle your termcap strings would be to start from
    normal sscanf() code and modify that. If Ganuck code would be
    acceptable, you could use that; if not, many textbooks have you write
    one as an exercise, and include simplified sample code, which may
    already be useful enough. IIRC K&R is one of these.

    Richard
    Richard Bos, Apr 19, 2007
    #4
  5. Captain Dondo

    Guest

    On 19 Apr, 14:42, Captain Dondo <> wrote:
    > I'm working on a terminal emulator for an embedded system.
    >
    > The key requirements are small size, code clarity, maintainability, and
    > portability. We have machines that regularly see a service life of 30 years
    > so it's not impossible that this code will be around that long.
    >
    > I'm trying to use termcap info to map incoming strings to display
    > actions.
    >
    > In other words, I have an array that holds termcap info:
    >
    > termcap = {
    > "ae=^O",
    > "as=^N",
    > "cm=\E[%i%d;%dH",
    > "cs=\E[%i%d;%dr",
    > ...
    > }
    >
    > I'm trying to come up with a nice, clear algorithm for matching incoming
    > characters to the patterns in the termcap array.
    >
    > So far I've struck out pretty much completely.
    >
    > For example, for the cm string, the terminal can see incoming strings like
    > this:
    >
    > {esc}[3;4H
    > {esc}[;4H
    > {esc}[3;H
    >
    > I can't quite come up with a parser that can handle that, and which
    > doesn't get all convoluted...


    I'd look at how others have done it - for example in tools like xterm,
    or putty...
    , Apr 19, 2007
    #5
  6. V Thu, 19 Apr 2007 14:18:42 +0000, Richard Bos napsal(a):

    > Captain Dondo <> wrote:
    >
    >> I'm trying to use termcap info to map incoming strings to display
    >> actions.
    >>
    >> In other words, I have an array that holds termcap info:
    >>
    >> termcap = {
    >> "ae=^O",
    >> "as=^N",
    >> "cm=\E[%i%d;%dH",
    >> "cs=\E[%i%d;%dr",
    >> ...
    >> }

    >
    > Except for the \E, which isn't a valid C escape character, and the ^O
    > and ^N, which aren't C escape characters at all, those look very much
    > like *scanf() strings to me.
    >
    >> For example, for the cm string, the terminal can see incoming strings like
    >> this:
    >>
    >> {esc}[3;4H
    >> {esc}[;4H
    >> {esc}[3;H
    >>
    >> I can't quite come up with a parser that can handle that, and which
    >> doesn't get all convoluted...

    >
    > Perhaps you don't need to? You might be able to get sscanf() to do the
    > job for you.
    > Translate the \E and ^Whatever in your strings to the corresponding real
    > C characters (presumably your compiler allows \E as an extension, so you
    > _might_ not need to do that bit, but your code does become non-portable
    > if you rely on this; and ^Letter is presumably quite easy to do). Then,
    > see if you can mangle either the %spec bits, or your call to sscanf(),
    > so that it accepts your input.
    > If that doesn't work, the easiest way to get your hands on a sscanf()-
    > variation which can handle your termcap strings would be to start from
    > normal sscanf() code and modify that. If Ganuck code would be
    > acceptable, you could use that; if not, many textbooks have you write
    > one as an exercise, and include simplified sample code, which may
    > already be useful enough. IIRC K&R is one of these.


    I've thought about running a pre-parser to replace those non-standard
    chars, but I never followed it through to using sscanf....

    I think I'll try that. There's some slight overhead - I'd have to create
    multiple entries for the variants that can omit numbers - but that's easy
    enough to do.

    Thanks!

    --Yan
    Captain Dondo, Apr 19, 2007
    #6
  7. Captain Dondo

    CptDondo Guest

    wrote:
    > On 19 Apr, 14:42, Captain Dondo <> wrote:


    >>
    >> I can't quite come up with a parser that can handle that, and which
    >> doesn't get all convoluted...

    >
    > I'd look at how others have done it - for example in tools like xterm,
    > or putty...
    >


    Most of those use the convoluted method... Basically the patterns are
    hard-coded into the program logic. I'm trying to do something where we
    can add functionality to the terminal (e.g. going from monochrome to
    color) without rewriting the logic....

    But I think I've hit on something thanks to the comments here. :)

    --Yan
    CptDondo, Apr 19, 2007
    #7
  8. On Thu, 19 Apr 2007 14:18:42 GMT, (Richard
    Bos) wrote:

    > Captain Dondo <> wrote:
    >
    > > I'm trying to use termcap info to map incoming strings to display
    > > actions.
    > >

    You need this generality only if you need someone other than (after)
    the developer(s), like a user or admin, to modify the emulation, or
    perhaps to select among multiple (many?) emulations. To just emulate a
    specific terminal/mode (or even family), I would hardcode at least the
    common structure for it (e.g. the X3.64 style of CSI, operands,
    somecols2-3 modifiers, cols4-5char terminator or cols6-7char + I don't
    recall) leaving the rest of the problem simpler.

    But if you want to (or must) stay with using termcap(ish) strings:

    > > In other words, I have an array that holds termcap info:

    <snip>
    > > "cm=\E[%i%d;%dH",

    <snip>
    > Except for the \E, which isn't a valid C escape character, and the ^O
    > and ^N, which aren't C escape characters at all, those look very much
    > like *scanf() strings to me.
    >
    > > For example, for the cm string, the terminal can see incoming strings like
    > > this:
    > >
    > > {esc}[3;4H
    > > {esc}[;4H
    > > {esc}[3;H
    > >
    > > I can't quite come up with a parser that can handle that, and which
    > > doesn't get all convoluted...

    >
    > Perhaps you don't need to? You might be able to get sscanf() to do the
    > job for you.
    > Translate the \E and ^Whatever in your strings to the corresponding real
    > C characters (presumably your compiler allows \E as an extension, so you
    > _might_ not need to do that bit, but your code does become non-portable
    > if you rely on this; and ^Letter is presumably quite easy to do). Then,
    > see if you can mangle either the %spec bits, or your call to sscanf(),
    > so that it accepts your input.


    But: *printf %i will generate only decimal digits, but *scanf %i will
    accept optional whitespace, optional sign and digits, and also allow
    0octal and 0xhex forms; *scanf %d and (even!) %u will allow the first
    two but not the third; these might result in false matches for
    improper input, if that is a concern. (Perhaps not, if the source from
    which the terminal emulator is receiving this data never makes
    mistakes, and the comms path never silently corrupts.) Conversely,
    *scanf %i or %d will fail, and cause scanning to stop, if the number
    is entirely omitted, as is legal for (most) X3.64/VT100 escape
    sequences, as the OP's example shows. Similarly %i (or %d) followed
    immediately by %d, applied to contiguous digits which they could have
    generated (or did) in *printf, will fail because the first specifier
    doesn't 'see' the boundary that (would have) occurred in generation
    and uses up (all) the data that should match the second specifier.

    Also a space in *scanf format will match any amount of whitespace in
    the data or none, not just a single space. If necessary, you can do
    the latter by changing to %1[ ], but since I think trying to use
    actual *scanf is not worth it anyway, see below, I wouldn't bother. I
    also don't recall offhand any terminal commands that use exactly a
    space as (required) data, although some (ADM3A, IIRC) do use a single
    character whose code STARTS at (ASCII) space.

    > If that doesn't work, the easiest way to get your hands on a sscanf()-
    > variation which can handle your termcap strings would be to start from
    > normal sscanf() code and modify that. If Ganuck code would be
    > acceptable, you could use that; if not, many textbooks have you write
    > one as an exercise, and include simplified sample code, which may
    > already be useful enough. IIRC K&R is one of these.
    >

    This problem appears to me enough different from and simpler than what
    *scanf must do that I would find it easier to start from scratch and
    build up, perhaps looking at *scanf for ideas in the unlikely event I
    had difficulty with a particular point.

    I would probably also try, if it doesn't make the code too complex
    (and difficult to maintain) or require more space than allowable,
    which could be any at all if this must run multithreaded and can't be
    provided with an instance pointer or similar, to make the scans
    'restartable' so that for each possible currently possible match I
    record the state and for the next received character only advance or
    fail each scan from that state.

    - formerly david.thompson1 || achar(64) || worldnet.att.net
    David Thompson, May 21, 2007
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jeff Silverman
    Replies:
    2
    Views:
    284
    Thomas Dickey
    Dec 29, 2005
  2. Eric

    Terminal emulator

    Eric, Feb 14, 2006, in forum: Perl Misc
    Replies:
    4
    Views:
    130
    Veli-Pekka Tätilä
    Feb 15, 2006
  3. Lakshmipathi.G
    Replies:
    0
    Views:
    82
    Lakshmipathi.G
    Aug 6, 2013
  4. dieter
    Replies:
    0
    Views:
    83
    dieter
    Aug 7, 2013
  5. Lakshmipathi.G
    Replies:
    0
    Views:
    87
    Lakshmipathi.G
    Aug 7, 2013
Loading...

Share This Page