Convert string with control character in caret notation to realcontrol character string.

Discussion in 'C Programming' started by Bart Vandewoestyne, Sep 25, 2012.

  1. I am working my way through the book 'Modern Compiler Implementation in C' and am now working on the lexer from Chapter 2:

    https://github.com/BartVandewoestyn...Compiler_Implementation_in_C/chap02/tiger.lex

    Part of the exercise is that strings with escape sequences and control characters in caret notation must be supported. Between lines 153 and 163, i make sure my strings support escape sequences like \ddd with ASCII code ddd (3 decimal digits). Between lines 187 and 194 I try to do the same for control characters in caret notation. I haven't succeeded to put the value ofthe control character in the result variable yet. I wonder if it is doable with a single sscanf line like for the \ddd case...

    What would be the most elegant and standard-conforming way to grab the value of the matched control character?

    Regards,
    Bart
    Bart Vandewoestyne, Sep 25, 2012
    #1
    1. Advertising

  2. Bart Vandewoestyne

    BartC Guest

    BartC, Sep 25, 2012
    #2
    1. Advertising

  3. Bart Vandewoestyne

    James Kuyper Guest

    On 09/25/2012 05:22 AM, Bart Vandewoestyne wrote:
    > I am working my way through the book 'Modern Compiler Implementation in C' and am now working on the lexer from Chapter 2:
    >
    > https://github.com/BartVandewoestyn...Compiler_Implementation_in_C/chap02/tiger.lex
    >
    > Part of the exercise is that strings with escape sequences and control characters in caret notation must be supported. Between lines 153 and 163, i make sure my strings support escape sequences like \ddd with ASCII code ddd (3 decimal digits). Between lines 187 and 194 I try to do the same for control characters in caret notation. I haven't succeeded to put the value of the control character in the result variable yet. I wonder if it is doable with a single sscanf line like for the \ddd case...
    >
    > What would be the most elegant and standard-conforming way to grab the value of the matched control character?


    The C standard provides ways of specifying only a few control characters
    (5.2.2p2):

    > Alphabetic escape sequences representing nongraphic characters in the
    > execution character set are intended to produce actions on display
    > devices as follows:
    >
    > \a (alert) Produces an audible or visible alert without changing the
    > active position.
    > \b (backspace) Moves the active position to the previous position on
    > the current line. If the active position is at the initial position
    > of a line, the behavior of the display device is unspecified.
    > \f ( form feed) Moves the active position to the initial position at
    > the start of the next logical page.
    > \n (new line) Moves the active position to the initial position of the
    > next line.
    > \r (carriage return) Moves the active position to the initial position
    > of the current line.
    > \t (horizontal tab) Moves the active position to the next horizontal
    > tabulation position on the current line. If the active position is
    > at or past the last defined horizontal tabulation position, the
    > behavior of the display device is unspecified.
    > \v (vertical tab) Moves the active position to the initial position of
    > the next vertical tabulation position. If the active position is at
    > or past the last defined vertical tabulation position, the
    > behavior of the display device is unspecified.


    Note that the numerical values of these escape sequences are not
    specified by the standard, only the intended behavior if they are sent
    to the display device. The standard goes out of it's way to avoid
    specifying anything more than it absolutely must about the character
    sets supported by a C implementation, or the encodings used for those
    characters sets.

    If you need to refer to any control characters that don't correspond to
    one of the above escape sequences, there's no solution that's portable
    to all implementations of C. If you're willing to restrict the
    portability of your code to systems using a particular encoding for the
    control characters you're interested in, then you can use the octal
    escape sequences to specify them explicitly.
    --
    James Kuyper
    James Kuyper, Sep 25, 2012
    #3
  4. Bart Vandewoestyne

    James Kuyper Guest

    On 09/25/2012 06:19 AM, BartC wrote:
    >
    >
    > "Bart Vandewoestyne" <> wrote in message
    > news:...
    >> I am working my way through the book 'Modern Compiler Implementation in C'
    >> and am now working on the lexer from Chapter 2:
    >>
    >> https://github.com/BartVandewoestyn...Compiler_Implementation_in_C/chap02/tiger.lex

    >
    > What language is that in?


    It looks like 'lex', or perhaps 'flex', which would be consistent with
    the extension on the file name. Several key parts of a lex file are
    transferred, almost verbatim, to the output file which is ordinary
    (though rather convoluted and unreadable) C code. Since his question is
    about how to represent control characters in that C code, the question
    is topical, but it requires a knowledge of lex to realize that fact.
    --
    James Kuyper
    James Kuyper, Sep 25, 2012
    #4
  5. Re: Convert string with control character in caret notation to real control character string.

    Bart Vandewoestyne <> writes:

    > I am working my way through the book 'Modern Compiler Implementation
    > in C' and am now working on the lexer from Chapter 2:
    >
    > https://github.com/BartVandewoestyn...Compiler_Implementation_in_C/chap02/tiger.lex
    >
    > Part of the exercise is that strings with escape sequences and control
    > characters in caret notation must be supported. Between lines 153 and
    > 163, i make sure my strings support escape sequences like \ddd with
    > ASCII code ddd (3 decimal digits). Between lines 187 and 194 I try to
    > do the same for control characters in caret notation. I haven't
    > succeeded to put the value of the control character in the result
    > variable yet. I wonder if it is doable with a single sscanf line like
    > for the \ddd case...


    No, that's "trying too hard". It's simpler than that.

    > What would be the most elegant and standard-conforming way to grab the
    > value of the matched control character?


    result = yytext[2] - '@';

    This assumes a lot about the character set, but that's fine in this case
    because the notation itself ('^A' etc.) is tied to the character set.

    --
    Ben.
    Ben Bacarisse, Sep 25, 2012
    #5
  6. Re: Convert string with control character in caret notation to real control character string.

    "BartC" <> writes:

    > "Bart Vandewoestyne" <> wrote in message
    > news:...
    >> I am working my way through the book 'Modern Compiler Implementation
    >> in C' and am now working on the lexer from Chapter 2:
    >>
    >> https://github.com/BartVandewoestyn...Compiler_Implementation_in_C/chap02/tiger.lex

    >
    > What language is that in?


    It's lex, but the part inside the outermost {}s is C (and the question
    was indeed a C question).

    --
    Ben.
    Ben Bacarisse, Sep 25, 2012
    #6
  7. On Tuesday, September 25, 2012 1:03:07 PM UTC+2, Ben Bacarisse wrote:
    >
    >> What would be the most elegant and standard-conforming way to grab the
    >> value of the matched control character?


    Before I looked at Ben's post, the solution that I came up with was:

    char key;
    sscanf(yytext, "^%c", &key);
    *string_buf_ptr++ = key - 64;

    But Ben's solution reads:

    > result = yytext[2] - '@';


    which I corrected to

    result = yytext[1] - '@';

    ;-)

    and this is indeed a lot shorter and more elegant! I love it when I can make my code more readable with shorter statements! :)

    Regards,
    Bart
    Bart Vandewoestyne, Sep 25, 2012
    #7
  8. Re: Convert string with control character in caret notation to real control character string.

    Bart Vandewoestyne <> writes:

    > On Tuesday, September 25, 2012 1:03:07 PM UTC+2, Ben Bacarisse wrote:
    >>
    >>> What would be the most elegant and standard-conforming way to grab the
    >>> value of the matched control character?

    >
    > Before I looked at Ben's post, the solution that I came up with was:
    >
    > char key;
    > sscanf(yytext, "^%c", &key);
    > *string_buf_ptr++ = key - 64;
    >
    > But Ben's solution reads:
    >
    >> result = yytext[2] - '@';

    >
    > which I corrected to
    >
    > result = yytext[1] - '@';
    >
    > ;-)


    Yes, I am sure you are right about the 1 but here's why I wrote 2: The
    code you had when i looked was: sscanf(yytext + 1, "^%c", &result); so I
    assumed that the ^ was in yytext[1] and the character to be adjusted
    would therefore be in yytext[2]. :)

    <snip>
    --
    Ben.
    Ben Bacarisse, Sep 25, 2012
    #8
  9. On Tuesday, September 25, 2012 2:14:10 PM UTC+2, Ben Bacarisse wrote:
    >
    > Yes, I am sure you are right about the 1 but here's why I wrote 2: The
    > code you had when i looked was: sscanf(yytext + 1, "^%c", &result); so I
    > assumed that the ^ was in yytext[1] and the character to be adjusted
    > would therefore be in yytext[2]. :)


    I forgive you ;-)

    Regards,
    Bart
    Bart Vandewoestyne, Sep 25, 2012
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Grey Squirrel

    Hungarian Notation Vs. Pascal Notation?

    Grey Squirrel, Mar 19, 2007, in forum: ASP .Net
    Replies:
    6
    Views:
    1,258
    Steve C. Orr [MCSD, MVP, CSM, ASP Insider]
    Mar 21, 2007
  2. Tameem
    Replies:
    454
    Views:
    11,492
  3. Seebs
    Replies:
    7
    Views:
    308
    Seebs
    Nov 12, 2010
  4. Robert Mark Bram

    Dot notation V Bracket notation

    Robert Mark Bram, Jul 4, 2003, in forum: Javascript
    Replies:
    3
    Views:
    449
    Robert Mark Bram
    Jul 5, 2003
  5. Noah
    Replies:
    3
    Views:
    125
    Marek Mand
    Mar 5, 2004
Loading...

Share This Page