getc and ungetc

Discussion in 'C Programming' started by Bill Cunningham, Nov 18, 2004.

  1. Would getc and ungetc be the best and most simple what to parse
    expressions for a parser?

    Bill
    Bill Cunningham, Nov 18, 2004
    #1
    1. Advertising

  2. Bill Cunningham

    Trent Buck Guest

    Quoth Bill Cunningham on or about 2004-11-18:
    > Would getc and ungetc be the best and most simple what to parse
    > expressions for a parser?


    ungetc can't be applied more than once `in a row' (i.e. sequentially).
    I suspect that makes for a rather unsuitable function for a parser.

    -t
    Trent Buck, Nov 18, 2004
    #2
    1. Advertising

  3. Bill Cunningham

    Chris Barts Guest

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Bill Cunningham wrote:
    | Would getc and ungetc be the best and most simple what to parse
    | expressions for a parser?
    |

    It's usually easier to do like K&R did in "The C Programming Language"
    (at the end, when they designed the RPN calculator): Read in a block of
    text, and then define your own getchar()/ungetchar() functions to push
    and pop characters on and off that buffer. You can read in more text (a
    block at a time, for efficiency and convenience) when your getchar()
    function tries to read beyond the end of your buffer.

    You need to write a bit more code yourself, but it's fairly trivial code
    and you can reuse it in any other parser you write.
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.2 (GNU/Linux)
    Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

    iD8DBQFBnXYsKxatjOtX+j0RAnY8AJ4gpW7DPFF/VgtMv7V2cdfzYINTawCeOcqi
    fpvs+q6HPDjLx6sVc1ozUXk=
    =A6oi
    -----END PGP SIGNATURE-----
    Chris Barts, Nov 19, 2004
    #3
  4. Bill Cunningham

    Malcolm Guest

    "Trent Buck" <> wrote
    >
    > > Would getc and ungetc be the best and most simple what to parse
    > > expressions for a parser?

    >
    > ungetc can't be applied more than once `in a row' (i.e. sequentially).
    > I suspect that makes for a rather unsuitable function for a parser.
    >

    It depends on your parser design.
    Most simple parsers used for things like computer languages divide the input
    stream into token, and then parse from left to right with one token of "look
    ahead". So if your tokens are single characters then getc() and ungetc() may
    be adequate.
    Of course if you have ambitions to build a natural language parser, or even
    some simple grammars with unusual characteristics, then this scheme won't
    work, and you will need some method of scanning up and down many tokens on
    input.
    Malcolm, Nov 20, 2004
    #4
  5. Bill Cunningham

    SM Ryan Guest

    # Of course if you have ambitions to build a natural language parser, or even
    # some simple grammars with unusual characteristics, then this scheme won't
    # work, and you will need some method of scanning up and down many tokens on
    # input.

    Such a parser runs the risk of exponential running time, while a tabular
    parser doesn't need to back up and has cubic time worst case. ungetc has
    at most marginal usability for some types of lexical scanners.

    --
    SM Ryan http://www.rawbw.com/~wyrmwif/
    There are subtler ways of badgering a witness.
    SM Ryan, Nov 20, 2004
    #5
  6. "SM Ryan" <> wrote in
    message news:...
    > # Of course if you have ambitions to build a natural language parser, or

    even
    > # some simple grammars with unusual characteristics, then this scheme

    won't
    > # work, and you will need some method of scanning up and down many tokens

    on
    > # input.
    >
    > Such a parser runs the risk of exponential running time, while a tabular
    > parser doesn't need to back up and has cubic time worst case. ungetc has
    > at most marginal usability for some types of lexical scanners.
    >

    I have k&r 2. What about redirecting fgetc to stdio and using it insteat
    getc?

    Bill
    Bill Cunningham, Nov 20, 2004
    #6
  7. "Bill Cunningham" <> wrote in message
    news:cbNnd.214$-media.com...
    >
    > "SM Ryan" <> wrote in
    > message news:...
    > > # Of course if you have ambitions to build a natural language parser, or

    > even
    > > # some simple grammars with unusual characteristics, then this scheme

    > won't
    > > # work, and you will need some method of scanning up and down many

    tokens
    > on
    > > # input.
    > >
    > > Such a parser runs the risk of exponential running time, while a tabular
    > > parser doesn't need to back up and has cubic time worst case. ungetc has
    > > at most marginal usability for some types of lexical scanners.
    > >

    > I have k&r 2. What about redirecting fgetc to stdio and using it insteat
    > getc?
    >
    > Bill


    I've heard of top down recursive parsers.
    Bill
    Bill Cunningham, Nov 20, 2004
    #7
  8. On Thu, 18 Nov 2004 20:21:41 UTC, "Bill Cunningham" <>
    wrote:

    > Would getc and ungetc be the best and most simple what to parse
    > expressions for a parser?


    Yes. But be aware of that ungetc can only unget ONE char at a time.

    On other hand you can simply by macro or function extend getc and
    ungetc to unget more than one char. Your UNGETC() would accept a
    number of chars to give back in your GETC() in reverse order. Your
    GETC() will give back the chars UNGETC had received before it gets new
    chars from the stream itself.

    A bit tricky is to ungent a char that you have never gotten - legally
    as there is nothing that forbids it and ungenc requires the char that
    is to unget. Be sure that you does NOT tries to unget EOF, this won't
    work. This can be very useful when you has a long list of keyword
    delemiters with same meaning to a long list of similar keywords.

    Your parser may convert keywords or keychars into tokens and save the
    tokens until all or a part of the input stream is readed and then work
    on the generated token, it may work in other ways you thinks it
    matches your requirements.

    getc() (in conjunktion with ungetc() ) gives you the strongest
    possible control over the stream you can ever need. When needed you
    can siply count the number of chars, words, lines... readed in as side
    effect, reset these counters as needed..... You avoids supervising of
    buffers - you does need one.

    Build your parser as state mashine and you can reuse the same code
    again and again beside the little number of statements you needs to
    handle a specific (sub)state. You gets high flexibility as you would
    easy extend the functionality of the parser by create a new
    (sub)state. Makes maintenance an easy work.

    --
    Tschau/Bye
    Herbert

    Visit http://www.ecomstation.de the home of german eComStation
    Herbert Rosenau, Nov 21, 2004
    #8
  9. Bill Cunningham

    Malcolm Guest

    "Bill Cunningham" <> wrote
    >
    > I've heard of top down recursive parsers.
    >

    What is important in terms of the tokenizer is that they be left-right with
    only one token of lookahead. Most practical parsers are in this class.

    Unfortunately, using getc / ungetc means that the tokens are constrained to
    be single characters. This may be Ok for a very simple application, but if
    the tokens are naturally several characters long it will be a nuisance. You
    can do it, for instance if you have a mathematical function tan() then you
    could build it up from the latters 't' 'a' and 'n' rather than reading it in
    as a single token TAN. But it is generally a lot easier to do the lexical
    analysis before the parse proper.
    Malcolm, Nov 22, 2004
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. TTroy
    Replies:
    11
    Views:
    729
    Dik T. Winter
    Mar 14, 2005
  2. Argento

    scanf(), ungetc() behaviour.

    Argento, Mar 3, 2006, in forum: C Programming
    Replies:
    62
    Views:
    1,533
  3. av

    ungetc

    av, Jul 15, 2006, in forum: C Programming
    Replies:
    15
    Views:
    856
  4. getc() and EOF

    , Jan 26, 2007, in forum: C Programming
    Replies:
    5
    Views:
    467
    santosh
    Jan 27, 2007
  5. David Masover
    Replies:
    0
    Views:
    99
    David Masover
    Jun 6, 2008
Loading...

Share This Page