A simple parser

Discussion in 'C Programming' started by jacob navia, Oct 16, 2006.

  1. jacob navia

    jacob navia Guest

    Summary:

    I have changed (as proposed by Chuck) the code to use isalpha()
    instead of (c>='a' && c <= 'z') etc.

    I agree that EBCDIC exists :)

    I eliminated the goto statement, obviously it is better in a tutorial
    to stick to structured programming whenever possible...

    Now the code is around 1000 bytes long. Not bad for what the code
    is doing. But I was somehow disappointed that nobody questioned
    the algorithm for finding all functions in a C file without
    having a full blown C parser. Somehow, it is an important utility,
    and it is very small and simple.

    Thanks to all people that participated in the discussion.

    jacob
    jacob navia, Oct 16, 2006
    #1
    1. Advertising

  2. jacob navia <> writes:
    > Summary:
    >
    > I have changed (as proposed by Chuck) the code to use isalpha()
    > instead of (c>='a' && c <= 'z') etc.
    >
    > I agree that EBCDIC exists :)
    >
    > I eliminated the goto statement, obviously it is better in a tutorial
    > to stick to structured programming whenever possible...
    >
    > Now the code is around 1000 bytes long. Not bad for what the code
    > is doing. But I was somehow disappointed that nobody questioned
    > the algorithm for finding all functions in a C file without
    > having a full blown C parser. Somehow, it is an important utility,
    > and it is very small and simple.
    >
    > Thanks to all people that participated in the discussion.


    You're welcome.

    Perhaps if you posted the revised code, you'd get more substantial
    comments.

    I haven't yet looked at your code in any detail, but it occurs to me
    that you can't *reliably* find all functions in a C file without
    having a full C parser *and* preprocessor. I worked on something
    related some years ago (an application that searched for struct and
    union declarations in header files) and I had to (a) use a full C
    parser, with special treatment for typedefs, and (b) feed my tool the
    output of the C preprocessor (the alternative would have been to
    re-implement the preprocessor).

    A more heuristic approach that catches *most* function declarations in
    real code might very well be good enough for most purposes, but it's
    important to note the limitations. (I don't remember whether you did
    so in your original post.)

    One thing to watch out for is whether your tool works correctly on
    machine-generated C source code. I suspect you may be making some
    assumptions about the code layout, aspects that are ignored by the
    compiler and that are likely to be ignored by anything that generates
    C source code (such as lex, yacc, or a frontend for another language).

    If your tool isn't intended to work on such code, that's fine, but you
    should explicitly note that fact.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Oct 16, 2006
    #2
    1. Advertising

  3. jacob navia

    jacob navia Guest

    Keith Thompson wrote:
    > Perhaps if you posted the revised code, you'd get more substantial
    > comments.


    OK. Here it is. Blanks instead of tabs.
    ---------------------------------------------------------------cut here
    /* A simple scanner that will take a file of C source code and
    print the names of all functions therein, in the following format:
    "Function XXXX found line dddd .... ddddd"
    Algorithm. It scans for a terminating parentheses and an immediately
    following
    opening brace. Comments can appear between the closing paren and the
    opening braces, but no other characters besides white space. Functions
    mst have
    the correct prototype, K & R syntax is not supported.
    */
    #include <stdio.h>
    #include <ctype.h>
    // Longest Identifier we support. Sorry Java guys..
    #define MAXID 1024
    // Buffer for remembering the function name
    static char IdBuffer[MAXID];
    // Line number counter. We start at line 1
    static int line = 1;

    // This function reads a character and if it is \n it bumps
    // the line counter.
    static int Fgetc(FILE *f)
    {
    int c = fgetc(f);
    if (c == '\n')
    line++;
    return c;
    }

    // Return 1 if the character is a legal C identifier
    // character, zero if not. The parameter "Start"
    // means if an identifier START character (excluding
    // numbers) is desired.
    static int IsIdentifier(int c,int start)
    {
    if (c == '_' || isalpha(c))
    return 1;
    if (start == 0 && isdigit(c))
    return 1;
    return 0;
    }
    // Just prints the function name
    static int PrintFunction(FILE *f)
    {
    printf("Function %s: line %d ...",IdBuffer,line);
    return Fgetc(f);
    }

    // Reads a global identifier into our name buffer
    static int ReadId(char c,FILE *f)
    {
    int i = 1;
    IdBuffer[0] = c;
    while (i < MAXID-1) {
    c = Fgetc(f);
    if (c != EOF) {
    if (IsIdentifier(c,0))
    IdBuffer[i++] = c;
    else break;
    }
    else break;
    }
    IdBuffer = 0;
    return c;
    }

    // Skips strings
    static int ParseString(FILE *f)
    {
    int c = Fgetc(f);
    while (c != EOF && c != '"') {
    if (c == '\\')
    c = Fgetc(f);
    if (c != EOF)
    c = Fgetc(f);
    }
    if (c == '"')
    c = Fgetc(f);
    return c;
    }
    // Skips comments
    static int ParseComment(FILE *f)
    {
    int c = Fgetc(f);

    while (1) {
    while (c != '*') {
    c = Fgetc(f);
    if (c == EOF)
    return EOF;
    }
    c = Fgetc(f);
    if (c == '/')
    break;
    }
    return Fgetc(f);
    }

    // Skips // comments
    static int ParseCppComment(FILE *f)
    {
    int c = Fgetc(f);
    while (c != EOF && c != '\n') {
    if (c == '\\')
    c = Fgetc(f);
    if (c != EOF)
    c = Fgetc(f);
    }
    if (c == '\n')
    c = Fgetc(f);
    return c;
    }

    // Checks if a comment is followed after a '/' char
    static int CheckComment(int c,FILE *f)
    {
    if (c == '/') {
    c = Fgetc(f);
    if (c == '*')
    c = ParseComment(f);
    else if (c == '/')
    c = ParseCppComment(f);
    }
    return c;
    }

    // Skips white space and comments
    static int SkipWhiteSpace(int c,FILE *f)
    {
    c = CheckComment(c,f);
    do {
    if (c <= ' ')
    c = Fgetc(f);
    c = CheckComment(c,f);
    }
    while (c <= ' ');
    return c;
    }

    // Skips chars between simple quotes
    static int ParseQuotedChar(FILE *f)
    {
    int c = Fgetc(f);
    while (c != EOF && c != '\'') {
    if (c == '\\')
    c = Fgetc(f);
    if (c != EOF)
    c = Fgetc(f);
    }
    if (c == '\'')
    c = Fgetc(f);
    return c;
    }


    int main(int argc,char *argv[]){
    if (argc == 1) {
    printf("Usage: %s <file.c>\n",argv[0]);
    return 1;
    }
    FILE *f = fopen(argv[1],"r");
    if (f == NULL) {
    printf("Can't find %s\n",argv[1]);
    return 2;
    }
    int c = Fgetc(f);
    int level = 0;
    int parenlevel = 0;
    int inFunction = 0;
    while (c != EOF) {
    // Note that each of the switches must advance the character
    // read so that we avoid an infinite loop.
    switch (c) {
    case '"':
    c = ParseString(f);
    break;
    case '/':
    c = CheckComment(c,f);
    break;
    case '\'':
    c = ParseQuotedChar(f);
    break;
    case '{':
    level++;
    c = Fgetc(f);
    break;
    case '}':
    if (level == 1 && inFunction) {
    printf(" %d\n",line);
    inFunction = 0;
    }
    if (level > 0)
    level--;
    c = Fgetc(f);
    break;
    case '(':
    parenlevel++;
    c = Fgetc(f);
    break;
    case ')':
    if (parenlevel > 0)
    parenlevel--;
    c = Fgetc(f);
    if ((parenlevel|level) == 0) {
    c = SkipWhiteSpace(c,f);
    if (c == '{') {
    level++;
    inFunction = 1;
    c = PrintFunction(f);
    }
    }
    break;
    default:
    if ((level | parenlevel) == 0 && IsIdentifier(c,1))
    c = ReadId(c,f);
    else c = Fgetc(f);
    }
    }
    fclose(f);
    return 0;
    }
    jacob navia, Oct 16, 2006
    #3
  4. jacob navia <> writes:
    > Keith Thompson wrote:
    >> Perhaps if you posted the revised code, you'd get more substantial
    >> comments.

    >
    > OK. Here it is. Blanks instead of tabs.
    > ---------------------------------------------------------------cut here

    [snip]

    You're still using "//" comments, and mixing declarations and
    statements.

    I can see the use for the latter, but the reasons for avoiding "//"
    comments on Usenet, even assuming they're 100% legal and portable,
    have been explained here many times.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Oct 16, 2006
    #4
  5. jacob navia

    CBFalconer Guest

    Keith Thompson wrote:
    > jacob navia <> writes:
    >> Keith Thompson wrote:
    >>
    >>> Perhaps if you posted the revised code, you'd get more substantial
    >>> comments.

    >>
    >> OK. Here it is. Blanks instead of tabs.
    >> -------------------------------------------------cut here

    > [snip]
    >
    > You're still using "//" comments, and mixing declarations and
    > statements.
    >
    > I can see the use for the latter, but the reasons for avoiding "//"
    > comments on Usenet, even assuming they're 100% legal and portable,
    > have been explained here many times.


    For those interested here is Jacobs code revised for portability to
    C90. It will even compile under C99. I also passed it through
    indent. Now you should be able to compile and thrash.

    Aside to Jacob - see how easy it is to be semi-portable. Still are
    portability problems, such as return 1 in main and the use of
    "c <= ' '". I hope this isn't the lexer in lcc.

    /*
    Subject: Re: A simple parser
    Date: Tue, 17 Oct 2006 00:08:19 +0200
    From: jacob navia <>
    Newsgroups: comp.lang.c

    Keith Thompson wrote:
    > Perhaps if you posted the revised code, you'd get more
    > substantial comments.


    OK. Here it is. Blanks instead of tabs.
    ------------------------cut here */

    /* A simple scanner that will take a file of C source code and
    print the names of all functions therein, in the following
    format:
    "Function XXXX found line dddd .... ddddd"
    Algorithm. It scans for a terminating parentheses and an
    immediately following opening brace. Comments can appear
    between the closing paren and the opening braces, but no
    other characters besides white space. Functions must have
    the correct prototype, K & R syntax is not supported.
    */
    #include <stdio.h>
    #include <ctype.h>

    /* Longest Identifier we support. Sorry Java guys. */
    #define MAXID 1024

    /* Buffer for remembering the function name */
    static char IdBuffer[MAXID];

    /* Line number counter. We start at line 1 */
    static int line = 1;

    /* This function reads a character and
    if it is \n it bumps the line counter. */
    static int Fgetc(FILE * f)
    {
    int c;

    c = fgetc(f);
    if (c == '\n')
    line++;
    return c;
    }

    /* Return 1 if the character is a legal C identifier
    character, zero if not. The parameter "Start"
    means if an identifier START character (excluding
    numbers) is desired */
    static int IsIdentifier(int c, int start)
    {
    if (c == '_' || isalpha(c))
    return 1;
    if (start == 0 && isdigit(c))
    return 1;
    return 0;
    }

    /* Just prints the function name */
    static int PrintFunction(FILE * f)
    {
    printf("Function %s: line %d ...", IdBuffer, line);
    return Fgetc(f);
    }

    /* Reads a global identifier into our name buffer */
    static int ReadId(char c, FILE * f)
    {
    int i = 1;

    IdBuffer[0] = c;
    while (i < MAXID - 1) {
    c = Fgetc(f);
    if (c != EOF) {
    if (IsIdentifier(c, 0))
    IdBuffer[i++] = c;
    else
    break;
    }
    else
    break;
    }
    IdBuffer = 0;
    return c;
    }

    /* Skips strings */
    static int ParseString(FILE * f)
    {
    int c;

    c = Fgetc(f);
    while (c != EOF && c != '"') {
    if (c == '\\')
    c = Fgetc(f);
    if (c != EOF)
    c = Fgetc(f);
    }
    if (c == '"')
    c = Fgetc(f);
    return c;
    }

    /* Skips comments */
    static int ParseComment(FILE * f)
    {
    int c;

    c = Fgetc(f);
    while (1) {
    while (c != '*') {
    c = Fgetc(f);
    if (c == EOF)
    return EOF;
    }
    c = Fgetc(f);
    if (c == '/')
    break;
    }
    return Fgetc(f);
    }

    /* Skips / * comments */
    static int ParseCppComment(FILE * f)
    {
    int c;

    c = Fgetc(f);
    while (c != EOF && c != '\n') {
    if (c == '\\')
    c = Fgetc(f);
    if (c != EOF)
    c = Fgetc(f);
    }
    if (c == '\n')
    c = Fgetc(f);
    return c;
    }

    /* Checks if a comment is followed after a '/' char */
    static int CheckComment(int c, FILE * f)
    {
    if (c == '/') {
    c = Fgetc(f);
    if (c == '*')
    c = ParseComment(f);
    else if (c == '/')
    c = ParseCppComment(f);
    }
    return c;
    }

    /* Skips white space and comments */
    static int SkipWhiteSpace(int c, FILE * f)
    {
    c = CheckComment(c, f);
    do {
    if (c <= ' ')
    c = Fgetc(f);
    c = CheckComment(c, f);
    }
    while (c <= ' ');
    return c;
    }

    /* Skips chars between simple quotes */
    static int ParseQuotedChar(FILE * f)
    {
    int c;

    c = Fgetc(f);
    while (c != EOF && c != '\'') {
    if (c == '\\')
    c = Fgetc(f);
    if (c != EOF)
    c = Fgetc(f);
    }
    if (c == '\'')
    c = Fgetc(f);
    return c;
    }

    int main(int argc, char *argv[])
    {
    FILE *f;
    int c;
    int level = 0;
    int parenlevel = 0;
    int inFunction = 0;

    if (argc == 1) {
    printf("Usage: %s <file.c>\n", argv[0]);
    return 1;
    }

    f = fopen(argv[1], "r");
    if (f == NULL) {
    printf("Can't find %s\n", argv[1]);
    return 2;
    }

    c = Fgetc(f);
    while (c != EOF) {
    /* Note that each of the switches must advance the
    character read so that we avoid an infinite loop. */
    switch (c) {
    case '"':
    c = ParseString(f);
    break;
    case '/':
    c = CheckComment(c, f);
    break;
    case '\'':
    c = ParseQuotedChar(f);
    break;
    case '{':
    level++;
    c = Fgetc(f);
    break;
    case '}':
    if (level == 1 && inFunction) {
    printf(" %d\n", line);
    inFunction = 0;
    }
    if (level > 0)
    level--;
    c = Fgetc(f);
    break;
    case '(':
    parenlevel++;
    c = Fgetc(f);
    break;
    case ')':
    if (parenlevel > 0)
    parenlevel--;
    c = Fgetc(f);
    if ((parenlevel | level) == 0) {
    c = SkipWhiteSpace(c, f);
    if (c == '{') {
    level++;
    inFunction = 1;
    c = PrintFunction(f);
    }
    }
    break;
    default:
    if ((level | parenlevel) == 0 && IsIdentifier(c, 1))
    c = ReadId(c, f);
    else
    c = Fgetc(f);
    }
    }
    fclose(f);
    return 0;
    }

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>
    CBFalconer, Oct 17, 2006
    #5
  6. jacob navia

    Bill Pursell Guest

    CBFalconer wrote (modified code by Jacob Navia):

    > static int line = 1;
    >
    > /* This function reads a character and
    > if it is \n it bumps the line counter. */
    > static int Fgetc(FILE * f)
    > {
    > int c;
    >
    > c = fgetc(f);
    > if (c == '\n')
    > line++;
    > return c;
    > }
    >
    > c = Fgetc(f);
    > while (c != EOF) {
    > /* Each case must advance the read of f */
    > ....
    > }


    I'm curious to hear opinions on the following adaptation:

    #ifndef NDEBUG
    static size_t count;
    #endif

    static int
    Fgetc(FILE *f)
    {
    int c;

    c = fgetc(f);
    #ifndef NDEBUG
    count++;
    #endif
    if (c == '\n')
    line++;
    return c;
    }

    while (c != EOF) {
    #ifndef NDEBUG
    size_t prev_count = count;
    #endif
    ...
    assert( count == prev_count +1);
    }

    To paraphrase, I'm replacing the comment that
    each case must read a character from the input
    stream with an assertion. Doing so requires
    a little bit of extra overhead in the code. Is that
    overhead (and potential bugs that it might introduce)
    worth it? It strikes me that the potential for having
    a bug is now lessened, and the resulting bug will
    be easier to detect. Other opinions appreciated.

    Also, does the assertion accurately capture the
    overflow of count? I started to write an explicit
    check for count ==0, but realized that the assertion
    will still hold when count overflows.

    --
    Bill Pursell
    Bill Pursell, Oct 17, 2006
    #6
  7. jacob navia

    jacob navia Guest

    Keith Thompson wrote:
    > jacob navia <> writes:
    >
    >>Keith Thompson wrote:
    >>
    >>>Perhaps if you posted the revised code, you'd get more substantial
    >>>comments.

    >>
    >>OK. Here it is. Blanks instead of tabs.
    >>---------------------------------------------------------------cut here

    >
    > [snip]
    >
    > You're still using "//" comments, and mixing declarations and
    > statements.
    >
    > I can see the use for the latter, but the reasons for avoiding "//"
    > comments on Usenet, even assuming they're 100% legal and portable,
    > have been explained here many times.
    >


    Look, I have spent all my time in the last 5-8 years writing
    a C99 compliant version of my compiler system. To tell me
    that now I should please heathfield and co and come back to C89
    because he doesn't want to change some compiler options is too
    much really.
    jacob navia, Oct 17, 2006
    #7
  8. jacob navia said:

    <snip>

    > Look, I have spent all my time in the last 5-8 years writing
    > a C99 compliant version of my compiler system.


    I was not aware that lcc-win32 was C99-"compliant". Last time this came up,
    you denied it. Has the situation changed? Or do you just mean that the
    source code of the compiler is C99-conforming? That should not have taken
    5-8 years to achieve. I've been writing C99-conforming code (without ever
    having /heard/ of C99), since long before 1999. It's hardly difficult, is
    it?

    > To tell me
    > that now I should please heathfield and co and come back to C89
    > because he doesn't want to change some compiler options is too
    > much really.


    You misunderstand the point entirely. If the point were to please
    "heathfield", then of course you'd be foolish to take any notice of some
    random hack on Usenet with what you perceive as an axe to grind. This is
    far more about the "and co" than it is about "heathfield".

    Why do some people - especially comp.lang.c subscribers - choose to invoke
    their compilers in conforming mode? Why, it's so that they can get as much
    compiler assurance as possible that their source code will be portable to
    other compilers, other platforms, other operating systems.

    But surely everywhere supports // comments, doesn't it? Well, no, everywhere
    doesn't support // comments. And even if everywhere did support such
    comments, to get *gcc* to support them we have to invoke it in a mode that
    conforms neither to C90 nor to C99, and thus we don't get as much compiler
    assurance as possible that the rest of our source code will be portable to
    other compilers, other platforms, other operating systems. What is true for
    me is true for others, too. This isn't about "heathfield". This is about
    those who write C portably because they need their code to work even on
    computers that Jacob Navia hasn't heard of and perhaps cannot imagine.

    What's more, even if my version of gcc had a C99 mode (which it doesn't),
    would there be any point in using it? Well, yes IF my intent were to use
    C99 features in my own code. But, as someone whose code must remain as
    portable as it reasonably can be, I dare not use C99 features that are not
    also C90 features until C99 becomes as widely implemented as C90 currently
    is. And, practically speaking, that means there's no point even /looking/
    until Microsoft, gcc, and at least one mainframe implementor provide
    C99-conforming implementations (compiler *and* library) off the shelf.

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at above domain (but drop the www, obviously)
    Richard Heathfield, Oct 17, 2006
    #8
  9. jacob navia

    Richard Guest

    Richard Heathfield <> writes:

    > jacob navia said:
    >
    > <snip>
    >
    >> Look, I have spent all my time in the last 5-8 years writing
    >> a C99 compliant version of my compiler system.

    >
    > I was not aware that lcc-win32 was C99-"compliant". Last time this came up,
    > you denied it. Has the situation changed? Or do you just mean that the
    > source code of the compiler is C99-conforming? That should not have taken
    > 5-8 years to achieve. I've been writing C99-conforming code (without ever
    > having /heard/ of C99), since long before 1999. It's hardly difficult, is
    > it?


    I would love to know how many target platforms your code is "ported to"
    which doesn't support a form of GCC and the C99 subsets Jacob uses.

    While it is laudable that you think all your code should compile on
    every C compiler out there, I dont think that makes it of paramount
    importance for Jacob, or those of us who do make use of compiler and
    platform specifics because of a need to harness some feature or
    optimizaton.

    I have spent a lot of time programming GUI messaging systems in C - they
    sure as hell wont port, and there was no reason whatsoever to consider
    that the C99 features used would be detrimental to the codes continued
    use.
    Richard, Oct 17, 2006
    #9
  10. jacob navia <> writes:
    > Keith Thompson wrote:
    >> jacob navia <> writes:
    >>
    >>>Keith Thompson wrote:
    >>>
    >>>>Perhaps if you posted the revised code, you'd get more substantial
    >>>>comments.
    >>>
    >>>OK. Here it is. Blanks instead of tabs.
    >>>---------------------------------------------------------------cut here

    >> [snip]
    >> You're still using "//" comments, and mixing declarations and
    >> statements.
    >> I can see the use for the latter, but the reasons for avoiding "//"
    >> comments on Usenet, even assuming they're 100% legal and portable,
    >> have been explained here many times.

    >
    > Look, I have spent all my time in the last 5-8 years writing
    > a C99 compliant version of my compiler system. To tell me
    > that now I should please heathfield and co and come back to C89
    > because he doesn't want to change some compiler options is too
    > much really.


    lcc-win32 is not C99 compliant, and I am not Richard Heathfield.

    Shall I explain again why "//" comments on Usenet are a bad idea, or
    do you remember the reasons?

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Oct 17, 2006
    #10
  11. jacob navia

    jacob navia Guest

    Keith Thompson wrote:
    > jacob navia <> writes:
    >
    >>Keith Thompson wrote:
    >>
    >>>jacob navia <> writes:
    >>>
    >>>
    >>>>Keith Thompson wrote:
    >>>>
    >>>>
    >>>>>Perhaps if you posted the revised code, you'd get more substantial
    >>>>>comments.
    >>>>
    >>>>OK. Here it is. Blanks instead of tabs.
    >>>>---------------------------------------------------------------cut here
    >>>
    >>>[snip]
    >>>You're still using "//" comments, and mixing declarations and
    >>>statements.
    >>>I can see the use for the latter, but the reasons for avoiding "//"
    >>>comments on Usenet, even assuming they're 100% legal and portable,
    >>>have been explained here many times.

    >>
    >>Look, I have spent all my time in the last 5-8 years writing
    >>a C99 compliant version of my compiler system. To tell me
    >>that now I should please heathfield and co and come back to C89
    >>because he doesn't want to change some compiler options is too
    >>much really.

    >
    >
    > lcc-win32 is not C99 compliant, and I am not Richard Heathfield.
    >
    > Shall I explain again why "//" comments on Usenet are a bad idea, or
    > do you remember the reasons?
    >


    I cutted the lines so that they pass everywhere. Verified that. The
    message looked perfectly well in my news reader. Buyt you are
    right, line breaking *could* be a problem.
    jacob navia, Oct 17, 2006
    #11
  12. jacob navia

    CBFalconer Guest

    CBFalconer wrote:
    >

    .... snip ...
    >
    > For those interested here is Jacobs code revised for portability to
    > C90. It will even compile under C99. I also passed it through
    > indent. Now you should be able to compile and thrash.
    >
    > Aside to Jacob - see how easy it is to be semi-portable. Still are
    > portability problems, such as return 1 in main and the use of
    > "c <= ' '". I hope this isn't the lexer in lcc.
    >

    .... snip ...

    I did some further simple reformatting and removal of hideous
    coding. Now I have a peculiar anomaly. The code compiles with gcc
    and runs on itself quite nicely when the output file is "a.exe" (by
    default). Operation with no params gives help. However, once I
    rename that to "cfunct.exe" no parameter operation goes wild, and
    execution on its own code gives nothing. I suspect some of the
    non-standard coding is the reason, and will look at it further
    later. This was done with gcc 3.2.1 under DJGPP and W98. The
    actual version compiled is below, for those interested.

    My instinct tells me that the use of argv[0] is to blame. I doubt
    very much that other systems will have the same misfunction. At
    any rate, Jacobs code has been shown to be non-portable.

    This all showed up when I started to build a simple filter to
    exterminate // comments.

    An aside: I always format do while loops as:

    do {
    ...
    } while (condition);

    indent, as I have it set up now, makes that

    do {
    ...
    }
    while (condition);

    which casual reading parses as a while loop with an empty statement
    phase. I initially revised that to "while (condition) continue;",
    which produced parse errors and located the problem. This also
    shows up the advantage of using continue in empty loops.

    /* A simple scanner that will take a file of C source code and
    print the names of all functions therein, in the following
    format:
    "Function XXXX found line dddd .... ddddd"
    Algorithm. It scans for a terminating parentheses and an
    immediately following opening brace. Comments can appear
    between the closing paren and the opening braces, but no
    other characters besides white space. Functions must have
    the correct prototype, K & R syntax is not supported.
    */
    #include <stdio.h>
    #include <ctype.h>

    /* Longest Identifier we support. Sorry Java guys. */
    #define MAXID 1024

    /* Buffer for remembering the function name */
    static char IdBuffer[MAXID];

    /* Line number counter. We start at line 1 */
    static int line = 1;

    /* ----------------- */

    /* This function reads a character and
    if it is \n it bumps the line counter. */
    static int Fgetc(FILE * f)
    {
    int c;

    if ('\n' == (c = fgetc(f))) line++;
    return c;
    } /* Fgetc */

    /* ----------------- */

    /* Return 1 if the character is a legal C identifier
    character, zero if not. The parameter "Start"
    means if an identifier START character (excluding
    numbers) is desired */
    static int IsIdentifier(int c, int start)
    {
    if (c == '_' || isalpha(c)) return 1;
    if (start == 0 && isdigit(c)) return 1;
    return 0;
    } /* IsIdentifier */

    /* ----------------- */

    /* Just prints the function name */
    static int PrintFunction(FILE * f)
    {
    printf("Function %s: line %d ...", IdBuffer, line);
    return Fgetc(f);
    } /* PrintFunction */

    /* ----------------- */

    /* Reads a global identifier into our name buffer */
    static int ReadId(char c, FILE * f)
    {
    int i = 1;

    IdBuffer[0] = c;
    while (i < MAXID - 1) {
    c = Fgetc(f);
    if (EOF == c) break;
    else {
    if (IsIdentifier(c, 0)) IdBuffer[i++] = c;
    else break;
    }
    }
    IdBuffer = 0;
    return c;
    } /* ReadId */

    /* ----------------- */

    /* Skips strings */
    static int ParseString(FILE * f)
    {
    int c;

    c = Fgetc(f);
    while (c != EOF && c != '"') {
    if (c == '\\') c = Fgetc(f);
    if (c != EOF) c = Fgetc(f);
    }
    if (c == '"') c = Fgetc(f);
    return c;
    } /* ParseString */

    /* ----------------- */

    /* Skips comments */
    static int ParseComment(FILE * f)
    {
    int c;

    c = Fgetc(f);
    while (1) {
    while (c != '*') {
    c = Fgetc(f);
    if (c == EOF) return EOF;
    }
    c = Fgetc(f);
    if (c == '/') break;
    }
    return Fgetc(f);
    } /* ParseComment */

    /* ----------------- */

    /* Skips / * comments */
    static int ParseCppComment(FILE * f)
    {
    int c;

    c = Fgetc(f);
    while (c != EOF && c != '\n') {
    if (c == '\\') c = Fgetc(f);
    if (c != EOF) c = Fgetc(f);
    }
    if (c == '\n') c = Fgetc(f);
    return c;
    } /* ParseCppComment */

    /* ----------------- */

    /* Checks if a comment is followed after a '/' char */
    static int CheckComment(int c, FILE * f)
    {
    if (c == '/') {
    c = Fgetc(f);
    if (c == '*') c = ParseComment(f);
    else if (c == '/') c = ParseCppComment(f);
    }
    return c;
    } /* CheckComment */

    /* ----------------- */

    /* Skips white space and comments */
    static int SkipWhiteSpace(int c, FILE * f)
    {
    c = CheckComment(c, f);
    do {
    if (c <= ' ') c = Fgetc(f);
    c = CheckComment(c, f);
    } while (c <= ' ');
    return c;
    } /* SkipWhiteSpace */

    /* ----------------- */

    /* Skips chars between simple quotes */
    static int ParseQuotedChar(FILE * f)
    {
    int c;

    c = Fgetc(f);
    while (c != EOF && c != '\'') {
    if (c == '\\') c = Fgetc(f);
    if (c != EOF) c = Fgetc(f);
    }
    if (c == '\'') c = Fgetc(f);
    return c;
    } /* ParseQuotedChar */

    /* ----------------- */

    int main(int argc, char *argv[])
    {
    FILE *f;
    int c;
    int level = 0;
    int parenlevel = 0;
    int inFunction = 0;

    if (argc == 1) {
    printf("Usage: %s <file.c>\n", argv[0]);
    return 1;
    }

    f = fopen(argv[1], "r");
    if (f == NULL) {
    printf("Can't find %s\n", argv[1]);
    return 2;
    }

    c = Fgetc(f);
    while (c != EOF) {
    /* Note that each of the switches must advance the
    character read so that we avoid an infinite loop. */
    switch (c) {
    case '"':
    c = ParseString(f);
    break;
    case '/':
    c = CheckComment(c, f);
    break;
    case '\'':
    c = ParseQuotedChar(f);
    break;
    case '{':
    level++;
    c = Fgetc(f);
    break;
    case '}':
    if (level == 1 && inFunction) {
    printf(" %d\n", line);
    inFunction = 0;
    }
    if (level > 0) level--;
    c = Fgetc(f);
    break;
    case '(':
    parenlevel++;
    c = Fgetc(f);
    break;
    case ')':
    if (parenlevel > 0) parenlevel--;
    c = Fgetc(f);
    if ((parenlevel | level) == 0) {
    c = SkipWhiteSpace(c, f);
    if (c == '{') {
    level++;
    inFunction = 1;
    c = PrintFunction(f);
    }
    }
    break;
    default:
    if ((level | parenlevel) == 0 && IsIdentifier(c, 1))
    c = ReadId(c, f);
    else
    c = Fgetc(f);
    }
    }
    fclose(f);
    return 0;
    } /* main, cfunct.c */

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>
    CBFalconer, Oct 18, 2006
    #12
  13. jacob navia

    Flash Gordon Guest

    CBFalconer wrote:
    > CBFalconer wrote:
    > ... snip ...
    >> For those interested here is Jacobs code revised for portability to
    >> C90. It will even compile under C99. I also passed it through
    >> indent. Now you should be able to compile and thrash.
    >>
    >> Aside to Jacob - see how easy it is to be semi-portable. Still are
    >> portability problems, such as return 1 in main and the use of
    >> "c <= ' '". I hope this isn't the lexer in lcc.
    >>

    > ... snip ...
    >
    > I did some further simple reformatting and removal of hideous
    > coding. Now I have a peculiar anomaly. The code compiles with gcc
    > and runs on itself quite nicely when the output file is "a.exe" (by
    > default). Operation with no params gives help. However, once I
    > rename that to "cfunct.exe" no parameter operation goes wild, and
    > execution on its own code gives nothing. I suspect some of the
    > non-standard coding is the reason, and will look at it further
    > later. This was done with gcc 3.2.1 under DJGPP and W98. The
    > actual version compiled is below, for those interested.
    >
    > My instinct tells me that the use of argv[0] is to blame.


    Possibly since you certainly don't allow for it being NULL.


    > int main(int argc, char *argv[])
    > {
    > FILE *f;
    > int c;
    > int level = 0;
    > int parenlevel = 0;
    > int inFunction = 0;
    >

    if (argc == 0) {
    printf("Usage: <file.c>\n"v[0]);
    return 1;
    }

    > if (argc == 1) {


    if (argc == 1 || argc > 2) {

    > printf("Usage: %s <file.c>\n", argv[0]);
    > return 1;
    > }


    <snip>

    The return value is still non-standard.
    --
    Flash Gordon
    Flash Gordon, Oct 18, 2006
    #13
  14. jacob navia

    jacob navia Guest

    Flash Gordon wrote:
    >
    > <snip>
    >
    > The return value is still non-standard.


    ???
    Since when there is a standard return value?

    This returns zero for no error, 1 for argument error
    and 2 if the file could not be opened...
    jacob navia, Oct 18, 2006
    #14
  15. jacob navia <> writes:
    > Flash Gordon wrote:
    >> <snip>
    >> The return value is still non-standard.

    >
    > ???
    > Since when there is a standard return value?
    >
    > This returns zero for no error, 1 for argument error
    > and 2 if the file could not be opened...


    The standard return values for main() are EXIT_SUCCESS or 0 for
    success, EXIT_FAILURE for failure. Any other values are non-portable.
    In particular, there are real-world systems where "exit(1)" or
    "return 1;" from main() will cause the program to terminate and
    indicate *success* to the calling environment.

    It's often possible to define return values other than the standard
    ones, but they're likely to be system-specific, and they should be
    clearly documented.

    You didn't know that?

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Oct 18, 2006
    #15
  16. jacob navia

    jacob navia Guest

    Keith Thompson wrote:
    > jacob navia <> writes:
    >
    >>Flash Gordon wrote:
    >>
    >>><snip>
    >>>The return value is still non-standard.

    >>
    >>???
    >>Since when there is a standard return value?
    >>
    >>This returns zero for no error, 1 for argument error
    >>and 2 if the file could not be opened...

    >
    >
    > The standard return values for main() are EXIT_SUCCESS or 0 for
    > success, EXIT_FAILURE for failure. Any other values are non-portable.
    > In particular, there are real-world systems where "exit(1)" or
    > "return 1;" from main() will cause the program to terminate and
    > indicate *success* to the calling environment.
    >
    > It's often possible to define return values other than the standard
    > ones, but they're likely to be system-specific, and they should be
    > clearly documented.
    >
    > You didn't know that?
    >


    Excuse me but it is
    int main(int argc,char *argv[])

    "int" means at least 16 bits return value. I can choose more
    tha 30 000 values, and I used 3:
    zero for no error, one for argument error, and two for open failure
    error. Other error codes (that I do not use) could be syntax error in
    the source file, etc.

    EXIT_SUCCESS or EXIT_FAILURE are just too few values to use.
    Or you mean that all error codes are unnecessary and that
    only "failure" should be returned instead of more detailed
    error reports???

    I can't understand the argumentation here, that is not based in
    any standard whatsoever. "main" returns an "int", not a boolean
    value of just success or failure. And that has a reason.

    Error codes are an habit for me. I always use them to convey
    more information to the calling program than just "failure"...

    WHAT FAILED?

    The file couldn't be opened? Syntax error in the file?

    Error codes allow you to differentiate the different possibilities.
    jacob navia, Oct 18, 2006
    #16
  17. On Thu, 19 Oct 2006 00:49:02 +0200, in comp.lang.c , jacob navia
    <> wrote:

    >Keith Thompson wrote:
    >> jacob navia <> writes:
    >>
    >> The standard return values for main() are EXIT_SUCCESS or 0 for
    >> success, EXIT_FAILURE for failure. Any other values are non-portable.


    >Excuse me but it is
    >int main(int argc,char *argv[])
    >
    >"int" means at least 16 bits return value. I can choose more
    >tha 30 000 values, and I used 3:


    Keith's point is that only the three listed above are guaranteed by
    the standard to be meaningful.

    For your information, many OSen treat return values as specific error
    codes. Defining your own is fraught with peril. I recall that
    returning a large positive multiple of two or any negative number on
    any VMS based system, could provide hours of amusement with the scary
    messages you get from the OS. I recall once being told that I had
    initiated a cluster-wide shutdown due to a fire alert, or being
    requested to place the tape into DRA0 or somesuch....

    Even MSDOS does this - there's a prescribed set of error codes in one
    of the system headers that lists what you should return for no memory,
    invalid file handle etc. If you return one of these values, some DOS
    tools will assume you have encountered such an error, and take
    unnecessary remedial action.

    >Error codes are an habit for me. I always use them to convey
    >more information to the calling program than just "failure"...


    This is a good plan. Keith's point is that you can't do this portably
    with the return from main, without unexpected side-effects. You need
    to find a different way to signal the precise error to the user, or
    drop into system-specific return codes and consider portability
    issues.

    >Error codes allow you to differentiate the different possibilities.


    Agreed.
    --
    Mark McIntyre

    "Debugging is twice as hard as writing the code in the first place.
    Therefore, if you write the code as cleverly as possible, you are,
    by definition, not smart enough to debug it."
    --Brian Kernighan
    Mark McIntyre, Oct 19, 2006
    #17
  18. In article <4536af5c$0$25916$>,
    jacob navia <> wrote:
    >EXIT_SUCCESS or EXIT_FAILURE are just too few values to use.
    >Or you mean that all error codes are unnecessary and that
    >only "failure" should be returned instead of more detailed
    >error reports???


    It depends on the context your program will be used in.

    Many operating systems interpret the return value of a C program as
    meaning success or failure. For example, in unix, a non-zero return
    value (usually) indicates failure and may cause a script to terminate.
    If you want to fit in with the operating system's conventions, and
    don't know for sure what the operating system is going to be, then the
    standard C way to do it is EXIT_FAILURE and EXIT_SUCCESS.

    (As has been pointed out before, the standard could instead have
    mapped 0 and 1, but for some reason that wasn't done - presumably
    for compatibility with existing programs on odd platforms.)

    If you don't care about the operating system's conventions, or you
    only care about portability to, say, Posix systems, you can use
    any values that seem appropriate.

    -- Richard
    Richard Tobin, Oct 19, 2006
    #18
  19. jacob navia <> writes:
    > Keith Thompson wrote:
    >> jacob navia <> writes:
    >>
    >>>Flash Gordon wrote:
    >>>
    >>>><snip>
    >>>>The return value is still non-standard.
    >>>
    >>>???
    >>>Since when there is a standard return value?
    >>>
    >>>This returns zero for no error, 1 for argument error
    >>>and 2 if the file could not be opened...

    >> The standard return values for main() are EXIT_SUCCESS or 0 for
    >> success, EXIT_FAILURE for failure. Any other values are non-portable.
    >> In particular, there are real-world systems where "exit(1)" or
    >> "return 1;" from main() will cause the program to terminate and
    >> indicate *success* to the calling environment.
    >> It's often possible to define return values other than the standard
    >> ones, but they're likely to be system-specific, and they should be
    >> clearly documented.
    >> You didn't know that?

    >
    > Excuse me but it is
    > int main(int argc,char *argv[])
    >
    > "int" means at least 16 bits return value. I can choose more
    > tha 30 000 values, and I used 3:
    > zero for no error, one for argument error, and two for open failure
    > error. Other error codes (that I do not use) could be syntax error in
    > the source file, etc.
    >
    > EXIT_SUCCESS or EXIT_FAILURE are just too few values to use.
    > Or you mean that all error codes are unnecessary and that
    > only "failure" should be returned instead of more detailed
    > error reports???
    >
    > I can't understand the argumentation here, that is not based in
    > any standard whatsoever. "main" returns an "int", not a boolean
    > value of just success or failure. And that has a reason.


    Not based on any standard whatsoever?? It's based on the C standard,
    ISO/IEC 9899:1999.

    C99 5.1.2.2.3p1:

    ... a return from the initial call to the main function is
    equivalent to calling the exit function with the value returned by
    the main function as its argument ...

    C99 7.20.4.3p5:

    Finally, control is returned to the host environment. If the value
    of status is zero or EXIT_SUCCESS, an implementation-defined form
    of the status _successful termination_ is returned. If the value
    of status is EXIT_FAILURE, an implementation-defined form of the
    status _unsuccessful termination_ is returned. Otherwise the
    status returned is implementation-defined.

    Note carefully that last sentence; for "exit(1);" or "return 1;", the
    status returned is defined by the *implementation*, not by your own
    program.

    > Error codes are an habit for me. I always use them to convey
    > more information to the calling program than just "failure"...
    >
    > WHAT FAILED?
    >
    > The file couldn't be opened? Syntax error in the file?
    >
    > Error codes allow you to differentiate the different possibilities.


    That's great if your implementation allows for it, but exit codes
    cannot *portably* distinguish results other than success vs. failure.

    Some concrete examples:

    In Unix-like systems, all but the low-order 8 bits of the status are
    silently ignored, so exit(256) has exactly the same effect as exit(0).
    It's common for applications to use multiple return codes for
    different failure modes (grep specifies 2 different failure codes;
    curl currently specifies 76), but there's no universal standard other
    than zero for success, non-zero for failure.

    In VMS, the convention is that an odd-numbered status indicates
    success and an even-numbered status indicates failure (with a lot more
    rules for interpreting specific values). As a special case, in a C
    program, a status of 0 is translated to 1, so that exit(0) will work
    as expected; this translation is *not* done for exit(1). So "exit(0)"
    and "exit(1)" both have exactly the same effect; both indicate that
    the program terminated successfully.

    It would be perfectly valid for an implementation to treat 0 as
    success, 1 as failure, and map *all* non-zero status values to 1. (I
    don't know of any systems that actually do this.)

    Now if you want to define a set of status codes for your program, and
    the program is intended to run only on some particular platform,
    that's just fine. If you want to define the same set of status codes
    for *all* platforms, even if it violates the established conventions
    on some systems, I personally think it's a bad idea but you can go
    ahead and do it if you insist.

    If you think that success vs. failure just isn't enough information, I
    don't necessarily disagree; your argument is with the C standard, not
    with me.

    Furthermore, if you want your program to convey extra information via
    the exit status, you really should document it. I can't use the
    information if I don't know it's there.

    jacob, did you really not know this?

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Oct 19, 2006
    #19
  20. jacob navia

    CBFalconer Guest

    jacob navia wrote:
    >

    .... snip ...
    >
    > WHAT FAILED?
    >
    > The file couldn't be opened? Syntax error in the file?


    Look at the first paragraph of my article. If you wish send me an
    email and I will return a package with the problems. I have no
    idea what and why, even putting printfs at various places,
    including as the first statement in main, all fail. The problem
    might be some failure in my system for all I know, and may not be
    reproducible elsewhere.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>
    CBFalconer, Oct 19, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bernd Oninger
    Replies:
    0
    Views:
    746
    Bernd Oninger
    Jun 9, 2004
  2. ZOCOR

    XML Parser VS HTML Parser

    ZOCOR, Oct 3, 2004, in forum: Java
    Replies:
    11
    Views:
    802
    Paul King
    Oct 5, 2004
  3. Bernd Oninger
    Replies:
    0
    Views:
    804
    Bernd Oninger
    Jun 9, 2004
  4. Joel Hedlund
    Replies:
    2
    Views:
    492
    Joel Hedlund
    Nov 11, 2006
  5. Joel Hedlund
    Replies:
    0
    Views:
    295
    Joel Hedlund
    Nov 11, 2006
Loading...

Share This Page