Remove the comments and excess white space in C source code

Discussion in 'C Programming' started by F.F., Sep 20, 2013.

  1. F.F.

    F.F. Guest

    F.F., Sep 20, 2013
    #1
    1. Advertisements

  2. F.F.

    Eric Sosman Guest

    I took only a brief look, so my remarks may be incomplete.
    In no particular order:

    - Interchange lines 219-221 with 222-224.

    - The tests at line 246 are wrong because of the `char' type.

    - I think check_preprocessor_statements() will fail if the
    source starts with white space (e.g., newlines) followed by '#'.

    - I think lines 168-173 will mess up source constructs like
    `x = y / *ptr;', turning them into (unterminated) comments.

    - Trigraph sequences and "digraphs" aren't handled properly.
    (This could be considered a feature rather than a bug.)

    - I don't think white space before what you call "tokens" will
    be removed. For example, it looks like `puts ( "Hello" ) ;' will
    become `puts ("Hello");' rather than `puts("Hello");'.

    - Lines that end with a backslash-newline pair aren't handled
    properly.

    - Lines 119-141 are a *terrible* idea! One crummy little
    I/O error (or bug!), and you can kiss your source code good-bye!

    - Speaking of I/O errors, rip_file() is careful to detect
    them but not so careful about closing FILE streams afterward.
    (In fact, it never closes the overwritten input which is its
    principal output, so never gets a chance to detect errors in
    closing -- but since the original source is already trashed by
    then it may not make much difference. Even if all goes well,
    though, rip_file() leaks an open FILE stream for each source it
    processes; feed it enough sources and it may well run out.)
    (Hmmm: I wonder what happens if you mention the same source
    file name twice on the command line ...)

    - Higher-level remark: I think the program might be simpler
    if re-cast as a state machine, instead of spreading the logic
    across a whole bunch of brittle-looking functions. ("Brittle"
    because there's always this question about whether the function
    has or has not swallowed the current character, and perhaps more;
    that's the sort of thing that's easy to lose track of.) This looks
    more like a job for one simple loop surrounding a big `switch'
    statement, with cases corresponding to the current context.
     
    Eric Sosman, Sep 20, 2013
    #2
    1. Advertisements

  3. F.F.

    Tim Rentsch Guest

    In addition to Eric Sosman's list (and overlap in some
    cases), I would list these problems:

    1. Some spaces that can be taken out aren't.

    2. Some cases where spaces must be left in are not,
    eg, return/**/ 0;

    3. Comments are not removed from preprocessor
    directives.

    4. Line boundaries ignored when deciding whether
    a '#' starts a preprocessor directive.

    5. Preprocessor directives after regular program
    text don't have a newline inserted before them.
    Or apparenly only sometimes don't, eg

    int main(){
    #define FOO 1

    misbehaves.

    6. There needs to be a final newline added if the
    last output line is non-empty (which it almost
    always will be in real programs).

    7. The formatting program generally assumes its
    input is well-formed C source, with little or
    no effort to detect bad input.

    8. Approach is generally too simplistic to be
    completely effective, especially if it matters
    what happens with spaces in macro expansions,
    which it does in some programs because of how
    the stringizing operator works.
     
    Tim Rentsch, Sep 22, 2013
    #3
  4. F.F.

    Tim Rentsch Guest

    That turns out to be a lot harder than it might seem, because of
    interactions between the different levels of textual processing
    (trigraphs, line splicing, comments, preprocessor lines, etc),
    not to mention the question of when adjacent tokens can be
    safely agglutinated.
     
    Tim Rentsch, Sep 22, 2013
    #4
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.