Preprocessor issue - token spacing

Discussion in 'C Programming' started by Mamluk Caliph, Dec 2, 2007.

  1. For quite some time I've used code as the following to be able to
    "cut&paste" parts from different headerfiles of the same name. It has
    worked with GCC, MS, Borland, Keil to name a few.

    In principle the mechanism looks like this:

    #define CHAINPATH /usr/include
    #define DEFSTR( x ) \
    #x

    #define FNAME( path, file ) \
    DEFSTR( path/file )

    #define BUILDCHAIN( file ) \
    FNAME( CHAINPATH, file )


    #include BUILDCHAIN( stdio.h )

    int main(int argc, char **argv){
    printf("Hello world\n");
    }

    For some reason, when I try to compile this with CodeWarrior an extra
    space will be inserted between the path and the filename.

    I'm not sure this is a bug and I would be grateful to know if there is
    any right or wrong concerning token spacing, resulting in a "file not
    found" error. When it comes to the preprocessor I'm usually confused
    so it might be me missing something really obvious.

    Also if somebody has another suggestion for handling multiple headers
    of the same name that would be welcome too.

    /Michael
     
    Mamluk Caliph, Dec 2, 2007
    #1
    1. Advertising

  2. On Sun, 02 Dec 2007 09:13:42 -0800, Mamluk Caliph wrote:
    > For quite some time I've used code as the following to be able to
    > "cut&paste" parts from different headerfiles of the same name. It has
    > worked with GCC, MS, Borland, Keil to name a few.
    >
    > In principle the mechanism looks like this:
    >
    > #define CHAINPATH /usr/include
    > #define DEFSTR( x ) \
    > #x
    >
    > #define FNAME( path, file ) \
    > DEFSTR( path/file )
    >
    > #define BUILDCHAIN( file ) \
    > FNAME( CHAINPATH, file )
    >
    >
    > #include BUILDCHAIN( stdio.h )
    >
    > int main(int argc, char **argv){
    > printf("Hello world\n");
    > }
    >
    > For some reason, when I try to compile this with CodeWarrior an extra
    > space will be inserted between the path and the filename.
    >
    > I'm not sure this is a bug and I would be grateful to know if there is
    > any right or wrong concerning token spacing, resulting in a "file not
    > found" error. When it comes to the preprocessor I'm usually confused so
    > it might be me missing something really obvious.


    The rules for #include are very lenient, and the details of how other
    tokens than "..." and <...> map to those two forms are left mostly to the
    implementation.

    However, stringizing itself is well specified, and it's possible to test
    whether that's the problem you're having by writing a program to test
    only that:

    #define CHAINPATH /usr/include
    #define DEFSTR( x ) \
    #x

    #define FNAME( path, file ) \
    DEFSTR( path/file )

    #define BUILDCHAIN( file ) \
    FNAME( CHAINPATH, file )

    #include <stdio.h>

    int main(int argc, char **argv){
    puts(BUILDCHAIN( stdio.h ));
    }

    This is required to print "/usr/include/stdio.h", and if CodeWarrior
    inserts a space in this case as well, I believe it has a bug. Whitespace
    surrounding macro arguments or macro definitions are supposed to be
    ignored, so the only relevant spacing would be between path and /, or
    between / and file. There isn't spacing in either case, so there should
    not be any spacing in the resulting string.

    > Also if somebody has another suggestion for handling multiple headers of
    > the same name that would be welcome too.


    Generally speaking, it would be good to simply completely avoid this. If
    you absolutely need this, could you explain why? There are some possible
    better ways of doing this, but they won't work in all cases, so depending
    on why you need this, they may or may not apply.
     
    Harald van Dijk, Dec 2, 2007
    #2
    1. Advertising

  3. Mamluk Caliph <> writes:
    > For quite some time I've used code as the following to be able to
    > "cut&paste" parts from different headerfiles of the same name. It has
    > worked with GCC, MS, Borland, Keil to name a few.
    >
    > In principle the mechanism looks like this:
    >
    > #define CHAINPATH /usr/include
    > #define DEFSTR( x ) \
    > #x
    >
    > #define FNAME( path, file ) \
    > DEFSTR( path/file )
    >
    > #define BUILDCHAIN( file ) \
    > FNAME( CHAINPATH, file )
    >
    >
    > #include BUILDCHAIN( stdio.h )
    >
    > int main(int argc, char **argv){
    > printf("Hello world\n");
    > }
    >
    > For some reason, when I try to compile this with CodeWarrior an extra
    > space will be inserted between the path and the filename.

    [...]

    Macro definitions are defined in terms of sequences of (preprocessing)
    tokens. Your first definition:

    #define CHAINPATH /usr/include

    defines CHAINPATH as a sequence of 4 distinct tokens:

    / usr / include

    Normally I'd suggest something like this:

    #define CHAINPATH "/usr/include"

    and using string literal concatenation to build the final
    "/usr/include/stdio.h" string literal, but I don't think concatenation
    applies to header names (which look like string literals, but really
    aren't) -- and a quick experiment with gcc shows that it accepts this:
    #include "/usr/include/stdio.h"
    but not this:
    #include "/usr/include/" "stdio.h"

    You might have to resort to a custom-built preprocessor that you
    invoke to generate your source files during your build procedure.

    --
    Keith Thompson (The_Other_Keith) <>
    Looking for software development work in the San Diego area.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Dec 2, 2007
    #3
  4. Mamluk Caliph

    Thad Smith Guest

    Keith Thompson wrote:

    > Normally I'd suggest something like this:
    >
    > #define CHAINPATH "/usr/include"
    >
    > and using string literal concatenation to build the final
    > "/usr/include/stdio.h" string literal, but I don't think concatenation
    > applies to header names (which look like string literals, but really
    > aren't)


    That's correct. String literal concatenation occurs in translation
    phase 6, too late. There is a footnote to that effect in section 6.10.2.

    --
    Thad
     
    Thad Smith, Dec 2, 2007
    #4
  5. On Dec 2, 8:32 pm, Harald van D©¦k <> wrote:
    > On Sun, 02 Dec 2007 09:13:42 -0800, Mamluk Caliph wrote:
    > >For quite some time I've used code as the followingto be able to
    > > "cut&paste" parts from different headerfiles of the same name. It has
    > > worked with GCC, MS, Borland, Keil to name a few.

    >
    > > In principle the mechanism looks like this:

    >
    > > #define CHAINPATH /usr/include
    > > #define DEFSTR( x ) \
    > > #x

    >
    > > #define FNAME( path, file ) \
    > > DEFSTR( path/file )

    >
    > > #define BUILDCHAIN( file ) \
    > > FNAME( CHAINPATH, file )

    >
    > > #include BUILDCHAIN( stdio.h )

    >
    > > int main(int argc, char **argv){
    > > printf("Hello world\n");
    > > }

    >
    > > For some reason, when I try to compile this with CodeWarrior an extra
    > > space will be inserted between the path and the filename.

    >
    > > I'm not sure this is a bug and I would be grateful to know if there is
    > > any right or wrong concerning token spacing, resulting in a "file not
    > > found" error. When it comes to the preprocessor I'm usually confused so
    > > it might be me missing something really obvious.

    >
    > The rules for #include are very lenient, and the details of how other
    > tokens than "..." and <...> map to those two forms are left mostly to the
    > implementation.
    >
    > However, stringizing itself is well specified, and it's possible to test
    > whether that's the problem you're having by writing a program to test
    > only that:
    >
    > #define CHAINPATH /usr/include
    > #define DEFSTR( x ) \
    > #x
    >
    > #define FNAME( path, file ) \
    > DEFSTR( path/file )
    >
    > #define BUILDCHAIN( file ) \
    > FNAME( CHAINPATH, file )
    >
    > #include <stdio.h>
    >
    > int main(int argc, char **argv){
    > puts(BUILDCHAIN( stdio.h ));
    >
    > }
    >
    > This is required to print "/usr/include/stdio.h", and if CodeWarrior
    > inserts a space in this case as well, I believe it has a bug. Whitespace
    > surrounding macro arguments or macro definitions are supposed to be
    > ignored, so the only relevant spacing would be between path and /, or
    > between / and file. There isn't spacing in either case, so there should
    > not be any spacing in the resulting string.
    >


    You're right! It's not stringizing that's the problem and the extra
    space is not there in the modified program.

    > > Also if somebody has another suggestion for handling multiple headers of
    > > the same name that would be welcome too.

    >
    > Generally speaking, it would be good to simply completely avoid this. If
    > you absolutely need this, could you explain why? There are some possible
    > better ways of doing this, but they won't work in all cases, so depending
    > on why you need this, they may or may not apply.


    Yes, and I think I've experienced the reason "why" many times over.
    It's not something that I enjoy doing and I consider it breaking
    almost every rule I know regarding good programming practice.

    The reason I do this however, is because I work with embedded
    applications for small to puny targets and in the embedded world, tool-
    chains often come with crippled or incomplete standard libraries. Even
    as such, they provide quite a lot of value and rewriting them just to
    get it right is almost always too much work. So what I do is adding
    what's missing (usually functions) and "merging" the original header
    files with the additional declarations.

    Sometimes I also need to redefine a function or macro because it's
    either wrong (the need for reentrancy sometimes forces me to re-
    implement certain functions with my own versions) or it doesn't fit
    the target for some other reason. The latter is *really* messy and
    neither habit is something I would recommend. I've just not figured
    out another way so far.
     
    Mamluk Caliph, Dec 7, 2007
    #5
  6. Mamluk Caliph

    Eric Sosman Guest

    Mamluk Caliph wrote:
    > On Dec 2, 8:32 pm, Harald van D©¦k <> wrote:
    >> On Sun, 02 Dec 2007 09:13:42 -0800, Mamluk Caliph wrote:
    >>> [...]
    >>> Also if somebody has another suggestion for handling multiple headers of
    >>> the same name that would be welcome too.

    >> Generally speaking, it would be good to simply completely avoid this. If
    >> you absolutely need this, could you explain why? There are some possible
    >> better ways of doing this, but they won't work in all cases, so depending
    >> on why you need this, they may or may not apply.

    >
    > Yes, and I think I've experienced the reason "why" many times over.
    > It's not something that I enjoy doing and I consider it breaking
    > almost every rule I know regarding good programming practice.
    >
    > The reason I do this however, is because I work with embedded
    > applications for small to puny targets and in the embedded world, tool-
    > chains often come with crippled or incomplete standard libraries. Even
    > as such, they provide quite a lot of value and rewriting them just to
    > get it right is almost always too much work. So what I do is adding
    > what's missing (usually functions) and "merging" the original header
    > files with the additional declarations.


    Unless I've missed something, the obvious approach is to
    give your modified headers distinct names: "mcstdio.h" or
    something of the kind, and #include them via those names.
    The content of one of these might look something like

    #ifndef MCSTDIO_H
    #define MCSTDIO_H

    #ifdef TINYCHIP
    /* Vendor's <stdio.h> is almost perfect, but
    * I need to do something sneaky with stderr
    */
    #include <stdio.h>
    extern FILE * mc_get_stderr_substitute(void);
    #undef stderr
    #define stderr mc_get_stderr_substitute()
    #endif

    #ifdef MEGACHIP
    /* For once, a vendor's <stdio.h> is fine */
    #include <stdio.h>
    #endif

    #ifdef CHIPOFFTHEOLDBLOCK
    /* Vendor's <stdio.h> is completely hopeless;
    * implement my own substitute here
    */
    #endif

    #endif /* MDSTDIO_H */

    In other words, handle the variations between platforms
    within the headers themselves, instead of by #include'ing
    variants of the headers. If the prospect of merging many
    modified headers into a single file is daunting, use one
    more level of indirection:

    #ifndef MCSTDIO_H
    #define MCSTDIO_H

    #ifdef TINYCHIP
    #include "/tools/tinychip/tiny_stdio.h"
    #endif

    #ifdef MEGACHIP
    #include <stdio.h>
    #endif

    #ifdef CHIPOFFTHEOLDBLOCK
    #include "/tools/tinychip/oldblock_stdio.h"
    #endif

    #endif /* MCSTDIO_H */

    --
     
    Eric Sosman, Dec 7, 2007
    #6
  7. On Dec 7, 11:10 pm, Eric Sosman <> wrote:
    > Mamluk Caliph wrote:
    > > On Dec 2, 8:32 pm, Harald van D©¦k <> wrote:
    > >> On Sun, 02 Dec 2007 09:13:42 -0800, Mamluk Caliph wrote:
    > >>> [...]
    > >>> Also if somebody has another suggestion for handling multiple headers of
    > >>> the same name that would be welcome too.
    > >> Generally speaking, it would be good to simply completely avoid this. If
    > >> you absolutely need this, could you explain why? There are some possible
    > >> better ways of doing this, but they won't work in all cases, so depending
    > >> on why you need this, they may or may not apply.

    >
    > > Yes, and I think I've experienced the reason "why" many times over.
    > > It's not something that I enjoy doing and I consider it breaking
    > > almost every rule I know regarding good programming practice.

    >
    > > The reason I do this however, is because I work with embedded
    > > applications for small to puny targets and in the embedded world, tool-
    > > chains often come with crippled or incomplete standard libraries. Even
    > > as such, they provide quite a lot of value and rewriting them just to
    > > get it right is almost always too much work. So what I do is adding
    > > what's missing (usually functions) and "merging" the original header
    > > files with the additional declarations.

    >
    > Unless I've missed something, the obvious approach is to
    > give your modified headers distinct names: "mcstdio.h" or
    > something of the kind, and #include them via those names.
    > The content of one of these might look something like
    >
    > #ifndef MCSTDIO_H
    > #define MCSTDIO_H
    >
    > #ifdef TINYCHIP
    > /* Vendor's <stdio.h> is almost perfect, but
    > * I need to do something sneaky with stderr
    > */
    > #include <stdio.h>
    > extern FILE * mc_get_stderr_substitute(void);
    > #undef stderr
    > #define stderr mc_get_stderr_substitute()
    > #endif
    >
    > #ifdef MEGACHIP
    > /* For once, a vendor's <stdio.h> is fine */
    > #include <stdio.h>
    > #endif
    >
    > #ifdef CHIPOFFTHEOLDBLOCK
    > /* Vendor's <stdio.h> is completely hopeless;
    > * implement my own substitute here
    > */
    > #endif
    >
    > #endif /* MDSTDIO_H */
    >
    > In other words, handle the variations between platforms
    > within the headers themselves, instead of by #include'ing
    > variants of the headers. If the prospect of merging many
    > modified headers into a single file is daunting, use one
    > more level of indirection:
    >
    > #ifndef MCSTDIO_H
    > #define MCSTDIO_H
    >
    > #ifdef TINYCHIP
    > #include "/tools/tinychip/tiny_stdio.h"
    > #endif
    >
    > #ifdef MEGACHIP
    > #include <stdio.h>
    > #endif
    >
    > #ifdef CHIPOFFTHEOLDBLOCK
    > #include "/tools/tinychip/oldblock_stdio.h"
    > #endif
    >
    > #endif /* MCSTDIO_H */
    >
    > --
    >


    This is actually very close to how different targets and vendor are
    handled today, except for the name of the headerfiles themselves i.e.

    For smaller (as in fewer lines of code) applications this solution
    would work well. In cases where one integrates other peoples code, or
    complete external projects for that matter, it would enforce a
    modification in each source (small, but never the less). Depending on
    how many files is concerned and how these are managed, this could be
    more ore less difficult.

    I would not mind having the headerfiles named distinctly, but
    enforcing other teamembers to use them would most likely be cumbersome
    and error-prone.

    So what I do is providing an alternative set of headerfiles in a
    separate directory structure and then just modify the buildsystem (one
    common point of management) is needed so that the modified ones are
    found before any vendor provided ones. Actually quite few files are
    needed, so I could do without the header filename mechanism as well
    and haredcode the include paths in the sources.

    However, it would be even better if the mechanism would work since it
    makes it possible to handle different buildhost installations more
    easily. The CHAINPATH variable is btw not hardcoded in reality, but
    provided externally by the build system.

    In essence my issue can be summarized in the following: "How to
    include something based on a macro"

    For example, this works in most compilers I've tried:

    #define ANAME "stdio.h"
    #include ANAME

    Why I can't combine paths with filenames is beyond my understanding
    though...

    /Michael
     
    Mamluk Caliph, Dec 9, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Cronus
    Replies:
    1
    Views:
    717
    Paul Mensonides
    Jul 15, 2004
  2. G Fernandes
    Replies:
    1
    Views:
    555
  3. Wessi
    Replies:
    3
    Views:
    911
    Lawrence Kirby
    Aug 11, 2005
  4. John Devereux

    converting a preprocessor token to a string constant

    John Devereux, Sep 26, 2005, in forum: C Programming
    Replies:
    3
    Views:
    432
    John Devereux
    Sep 26, 2005
  5. =?Utf-8?B?Y2FzaGRlc2ttYWM=?=

    This is an unexpected token. The expected token is 'NAME'

    =?Utf-8?B?Y2FzaGRlc2ttYWM=?=, Jul 13, 2007, in forum: ASP .Net
    Replies:
    2
    Views:
    825
    =?Utf-8?B?Y2FzaGRlc2ttYWM=?=
    Jul 13, 2007
Loading...

Share This Page