Declaration of main()

Discussion in 'C Programming' started by BartC, Mar 29, 2014.

  1. BartC

    BartC Guest

    Never thought I'd be asking about this, but it's giving me some trouble!

    I want to use a declaration that looks like this:

    typedef unsigned char* ichar;

    int main(int nparams,ichar (*params)[]) {
    int i;

    for (i=0; i<nparams; ++i)
    printf("%d: %s\n",i,(*params));

    }

    (Why? Because this will be the output of a code generator.)

    This works perfectly well with four different C compilers. But Clang doesn't
    like it: it insists the params type must be char** (and it can't be signed
    nor unsigned either, just unspecified, however the latter only gives a
    warning; as I have it above, it is an error).

    Is there in fact something wrong with the way I'm doing it?

    One way to get around it, seems to be to move the main function, which
    appears to be special to Clang, outside of the non-C source language.
    Another is to make a special case when compiling a function called 'main',
    and bodge the output that way. But I don't particularly want to do this, and
    it's just pandering to this very fussy compiler.
     
    BartC, Mar 29, 2014
    #1
    1. Advertisements

  2. BartC

    Eric Sosman Guest

    This is equivalent to

    int main(int nparams, ichar *(*params))

    .... which is equivalent to

    int main(int nparams, ichar **params)

    .... which is equivalent to

    int main(int nparams, unsigned char* **params)

    .... which is equivalent to

    int main(int nparams, unsigned char ***params)

    .... which is in no way equivalent to or even remotely like

    int main(int argc, char **argv)
    Left as an exercise.
     
    Eric Sosman, Mar 29, 2014
    #2
    1. Advertisements

  3. BartC

    Kaz Kylheku Guest

    argv is a pointer to char*

    what you have here is a pointer to (incomplete) array of char*

    This is a needless complication.

    Generate an additional local and a cast to initialize it:

    int main(int nparams, char **params_in)[]) {
    ichar (*params_in)[] = (ichar (*)[]) params_in;

    Really, does your code generator even have to take over
    main?

    You can have

    int my_main(/*whatever you want*/)
    {
    }


    int main(int argc, char **argv)
    {
    /* generate call to my_main */
    }

    This main can be part of a fixed module: your own "BartCRT.o"
    that is linked in. :)
     
    Kaz Kylheku, Mar 29, 2014
    #3
  4. BartC

    BartC Guest

    Forgetting the ichar part for a minute, CDECL tells me that:

    char *(*params)[]

    is 'pointer to array of pointer to char'. Which likely corresponds to the
    actual structure of the argv data (and exactly matches what I expressed in
    the source language). The difference from char** is that one of the pointers
    points to the whole array, instead of the first element, which is taken care
    of with the special indexing used.

    And type-wise, no compiler has complained, not even Clang; it just doesn't
    like it for main().
    Apparently the only thing wrong with it is that is it not char**.
     
    BartC, Mar 29, 2014
    #4
  5. BartC

    BartC Guest

    I thought that was exactly what argv was.
    Yes, but I'd have to write it in a more C-like language. I was trying to
    minimise the need for that (for cases where there are complex C declarations
    I don't know about and can't really replicate, such as FILE); I didn't
    expect to need it for main()!
     
    BartC, Mar 29, 2014
    #5
  6. BartC

    Eric Sosman Guest

    No, that is *not* what you expressed in the source language.
    Have you forgotten that C functions do not and cannot take array
    parameters? 6.7.6.3p7:

    "A declaration of a parameter as ‘‘array of type’’ shall
    be adjusted to ‘‘qualified pointer to type’’, [...]"

    I stand by my claim that your code asks for three levels of
    indirection, not two. (And I'd be interested to hear what
    cdecl makes of the *entire* function declaration, not just
    a snippet of text out of the middle. Context Matters.)
    Yes, that's what's wrong with it. It's not `char**', it's
    `unsigned char***' with three asterisks, not two, and that's
    wrong. R-O-N-G, wrong.
     
    Eric Sosman, Mar 29, 2014
    #6
  7. BartC

    BartC Guest

    (I meant the source language I'm translating to C.

    In that language, and again disregarding the ichar subtype, the type of
    'params' is 'ref [] ref char'; an Algol-68-style left-to-right type-spec
    which means 'pointer to array of pointer to char', exactly what CDECL told
    me what 'char *(*params)[] meant. Two levels of pointer.)
    I would guess that applies to the top-level parameter type. I'm fairly
    certain you can pass pointer-to-array types in C (and I've done that quite a
    lot).
    That extra [] threw me too, but I don't think it counts in this context.

    (And I'd be interested to hear what
    The online cdecl I used is down at the minute. However, I understand that
    top-level arrays as parameters are treated differently to those elsewhere.
    But this wasn't a top-level one.
     
    BartC, Mar 29, 2014
    #7
  8. BartC

    Eric Sosman Guest

    Fine, but you're translating it to erroneous C.
    Okay, let's just try a little experiment. Put the
    following into a source file:

    #include <stdio.h>

    /* BartC's main, renamed to protect the innocent */
    int bartc(int nparams, char *(*params)[]) {
    printf("Hello from bartc: %d params at %p\n",
    nparams, (void*) params);
    return 0;
    }

    /* A "statutory" main */
    int main(int argc, char **argv) {
    puts("Hello from main!");
    /* Call the function BartC thinks is compatible with
    * a statutory main, passing the actual main's own
    * arguments. If BartC is right and his function is
    * truly equivalent to a real main, this will work.
    */
    return bartc(argc, argv);
    }

    Feed it to your favorite C compilers (using C11, C99, C90, C89,
    or even K&R C, with or without TC's) and see what they have to
    say about it.
     
    Eric Sosman, Mar 29, 2014
    #8
  9. BartC

    BartC Guest

    I don't even need to try and compile that (although I did). It's obvious
    that the two access argv in different ways (one a pointer to pointer, the
    other pointer to array of pointers). Both will work.

    The question is, is argv in main() *only* defined as pointer to pointer?
    Clang obviously thinks so.

    Try this experiment of mine:

    #include <stdio.h>

    void print_cstyle(int n, char** array){
    int i;
    for (i=0; i<n; ++i)
    printf("C: %d %s\n",i,array);
    }

    void print_bstyle(int n, char*(*array)[]){
    int i;
    for (i=0; i<n; ++i)
    printf("B: %d %s\n",i,(*array));
    }

    /* A "statutory" main */
    int main(void) {
    char *s[] = {"one","two","three"};
    int n=sizeof(s)/sizeof(s[0]);

    print_cstyle(n,s);
    puts("");
    print_bstyle(n,&s);

    }

    The array 's' is a little like the argv data.

    I'm printing this using two functions: print_cstyle() which accesses it like
    you say main() does. And print_bstyle(), which uses my pointer-to-array
    style.

    Both work. There are no casts involved. But is print_bstyle() valid, defined
    C? If so, then I should be able to use pointer-to-array types for argv in
    main.
     
    BartC, Mar 29, 2014
    #9
  10. BartC

    Eric Sosman Guest



    Missing the final NULL, but okay.
    I think it's valid, yes. But note that sneaky little `&'
    in the call: You're not passing it what main() receives, you're
    not passing it what the environment will pass to main(). If
    you declare main() the same way you've declared print_bstyle(),
    you have declared main() incorrectly.[*]

    print_bstyle() is (on cursory inspection) a valid C function,
    but plenty of valid C functions are not main()-like.

    [*] Obligatory nitpick: An implementation may support other
    forms of main() besides the two required by the Standard, like
    `double main(const struct args_s *, unsigned long)', and perhaps
    somewhere you'll find an implementation that supports your style
    of main(). I've not seen one in not quite four decades of using C,
    but maybe you'll get lucky.
     
    Eric Sosman, Mar 29, 2014
    #10


  11. Yes, very much so.

    If you haven't already done so, download

    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

    which is the latest freely available draft of the ISO C standard. Go to
    section 5.1.2.2.1 (which applies to hosted implementations).

    It specifies that a program's entry point is named "main", and that it
    may be defined as:

    int main(void) { /* ... */ }

    or as

    int main(int argc, char *argv[]) { /* ... */ }

    or equivalent; or in some other implementation-defined manner.

    The "or equivalent" permits you to use different names for "argc"
    and "argv" (but *not* for "main"), or to use equivalent typedefs,
    or to write "char **argv" rather than "char *argv[]" (these are
    equivalent only as parameter declarations, not in other contexts).

    The "or in some other implementation-defined manner" means that
    a compiler *may* permit other definitions of main, as long as it
    documents them, but need not do so. For example, I believe that
    Microsoft's C compiler explicitly permits "void main(void)".

    Furthermore, since these requirements are not stated as
    "constraints", a compiler is not required to diagnose violations
    of them. Even if a compiler does not document that it will accept a
    different form, it may still do so, with or without a diagnostic.
    And if it does, the program's behavior is not defined by the
    standard.

    All this is entirely consistent with the behavior you're seeing:
    some compilers quietly accept different forms, other do not.

    Your definition:
    int main(int nparams,ichar (*params)[]) { /* ... */ }
    (where ichar is a typedef for unsigned char*) is equivalent to:
    int main(int nparams,unsigned char * (*params)[]) { /* ... */ }

    "params" is of type "pointer to array of pointer to unsigned char".

    The fact that this is the output of a code generator doesn't seem
    to be relevant. If you happen to have a compiler that accepts your
    definition, the runtime environment that invokes the program will
    still almost certainly assume that main accepts either no arguments,
    or two arguments of types int and char**. By treating the char**
    argument as if it were of a completely different type, it's unlikely
    that your program will do what you want it to do.

    You need to find a different way to do whatever it is you're trying to
    do.
    The main function is special to *any* C compiler. The compiler isn't
    being "very fussy", it's simply following the C language definition.
    You should do the same.
     
    Keith Thompson, Mar 29, 2014
    #11
  12. You were mistaken. argv is of type char**. The particular value
    passed into main happens to point to the first element of an array
    of char*; that's a statement about the *value* of argv, not its type.

    [...]
    The C language defines what the parameters of main can be, with some
    limited flexibility.

    What's unusual about main (vs. other functions) isn't that its
    definition is restricted to just a few forms, it's that its definition
    is more *flexible* than other functions. It is the interface between
    your program and the calling environment. If it were defined like other
    functions, the environment would define a prototype for it, and you as a
    programmer would have to write a definition that conforms to that
    prototype. For historical reasons, there is no actual prototype, and
    you are permitted a bit more flexibility.

    You can write a function that takes parameters
    (int nparams,ichar (*params)[]); you just can't call it "main".
     
    Keith Thompson, Mar 29, 2014
    #12
  13. BartC

    BartC Guest

    OK, I will have to fix it then. (I have to keep in Clang's good books
    because the other non-gcc compilers, although they don't care about how I
    define main(), complain about a lot of other things!)
    That is a valid point. Although with a pointer to array like that, stepping
    to the next array is not always meaningful.
     
    BartC, Mar 29, 2014
    #13
  14. BartC

    BartC Guest

    I'm talking nonsense!

    Obviously this can also be written in my source language, and I can simply
    use pointer-to-pointer for params like C expects ('ref ref char params' in
    my parlance).

    In fact the simple solution for my program as it is, was to switch to
    pointer-to-pointer, and change the handful of accesses to params from
    indexing to pointer offsets (this source language doesn't mix pointers and
    arrays like C does). But in general I will use the wrapper.
     
    BartC, Mar 29, 2014
    #14
  15. That's not what "char *(*params)[]" means. It declares "params" as a
    pointer to an array, which is perfectly valid as a parameter type (it's
    a pointer type, not an array type).

    The fact that the array type is incomplete is a bit troubling, but not
    invalid as far as I can tell.

    [...]
     
    Keith Thompson, Mar 29, 2014
    #15
  16. Yes. Clang is quite correct.

    If you define main with a second parameter that's of a type incompatible
    with char**, how are you going to persuade the runtime environment to
    invoke it with data of the type it expects?
     
    Keith Thompson, Mar 29, 2014
    #16
  17. BartC

    Kaz Kylheku Guest

    You don't have to; it will work fine. Unless the machine is completely bizarre,
    a pointer to an array has a representation which is interchangeable with a
    pointer to the first element of an array of the same type.

    A formal parameter of type "pointer to incomplete array of T" should have
    no trouble accepting a "pointer to T" actual argument.

    If you have a machine where a T ** and a T *(*[]) have a different,
    incompatible representation and are passed differently as function
    parameters, I'd like to hear about it.

    Type mismatches can also confound due to invalid aliasing, but in practice
    that isn't a problem across module boundaries in this type of situation.

    And anyway, there is no aliasing issue between an array and an array element.
    Arrays cannot be assigned as a unit in C, and if they could be, then any
    modification to an object of type T as an array element involved in
    an array-level operation would have o be suspected as modifying some T that is
    pointed-at by a T * pointer and vice versa.

    So, basically this only fails because it's "artificially" rejected by a type
    check, which has nothing to do with the run-time environment not being able to
    handle it if the check is removed and code is generated anyway.
     
    Kaz Kylheku, Mar 30, 2014
    #17
  18. BartC

    Kaz Kylheku Guest

    That it is incomplete is actually an important part of this hack, since it
    allows params[] to represent "any" number of arguments without running into the
    additional undefined behavior of overruning the dimension of an array.
     
    Kaz Kylheku, Mar 30, 2014
    #18
  19. BartC

    Kaz Kylheku Guest

    Now suppose that the bounds information for different pointer types
    is binary compatible, and normalized to bytes. The argv pointer, generated as a
    char ** would carry a bounds field denoting the extent, in bytes, of the
    argv vector (including null terminating entry). If this pointer is
    reinterpreted as a pointer to an incomplete array, then the bounds info nicely
    becomes reinterpreted as the actual size of that array.

    A bound schecking system based on units of element size rather than bytes
    would have a problem dealing with pointers to incomplete types, where
    the element size is not known.

    On the other hand, a bounds checking system could have complicated, rich
    meta-data which describes details about an object, such as all of the
    dimensions of a de-facto multi-dimensional array (and has ways of dealing with
    incomplete objects). That meta-data might depend on the structure of the
    static type declaration for its correct interpretation.
     
    Kaz Kylheku, Mar 30, 2014
    #19
  20. BartC

    BartC Guest

    This is exactly it.

    What makes it more bizarre, is that the actual argv data almost certainly
    exists as an array of char* pointers anyway, otherwise how would it be
    possible to step or index the argv value?

    Yet the language spec makes it impossible (when using Clang at least) to
    actually define it as an array! Which is a method completely acceptable in
    any other context.
     
    BartC, Mar 30, 2014
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.