Sub-sub-expression evaluation order

Discussion in 'C Programming' started by Frank Wallingford, Dec 9, 2004.

  1. Note: For those with instant reactions, this is NOT the common "why is i
    = i++ not defined?" question. Please read on.


    I came across an interesting question when talking with my colleagues.
    According to the standard, most operators evaluate their operands in an
    unspecified order. This means that in code like this:

    f() + g()

    the two functions may be called in either order, and still be standards-
    compliant.

    Fine. Now, when the evaluation of one sub-expression begins, is there
    anything in the standard that says that it must complete (ignoring side-
    effects) before the other sub-expression evaluation begins? In other
    words, can pieces of the sub-expressions be evaluated in an interleaving
    fashion?

    Take this example:

    f(i++) + g(i++)

    The difference from the first example is the sequence points for the
    function calls.

    Now, obviously since f() and g() may be called in any order, this code
    is non-deterministic. My question is: can the compiler evaluate the sub-
    expressions in the following order, interleaving the sub-sub-expression
    evaluations?

    t1 = i;
    t2 = i;
    i++;
    i++;
    f(t1);
    g(t2);

    Note that all side-effects were finished before the function-call
    sequence points, so this doesn't violate that.

    If there is nothing in the standard to prevent this, then not only is
    the code "f(i++) + g(i++)" non-deterministic, but it remains as
    undefined as "i++ + i++".


    Here is an example where I think the above problem exists, although it's
    not obvious (please ignore the obvious problems and poor design; this is
    pseudo-code to illustrate a point):


    const char *a(void)
    {
    static char *x;
    if (x) free(x);
    x = (char*)malloc(2);
    strcpy(x, "a");
    return x;
    }

    printf("%s %s\n", a(), a());


    The obvious problem here is that the second call to a() (in either
    order) will free the pointer returned by the first call.

    Here is the revised version, but I believe that the following is still
    undefined:

    const char *a(void) { ... same as above ... }

    static list *strings; // implementation not important

    const char *cache(const char *in)
    {
    char *out;
    out = strdup(in);
    add_to_list(out);
    return out;
    }

    printf("%s %s\n", cache(a()), cache(a()));


    The cache() function only exists so that calls like 'printf' can be made
    - that is, they copy the argument off and return the copy (and save it
    in a list to be freed later). This way, the second call to a() doesn't
    affect anything by freeing the old value.

    But, similar to the above, I believe that an implementation is free to
    evaluate the printf call in this order:

    char *t1 = a();
    char *t2 = a();
    char *t3 = cache(t1);
    char *t4 = cache(t2);
    printf("%s %s\n", t3, t4);


    Is this right? Is it undefined, according to the standard?


    -Frank

    --
     
    Frank Wallingford, Dec 9, 2004
    #1
    1. Advertising

  2. Frank Wallingford wrote:
    > ...
    > I came across an interesting question when talking with my colleagues.
    > According to the standard, most operators evaluate their operands in an
    > unspecified order. This means that in code like this:
    >
    > f() + g()
    >
    > the two functions may be called in either order, and still be standards-
    > compliant.


    That's correct.

    > Fine. Now, when the evaluation of one sub-expression begins, is there
    > anything in the standard that says that it must complete (ignoring side-
    > effects) before the other sub-expression evaluation begins? In other
    > words, can pieces of the sub-expressions be evaluated in an interleaving
    > fashion?


    Yes. The evaluation can be implemented in any way, as long as it
    produces the result that satisfies the requirements imposed by the
    language specification.

    > Take this example:
    >
    > f(i++) + g(i++)
    >
    > The difference from the first example is the sequence points for the
    > function calls.


    There are indeed sequence points at the beginning of each function's
    execution. However, the standard doesn't require the evaluation of
    argument subexpressions to be "tightly packed" with the execution of the
    corresponding function. In other words, in this particular case the
    expression might be interpreted as two consecutive 'i++' evaluations
    (not separated by any sequence point) followed by calls to 'f' and 'g'.
    This means that the modifications of 'i' are potentially violating the
    requirements of the standard and the code produces undefined behavior.

    > Now, obviously since f() and g() may be called in any order, this code
    > is non-deterministic.


    If by "non-deterministic" you mean that the code produces unspecified
    result, you are wrong. The code produces undefined behavior.

    > My question is: can the compiler evaluate the sub-
    > expressions in the following order, interleaving the sub-sub-expression
    > evaluations?
    >
    > t1 = i;
    > t2 = i;
    > i++;
    > i++;
    > f(t1);
    > g(t2);


    Yes. That's exactly what I said above. Note though, that there wouldn't
    really be any sequence points between two modifications of 'i', which
    leads to UB.

    > Note that all side-effects were finished before the function-call
    > sequence points, so this doesn't violate that.


    What exactly do you mean by "that" here?

    > If there is nothing in the standard to prevent this, then not only is
    > the code "f(i++) + g(i++)" non-deterministic, but it remains as
    > undefined as "i++ + i++".


    Exactly.

    > Here is an example where I think the above problem exists, although it's
    > not obvious (please ignore the obvious problems and poor design; this is
    > pseudo-code to illustrate a point):
    >
    > const char *a(void)
    > {
    > static char *x;
    > if (x) free(x);
    > x = (char*)malloc(2);
    > strcpy(x, "a");
    > return x;
    > }
    >
    > printf("%s %s\n", a(), a());
    >
    > The obvious problem here is that the second call to a() (in either
    > order) will free the pointer returned by the first call.


    That's correct. But I'd say that the nature of the problem is very
    different here.

    > Here is the revised version, but I believe that the following is still
    > undefined:
    >
    > const char *a(void) { ... same as above ... }
    >
    > static list *strings; // implementation not important
    >
    > const char *cache(const char *in)
    > {
    > char *out;
    > out = strdup(in);
    > add_to_list(out);
    > return out;
    > }
    >
    > printf("%s %s\n", cache(a()), cache(a()));
    >
    > The cache() function only exists so that calls like 'printf' can be made
    > - that is, they copy the argument off and return the copy (and save it
    > in a list to be freed later). This way, the second call to a() doesn't
    > affect anything by freeing the old value.
    >
    > But, similar to the above, I believe that an implementation is free to
    > evaluate the printf call in this order:
    >
    > char *t1 = a();
    > char *t2 = a();
    > char *t3 = cache(t1);
    > char *t4 = cache(t2);
    > printf("%s %s\n", t3, t4);
    >
    > Is this right?


    Yes, that's perfectly possible.

    >Is it undefined, according to the standard?


    I don't see any problems with the last portion of code. It doesn't have
    any violations present in the 'f(i++) + g(i++)' example. Why exactly do
    you suspect that it is undefined?

    --
    Best regards,
    Andrey Tarasevich
     
    Andrey Tarasevich, Dec 10, 2004
    #2
    1. Advertising

  3. Andrey Tarasevich wrote:
    >> Here is the revised version, but I believe that the following is still
    >> undefined:
    >>
    >> const char *a(void) { ... same as above ... }
    >>
    >> static list *strings; // implementation not important
    >>
    >> const char *cache(const char *in)
    >> {
    >> char *out;
    >> out = strdup(in);
    >> add_to_list(out);
    >> return out;
    >> }
    >>
    >> printf("%s %s\n", cache(a()), cache(a()));
    >>
    >> The cache() function only exists so that calls like 'printf' can be made
    >> - that is, they copy the argument off and return the copy (and save it
    >> in a list to be freed later). This way, the second call to a() doesn't
    >> affect anything by freeing the old value.
    >>
    >> But, similar to the above, I believe that an implementation is free to
    >> evaluate the printf call in this order:
    >>
    >> char *t1 = a();
    >> char *t2 = a();
    >> char *t3 = cache(t1);
    >> char *t4 = cache(t2);
    >> printf("%s %s\n", t3, t4);
    >>
    >> Is this right?

    >
    > Yes, that's perfectly possible.
    >
    >>Is it undefined, according to the standard?

    >
    > I don't see any problems with the last portion of code. It doesn't have
    > any violations present in the 'f(i++) + g(i++)' example. Why exactly do
    > you suspect that it is undefined?
    >


    Oops... I'm sorry. I misread the code. Of course, the potential
    evaluation order you suggested leads to the same problem as in the
    original version of the code. If you integrate the functionality of
    'cache()' into 'a()' then the problem will disappear (somehow I thought
    that this has already been done)

    --
    Best regards,
    Andrey Tarasevich
     
    Andrey Tarasevich, Dec 10, 2004
    #3
  4. Andrey Tarasevich <> wrote in
    news::

    > Andrey Tarasevich wrote:
    >>> Here is the revised version, but I believe that the following is
    >>> still undefined:
    >>>
    >>> const char *a(void) { ... same as above ... }
    >>>
    >>> static list *strings; // implementation not important
    >>>
    >>> const char *cache(const char *in)
    >>> {
    >>> char *out;
    >>> out = strdup(in);
    >>> add_to_list(out);
    >>> return out;
    >>> }
    >>>
    >>> printf("%s %s\n", cache(a()), cache(a()));
    >>>
    >>> The cache() function only exists so that calls like 'printf' can be
    >>> made - that is, they copy the argument off and return the copy (and
    >>> save it in a list to be freed later). This way, the second call to
    >>> a() doesn't affect anything by freeing the old value.
    >>>
    >>> But, similar to the above, I believe that an implementation is free
    >>> to evaluate the printf call in this order:
    >>>
    >>> char *t1 = a();
    >>> char *t2 = a();
    >>> char *t3 = cache(t1);
    >>> char *t4 = cache(t2);
    >>> printf("%s %s\n", t3, t4);
    >>>
    >>> Is this right?
    >>>Is it undefined, according to the standard?

    >>
    >> I don't see any problems with the last portion of code. It doesn't
    >> have any violations present in the 'f(i++) + g(i++)' example. Why
    >> exactly do you suspect that it is undefined?
    >>

    >
    > Oops... I'm sorry. I misread the code. Of course, the potential
    > evaluation order you suggested leads to the same problem as in the
    > original version of the code. If you integrate the functionality of
    > 'cache()' into 'a()' then the problem will disappear (somehow I
    > thought that this has already been done)


    Right - the last example is undefined because if the implementation re-
    orders the sub-expressions and calls 'a' twice before calling 'cache',
    then it ends up accessing memory that has been deallocated.

    Thanks for your answers.

    -Frank

    --
     
    Frank Wallingford, Dec 10, 2004
    #4
  5. Frank Wallingford

    Chris Torek Guest

    In article <Xns95BAB949C3D9fwallingfordhotmailc@69.28.186.158>
    Frank Wallingford <> wrote:
    >Note: For those with instant reactions, this is NOT the common "why is i
    >= i++ not defined?" question. Please read on.


    Indeed. (It is almost a comp.std.c question, it is so well-specified :) )

    [much snippage]
    > f(i++) + g(i++)
    >... then not only is
    >the code "f(i++) + g(i++)" non-deterministic, but it remains as
    >undefined as "i++ + i++".


    Yes. There are sequence points before the calls to f() and g(),
    after the evaluation of the various parameters, but the parameter
    evaluations can be "interleaved", giving this undefinedness (I
    believe -- you might check in comp.std.c to see if there is any
    sort of consensus on this one).

    [another example, mostly snipped]

    >But, similar to the above, I believe that an implementation is free to
    >evaluate the printf call in this order:
    >
    > char *t1 = a();
    > char *t2 = a();
    > char *t3 = cache(t1);
    > char *t4 = cache(t2);
    > printf("%s %s\n", t3, t4);
    >
    >Is this right? Is it undefined, according to the standard?


    I believe so, yes.
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
     
    Chris Torek, Dec 10, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jan Engelhardt
    Replies:
    3
    Views:
    382
    Mike Wahler
    Aug 20, 2003
  2. Christos TZOTZIOY Georgiou

    Introspection: expression evaluation order - precedence

    Christos TZOTZIOY Georgiou, Oct 24, 2003, in forum: Python
    Replies:
    0
    Views:
    315
    Christos TZOTZIOY Georgiou
    Oct 24, 2003
  3. Replies:
    26
    Views:
    679
    Mark McIntyre
    Jun 15, 2007
  4. Replies:
    2
    Views:
    295
    Andrew Koenig
    Jul 7, 2007
  5. Stasa Medjedovic

    Order of expression evaluation

    Stasa Medjedovic, May 15, 2008, in forum: C++
    Replies:
    6
    Views:
    464
    Kai-Uwe Bux
    May 16, 2008
Loading...

Share This Page