Restricted pointer parameters in printf()

P

Peter Ammon

It's my understanding that the printf() function is declared as

int printf(const char * restrict format, ...);

in stdio.h. And loosely speaking, if a parameter is declared as
restricted, then accesses to the object must go through that parameter.

Does this mean that

printf("%s", "%s");

is illegal if the string literals are coalesced into the same pointer?
printf() would be accessing the same object through different
parameters, one of which is declared as restricted.

Another possibility is

printf("%s is tricky", "is tricky");

since the string literal objects can again overlap even though their
pointers have different values. Is that code potentially illegal?

Thanks for your thoughts,
-Peter
 
J

Jack Klein

It's my understanding that the printf() function is declared as

int printf(const char * restrict format, ...);

in stdio.h. And loosely speaking, if a parameter is declared as
restricted, then accesses to the object must go through that parameter.

Does this mean that

printf("%s", "%s");

is illegal if the string literals are coalesced into the same pointer?
printf() would be accessing the same object through different
parameters, one of which is declared as restricted.

Another possibility is

printf("%s is tricky", "is tricky");

since the string literal objects can again overlap even though their
pointers have different values. Is that code potentially illegal?

Thanks for your thoughts,
-Peter

With the exception of the destination strings for (v)s(n)printf(),
members of the *printf() family never modify memory passed in via a
pointer to char.

The restrict keyword is completely unnecessary and has no effect on
objects that are not modified. It's purpose is to allow the compiler
to perform optimizations based on the fact that modifying one object
is guaranteed not to change the value of some other object. There is
no possibility of that happening when the objects are not modified.
 
P

Peter Ammon

Jack said:
> On Sat, 10 Apr 2004 04:46:37 GMT, Peter Ammon
>
>
>
>
> With the exception of the destination strings for (v)s(n)printf(),
> members of the *printf() family never modify memory passed in via a
> pointer to char.
>
> The restrict keyword is completely unnecessary and has no effect on
> objects that are not modified.

Why's that?
> It's purpose is to allow the compiler
> to perform optimizations based on the fact that modifying one object
> is guaranteed not to change the value of some other object. There is
> no possibility of that happening when the objects are not modified.

Correct me if I'm wrong, but I thought that the restrict keyword was
defined in terms of "accesses," which would includes reads as well as
writes?
 
J

Jack Klein

Why's that?


Correct me if I'm wrong, but I thought that the restrict keyword was
defined in terms of "accesses," which would includes reads as well as
writes?

The restrict keyword has its roots in the proposed noalias keyword,
which was proposed, disagreed over vehemently, and finally not
included in the original C standard ANSI89/ISO 90.

A weakness of C compared to other languages is the possibility of
aliasing. Let's take a trivial program like this:

#include <stdio.h>

int arr1 [5] = { 2, 4, 6, 8, 10 };
int arr2 [5] = { 2, 4, 6, 8, 10 };

void some_func(int *a, int *b, int count)
{
while (count--)
{
if (*a == *b)
{
*a = *b + 2;
}
++a;
}
}

int main(void)
{
int b = 2;

some_func(arr1, &b, 5);
some_func(arr2, arr2, 5);
printf("arr1 = { %d, %d, %d, %d, %d }\n",
arr1[0],arr1[1],arr1[2],arr1[3],arr1[4]);
printf("arr2 = { %d, %d, %d, %d, %d }\n",
arr2[0],arr2[1],arr2[2],arr2[3],arr2[4]);
return 0;
}

A simple optimization in some_func() would be for the compiler to read
*b once, and cache it in a local variable or register, instead of
reading it each time. But what if b is the address of one of the ints
in the range a[0] to a[count - 1]? Let's run it and look at the
output:

arr1 = { 4, 4, 6, 8, 10 }
arr2 = { 4, 6, 6, 8, 10 }

Both calls to some_func() were made with a count of 5 and an input
array of 5 ints containing 2, 4, 6, 8, and 10. And finally, both
calls were made with a pointer to an int having the value 2.

If the compiler performed the optimization of reading *b only once and
cacheing the value, both arrays would have had identical values after
being processed by some_func(), which would have been wrong. When
some_func() is called the second time, b points to arr2[0], so the
first pass of the loop causes *b to be modified. This causes *b to
match *a on the second pass of the loop, and the code correctly
modifies both arr2[0] and arr2[1].

The price of this flexibility in C is the fact that the compiler
cannot perform the optimization of caching the value of b, since
modifying an object through pointer a could modify what b points to.

If b was defined as a pointer to a const int, the compiler could
perform that optimization. On the other hand, if b was restrict
qualified, the compiler could perform that optimization. If it did,
the results of the second function call would be wrong, but that would
be the programmer's fault, for invoking undefined behavior.

The description of the restrict keyword in the standard is rather
complex, but it still boils down to informing the compiler that the
value of an object will not be modified unexpectedly via a different
lvalue. This allows, but does not require, the compiler to perform
certain optimizations when the programmer specifies, via the restrict
keyword, that an object will not be modified through another pointer.

One of the examples from the standard might be helpful in
understanding the intent:

====
10 EXAMPLE 3 The function parameter declarations
void h(int n, int * restrict p, int * restrict q, int * restrict r)
{
int i;
for (i = 0; i < n; i++)
p = q + r;
}
illustrate how an unmodified object can be aliased through two
restricted pointers. In particular, if a and b are disjoint arrays, a
call of the form h(100, a, b, b) has defined behavior, because array b
is not modified within function h.
====

So even though the term "access" is used several times in the
definition of the restrict qualifier, it has no meaning for values
that are not modified at all within the scope of the qualifier.
 
C

Chris Torek

It's my understanding that the printf() function is declared as

int printf(const char * restrict format, ...);

in stdio.h. And loosely speaking, if a parameter is declared as
restricted, then accesses to the object must go through that parameter.

Yes, but this is "too loosely". :)
Does this mean that
printf("%s", "%s");
is illegal if the string literals are coalesced into the same pointer?
... Another possibility is
printf("%s is tricky", "is tricky");

Both of these calls are OK, as I think Jack Klein noted elsethread.

If you want a simpler, yet still loose but not (I hope) *too* loose,
way to describe "restrict", think of it as you saying to the
compiler:

OK, here is this pointer "p", that has the "restrict" on it. I
promise you, Mr Compiler, that I will not change the object at
p without actually *writing* p, so that you can make
optimization assumptions.

In other words, the compiler is allowed to assume that any change
to *(p + i) for any integer i happens by writing to *(p + j) for
some integer j such that j==i. (The draft has additional wording
that lets the programmer use expressions like ((p+2)[i-2]) as well,
for instance, but this captures the essence.)

If you never write on format at all -- as is the case here,
since format is const-qualified -- the compiler can assume that
format never changes while printf() is doing its thing internally.

Optimization of variable accesses is mostly a matter of clever
cacheing, including "reading ahead" (so that inputs are available
by the time they are needed) and "writing behind" (so that outputs
do not cause the program to stop and wait for the output to finish).
If the only kind of cacheing you ever do is read-related, everything
"just works" -- your caches could even wind up with multiple copies
of a single variable, but they all have the same value anyway, so
who cares? :)

Unfortunately, there are some subtler interactions when one includes
write steps. Consider the (not very good) routine:

#define OPERATE ... /* some operation here */
void operate(double *result, double *inputs, size_t n) {

*result = 0.0;
for (size_t i = 0; i < n; i++)
*result = *result OPERATE inputs;
}

followed by a call of the form:

double buf[N]; /* N >= 11 */
operate(buf + 10, buf, 11);

This tries to put the result in buf[10] while using buf[10] as one
of the inputs.

With no "restrict" qualifiers, this code is OK -- when i reaches 10
inside operate(), buf[10] is the accumulated result-so-far, and
it is fed back into the operation.

If you were to write "double *restrict result", the differences
between various kinds of caching begin to show up: if the input[]
array is read too soon, or the output *result pointer is not written
soon enough, input[10] will not hold the (presumably) desired value
at the right time.

Since I do not have the final C99 standard, I am not willing to
say what it might require in these cases. I *am* sure, however,
that for read-only "restrict"ed pointers (as in the printf()
example), it *is* OK to have aliases going on (despite the wording
in the draft I use, which seems to forbid it).
 
P

Peter Ammon

Thanks to both Jack Klein and Chris Torek for clarifying the issues. I
also lack the official C standard and was confused by the draft. Time
to "upgrade" I suppose. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top