passing uninitialised data

M

Mark

Hello,

I have quite a generic question. Suppose we work with fwrite(), or sendto()
(not C standard function, but I'll assume everyone knows it) or any other
API that takes a pointer to a buffer to trasnmit. When such a buffer is
unititialised, does it in some ways impose undefined behaviour?

I'm asking this, because when such code is run under 'valgrind' it reports a
lot about uninitiased memory or 'unaddressable values in syscalls' -- does
it imply such will act undefined due to uninit storages used as buffers?

Would appreciate the comments on it. Thanks.

Mark
 
E

Eric Sosman

Hello,

I have quite a generic question. Suppose we work with fwrite(), or sendto()
(not C standard function, but I'll assume everyone knows it) or any other
API that takes a pointer to a buffer to trasnmit. When such a buffer is
unititialised, does it in some ways impose undefined behaviour?

Evaluating the pointer as an argument expression and passing
it to a function as a parameter value is well-defined. It's what
happens thereafter that could be problematic.

If the called function uses its perfectly good pointer to read
uninitialized data, the data's value is indeterminate and the
behavior might be undefined. I say "might" because it depends on
the type of pointer the called function uses: an `unsigned char*',
for example, can be used to read any value, even indeterminate.
(Note that the pointer used for the access might be of a different
type than what was passed; the function may have converted the
pointer value before using it.)

On the other hand, passing a pointer to uninitialized data
sometimes makes perfectly good sense:

char uninitialized[42];
strcpy(uninitialized, "Hello, world!");
puts(uninitialized);

The fact that uninitialized[] is uninitialized before the strcpy()
call is unimportant because strcpy() doesn't care what was in the
buffer beforehand: it'll be obliterated anyhow. The puts() call
is okay even though uninitialized[14] through [41] are *still*
uninitialized, because puts() will stop reading before it gets
to the problematic part.
I'm asking this, because when such code is run under 'valgrind' it reports a
lot about uninitiased memory or 'unaddressable values in syscalls' -- does
it imply such will act undefined due to uninit storages used as buffers?

"A lot about uninitialized memory" is a rather vague report,
not much to go on. "Unaddressable values" sounds worse, but I'm
not sure what circumstance valgrind is actually complaining about.
Consult the valgrind documentation for the meaning of its messages.
 
J

James Kuyper

Hello,

I have quite a generic question. Suppose we work with fwrite(), or sendto()
(not C standard function, but I'll assume everyone knows it) or any other
API that takes a pointer to a buffer to trasnmit. When such a buffer is
unititialised, does it in some ways impose undefined behaviour?

I'm asking this, because when such code is run under 'valgrind' it reports a
lot about uninitiased memory or 'unaddressable values in syscalls' -- does
it imply such will act undefined due to uninit storages used as buffers?

Would appreciate the comments on it. Thanks.

"If an object that has automatic storage duration is not initialized
explicitly, its value is indeterminate." (6.7.9p10)

An indeterminate value is defined as "either an unspecified value or a
trap representation" (3.19.2)

"Certain object representations need not represent a value of the object
type. If the stored value of an object has such a representation and is
read by an lvalue expression that does not have character type, the
behavior is undefined. ... Such a representation is called a trap
representation." (6.2.6.1p5)

So far, this looks bad. However, let's go a little further:
The description of fwrite() says "... For each object, size calls are
made to the fputc function, taking the values (in order) from an array
of unsigned char exactly overlaying the object. ..." (7.21.8.2p1). In
other words, it accesses the data using only lvalues of type unsigned
char, a character type. Therefore, the undefined behavior mentioned in
6.2.6.1p5 doesn't apply.

While sendto() is not a C standard library function, if it's written in
C, the fact that you mention it in this context suggests that it may be
the kind of function that would also access the data buffer one char
(signed, plain or unsigned) at a time; if so, it's safe too.

If sendto() is not written in C, the behavior is undefined by the C
standard, regardless of whether or not the buffer is uninitialized.
Programs that call functions which are not part of the C standard
library, and are not written in C, are outside the scope of the C
standard, and therefore have undefined behavior due to "the
omission of any explicit definition of behavior." (4p2)

Still, the buffer contains unspecified values. What point is there in
writing it? I've heard of such tricks being used to access whatever data
was last stored in the same block of memory, thereby circumventing
certain weak kinds of security measures - but that's why many systems
now automatically fill uninitialized memory with a fixed bit pattern
such as 0 or 0XDEADBEEF.
 
B

Barry Schwarz

"If an object that has automatic storage duration is not initialized
explicitly, its value is indeterminate." (6.7.9p10)

An indeterminate value is defined as "either an unspecified value or a
trap representation" (3.19.2)

"Certain object representations need not represent a value of the object
type. If the stored value of an object has such a representation and is
read by an lvalue expression that does not have character type, the
behavior is undefined. ... Such a representation is called a trap
representation." (6.2.6.1p5)

So far, this looks bad. However, let's go a little further:
The description of fwrite() says "... For each object, size calls are
made to the fputc function, taking the values (in order) from an array
of unsigned char exactly overlaying the object. ..." (7.21.8.2p1). In
other words, it accesses the data using only lvalues of type unsigned
char, a character type. Therefore, the undefined behavior mentioned in
6.2.6.1p5 doesn't apply.

While an unsigned char cannot contain a trap value, I believe J.2,
even though it is not normative, gives the clearest statement of the
intent:

"The behavior is undefined in the following circumstances: ... The
value of an object with automatic storage duration is used while it is
indeterminate."
 
J

James Kuyper

While an unsigned char cannot contain a trap value, I believe J.2,
even though it is not normative, gives the clearest statement of the
intent:

"The behavior is undefined in the following circumstances: ... The
value of an object with automatic storage duration is used while it is
indeterminate."

You might be right about the intent, but if so, what significance should
we attach to the fact that 6.2.6.1p5 goes out of its way to say "that
does not have character type"? If it doesn't mean that accesses through
lvalues of character type are safe, what does it mean? If such accesses
were not intended to be safe, why bother writing that phrase at all? The
passage would be easier to read, write, and understand without that
phrase, so I doubt that the phrase's presence was a mere accident.
 
A

Angel

Hello,

I have quite a generic question. Suppose we work with fwrite(), or sendto()
(not C standard function, but I'll assume everyone knows it) or any other
API that takes a pointer to a buffer to transmit. When such a buffer is
unititialised, does it in some ways impose undefined behaviour?

Passing a pointer to a function that expects such is well defined, no
matter if the object the pointer points to is initialized or not. For
some functions this makes perfect sense, like the first argument to
memcpy() for instance.

However, the old saying "garbage in, garbage out" applies here. Passing
nonsense data to a function that tries to process it in some way will
likely not end well and may result in undefined behaviour somewhere during
the execution of that function. It depends on what random bits happen
to be in the uninitialized object and what the function tries to do with
it.
 
J

James Kuyper

Passing a pointer to a function that expects such is well defined, no
matter if the object the pointer points to is initialized or not.

There's a reason why valgrind is giving him warning messages about this.
It's quite trivial to write code that has undefined behavior when passed
a pointer to an uninitialized buffer. What precisely is encompassed by
the phrase "that expects such"?
 
A

Angel

There's a reason why valgrind is giving him warning messages about this.
It's quite trivial to write code that has undefined behavior when passed
a pointer to an uninitialized buffer. What precisely is encompassed by
the phrase "that expects such"?

I mean that passing a pointer to a function that requires a pointer
argument is well-defined. (Assuming that the pointer is of a compatible
type, of course.) Maybe my wording is unclear; sorry about that.
English is not my first language.

What I mean to say is that

void f(int *arg1);
int x;

f(&x);

by itself does not cause undefined behaviour even though x is not
initialized. But if the function f() dereferences that pointer argument
and tries to something with the uninitialized integer object (increment
it by one, for example), that might cause undefined behaviour.

I don't think we disagree here, do we? Again, sorry if I was unclear.
 
J

James Kuyper

I mean that passing a pointer to a function that requires a pointer
argument is well-defined. (Assuming that the pointer is of a compatible
type, of course.) Maybe my wording is unclear; sorry about that.
English is not my first language.

Sure, passing the pointer is itself well defined. Whether the behavior
of the function is well defined when passed a pointer to uninitialized
data is a very different question.
What I mean to say is that

void f(int *arg1);
int x;

f(&x);

by itself does not cause undefined behaviour even though x is not
initialized. But if the function f() dereferences that pointer argument
and tries to something with the uninitialized integer object (increment
it by one, for example), that might cause undefined behaviour.

I don't think we disagree here, do we? Again, sorry if I was unclear.

He was asking about the behavior of executing the function, not just the
behavior of the function call itself, and he was asking very
specifically about functions which will dereference the pointer.
fwrite() certainly does so; and the function named sendto() that's
available on my desktop does as well. The fact that he bothered to ask
about it suggests that his sendto() probably does so as well, even if
it's a different from the one on my machine. As a result, your answer
was, at best, incomplete, almost to the point of uselessness.

However, since they dereference the pointers using lvalues of character
type, the behavior is well-defined, at least according to my
understanding. Ben's take on that issue is a bit closer to yours.
 
K

Kaz Kylheku

There's a reason why valgrind is giving him warning messages about this.

Valgrind does not complain when pointers to unitialized storage are propagated
through the program, only when unitialized storage is accessed. At that time,
the origin of the storage is traced (such as the call trace to the malloc
call that produced the storage).
It's quite trivial to write code that has undefined behavior when passed
a pointer to an uninitialized buffer. What precisely is encompassed by
the phrase "that expects such"?

/* well-defined behavior possible */
void expects_pointer_to_unitialized_foo(struct foo *init_me)
{
init_me->important_field = 42;
}
 
B

Barry Schwarz

You might be right about the intent, but if so, what significance should
we attach to the fact that 6.2.6.1p5 goes out of its way to say "that
does not have character type"? If it doesn't mean that accesses through
lvalues of character type are safe, what does it mean? If such accesses
were not intended to be safe, why bother writing that phrase at all? The
passage would be easier to read, write, and understand without that
phrase, so I doubt that the phrase's presence was a mere accident.

I believe p5 is providing a definition of trap representation (and the
fact that unsigned char cannot have one) rather than limiting what
behavior is undefined.
 
J

James Kuyper

I believe p5 is providing a definition of trap representation (and the
fact that unsigned char cannot have one) rather than limiting what
behavior is undefined.

Even if your interpretation is correct, then 6.2.6.1p5 still doesn't
give such code undefined behavior, because in that case, the values can
only be unspecified, not trap representations, so the behavior of the
call to fwrite() is still defined.

As of C2011, the phrase "trap representation" is no longer in italics in
that section, and 3.19.4p1 is now the place where that phrase is
defined. That definition is "an object representation that need not
represent a value of the object type". It says nothing to exclude
character types, and the fact that, under certain circumstances, a trap
representation can result in undefined behavior, is not part of the
definition. I believe that this is not a change in the meaning of "trap
representation", but merely a clarification of something that C99 said
less clearly. The first and last sentences of that clause are now
redundant with 3.19.4p1, and could be removed; if so, "such a
representation" would have to be replaced with "a trap representation",
which would actually make the meaning clearer. Not naming the concept
until after it has been used several times was always rather clumsy writing.

6.2.6.1p5 is (now) solely about the circumstances under which a trap
representation can lead to undefined behavior. Reading one though an
lvalue of character type isn't one of them. The behavior of the fwrite()
call is defined by 7.21.8.2 even if the buffer is uninitialized: either
the specified number of bytes will be written from that buffer, or an
error code indicating an I/O error will be returned. If successful, the
actual values written to the file are unspecified, but if written and
read back using a binary stream, they must compare equal to the
unspecified values in the buffer (7.21.2p3). No matter what those values
are, fwrite() must return (after an unspecified amount of time, the same
as any call to fwrite() with defined behavior), and the succeeding lines
in the program will be executed. None of those things would be
guaranteed if the behavior were undefined.
 
S

Shao Miller

Accessing an uninitialized variable of type unsigned char is (as
of C11) undefined behavior in some circumstances, but only some.
This result obtains by virtue of 6.3.2.1 p2, in particular the
last sentence.

Obviously, accessing an uninitialized variable is not /always/
undefined behavior, because if it were then there would be
no point in adding the new proviso to 6.3.2.1 p2; undoubtedly
the committee would simply have added some non-normative text
instead.

And since the question is about a pointer into a buffer being passed,
then it'd seem that there're no trap representations for the 'unsigned
char' values in that buffer, right?

This goes back to Defect Report 260, doesn't it?

http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm

- Shao Miller
 
T

Tim Rentsch

pete said:
The last statement of 6.3.2.1 p2,
makes me think that
if the lvalue designates an object of automatic storage duration
that could have been declared with the register storage class
(never had its address taken),
and that object is uninitialized (not declared with an initializer
and no assignment to it has been performed prior to use),
then the program really doen't have to reserve storage for that object.

You're assuming that the compiler can tell whether that
situation will occur. Sometimes it can, but in the
general case it can't (Halting Problem, etc).
 
T

Tim Rentsch

pete said:
It's the only reason that I can think of for why

int main(void) {unsigned char x; return x - x;}

should be undefined.

It turns out that some actual hardware has out-of-band marking
(for registers, I think) that indicate non-initialization. Those
machiness (can) trap on a reference to an unitialized register.
My understanding is that the passage we are discussing was added
specifically because such machines currently exist, and more
generally because it is useful to allow similar kinds of traps
or what-have-you, yet still be inside the boundaries of being
a conforming implementation.
 
G

glen herrmannsfeldt

(snip, someone wrote)
It turns out that some actual hardware has out-of-band marking
(for registers, I think) that indicate non-initialization. Those
machiness (can) trap on a reference to an unitialized register.

More usual would be tagged memory, which can have an unitialized,
or otherwise trap representation.

Stories are that the original WATFOR on the 7090 used parity
errors to detect uninitialized variables. It could actually
set invalid parity before starting execution of the compiled
program.

I believe some Burroughs machines tag memory locations for their
contents. I will guess that there is a trap representation for
the tag.

Though presumable one could say that C requires that such memory
or parity bits be initialized to valid values before the program
starts executing.
My understanding is that the passage we are discussing was added
specifically because such machines currently exist, and more
generally because it is useful to allow similar kinds of traps
or what-have-you, yet still be inside the boundaries of being
a conforming implementation.

Well, at least signaling NaN exists on modern systems.

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top