UDB and pointer increments and decrements

R

Richard

I'm still battling with this causing UDB:

while(e-- > s);

if s points to the start of a string and e becomes less than s then e is
not really pointing to defined char. Fine.

But UDB?

Yes, e has an UDV (undefined value) but would this really cause a
program to misbehave? In any platfrom? Remember this value of e is never
used again in this case.

I ask because theoretically s can be pointing to the middle of a bigger
string. We then call a function with s as a parameter.

The function called can have no idea that s is the pointer to a middle
string. therefore it can have no idea how to "do undefined things" when
e is decremented past the start of s. e and s are strictly char *s. It
would be so "not C" if the compiler generated code to check the contents
pointed to do determine the range of the object to the middle of which s
points. I mean then we may as well have array limits and exceptions
built into the language.

I'm not being difficult here. Explain how this works. My problem (and I
admit its a problem) is that i feel too much of C is being elevated to
an almost ADA type status and (in this group) C is losing that "down and
dirty and efficient" feeling which it is famous for.
 
J

Jean-Marc Bourguet

Richard said:
I'm still battling with this causing UDB:

while(e-- > s);

if s points to the start of a string and e becomes less than s then e is
not really pointing to defined char. Fine.

But UDB?

Yes, e has an UDV (undefined value) but would this really cause a
program to misbehave? In any platfrom? Remember this value of e is never
used again in this case.

1/ C has 4 levels of definition (defined behavior, implementation --
includede locale -- defined behavior, unspecified behavior, undefined
behavior), no more. Spending effort to try and classify undefined behavior
more finely is probably not worthwhile. And it seems to be that's what you
want, having different rules for the undefined value created by
decrementing a pointer and all they others. There is precedence (the
similar past of end of an array pointer comes immediately to mind), but
your's would be more limited than that one (or you'd have got opposition
from DOS folk as allowing them in comparison would have constrained them a
lot, probably limitting the size of an object to 32767 instead of 65535
they have got).

2/ Optimizers tend to use undefined behavior in creative way. For example,
things like value propagation can optimize out the then part of the if in
this code:

if (i == INT_MAX) {
do something not modifying i;
}
++i;

(reasonning: as incrementing i is an overflow if i is INT_MAX, so it would
be undefined behavior is that was the case, then the optimizer can assume i
isn't INT_MAX, the result of the comparison is false). Optimizations like
this is one of the reasons for which undefined behavior can be non causal
(you just have to be sure that the code causing the undefined behavior
would have been executed). And note that optimized do such propagation to
more than the current function, they potentially can even do it for the
whole program, and that's the way they are heading.

Yours,
 
J

jameskuyper

Richard said:
I'm still battling with this causing UDB:

while(e-- > s);

if s points to the start of a string and e becomes less than s then e is
not really pointing to defined char. Fine.

But UDB?

Yes, e has an UDV (undefined value) but would this really cause a
program to misbehave? In any platfrom? Remember this value of e is never
used again in this case.

I ask because theoretically s can be pointing to the middle of a bigger
string. We then call a function with s as a parameter.
The function called can have no idea that s is the pointer to a middle
string. therefore it can have no idea how to "do undefined things" when
e is decremented past the start of s. e and s are strictly char *s. It
would be so "not C" if the compiler generated code to check the contents
pointed to do determine the range of the object to the middle of which s
points. I mean then we may as well have array limits and exceptions
built into the language.

It's too late - the language that makes the behavior undefined was
inserted into the standard precisely for the purpose of allowing (but
not mandating) array limit checks. In order to make array limit
checks mandatory, the behavior could not be undefined - it would have
to be either standard-defined or implementation-defined. Because the
behavior is undefined, an implementation is currently free to deal
with array limits by ignoring them.

Permitting array limit checks was done, in part, because there were
(and are) real implementations that perform them. On some machines,
such checks are built into the hardware; avoiding them would require
software emulation. In other cases, the checks are performed in
software.

I know of at least two ways of implementing pointers that make array
limit checks feasible: fat pointers, and segmented memory. Of course,
in both cases this means that pointer values cannot be correctly
understood by treating them as simple numbers, which might be a
conceptual hurdle for you. If you need an explanation of how those
techniques work, I can provide it.

More subtly, an implementation can use the existence of code that
might, under certain circumstances, have undefined behavior, to
justify optimizations of related code that will fail only under
exactly those same circumstances. As a result, the actual catastrophic
failure might occur while executing code other than the code whose
execution makes the behavior undefined. But I've already explained
that possibility in more detail in another message.
 
R

Richard

It's too late - the language that makes the behavior undefined was
inserted into the standard precisely for the purpose of allowing (but
not mandating) array limit checks. In order to make array limit

That makes sense. Thanks.
checks mandatory, the behavior could not be undefined - it would have
to be either standard-defined or implementation-defined. Because the
behavior is undefined, an implementation is currently free to deal
with array limits by ignoring them.

And them remaining undefined? Unspecified would have been better surely?
Permitting array limit checks was done, in part, because there were
(and are) real implementations that perform them. On some machines,
such checks are built into the hardware; avoiding them would require
software emulation. In other cases, the checks are performed in
software.

I know of at least two ways of implementing pointers that make array
limit checks feasible: fat pointers, and segmented memory. Of course,
in both cases this means that pointer values cannot be correctly
understood by treating them as simple numbers, which might be a
conceptual hurdle for you. If you need an explanation of how those
techniques work, I can provide it.

I know about segmented memory. I have written oodles of VGA libraries
for the, in x86 using the various addressing modes. The point that has
been totally taken out of context is that these segmented
representations are STILL represented as numbers in my debugger. Nothing
more nothing less. Yes i call them numbers. Addresses. Numbers.
More subtly, an implementation can use the existence of code that
might, under certain circumstances, have undefined behavior, to
justify optimizations of related code that will fail only under
exactly those same circumstances. As a result, the actual catastrophic
failure might occur while executing code other than the code whose
execution makes the behavior undefined. But I've already explained
that possibility in more detail in another message.

I appreciate the time you have taken to explain. I would still love
someone to explain the case I asked about above though. The one where s
is pointing into the middle of an array. Or did you and I didn't
understand?
 
K

Keith Thompson

Richard said:
I'm still battling with this causing UDB:

while(e-- > s);

if s points to the start of a string and e becomes less than s then e is
not really pointing to defined char. Fine.

But UDB?

A small note: You're the only person I've ever seen refer to undefined
behavior as "UDB". Most posters here (at least those who choose to
abbreviate it) refer to it as "UB". Why do you feel the need to
invent your own abbreviation when there's already a perfectly good one
in widespread use? (One could argue that "UB" could also mean
unspecified behavior, but i've never seen it used that way, and it's
generally clear enough from the context.)

Yes, the behavior is undefined, simply because the standard doesn't
define the behavior. That's all "undefined behavior" means.
Yes, e has an UDV (undefined value) but would this really cause a
program to misbehave? In any platfrom? Remember this value of e is never
used again in this case.

Yes. I don't have a real-world example, but if the containing object
happens to be allocated at the beginning of a memory segment, it could
easily blow up. And, as has been mentioned elsethread, a compiler is
allowed to *assume* that undefined behavior does not occur, and
perform code transformations based on that assumption (after all, if
the behavior is already undefined, it can't make things worse); that
may be a more realistic risk for most modern systems.
I ask because theoretically s can be pointing to the middle of a bigger
string. We then call a function with s as a parameter.

Undefined behavior occurs if a pointer is decremented past the
beginning of an array object, not if it's decremented past the initial
value of a function parameter. Given this:

char s[100];

char *func(char *ptr) { return ptr - 1; }

calling func(s+10) has well-defined behavior, but calling func(s) has
undefined behavior. (I haven't compiled the above, so there may be
some dumb mistakes.)
The function called can have no idea that s is the pointer to a middle
string.
Right.

therefore it can have no idea how to "do undefined things" when
e is decremented past the start of s. e and s are strictly char *s.

It doesn't deliberately "do undefined things"; that's not the point.
The point is that the standard doesn't define what it does. In my
example above, I'm thinking of a hypothetical system on which
constructing the pointer value s-1 causes a hardware trap (because s
is allocated at the beginning of a segment, and the hardware
"decrement address" instruction traps in this case). The code
generated for the body of the function has no awareness of this.

For example, assume an implementation on which signed integer overflow
causes a trap.

int func(int n) { return n + 1; }

func(42) has well-defined behavior, and returns 43. func(INT_MAX) has
undefined behavior, and (on this particular implementation) causes a
trap (or does something arbitrarily strange if an optimizing compiler
rearranges code based on the assumption that no UB occurs). The
function has no awareness of this; it just returns the result of n +
1.
It
would be so "not C" if the compiler generated code to check the contents
pointed to do determine the range of the object to the middle of which s
points. I mean then we may as well have array limits and exceptions
built into the language.

The compiler is *allowed* to perform such checks, but it's not
required to. That's why the behavior is undefined, rather than being
defined to do whatever a failing check would do.
I'm not being difficult here. Explain how this works. My problem (and I
admit its a problem) is that i feel too much of C is being elevated to
an almost ADA type status and (in this group) C is losing that "down and
dirty and efficient" feeling which it is famous for.

(It's "Ada", not "ADA".)

C loses none of its "down and dirty and efficient" feeling because of
this. In fact, the generated code can gain in efficiency because the
compiler is allowed to trust the user to avoid undefined behavior and
to perform aggressive optimization based on that assumption.

A C implementation that does exactly what you seem to expect it to do
(treat addresses as simple integers, allow arbitrary addresses to be
computed, etc.) would be conforming. An implementation that performs
aggressive bounds checking can also be conforming.
 
K

Keith Thompson

Richard said:
(e-mail address removed) writes: [...]
checks mandatory, the behavior could not be undefined - it would have
to be either standard-defined or implementation-defined. Because the
behavior is undefined, an implementation is currently free to deal
with array limits by ignoring them.

And them remaining undefined? Unspecified would have been better surely?

Better how?

Unspecified behavior is "use of an unspecified value, or other
behavior where this International Standard provides two or more
possibilities and imposes no further requirements on which is chosen
in any instance".

For the behavior of, for example, attempting to access an array
outside its bounds to be unspecified rather than undefined, the
standard would have to provide a number of possible behaviors, and
anything other than one of those behaviors would be non-conforming.

Suppose I have an array object declared within a function, and I write
to element -1 of that array. I could clobber nearly anything,
including the function's stored return address or some other vital
piece of information. How would you restrict the possible
consequences of that to "two or more possibilities"?

[snip]
I appreciate the time you have taken to explain. I would still love
someone to explain the case I asked about above though. The one where s
is pointing into the middle of an array. Or did you and I didn't
understand?

See my other recent response in this thread.
 
F

Flash Gordon

Richard wrote, On 23/09/08 16:44:
I'm still battling with this causing UDB:

while(e-- > s);

if s points to the start of a string and e becomes less than s then e is
not really pointing to defined char. Fine.

But UDB?

I'm not being difficult here. Explain how this works. My problem (and I
admit its a problem) is that i feel too much of C is being elevated to
an almost ADA type status and (in this group) C is losing that "down and
dirty and efficient" feeling which it is famous for.

Myself and another poster suggested an object starting at the beginning
of a page or segment and *hardware* that traps on trying to decrement to
before the start of the page/segment. No software checks need be involved!
 
J

jameskuyper

The behavior is only undefined if e points at the beginning of an
array, or points at a different array than s points at. For the code
you posted on the "Highly efficient string reversal code" thread, it's
quite likely that s does point at the beginning of an array; if it
does, and that array contains a zero-length string, then e will also
end up pointing at the start of the array.

However, in that same code, if s starts out pointing in the middle of
a bigger string, then e won't end up pointing at the beginning of the
array. And no one said anything to suggest to you that the behavior
would be undefined when that was the case.

In this thread, you chose to start a new discussion without cross
referencing the old one, and without giving any context for "while(e--
s)". Using only the information you've provided on this thread,
it's not possible to derive the fact that a) e points at the same
array as s and b) that e does not point at the beginning of an array.
That makes sense. Thanks.


And them remaining undefined? Unspecified would have been better surely?

I'm not quite sure whether you're talking about prohibiting array
limit checks, or mandating them. Making the behavior of such pointer
arithmetic undefined, as is currently the case, neither mandates nor
prohibits array limit checks.

If the behavior is unspecified, the standard must, at least
implicitly, provide a range of permitted behaviors. Depending upon
what that range is, it could mandate array limit checks, for instance,
by requiring that an unspecified signal be raise()d.

Alternatively, it could also prohibit array limit checks, by saying
moving a pointer beyond it's valid range produces a pointer to an
unspecified but valid location, and that writing through such a
pointer value has no effect. Note that this would prohibit array limit
checks only in the sense that they would be invisible to the user; I
see no way of implementing such a requirement without the
implementation performing array limit checks to determine whether or
not a write is required to have no effect. Notice that if the location
is unspecified, and writes were actually permitted to have an effect,
then the consequences would be pretty much indistinguishable from
undefined behavior. Being allowed to write to an arbitrary memory
location can have arbitrarily bad consequences on many
implementations.
I appreciate the time you have taken to explain. I would still love
someone to explain the case I asked about above though. The one where s
is pointing into the middle of an array. Or did you and I didn't
understand?

I'll use a segmented architecture as an example, since you're familiar
with the concepts. Consider the possibility that e points at the
beginning of an array. That array might have been allocated at the
beginning of a memory segment. As a built-in feature of the hardware,
or as the result of code generated by the compiler, any attempt to
decrement a pointer that already points at the beginning of a segment
could cause the program to abort. Consider the possibility that e and
s point into different memory segments. As a built-in feature of the
hardware, or as the result of code generated by the compiler, any
attempt to compare pointers into different memory segments for order
(<, >, <=, >=) might cause the program to abort. A implementation that
produces such behavior can be perfectly conforming so long as it makes
sure to never allocate different parts of the same object in different
memory segments.
 
O

Old Wolf

I'm still battling with this causing UDB:

while(e-- > s);

if s points to the start of a string and e becomes less than s then e is
not really pointing to defined char. Fine.

What if the string is at the very start of
the address space? Where does 'e' point after
decrementing it?

There are CPUs or MMUs that will trap upon
loading of an obviously bogus pointer such
as this one that doesn't even describe a
memory location that exists.
 
J

James Kuyper

Rosario wrote:
....
for what i can see for this group the speaking time of varios "UB"
(undefinite behaviours) is more time consuming that programming

That's because undefined (not "undefinite") behavior is the single most
serious kind of problem C code can have. It's also because most code
that people bring to this group because they're having problems with it,
has undefined behavior. That's a selection effect; syntax errors and
constraint violations are easily caught by the compiler; the programs
that actually compile and fail tend to have subtler problems, usually
involving undefined behavior.
 
T

Tim Rentsch

Richard said:
I'm still battling with this causing UDB:

while(e-- > s);

if s points to the start of a string and e becomes less than s then e is
not really pointing to defined char. Fine.

But UDB?

Yes, e has an UDV (undefined value) but would this really cause a
program to misbehave? In any platfrom? Remember this value of e is never
used again in this case.

I ask because theoretically s can be pointing to the middle of a bigger
string. We then call a function with s as a parameter.
The function called can have no idea that s is the pointer to a middle
string. therefore it can have no idea how to "do undefined things" when
e is decremented past the start of s. e and s are strictly char *s. It
would be so "not C" if the compiler generated code to check the contents
pointed to do determine the range of the object to the middle of which s
points. I mean then we may as well have array limits and exceptions
built into the language.

It's too late - the language that makes the behavior undefined was
inserted into the standard precisely for the purpose of allowing (but
not mandating) array limit checks. [...]

Nonsense. Allowing a pointer to be decremented to before the
start of an array is still compatible with doing array limit
checks, just as allowing a pointer to be incremented past the end
of an array is compatible with doing array limit checks.
The rationale document makes clear that decrementing a pointer
to before the start of an array was rejected because it would
impose overly burdensome requirements on implementations.
Array limit checks are equally possible whether e-- is allowed
or not.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top