UB while dealing with invalid raw pointers, the std::uninitialized_fillcase

Francesco S. Carta · Sep 3, 2010

Hi there,
as far as I've been able to understand, if a raw pointer contains an
invalid value (that is, it does not point to any valid object of the
type it is a pointer to) then some of the actions performed on these
pointers will lead to UB.

As it seems, two actions in particular should be safe and well defined:
- zeroing the invalid pointer;
- assigning a valid value to the invalid pointer;

One issue that has been recently raised in this group is about storing
invalid raw pointers into a container such as std::vector; the rationale
that led to define it as a potential source of UB is about the lvalue to
rvalue conversion that will be performed on those raw pointers during
internal reallocations of the container.

Since the only significant action that gets performed during the
reallocation is to copy such invalid pointer values from a storage to
another, it should boil down to something equivalent to this:

int* p = new int;
delete p;

Now "p" contains and invalid value.

int* q = p;

During the above assignment, an lvalue to rvalue conversion is performed
on "p", leading to undefined behavior.

Now my question is, would the following test also lead to an lvalue to
rvalue conversion on "p", therefore leading to UB?

int* p = new int;
delete p;
int* q = new int;
if(q != p) {
//...
}

If that's the case, then any uninitialized_fill performed on a storage
area of raw pointers will lead to UB, as the Standard depicts, as
expected effect, the fact of comparing two invalid pointers:

[citation formatted for presentation]

20.4.4.2 uninitialized_fill [lib.uninitialized.fill]

template <class ForwardIterator, class T>
void uninitialized_fill(ForwardIterator first,
ForwardIterator last,
const T& x);

1 Effects:

for (; first != last; ++first)
new (static_cast<void*>(&*first))
typename iterator_traits<ForwardIterator>::value_type(x);

Would all the above mean that we shouldn't really worry about UB when
dealing with invalid pointers into standard containers as long as we
don't dereference such invalid pointers, and accordingly, would that
mean that the standard needs to be modified to state these actions
(copying and comparing of invalid pointers) as well-defined?

Thank you for your attention.

Alf P. Steinbach /Usenet · Sep 3, 2010

* Francesco S. Carta, on 03.09.2010 12:52:

Hi there,
as far as I've been able to understand, if a raw pointer contains an invalid
value (that is, it does not point to any valid object of the type it is a
pointer to) then some of the actions performed on these pointers will lead to UB.

Hm, well you need to define "invalid" more precisely, e.g. as "is not valid".
;-) Or even more precisely, "can not be dereferenced without UB". For example, 0
is a valid pointer value, as is 1+p where p points to the last element in an array.

Neither C++98 nor C++0x does, as far as I know, define "invalid pointer", but
C++0x defines "valid pointer" as pointing to a byte in memory or being zero, in
C++0x §2.9.2/3. The definition in C++0x is perhaps too permissive. If taken
literally the validity of a pointer would in general not be deducible but would
depend on whether the address in question had been remapped by the HW, e.g. p
would be /valid/ immediately after delete p unless the delete affected the
validity of the address itself (e.g. by changing paging or segment setup).

But OK...

As it seems, two actions in particular should be safe and well defined:
- zeroing the invalid pointer;
- assigning a valid value to the invalid pointer;

One issue that has been recently raised in this group is about storing invalid
raw pointers into a container such as std::vector; the rationale that led to
define it as a potential source of UB is about the lvalue to rvalue conversion
that will be performed on those raw pointers during internal reallocations of
the container.

Since the only significant action that gets performed during the reallocation is
to copy such invalid pointer values from a storage to another, it should boil
down to something equivalent to this:

int* p = new int;
delete p;

Now "p" contains and invalid value.

int* q = p;

During the above assignment, an lvalue to rvalue conversion is performed on "p",
leading to undefined behavior.

Now my question is, would the following test also lead to an lvalue to rvalue
conversion on "p", therefore leading to UB?

int* p = new int;
delete p;
int* q = new int;
if(q != p) {
//...
}

Yes, this invokes rvalue conversion and UB.

If that's the case, then any uninitialized_fill performed on a storage area of
raw pointers will lead to UB, as the Standard depicts, as expected effect, the
fact of comparing two invalid pointers:

[citation formatted for presentation]

20.4.4.2 uninitialized_fill [lib.uninitialized.fill]

template <class ForwardIterator, class T>
void uninitialized_fill(ForwardIterator first,
ForwardIterator last,
const T& x);

1 Effects:

for (; first != last; ++first)
new (static_cast<void*>(&*first))
typename iterator_traits<ForwardIterator>::value_type(x);

Click to expand...

Huh, no.

'first' and 'last' here are not invalid pointers: if pointers, then they point
/to/ the area to be filled.

Would all the above mean that we shouldn't really worry about UB when dealing
with invalid pointers into standard containers as long as we don't dereference
such invalid pointers,

Formally you invoke UB when a vector containing invalid pointers is destroyed.

That's because a simplistic implementation may iterate over the vector contents
and do pseudo destructor calls on the pointers (or it can do anything at all).

In practice it's not anything I'd worry about, because leaving a vector with
invalid pointers is common practice, so implementations have to not crash on
that. However, to play nice, I guess one should always zero a pointer in a
vector (or other container) after making it invalid. Just making sure.

and accordingly, would that mean that the standard needs
to be modified to state these actions (copying and comparing of invalid
pointers) as well-defined?

I dont't think so.

If the standard was all too clear about everything then we'd have nothing to
discuss.

Cheers & hth.,

- Alf

Alf P. Steinbach /Usenet · Sep 3, 2010

* Alf P. Steinbach /Usenet, on 03.09.2010 13:28:

* Francesco S. Carta, on 03.09.2010 12:52:

Hm, well you need to define "invalid" more precisely, e.g. as "is not valid".
;-) Or even more precisely, "can not be dereferenced without UB". For example, 0
is a valid pointer value, as is 1+p where p points to the last element in an array.

Hm, let me be more clear on that, before someone protests: a zero pointer can be
dereferenced without UB, namely in a typeid expression.

I'm concerned, however, that "can be dereferenced" possibly does not hold for
1+p, so that my "even more precisely" then becomes less precise :-(.

Deref of 1+p was slated to become part of C++0x, for compatibility with C99
rules, but I don't recall exactly what happened and it takes too much time to
find it, especially if it's not there...

So perhaps the most precise that can be said is just "is not valid".

Whatever.

Cheers,

- Alf

Francesco S. Carta · Sep 3, 2010

on said:
* Francesco S. Carta, on 03.09.2010 12:52:

Hm, well you need to define "invalid" more precisely, e.g. as "is not
valid". ;-) Or even more precisely, "can not be dereferenced without
UB". For example, 0 is a valid pointer value, as is 1+p where p points
to the last element in an array.

Neither C++98 nor C++0x does, as far as I know, define "invalid
pointer", but C++0x defines "valid pointer" as pointing to a byte in
memory or being zero, in C++0x §2.9.2/3. The definition in C++0x is
perhaps too permissive. If taken literally the validity of a pointer
would in general not be deducible but would depend on whether the
address in question had been remapped by the HW, e.g. p would be /valid/
immediately after delete p unless the delete affected the validity of
the address itself (e.g. by changing paging or segment setup).

But OK...

OK, I've read your self-follow-up, just for the records. Correctly
defining an invalid pointer seems to be impossible, but we have some
agreed cases of valid and invalid pointer values:

- the null-pointer value is a valid and non-dereferenceable value;
- the address of a valid object is a valid pointer value;
- the address of a valid object becomes an invalid pointer value after
the object gets destroyed;
- the value of an uninitialized pointer is an invalid pointer value and,
according to the following, it also is a singular pointer value:

[lib.iterator.requirements] p. 5

"[...] Iterators can also have singular values that are not associated
with any container. [Example: After the declaration of an uninitialized
pointer x (as with int* x

, x must always be assumed to have a singular
value of a pointer. ] Results of most expressions are undefined for
singular values; the only exception is an assignment of a non-singular
value to an iterator that holds a singular value. [...]"

....and as the above states, the only thing that can be done with a
singular pointer value is to assign a non-singular pointer value to it.

Following the informal reasoning above, we can safely assign zero or the
address of a valid object to that invalid pointer (i.e. that pointer
containing a singular value).

Fast forward now...

Yes, this invokes rvalue conversion and UB.

OK about the conversion, still not convinced about the UB.

Fast forward once more...

If that's the case, then any uninitialized_fill performed on a storage
area of
raw pointers will lead to UB, as the Standard depicts, as expected
effect, the
fact of comparing two invalid pointers:

[citation formatted for presentation]

20.4.4.2 uninitialized_fill [lib.uninitialized.fill]

template <class ForwardIterator, class T>
void uninitialized_fill(ForwardIterator first,
ForwardIterator last,
const T& x);

1 Effects:

for (; first != last; ++first)
new (static_cast<void*>(&*first))
typename iterator_traits<ForwardIterator>::value_type(x);

Click to expand...

Click to expand...

Huh, no.

'first' and 'last' here are not invalid pointers: if pointers, then they
point /to/ the area to be filled.

Here we come to the point, assume this program, which should be
well-defined and well-behaving:

//-------
#include <iostream>
#include <memory>

using namespace std;

int main() {
size_t n = 4;
int* start = static_cast<int*>(
operator new(n * sizeof(int))
);
int* end = start + n;
uninitialized_fill(start, end, 42);
for(int* i = start; i < end; ++i) {
cout << *i << endl;
}
operator delete(start);
return 0;
}
//-------

By the time "start" gets initialized, it points to an uninitialized
storage area big enough to hold an int, but since that storage is
uninitialized, the pointer is currently invalid (we cannot dereference
it without invoking UB). "end" is an invalid pointer too.

Let's now enter the uninitialized_fill template function.

It gets called with this pseudo-signature:

uninitialized_fill<int*, int>(...)

which means that in the "Expected" section cited above, we have:

for (; first != last; ++first)

where "first == start" and "last == end", and all of them are of type
"class ForwardIterator = int*"

Following from all the above, we should have a standard algorithm that
invokes UB by comparing two invalid pointers.

Where is my reasoning flawed?

Formally you invoke UB when a vector containing invalid pointers is
destroyed.

That's because a simplistic implementation may iterate over the vector
contents and do pseudo destructor calls on the pointers (or it can do
anything at all).

In practice it's not anything I'd worry about, because leaving a vector
with invalid pointers is common practice, so implementations have to not
crash on that. However, to play nice, I guess one should always zero a
pointer in a vector (or other container) after making it invalid. Just
making sure.

I dont't think so.

If the standard was all too clear about everything then we'd have
nothing to discuss.

That doesn't really seem a good reason to keep a self-contradicting
standard (if it really is the case). I'd like to think that you're just
kidding

Alf P. Steinbach /Usenet · Sep 3, 2010

* Francesco S. Carta, on 03.09.2010 15:06:

OK, I've read your self-follow-up, just for the records. Correctly defining an
invalid pointer seems to be impossible, but we have some agreed cases of valid
and invalid pointer values:

- the null-pointer value is a valid and non-dereferenceable value;
- the address of a valid object is a valid pointer value;
- the address of a valid object becomes an invalid pointer value after the
object gets destroyed;
- the value of an uninitialized pointer is an invalid pointer value and,
according to the following, it also is a singular pointer value:

Yah, mostly that was my reasoning in my old "pointers tutorial" (referenced from
my blog, right hand column somewhere).

I introduced the concept of "RealGood" pointers there.

But unfortunately that term did not catch on.

[snip]

Here we come to the point, assume this program, which should be well-defined and
well-behaving:

//-------
#include <iostream>
#include <memory>

using namespace std;

int main() {
size_t n = 4;
int* start = static_cast<int*>(
operator new(n * sizeof(int))
);
int* end = start + n;
uninitialized_fill(start, end, 42);
for(int* i = start; i < end; ++i) {
cout << *i << endl;
}
operator delete(start);
return 0;
}
//-------

By the time "start" gets initialized, it points to an uninitialized storage area
big enough to hold an int, but since that storage is uninitialized, the pointer
is currently invalid (we cannot dereference it without invoking UB). "end" is an
invalid pointer too.

Let's now enter the uninitialized_fill template function.

It gets called with this pseudo-signature:

uninitialized_fill<int*, int>(...)

which means that in the "Expected" section cited above, we have:

for (; first != last; ++first)

where "first == start" and "last == end", and all of them are of type "class
ForwardIterator = int*"

Following from all the above, we should have a standard algorithm that invokes
UB by comparing two invalid pointers.

Where is my reasoning flawed?

The pointers are not invalid. They can be dereferenced. What you can't do is to
invoke an rvalue conversion on *p, because that would use an indeterminate
value. p itself is valid, *p is a valid reference, (*p)+2, for example, is bad.

[snip]

That doesn't really seem a good reason to keep a self-contradicting standard (if
it really is the case). I'd like to think that you're just kidding

He he.

Only partially... ;-)

Cheers & hth.,

- Alf

Stuart Redmann · Sep 3, 2010

Hi there,
as far as I've been able to understand, if a raw pointer contains an
invalid value (that is, it does not point to any valid object of the
type it is a pointer to) then some of the actions performed on these
pointers will lead to UB.

As it seems, two actions in particular should be safe and well defined:
- zeroing the invalid pointer;
- assigning a valid value to the invalid pointer;

One issue that has been recently raised in this group is about storing
invalid raw pointers into a container such as std::vector; the rationale
that led to define it as a potential source of UB is about the lvalue to
rvalue conversion that will be performed on those raw pointers during
internal reallocations of the container.

Since the only significant action that gets performed during the
reallocation is to copy such invalid pointer values from a storage to
another, it should boil down to something equivalent to this:

int* p = new int;
delete p;

Now "p" contains and invalid value.

int* q = p;

During the above assignment, an lvalue to rvalue conversion is performed
on "p", leading to undefined behavior.

Now my question is, would the following test also lead to an lvalue to
rvalue conversion on "p", therefore leading to UB?

int* p = new int;
delete p;
int* q = new int;
if(q != p) {
//...

}

If that's the case, then any uninitialized_fill performed on a storage
area of raw pointers will lead to UB, as the Standard depicts, as
expected effect, the fact of comparing two invalid pointers:

[citation formatted for presentation]

20.4.4.2 uninitialized_fill [lib.uninitialized.fill]

Click to expand...

template <class ForwardIterator, class T>
void uninitialized_fill(ForwardIterator first,
ForwardIterator last,
const T& x);

Click to expand...

1 Effects:

Click to expand...

for (; first != last; ++first)
new (static_cast<void*>(&*first))
typename iterator_traits<ForwardIterator>::value_type(x);

Click to expand...

Would all the above mean that we shouldn't really worry about UB when
dealing with invalid pointers into standard containers as long as we
don't dereference such invalid pointers, and accordingly, would that
mean that the standard needs to be modified to state these actions
(copying and comparing of invalid pointers) as well-defined?

I would rather like it if the standard made it some kind of platform-
dependent. Since nobody can cite some convincing rationale for UB, and
apparently lots of people use deleted pointer in containers, it makes
little sense to say that all those programs show UB.

Regards,
Stuart

Francesco S. Carta · Sep 3, 2010

on said:
* Francesco S. Carta, on 03.09.2010 15:06:

Alf P. Steinbach /Usenet <[email protected]>, on
03/09/2010

OK, I've read your self-follow-up, just for the records. Correctly
defining an
invalid pointer seems to be impossible, but we have some agreed cases
of valid
and invalid pointer values:

- the null-pointer value is a valid and non-dereferenceable value;
- the address of a valid object is a valid pointer value;
- the address of a valid object becomes an invalid pointer value after
the
object gets destroyed;
- the value of an uninitialized pointer is an invalid pointer value and,
according to the following, it also is a singular pointer value:

Click to expand...

Yah, mostly that was my reasoning in my old "pointers tutorial"
(referenced from my blog, right hand column somewhere).

I introduced the concept of "RealGood" pointers there.

But unfortunately that term did not catch on.

[snip]

Here we come to the point, assume this program, which should be
well-defined and
well-behaving:

//-------
#include <iostream>
#include <memory>

using namespace std;

int main() {
size_t n = 4;
int* start = static_cast<int*>(
operator new(n * sizeof(int))
);
int* end = start + n;
uninitialized_fill(start, end, 42);
for(int* i = start; i < end; ++i) {
cout << *i << endl;
}
operator delete(start);
return 0;
}
//-------

By the time "start" gets initialized, it points to an uninitialized
storage area
big enough to hold an int, but since that storage is uninitialized,
the pointer
is currently invalid (we cannot dereference it without invoking UB).
"end" is an
invalid pointer too.

Let's now enter the uninitialized_fill template function.

It gets called with this pseudo-signature:

uninitialized_fill<int*, int>(...)

which means that in the "Expected" section cited above, we have:

for (; first != last; ++first)

where "first == start" and "last == end", and all of them are of type
"class
ForwardIterator = int*"

Following from all the above, we should have a standard algorithm that
invokes
UB by comparing two invalid pointers.

Where is my reasoning flawed?

Click to expand...

The pointers are not invalid. They can be dereferenced. What you can't
do is to invoke an rvalue conversion on *p, because that would use an
indeterminate value. p itself is valid, *p is a valid reference, (*p)+2,
for example, is bad.

All right... I think I'm starting to understand. The difference between
the above and the following (restored from your previous reply)...

Yes, this invokes rvalue conversion and UB.

....is that "start" points to allocated (even if uninitialized) memory,
while "p" points to deallocated memory, is this all the difference that
makes one case as well-defined and the other as UB?

Francesco S. Carta · Sep 3, 2010

on said:
Formally you invoke UB when a vector containing invalid pointers is
destroyed.

That's because a simplistic implementation may iterate over the vector
contents and do pseudo destructor calls on the pointers (or it can do
anything at all).

I forgot to ask about this. What is a pseudo destructor call?

Alf P. Steinbach /Usenet · Sep 3, 2010

* Francesco S. Carta, on 03.09.2010 17:34:

I forgot to ask about this. What is a pseudo destructor call?

template< class Type >
void destroy( T& x ) { x.~Type(); }

invoked with say Type as int.

Cheers & hth.,

- Alf (and else-thread: yes)

Francesco S. Carta · Sep 3, 2010

on said:
* Francesco S. Carta, on 03.09.2010 17:34:

template< class Type >
void destroy( T& x ) { x.~Type(); }

invoked with say Type as int.

Ah, I see. Though, I cannot figure any implementation of std::vector not
calling the destructor on the types it contains - furthermore as it will
resolve to a no-op for the types that do not have a destructor. I wonder
why you labeled that as a simplistic implementation.

Francesco S. Carta · Sep 3, 2010

on said:
on said:

* Francesco S. Carta, on 03.09.2010 15:06:

Alf P. Steinbach /Usenet <[email protected]>, on
03/09/2010
13:28:40, wrote:

* Francesco S. Carta, on 03.09.2010 12:52:
Hi there,
as far as I've been able to understand, if a raw pointer contains an
invalid
value (that is, it does not point to any valid object of the type it
is a
pointer to) then some of the actions performed on these pointers will
lead to UB.

Hm, well you need to define "invalid" more precisely, e.g. as "is not
valid". ;-) Or even more precisely, "can not be dereferenced without
UB". For example, 0 is a valid pointer value, as is 1+p where p points
to the last element in an array.

Neither C++98 nor C++0x does, as far as I know, define "invalid
pointer", but C++0x defines "valid pointer" as pointing to a byte in
memory or being zero, in C++0x §2.9.2/3. The definition in C++0x is
perhaps too permissive. If taken literally the validity of a pointer
would in general not be deducible but would depend on whether the
address in question had been remapped by the HW, e.g. p would be
/valid/
immediately after delete p unless the delete affected the validity of
the address itself (e.g. by changing paging or segment setup).

But OK...

OK, I've read your self-follow-up, just for the records. Correctly
defining an
invalid pointer seems to be impossible, but we have some agreed cases
of valid
and invalid pointer values:

- the null-pointer value is a valid and non-dereferenceable value;
- the address of a valid object is a valid pointer value;
- the address of a valid object becomes an invalid pointer value after
the
object gets destroyed;
- the value of an uninitialized pointer is an invalid pointer value and,
according to the following, it also is a singular pointer value:

Click to expand...

Yah, mostly that was my reasoning in my old "pointers tutorial"
(referenced from my blog, right hand column somewhere).

I introduced the concept of "RealGood" pointers there.

But unfortunately that term did not catch on.

[snip]

Here we come to the point, assume this program, which should be
well-defined and
well-behaving:

//-------
#include <iostream>
#include <memory>

using namespace std;

int main() {
size_t n = 4;
int* start = static_cast<int*>(
operator new(n * sizeof(int))
);
int* end = start + n;
uninitialized_fill(start, end, 42);
for(int* i = start; i < end; ++i) {
cout << *i << endl;
}
operator delete(start);
return 0;
}
//-------

By the time "start" gets initialized, it points to an uninitialized
storage area
big enough to hold an int, but since that storage is uninitialized,
the pointer
is currently invalid (we cannot dereference it without invoking UB).
"end" is an
invalid pointer too.

Let's now enter the uninitialized_fill template function.

It gets called with this pseudo-signature:

uninitialized_fill<int*, int>(...)

which means that in the "Expected" section cited above, we have:

for (; first != last; ++first)

where "first == start" and "last == end", and all of them are of type
"class
ForwardIterator = int*"

Following from all the above, we should have a standard algorithm that
invokes
UB by comparing two invalid pointers.

Where is my reasoning flawed?

Click to expand...

The pointers are not invalid. They can be dereferenced. What you can't
do is to invoke an rvalue conversion on *p, because that would use an
indeterminate value. p itself is valid, *p is a valid reference, (*p)+2,
for example, is bad.

Click to expand...

All right... I think I'm starting to understand. The difference between
the above and the following (restored from your previous reply)...

Yes, this invokes rvalue conversion and UB.

Click to expand...

...is that "start" points to allocated (even if uninitialized) memory,
while "p" points to deallocated memory, is this all the difference that
makes one case as well-defined and the other as UB?

Alf, I suppose your "(and else-thread: yes)" was about the above.

I wonder why some of you "better knowing" ones need to be "extracted"
the info with the clamps, sometimes ;-)

But I think it's fine, nonetheless. It forced me to dig the issue till I
realized the actual answer: had you pointed it out directly, it would
not have had the same value for my comprehension, so thank you very much
for your replies, Alf.

Bo Persson · Sep 3, 2010

Stuart said:
I would rather like it if the standard made it some kind of
platform- dependent. Since nobody can cite some convincing
rationale for UB, and apparently lots of people use deleted pointer
in containers, it makes little sense to say that all those programs
show UB.

You can't test for UB, because it IS undefined. "Seems to work" isn't
good enough!

Those of us who used to program with 16-bit segmented memory (80286)
know that a deallocated segment WILL trap if you load a segment
register with a selector for a segment that has been removed from the
descriptor tables.

The hardware is still there for 32-bit x86, but the most popular
operating systems have decided to just use a single segment, which
masks that problem. Is this good enough reason to change the language
definition? Who knows!?

Bo Persson

Pavel · Sep 4, 2010

Andy said:
Hmm, I'm still puzzled. I've worked with architectures where a block of
memory is precisely bounded (fortunately for my sanity, never on a '286
- I've only ever used the MMU to get to bigger addresses, and _that_
gave me a headache) but even in those merely accessing the memory
containing the invalid pointer wouldn't cause a fault.

It's easy to understand: an implementation must map a pointer to some
hardware type -- that is define how it is stored in memory, and (very
probably for a pointer type) how it is stored in registers. Some
architectures have special registers specifically for the pointers, in
particular in 8086/8088 (not even 286), one kind of pointers, namely
"long" pointers (two 16-bit words) had to be loaded into a pair of
16-bit special registers, for example DS:SI or DS:EI. It has some
advantages (as well as drawbacks) to use special registers for the
pointers as opposed to using general registers for both pointers and
integers; in particular, as soon as a special pointer register is
loaded, the hardware can speculate that is probably will be dereferenced
soon, so the hardware can start some background job of reloading shadow
translation tables (as in protected mode of 80286) or updating the
nearby cache lines (as in modern CPUs) etc. You can think of it as a
"pre-fetch" or "proactive hardware-level-dereferencing". It is up to the
hardware designer how to work out the situation when the "hardware
dereferencing" can't be done because the memory pointed to does not
exist -- whether to work around the error or throw it back in the face
of the faulty program.

Why would a compiler load the pointers to the registers if no
dereferencing is required by the program? -- for example to compare them
(a good explanation of why comparing of not valid pointers should be
UB). In real mode of 8086, for example, addresses A000:ABCD and
AA00:0BCD were pointing to the same byte in the memory -- it would be
logical if the hardware registers provided some help in comparison these
for equality and lesser/greater (in reality they did not though as far
as I can remember).

Pointing to non-existing memory is only one example of how uninitialized
pointer can alarm the hardware; others may include special "mode" bits
in the pointers where not every combination makes sense or allowed in
the current CPU mode and what not (although I don't have ready examples
for these).

Copying the pointer - I can see why that could cause a fault, if the
system chose to copy the pointer as a pointer not raw bytes, and loaded
into some kind of pointer register. At that time the HW has the
opportunity to validate it.

All the architectures I've done this on BTW give you the memory you ask
for, and range check it - so de-referencing p+1 where p points to the
last element would fault. (And I've not used C in anger on any of them,
never mind C++)

But I still can't imagine any architecture where copying the bits of an
invalid pointer into an int (via a union) might cause a fault, but any
valid pointer would be fine. Not even the ones where ints are sometimes
BCD - on those valid pointers might fault.

Ah well, these day's I'm on X86/X64 only (or .net ) and the
architecture is flat with no protection - so not my problem!

Andy

-Pavel

Bo Persson · Sep 4, 2010

Andy said:
Hmm, I'm still puzzled. I've worked with architectures where a
block of memory is precisely bounded (fortunately for my sanity,
never on a '286 - I've only ever used the MMU to get to bigger
addresses, and _that_ gave me a headache) but even in those merely
accessing the memory containing the invalid pointer wouldn't cause
a fault.
Copying the pointer - I can see why that could cause a fault, if the
system chose to copy the pointer as a pointer not raw bytes, and
loaded into some kind of pointer register. At that time the HW has
the opportunity to validate it.

Right.

And the language standard has chosen not to prescribe how the hardware
should copy a pointer.

All the architectures I've done this on BTW give you the memory you
ask for, and range check it - so de-referencing p+1 where p points
to the last element would fault. (And I've not used C in anger on
any of them, never mind C++)

But I still can't imagine any architecture where copying the bits
of an invalid pointer into an int (via a union) might cause a
fault, but any valid pointer would be fine. Not even the ones
where ints are sometimes BCD - on those valid pointers might fault.

Ah well, these day's I'm on X86/X64 only (or .net ) and the
architecture is flat with no protection - so not my problem!

No, but it is language problem. You can very well "limit" the
portability by specifying flat memory, 8-bit bytes, 32-bit ints, and
IEEE floating point for your programs. Java does that for you, and I
guess .NET does too.

C++ wants to be *natively* implementable on a wider range of
platforms. Therefore it cannot specify copying a pointer as an int,
because that might hurt badly on hardware with special address
registers, or CPUs without special integer registers.

http://en.wikipedia.org/wiki/CDC_6600#The_Central_Processor_.28CP.29

http://en.wikipedia.org/wiki/Motorola_68000#Internal_registers

It is true that most current hardware doesn't show these
peculiarities, but why limit a programming language to current
hardware?

Bo Persson

Stuart Redmann · Sep 6, 2010

You can't test for UB, because it IS undefined. "Seems to work" isn't
good enough!

I don't quite get you. What do you mean by "testing" for UB? "Seems to
work" is good enough for me when the compiler emits the right binary
(if the produced binary is well-formed, I don't need to care whether
the compiled source code is ill-formed).

Those of us who used to program with 16-bit segmented memory (80286)
know that a deallocated segment WILL trap if you load a segment
register with a selector for a segment that has been removed from the
descriptor tables.

Right. But from my point of view loading a pointer into the segment
registers is only necessary if you want to dereference it. It should
not be a problem to compare two invalid pointers for equality without
loading either of them into the segment selectors. The same goes for
destructing an invalid pointer (which is a no-op even on the 8086
AFAIK).

The hardware is still there for 32-bit x86, but the most popular
operating systems have decided to just use a single segment, which
masks that problem. Is this good enough reason to change the language
definition? Who knows!?

I wanted to say that the standard could say something like this: Any
use of invalid pointers except dereferencing is platform-dependent,
dereferencing it is UB.

Regards,
Stuart

James Kanze · Sep 6, 2010

On 04/09/2010 13:01, Bo Persson wrote:

[...]

I suspect the language is already limited.

Take this nice simple bit of C:

char* p = malloc(10);

How is the compiler supposed to know that the pointer returned from
malloc is supposed to be a pointer to chars?

At that level, it doesn't. All it can do is memorize the upper
and lower bounds.

Some architectures have different pointer types depending on
the target data. That'll give me 10 bytes back, and I suppose
the compiler could work out that the cast from void* to char*
should do a pointer conversion to a char type pointer with
a bound of 10.

Then I write

float* fp = (void*)p;

Floats are bigger. Let's say that they have a size of 8 bytes
on this architecture - so is it going to put a range check on
the memory pointer of 1 float?

No, but it can easily generate code which will detect that fp+1
will access bytes beyond the address p+10 (if it is
dereferenced).

And if it does, and I cast it back to char*, will it remember
the size of 10?

It doesn't have to. All it has to do is remember the upper and
lower bounds.

What do you do on those TI graphics processors, where the
native address is a bit address not a byte address? Is
sizeof(char) == 8?

Or bigger. That's a requirement of the standard.

(We just wrote _c_a_r_e_f_u_l_l_y_, especially when we were
dealing with the arrays of 3-bit items which were the graphics
memory, whose sizeof was not a good thing to ask!)

Most likely, if you were using C, the compiler had some
extensions to support the extra addressing possibilities.

Francesco S. Carta · Sep 6, 2010

On 3 Sep., Francesco S. Cartawrote:
[...]

Would all the above mean that we shouldn't really worry about UB when
dealing with invalid pointers into standard containers as long as we
don't dereference such invalid pointers, and accordingly, would that
mean that the standard needs to be modified to state these actions
(copying and comparing of invalid pointers) as well-defined?

Click to expand...

Click to expand...

I would rather like it if the standard made it some kind of platform-
dependent. Since nobody can cite some convincing rationale for UB, and
apparently lots of people use deleted pointer in containers, it makes
little sense to say that all those programs show UB.

Click to expand...

I'd rather the standard just require those programs to be legal
and well defined. There are two issues:
-- lvalue to rvalue conversion of an invalid pointer, and
-- allowing a container to make gratuitous lvalue to rvalue
conversions, for no real reason.
I have no problem with the lvalue to rvalue conversion remaining
undefined behavior, but I do object to the idea that it might
occur when I'm not looking, when there's no reason for it to
occur. If, for example, I have a class which contains an
std::vector<T*> myVector (for some type T), and in the
destructor I write:

for (std::vector<T*>::iterator iter = myVector.begin();
iter != myVector.end();
++ iter) {
delete *iter;
}

there should be no undefined behavior. Similarly in the case
of:

delete myVector;
myVector = NULL;

In these two cases (and many others), there's just no reason for
an lvalue to rvalue conversion of the invalid pointer to occur
(and in fact, it doesn't occur in any existing implementation).

So, if I understand you correctly, the problem is that the standard does
not explicitly forbid lvalue to rvalue conversions during the
destruction of a container, did I understand you correctly?

I suppose you're /not/ including, say, vector reallocation as a case
where the lvalue to rvalue conversion would be gratuitous - I ask
because I cannot conceive any way to reallocate the internals of a
vector without performing such a conversion, at least not under the
current standard.

Francesco S. Carta · Sep 6, 2010

On 04/09/2010 13:01, Bo Persson wrote:
[...]

C++ wants to be *natively* implementable on a wider range of
platforms. Therefore it cannot specify copying a pointer as an int,
because that might hurt badly on hardware with special address
registers, or CPUs without special integer registers.
http://en.wikipedia.org/wiki/CDC_6600#The_Central_Processor_.28CP.29
http://en.wikipedia.org/wiki/Motorola_68000#Internal_registers
It is true that most current hardware doesn't show these
peculiarities, but why limit a programming language to current
hardware?

Click to expand...

Click to expand...

I suspect the language is already limited.

Click to expand...

Take this nice simple bit of C:

Click to expand...

char* p = malloc(10);

Click to expand...

How is the compiler supposed to know that the pointer returned from
malloc is supposed to be a pointer to chars?

Click to expand...

At that level, it doesn't. All it can do is memorize the upper
and lower bounds.

Some architectures have different pointer types depending on
the target data. That'll give me 10 bytes back, and I suppose
the compiler could work out that the cast from void* to char*
should do a pointer conversion to a char type pointer with
a bound of 10.

Click to expand...

Then I write

Click to expand...

float* fp = (void*)p;

Click to expand...

Floats are bigger. Let's say that they have a size of 8 bytes
on this architecture - so is it going to put a range check on
the memory pointer of 1 float?

Click to expand...

No, but it can easily generate code which will detect that fp+1
will access bytes beyond the address p+10 (if it is
dereferenced).

And if it does, and I cast it back to char*, will it remember
the size of 10?

Click to expand...

It doesn't have to. All it has to do is remember the upper and
lower bounds.

What do you do on those TI graphics processors, where the
native address is a bit address not a byte address? Is
sizeof(char) == 8?

Click to expand...

Or bigger. That's a requirement of the standard.

Uh? Is that an oversight of yours, James, or you did really mean that?

As I read the standard, sizeof(char) == 1, no more, no less... maybe you
mistaken that line for CHAR_BIT == 8?

If an implementation has sizeof(char) != 1, that would be not
conforming, if I'm not mistaken.

Bo Persson · Sep 6, 2010

Stuart said:
I don't quite get you. What do you mean by "testing" for UB? "Seems
to work" is good enough for me when the compiler emits the right
binary (if the produced binary is well-formed, I don't need to care
whether the compiled source code is ill-formed).

I mean running unit tests for your code.

Because the UB doesn't have to be consistent, passing the tests
doesn't tell us if the code works, just that it works sometimes.
"Seems to work".

UB is really evil!

Bo Persson

Should I be using something other than raw pointers as a helper indexinto a collection?	17	Jun 1, 2013
std::list of class pointers, understanding problem (with minimal example)	0	Jul 28, 2010
How do you detect and handle invalid pointers?	9	Nov 2, 2010
std::list of class pointers, understanding problem (with minimal example)	4	Jul 28, 2010
Pointers + dealing with integer arrays and strings	2	Jul 12, 2007
differentiating between pointers - "primary"?	9	May 24, 2012
On generally accepted terminology for pointers and arrays	61	Apr 12, 2011
Templates, Structs and Invalid Pointers - where did it go wrong	50	Aug 28, 2008

UB while dealing with invalid raw pointers, the std::uninitialized_fillcase

Francesco S. Carta

Alf P. Steinbach /Usenet

Alf P. Steinbach /Usenet

Francesco S. Carta

Alf P. Steinbach /Usenet

Stuart Redmann

Francesco S. Carta

Francesco S. Carta

Alf P. Steinbach /Usenet

Francesco S. Carta

Francesco S. Carta

Bo Persson

Pavel

Bo Persson

Stuart Redmann

James Kanze

Francesco S. Carta

Francesco S. Carta

Bo Persson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads