Test if const_iterator may be dereferenced - with no direct accessto original vector.

M

mathog

What does one do in this situation:

....
Glib::ustring::const_iterator icc;
....
icc = _spans[lastspan].input_stream_first_character;
....
// need a test here to see if the next line is safe
if(*icc){

In the textbook examples one has both the vector and the iterator, so
the test can be rolled together on one line like:

if(icc != avector.end() && *icc)

In this case it isn't entirely clear which vector that iterator is
referencing. (Because _spans hangs onto the iterator but does not
store the vector, at least not publicly.) It might (might!) be possible
to hunt the vector down, by chasing backwards through half a dozen
objects to find it, but why should that be necessary? Is there not in
C++ something like:

if(icc->dereferencable()){

?

This came up in a situation where an empty text span was embedded
between others with characters. So on the 3rd span (or whatever it was)
the value of icc was set to a non-dereferencable value from the get go,
that value having been stored there long ago and far away in the code.

Of course without the missing test the program segfaulted when it tried
to dereference the const_iterator for this empty span.

I suppose that the desired result could be accomplished with try/catch,
but wonder if C++ iterators do not in general have some method for doing
this.

Thank you,

David Mathog
 
B

Bart van Ingen Schenau

What does one do in this situation:

...
Glib::ustring::const_iterator icc;
...
icc = _spans[lastspan].input_stream_first_character;
...
// need a test here to see if the next line is safe if(*icc){

In the textbook examples one has both the vector and the iterator, so
the test can be rolled together on one line like:

if(icc != avector.end() && *icc)

In this case it isn't entirely clear which vector that iterator is
referencing. (Because _spans hangs onto the iterator but does not store
the vector, at least not publicly.)

Within the concept of C++ iterators, you always need *two* iterators: one
to indicate the current position and another to indicate the end of the
range. And although it is common for the end of a range to coincide with
the end of a container, this is by no means part of the concept of
iterators.
For that reason, the common solution would be:

...
Glib::ustring::const_iterator icc, end;
...
icc = _spans[lastspan].input_stream_first_character;
end = _spans[lastspan].input_stream_end;
...
// need a test here to see if the next line is safe if(*icc){
if (icc != end && *icc) /* do something */

The important change here is that a span knows where it ends. For the
calling code, it does not matter if that end coincides with the end of a
vector, or if that end happens to be the start of the next span.
It might (might!) be possible to
hunt the vector down, by chasing backwards through half a dozen objects
to find it, but why should that be necessary? Is there not in C++
something like:

if(icc->dereferencable()){

?

There are several problems with requiring such a function.
First of all, the function can't tell if the iterator is still within the
range it is meant to iterate over, because ranges are not required to end
on a non-dereferenceable iterator.
Secondly, iterators are meant to be lightweight objects. Not much more
than a pointer or a wrapper around one with knowledge how to access the
next element. As such, determining dereferenceability becomes as hard as
determining dereferencability for a plain pointer, which means
practically impossible.
This came up in a situation where an empty text span was embedded
between others with characters. So on the 3rd span (or whatever it was)
the value of icc was set to a non-dereferencable value from the get go,
that value having been stored there long ago and far away in the code.

Of course without the missing test the program segfaulted when it tried
to dereference the const_iterator for this empty span.

I suppose that the desired result could be accomplished with try/catch,
but wonder if C++ iterators do not in general have some method for doing
this.

As the segfault was the result of undefined behaviour, try/catch would
not have reliably helped you.
The general method for checking if an iterator is still within range is
to test if it has not reached the end iterator for that range yet.
Thank you,

David Mathog

Bart van Ingen Schenau
 
M

Marcel Müller

If I then append to the string, so the buffer now contains

"ABCdefghijklmnop"

Without any change whatsoever to the iterator it has now become valid -
it points at d.

no, this is undefined behavior. Changing a vector or string invalidates
all existing iterators of this instance. You must consider that the
append operation could require a reallocation.

However, your answer that you can't check whether an iterator is valid
and dereferencable is right. I think this has mainly be done for
performance reasons. In C++ iterators are intended to be very cheap to
copy. In many cases they are only one machine size word in size.

In other languages like Java iterators are heap objects. It doesn't
matter whether they are a few bytes larger or not.


Marcel
 
J

James Kanze

On 01/05/2013 21:57, mathog wrote:
No, there isn't, and for good reasons.

The good reason is probably because there's no way of
implementing it, given that you need a second iterator to know
whether you're at the end or not. Every iterator I wrote before
STL came along supported something like this (usually
icc.isValid()).
Inside a std::string there's usually a buffer containing the string. (I
don't think there _has_ to be, but that's another matter (1) ). That
string is a load of characters, usually bytes.
Imagine I have an internal vector, which for efficiency the string code
has initially allocated as 16 bytes even though it only contains "ABC".
I'll use ? as a marker for "undefined". The bytes are then

An iterator to C can be de-referenced, but if you increment it you get
one that cannot be de-referenced. There's nothing about that ? that
marks it as something that can't be accessed. Without
accessing the original collection there's nothing the iterator
can use either - and for reasons I don't know the original STL
design doesn't contain references from iterators into the
collection (2). And if you _do_ de-reference it you'll just
get whatever character happens to be in the first question
mark.

With most modern implementations, you'll get an assertion
failure. (At least, this is the case with VC++ and g++.)
If I then append to the string, so the buffer now contains

Without any change whatsoever to the iterator it has now become valid -
it points at d.

What happens in this case is undefined behavior. I suspect,
however, that most implementations would miss that error
(supposing that capacity() had been larger than the new string).
I can then set it to end(). Typically this will be an address one more
than p. Again, without reference to the collection you can't tell if
it's valid. And if you do de-reference it - well, you might get the byte
that follows p. Or that page in the processor's memory space might not
have been allocated, and you get an exception. So once more you are in
the realms of undefined behaviour.
(1) I just checked. In C++98 there's no requirement for there to be an
internal buffer, but for C++11 there is!
(2) But I can guess. Suppose the collection was on the heap, and was
deleted? Suppose the iterator was re-pointed into a different
collection? And if the collection was deleted, and another one of the
same type created in the same heap location, what then?

In most implementations, iterators register with the container,
so that they can be marked as invalid in such cases.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top