copy smaller array into bigger array?

J

James Kanze

"Francesco"
Could you please explain me why it isn't good to use
"&from[Nfrom]", being that "from + Nfrom" is just
exactly the same thing, and since the standard allows me
to take the address of the first element past the array
in either way? Am I mistaking the standard or is it
just matter of tastes?
The former takes the address of a nonexisting element. The
latter only uses pointer arithmetic.
The former dereferences a pointer to a nonexisting element.
&from[Nfrom] is, by definition, &(*(from + Nfrom)). And
that * operator in there results in undefined behavior.
I don't agree here.

Whether you agree or not, it's what the standard says.
As I understand it, it is the act of _accessing_ the first
element past the array to be undefined behavior,

Concerning unary *, the standard says that "result is an lvalue
referring to the object or function to which the expression
points." There must be an object or function there, or you have
undefined behavior.

In this regard, the standard bases itself on C90. In C90, the
question was actually raised, and answered by the C
committee---it is undefined behavior. The authors of the C
standard didn't like this, so in C99, they introduced a special
case:
The unary & operator returns the address of its operand.
If the operand has type ‘‘type’’, the result has type
‘‘pointer to type’’. If the operand is the result of a
unary * operator, neither that operator nor the &
operator is evaluated and the result is as if both were
omitted, except that the constraints on the operators
still apply and the result is not an lvalue. Similarly,
if the operand is the result of a [] operator, neither
the & operator nor the unary * that is implied by the []
is evaluated and the result is as if the & operator were
removed and the [] operator were changed to a +
operator. Otherwise, the result is a pointer to the
object or function designated by its operand.
The special case is present because such an expression would
otherwise be undefined behavior---C++ doesn't have this special
case.

You're point is, however, interesting. I don't know why the C
committee took this route, rather than simply stating that it is
only undefined behavior if there is an lvalue to rvalue
conversion or an attempt to modify the object through the
lvalue. Off hand, that would seem to me to be a simpler and
more general solution.
not the act of composing, getting or passing its address in
any manner.
Stroustrup does this in his example code...
int v[] = { 1, 2, 3, 4};
int* p3 = &v[4];
...and I wasn't able to find any clause of the standard giving
a different behavior or different requisites of behavior to
the process of creating an address using unary *, unary & and
the subscript operator.

The standard very clearly says that v[4] is the exact equivalent
of *(v + 4). Period. Unlike the C standard, it doesn't make
any exceptions. The standard also says that the result of a
unary * _must_ designate an object, and in this case, there is
no object at v+4, so the expression is undefined behavior.

FWIW, this fact wasn't recognized when the C standard was
originally adopted, which means that it wasn't realized when
much of the C++ standard was being written (and perhaps when
Stroustrup wrote the text you site). Once the problem was
pointed out, the C committee, in a response to a DR or in a
request for interpretation, recognized that as the standard
stood, it was undefined behavior---by that time, they were
already in the process of creating C99, so they added the
special text to remove the undefined behavior. I seem to
remember a little bit of discussion in the C++ committee as to
whether C++ should take the same steps, but in the end, it
didn't. (One of the arguments, IIRC, was that programmers
shouldn't use this form anyway, since in something like:
std::vector< int > v( 4 ) ;
int* p3 = &v[ 4 ] ;
it is also undefined behavior, and in a checking implementation,
will cause a fatal error. And I think it's impossible to
implement a checking implementation where this wouldn't be the
case; vector<>::eek:perator[] would have to return a proxy which
overloaded operator&... and also operator. .)
 
F

Francesco

"Francesco"
Could you please explain me why it isn't good to use
"&from[Nfrom]", being that "from + Nfrom" is just
exactly the same thing, and since the standard allows me
to take the address of the first element past the array
in either way? Am I mistaking the standard or is it
just matter of tastes?
The former takes the address of a nonexisting element. The
latter only uses pointer arithmetic.
The former dereferences a pointer to a nonexisting element.
&from[Nfrom] is, by definition, &(*(from + Nfrom)). And
that * operator in there results in undefined behavior.
I don't agree here.

Whether you agree or not, it's what the standard says.
As I understand it, it is the act of _accessing_ the first
element past the array to be undefined behavior,

Concerning unary *, the standard says that "result is an lvalue
referring to the object or function to which the expression
points." There must be an object or function there, or you have
undefined behavior.

In this regard, the standard bases itself on C90. In C90, the
question was actually raised, and answered by the C
committee---it is undefined behavior. The authors of the C
standard didn't like this, so in C99, they introduced a special
case:
The unary & operator returns the address of its operand.
If the operand has type ‘‘type’’, the result has type
‘‘pointer to type’’. If the operand is the result of a
unary * operator, neither that operator nor the &
operator is evaluated and the result is as if both were
omitted, except that the constraints on the operators
still apply and the result is not an lvalue. Similarly,
if the operand is the result of a [] operator, neither
the & operator nor the unary * that is implied by the []
is evaluated and the result is as if the & operator were
removed and the [] operator were changed to a +
operator. Otherwise, the result is a pointer to the
object or function designated by its operand.
The special case is present because such an expression would
otherwise be undefined behavior---C++ doesn't have this special
case.

You're point is, however, interesting. I don't know why the C
committee took this route, rather than simply stating that it is
only undefined behavior if there is an lvalue to rvalue
conversion or an attempt to modify the object through the
lvalue. Off hand, that would seem to me to be a simpler and
more general solution.
not the act of composing, getting or passing its address in
any manner.
Stroustrup does this in his example code...
int v[] = { 1, 2, 3, 4};
int* p3 = &v[4];
...and I wasn't able to find any clause of the standard giving
a different behavior or different requisites of behavior to
the process of creating an address using unary *, unary & and
the subscript operator.

The standard very clearly says that v[4] is the exact equivalent
of *(v + 4). Period. Unlike the C standard, it doesn't make
any exceptions. The standard also says that the result of a
unary * _must_ designate an object, and in this case, there is
no object at v+4, so the expression is undefined behavior.

FWIW, this fact wasn't recognized when the C standard was
originally adopted, which means that it wasn't realized when
much of the C++ standard was being written (and perhaps when
Stroustrup wrote the text you site). Once the problem was
pointed out, the C committee, in a response to a DR or in a
request for interpretation, recognized that as the standard
stood, it was undefined behavior---by that time, they were
already in the process of creating C99, so they added the
special text to remove the undefined behavior. I seem to
remember a little bit of discussion in the C++ committee as to
whether C++ should take the same steps, but in the end, it
didn't. (One of the arguments, IIRC, was that programmers
shouldn't use this form anyway, since in something like:
std::vector< int > v( 4 ) ;
int* p3 = &v[ 4 ] ;
it is also undefined behavior, and in a checking implementation,
will cause a fatal error. And I think it's impossible to
implement a checking implementation where this wouldn't be the
case; vector<>::eek:perator[] would have to return a proxy which
overloaded operator&... and also operator. .)

--
James Kanze (GABI Software) email:[email protected]
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Your post is really interesting, I'm happy to have stomped on such
issue, being that it still seems to be debated.

Now please take a look to these passages I extracted from
[N2914=09-0104] - by the way, is this the last normative reference? I
hope I didn't mistake the meaning of "working draft" and "current
draft" :-/ If I mistaken it, please somebody point me out which is the
document number I must get.

[ please excuse me in advance for any mistake I could have made citing
the standard, I've cut most of the paragraphs I'm citing just to
highlight the passages that seem meaningful in this context, don't
take any cut as a mean to change the sense of the sentences ]

----------------------

4 [conv] / 5

There are some contexts where certain conversions are suppressed. For
example, the lvalue-to-rvalue conversion is not done on the operand of
the unary & operator.

5.3.1 [expr.unary.op] / 3

The result of the unary & operator is a pointer to its operand.

5.7 [expr.add] / 6

Unless both pointers point to elements of the same array object, or
one past the last element of the array object, the behavior is
undefined. 78 <- superscript, for the following footnote:

[same page, footnote]
78) [...] an implementation need only provide one extra byte (which
might overlap another object in the program) just after the end of the
object in order to satisfy the “one past the last element”
requirements.

8.3.4 [dcl.array] / 6

Except where it has been declared for a class (13.5.5), the subscript
operator [] is interpreted in such a way that E1[E2] is identical to *
((E1)+(E2)).

24.2 [iterator.concepts] / 6

[...] a regular pointer to an array guarantees that there is a pointer
value pointing past the last element of the array, [...]

---------------------------------

The footnote #78 seems to tell that there will actually be something
at that address.
Also 4 [conv] / 5 seems the same exception introduced by C99, am I
right?

I'm really willing to understand which is the current position of the
standard about this point - regardless whether my intuitions were
right or not.

Best regards,
Francesco
 
J

James Kanze

In this regard, the standard bases itself on C90. In C90,
the question was actually raised, and answered by the C
committee---it is undefined behavior. The authors of the C
standard didn't like this, so in C99, they introduced a
special case:
The unary & operator returns the address of its operand.
If the operand has type ‘‘type’’, the result has type
‘‘pointer to type’’. If the operand is the result of a
unary * operator, neither that operator nor the &
operator is evaluated and the result is as if both were
omitted, except that the constraints on the operators
still apply and the result is not an lvalue. Similarly,
if the operand is the result of a [] operator, neither
the & operator nor the unary * that is implied by the []
is evaluated and the result is as if the & operator were
removed and the [] operator were changed to a +
operator. Otherwise, the result is a pointer to the
object or function designated by its operand.
The special case is present because such an expression would
otherwise be undefined behavior---C++ doesn't have this
special case.

[...]
The standard very clearly says that v[4] is the exact equivalent
of *(v + 4). Period. Unlike the C standard, it doesn't make
any exceptions. The standard also says that the result of a
unary * _must_ designate an object, and in this case, there is
no object at v+4, so the expression is undefined behavior.
FWIW, this fact wasn't recognized when the C standard was
originally adopted, which means that it wasn't realized when
much of the C++ standard was being written (and perhaps when
Stroustrup wrote the text you site). Once the problem was
pointed out, the C committee, in a response to a DR or in a
request for interpretation, recognized that as the standard
stood, it was undefined behavior---by that time, they were
already in the process of creating C99, so they added the
special text to remove the undefined behavior. I seem to
remember a little bit of discussion in the C++ committee as
to whether C++ should take the same steps, but in the end,
it didn't. (One of the arguments, IIRC, was that
programmers shouldn't use this form anyway, since in
something like:
std::vector< int > v( 4 ) ;
int* p3 = &v[ 4 ] ;
it is also undefined behavior, and in a checking
implementation, will cause a fatal error. And I think it's
impossible to implement a checking implementation where this
wouldn't be the case; vector<>::eek:perator[] would have to
return a proxy which overloaded operator&... and also
operator. .)
Your post is really interesting, I'm happy to have stomped on
such issue, being that it still seems to be debated.

It's not still being debated. It was debated by the C
committee: they made a definite statement that
&array[onePastTheEnd] is undefined behavior, and made a change
in the standard to allow it. After that change in C, there was
a little discussion about propagating it to C++, but on the
whole, I think the consensus was that it wasn't worth it,
perhaps based on the fact that the C++ committee expects people
to use vector, and not C style arrays, and that it isn't
reasonably possible to make it work with vector.
Now please take a look to these passages I extracted from
[N2914=09-0104] - by the way, is this the last normative
reference? I hope I didn't mistake the meaning of "working
draft" and "current draft" :-/ If I mistaken it, please
somebody point me out which is the document number I must get.

N2914 is, I think, the latest working draft (unless there is a
later one I missed---the "latest working draft" is constantly
changing). It certainly is *not* the current standard, and
differs from it in a large number of ways. To my knowledge,
however, nothing has changed concerning this issue since C++98.
----------------------
4 [conv] / 5
There are some contexts where certain conversions are suppressed. For
example, the lvalue-to-rvalue conversion is not done on the operand of
the unary & operator.
5.3.1 [expr.unary.op] / 3
The result of the unary & operator is a pointer to its operand.
5.7 [expr.add] / 6
Unless both pointers point to elements of the same array object, or
one past the last element of the array object, the behavior is
undefined. 78 <- superscript, for the following footnote:
[same page, footnote]
78) [...] an implementation need only provide one extra byte (which
might overlap another object in the program) just after the end of the
object in order to satisfy the “one past the last element”
requirements.
8.3.4 [dcl.array] / 6
Except where it has been declared for a class (13.5.5), the subscript
operator [] is interpreted in such a way that E1[E2] is identical to *
((E1)+(E2)).
24.2 [iterator.concepts] / 6
[...] a regular pointer to an array guarantees that there is a pointer
value pointing past the last element of the array, [...]
---------------------------------

Most of this is irrelevant. The unary operator * results in
undefined behavior unless the dereferenced pointer designates an
object: "the result is an lvalue referring to the object or function
to
which the expression points". That's all we have concerning the
semantics, so if the expression doesn't point to an object,
behavior is undefined. No later operations on the expression
can undo the undefinedness.
The footnote #78 seems to tell that there will actually be something
at that address.

Footnotes aren't normative, and the footnote in question says
just the opposite---that there is no need for an object of the
correct type to be present. (The footnote is misleading,
however, since there is not even a need for the extra byte
unless the system traps invalid pointers; none that I know of
do.)
Also 4 [conv] / 5 seems the same exception introduced by C99, am I
right?

Not at all. First, of course, the text you cite is a
non-normative note. And it doesn't really make much
sense---conversions aren't "suppressed", they simply aren't done
unless necessary. The lvalue to rvalue conversion only occurs
if an lvalue expression is used in a context which requires an
rvalue. There are lots of places it doesn't occur: on the left
side of an assignment, for starters.

And of course, that's totally irrelevant here; I suggested that
a better way of specifying the C exception might be to limit the
undefined behavior to lvalue to rvalue conversions and attempts
to modify the object through the lvalue, but neither standard
actually takes this route, and C++ doesn't have the C exception.
The unary * operator (for built-in types) must refer to an
object of the correct type, or the behavior is undefined (since
the standard doesn't define it, see §1.3.13).

And that's all. You can't get around that---the C standard
creates a specific exception if (and only if) the expression is
immediately the operand of a & operator, but C++ doesn't have
this exception.
 
F

Francesco

In this regard, the standard bases itself on C90.  In C90,
the question was actually raised, and answered by the C
committee---it is undefined behavior.  The authors of the C
standard didn't like this, so in C99, they introduced a
special case:
    The unary & operator returns the address of its operand.
    If the operand has type ‘‘type’’, the result has type
    ‘‘pointer to type’’. If the operand is the result of a
    unary * operator, neither that operator nor the &
    operator is evaluated and the result is as if both were
    omitted, except that the constraints on the operators
    still apply and the result is not an lvalue. Similarly,
    if the operand is the result of a [] operator, neither
    the & operator nor the unary * that is implied by the []
    is evaluated and the result is as if the & operator were
    removed and the [] operator were changed to a +
    operator. Otherwise, the result is a pointer to the
    object or function designated by its operand.
The special case is present because such an expression would
otherwise be undefined behavior---C++ doesn't have this
special case.

    [...]


The standard very clearly says that v[4] is the exact equivalent
of *(v + 4).  Period.  Unlike the C standard, it doesn't make
any exceptions.  The standard also says that the result of a
unary * _must_ designate an object, and in this case, there is
no object at v+4, so the expression is undefined behavior.
FWIW, this fact wasn't recognized when the C standard was
originally adopted, which means that it wasn't realized when
much of the C++ standard was being written (and perhaps when
Stroustrup wrote the text you site).  Once the problem was
pointed out, the C committee, in a response to a DR or in a
request for interpretation, recognized that as the standard
stood, it was undefined behavior---by that time, they were
already in the process of creating C99, so they added the
special text to remove the undefined behavior.  I seem to
remember a little bit of discussion in the C++ committee as
to whether C++ should take the same steps, but in the end,
it didn't.  (One of the arguments, IIRC, was that
programmers shouldn't use this form anyway, since in
something like:
    std::vector< int > v( 4 ) ;
    int* p3 = &v[ 4 ] ;
it is also undefined behavior, and in a checking
implementation, will cause a fatal error.  And I think it's
impossible to implement a checking implementation where this
wouldn't be the case; vector<>::eek:perator[] would have to
return a proxy which overloaded operator&... and also
operator. .)
Your post is really interesting, I'm happy to have stomped on
such issue, being that it still seems to be debated.

It's not still being debated.  It was debated by the C
committee: they made a definite statement that
&array[onePastTheEnd] is undefined behavior, and made a change
in the standard to allow it.  After that change in C, there was
a little discussion about propagating it to C++, but on the
whole, I think the consensus was that it wasn't worth it,
perhaps based on the fact that the C++ committee expects people
to use vector, and not C style arrays, and that it isn't
reasonably possible to make it work with vector.
Now please take a look to these passages I extracted from
[N2914=09-0104] - by the way, is this the last normative
reference? I hope I didn't mistake the meaning of "working
draft" and "current draft" :-/ If I mistaken it, please
somebody point me out which is the document number I must get.

N2914 is, I think, the latest working draft (unless there is a
later one I missed---the "latest working draft" is constantly
changing).  It certainly is *not* the current standard, and
differs from it in a large number of ways.  To my knowledge,
however, nothing has changed concerning this issue since C++98.


----------------------
4 [conv] / 5
There are some contexts where certain conversions are suppressed. For
example, the lvalue-to-rvalue conversion is not done on the operand of
the unary & operator.
5.3.1 [expr.unary.op] / 3
The result of the unary & operator is a pointer to its operand.
5.7 [expr.add] / 6
Unless both pointers point to elements of the same array object, or
one past the last element of the array object, the behavior is
undefined. 78 <- superscript, for the following footnote:
[same page, footnote]
78) [...] an implementation need only provide one extra byte (which
might overlap another object in the program) just after the end of the
object in order to satisfy the “one past the last element”
requirements.
8.3.4 [dcl.array] / 6
Except where it has been declared for a class (13.5.5), the subscript
operator [] is interpreted in such a way that E1[E2] is identical to *
((E1)+(E2)).
24.2 [iterator.concepts] / 6
[...] a regular pointer to an array guarantees that there is a pointer
value pointing past the last element of the array, [...]
---------------------------------

Most of this is irrelevant.  The unary operator * results in
undefined behavior unless the dereferenced pointer designates an
object: "the result is an lvalue referring to the object or function
to
which the expression points".  That's all we have concerning the
semantics, so if the expression doesn't point to an object,
behavior is undefined.  No later operations on the expression
can undo the undefinedness.
The footnote #78 seems to tell that there will actually be something
at that address.

Footnotes aren't normative, and the footnote in question says
just the opposite---that there is no need for an object of the
correct type to be present.  (The footnote is misleading,
however, since there is not even a need for the extra byte
unless the system traps invalid pointers; none that I know of
do.)
Also 4 [conv] / 5 seems the same exception introduced by C99, am I
right?

Not at all.  First, of course, the text you cite is a
non-normative note.  And it doesn't really make much
sense---conversions aren't "suppressed", they simply aren't done
unless necessary.  The lvalue to rvalue conversion only occurs
if an lvalue expression is used in a context which requires an
rvalue.  There are lots of places it doesn't occur: on the left
side of an assignment, for starters.

And of course, that's totally irrelevant here; I suggested that
a better way of specifying the C exception might be to limit the
undefined behavior to lvalue to rvalue conversions and attempts
to modify the object through the lvalue, but neither standard
actually takes this route, and C++ doesn't have the C exception.
The unary * operator (for built-in types) must refer to an
object of the correct type, or the behavior is undefined (since
the standard doesn't define it, see §1.3.13).

And that's all.  You can't get around that---the C standard
creates a specific exception if (and only if) the expression is
immediately the operand of a & operator, but C++ doesn't have
this exception.

--
James Kanze (GABI Software)             email:[email protected]
Conseils en informatique orientée objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Thank you very much for your detailed response once again -
furthermore, for being in reply to such an out-of-place-and-time post
like mine ;-) - your comments are valuable for me also because they
correct some false assumptions I've made on the importance of the
various parts of the standard (notes in particular).

After that, yes, I definitely mistaken the meaning of current/working
draft. I'll get the current standard as soon as possible - so then I
must get C++98, if I got your sentence right.

In another post, in another thread, I've said something like that the
only thing that interested me was seeing things working, but now I'm
really getting interested into things like "what is *expected* to
work" and "what works *breaking* the expectations".

I know it might be a different approach from "real life" issues &
needs, mine is a hobbyist - ah-hem! - a "private researcher"
approach ;-)

Thanks a lot, again,
best regards,
Francesco
 
J

James Kanze

[...]
After that, yes, I definitely mistaken the meaning of
current/working draft. I'll get the current standard as soon
as possible - so then I must get C++98, if I got your sentence
right.

The current standard is formally C++03, but that's really just
corrections (no fundamental changes) to C++98. The next version
will be significantly different, although exactly how is still
being debated. The document you cited is a draft of this next
version; it tends to change quite a lot. (For example, the
document you cited has concepts, which finally won't be in the
next version.)
In another post, in another thread, I've said something like
that the only thing that interested me was seeing things
working, but now I'm really getting interested into things
like "what is *expected* to work" and "what works *breaking*
the expectations".

It's a fairly delicate issue. On one hand, we have a standard
which says more or less (mostly more) exactly what must work,
what shouldn't work and what is not defined. On the other hand,
we have real compilers, and in the end, they're what we have to
deal with: if your code doesn't compile, the fact that the
standard says it's legal doesn't advance your project, and in
almost all real projects, we end up having to use things which
aren't covered by the standard: threads, sockets, GUI
interfaces... Still, I like to think of the standard as part of
the contract between the compiler vendor and me, even if the
compiler vendors often ignore certain parts of the standard
(e.g. export).
 
F

Francesco

    [...]
After that, yes, I definitely mistaken the meaning of
current/working draft. I'll get the current standard as soon
as possible - so then I must get C++98, if I got your sentence
right.

The current standard is formally C++03, but that's really just
corrections (no fundamental changes) to C++98.  The next version
will be significantly different, although exactly how is still
being debated.  The document you cited is a draft of this next
version; it tends to change quite a lot.  (For example, the
document you cited has concepts, which finally won't be in the
next version.)

Thank you for the further details and for the clarification about the
exact standard version I should get.
It's a fairly delicate issue.  On one hand, we have a standard
which says more or less (mostly more) exactly what must work,
what shouldn't work and what is not defined.  On the other hand,
we have real compilers, and in the end, they're what we have to
deal with: if your code doesn't compile, the fact that the
standard says it's legal doesn't advance your project, and in
almost all real projects, we end up having to use things which
aren't covered by the standard: threads, sockets, GUI
interfaces...  Still, I like to think of the standard as part of
the contract between the compiler vendor and me, even if the
compiler vendors often ignore certain parts of the standard
(e.g. export).

Interesting. I think I would be better getting different compilers to
check my code.

Thanks a lot, once more.

Best regards,
Francesco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,481
Members
44,900
Latest member
Nell636132

Latest Threads

Top