why prefix increment is faster than postfix increment?

B

Branimir Maksimovic

Keith said:
The implementation can use floating point registers in the
implementation of memcpy() if it can guarantee that doing so meets the
standard's requirements for memcpy(). Hardware exceptions aren't the
only consideration, as Christian Bau very clearly explained (see
above).

For "floating point registers", you can substitute any conceivable
implementation detail, including carrier pigeons carrying clay
tablets. It just has to work.


Unrealistic, but perfectly legal as far as the standard is concerned.

So basically you think that there is implementation that will run
time check every value before using mechanisms that produce
exceptions otherwise apply string copy semantics for memcpy?
This is completely opposite of what orginial message says,
that memcpy can have x=y semantics which is clearly not the case.


Greetings, Bane.
 
B

Branimir Maksimovic

William said:
Branimir said:
Christian said:
This is just wrong example, but if we observe this:
double x[2];y=0.;
memcpy((char*)x+1,&y,sizeof(y));
double t = *(double*)((char*)x+1); /* depends on
hardware tolerance to alignment */

I would recommend to write ((char *) x) + 1 instead of (char *) x + 1,
so that (1) everyone knows what the expression means without having to
look up the precedence of cast operators, and (2) everyone knows that
what you wrote is what you meant.

memcpy(x,&y,sizeof(y));
t = *x;

It is obvious that second case will be always faster or at least
equal then first case, even if memcpy have to copy same number of bytes
and use ram instead or sizeof (x) ==1 .

In this case, the first assignment to t will have undefined behavior.
There are implementations where it will crash, there are others where it
will be set t to the same value as y, just very slowly, but it is
undefined behavior.

Only on implementations where alignment requirement for a type
is not met.

No! It is undefined behaviour on any implementation.
The fact that it works and works the way you expect does
not make it defined behaviour

Ok, since I don't have C standard, you owe me a quote.
Assuming you did not intend the double negative, wrong.

It is not clear if you mean

- a memory allocator cannot be written in C
(i.e. a C function, say my_malloc, cannot be written)

this case.
However in either case you are incorrect

(as an extreme case consider a memory allocator that
allocates a block of 1 megabyte of memory, suitably
alligned for anything no matter how much memory is
asked for. Ruinously inefficient, but it certainly
can be done.)

I think that you don't undestand the point.
This would also be undefined behavior no matter if
chunk is aligned or not.
You claim undefined behavior in *every* case.
So memcpy would be also undefined behavior if implemented
in C.
What is true is that memory allocator takes care that returned
pointer is always aligned for a size.
Usually allocator grabs big chunk from OS
and then organize that in it's internal data structure.
Returned pointers usually point within that data structure.
This is perfectly legal thing to do.

Greetings, Bane.
 
W

William Hughes

Branimir said:
William said:
Branimir said:
Christian Bau wrote:
This is just wrong example, but if we observe this:
double x[2];y=0.;
memcpy((char*)x+1,&y,sizeof(y));
double t = *(double*)((char*)x+1); /* depends on
hardware tolerance to alignment */

I would recommend to write ((char *) x) + 1 instead of (char *) x + 1,
so that (1) everyone knows what the expression means without having to
look up the precedence of cast operators, and (2) everyone knows that
what you wrote is what you meant.

memcpy(x,&y,sizeof(y));
t = *x;

It is obvious that second case will be always faster or at least
equal then first case, even if memcpy have to copy same number of bytes
and use ram instead or sizeof (x) ==1 .

In this case, the first assignment to t will have undefined behavior.
There are implementations where it will crash, there are others where it
will be set t to the same value as y, just very slowly, but it is
undefined behavior.

Only on implementations where alignment requirement for a type
is not met.

No! It is undefined behaviour on any implementation.
The fact that it works and works the way you expect does
not make it defined behaviour

Ok, since I don't have C standard, you owe me a quote.
Assuming you did not intend the double negative, wrong.

It is not clear if you mean

- a memory allocator cannot be written in C
(i.e. a C function, say my_malloc, cannot be written)

this case.
However in either case you are incorrect

(as an extreme case consider a memory allocator that
allocates a block of 1 megabyte of memory, suitably
alligned for anything no matter how much memory is
asked for. Ruinously inefficient, but it certainly
can be done.)

I think that you don't undestand the point.


What point? Clealy my_malloc can be written
in C (or C++) just by having my_malloc call
malloc [note: the fact that this is not
sensible is not relevant to an existence proof].
This has nothing to do with the fact that
char addresses have no alignment restrictions.

Why do you think alignment restrictions on
char variables would prevent a memory allocator
from being written?

Or perhaps you are using the word
"allocator" to mean something like memcpy.
True, if there are alignment restrictions on
char variables then it is not trivial to
write my_memcpy in C. You need to
do some calculations and some masking and
shifting to handle end cases (this sort
of thing must be done by C implementatons
that define a byte to be smaller than
the smallest unit the hardware can address).
However, it is quite possible.


- William Hughes
 
B

Branimir Maksimovic

William said:
Branimir said:
William said:
Branimir Maksimovic wrote:
Christian Bau wrote:
This is just wrong example, but if we observe this:
double x[2];y=0.;
memcpy((char*)x+1,&y,sizeof(y));
double t = *(double*)((char*)x+1); /* depends on
hardware tolerance to alignment */

I would recommend to write ((char *) x) + 1 instead of (char *) x + 1,
so that (1) everyone knows what the expression means without having to
look up the precedence of cast operators, and (2) everyone knows that
what you wrote is what you meant.

memcpy(x,&y,sizeof(y));
t = *x;

It is obvious that second case will be always faster or at least
equal then first case, even if memcpy have to copy same number of bytes
and use ram instead or sizeof (x) ==1 .

In this case, the first assignment to t will have undefined behavior.
There are implementations where it will crash, there are others where it
will be set t to the same value as y, just very slowly, but it is
undefined behavior.

Only on implementations where alignment requirement for a type
is not met.

No! It is undefined behaviour on any implementation.
The fact that it works and works the way you expect does
not make it defined behaviour

Ok, since I don't have C standard, you owe me a quote.
This is a basic thing for implementing memory allocators.
memcpy works in all cases because it is defined that char is
aligned on any address.
If that wouldn't be the case then no memory allocator can't be written
in C or C++ without causing undefined behavior.

Assuming you did not intend the double negative, wrong.

It is not clear if you mean

- a memory allocator cannot be written in C
(i.e. a C function, say my_malloc, cannot be written)

this case.
However in either case you are incorrect

(as an extreme case consider a memory allocator that
allocates a block of 1 megabyte of memory, suitably
alligned for anything no matter how much memory is
asked for. Ruinously inefficient, but it certainly
can be done.)

I think that you don't undestand the point.


What point? Clealy my_malloc can be written
in C (or C++) just by having my_malloc call
malloc [note: the fact that this is not
sensible is not relevant to an existence proof].
This has nothing to do with the fact that
char addresses have no alignment restrictions.

Point is that if you return such pointer to application
every attempt to dereference or store anything within
that memory region would produce undefined behavior
by your definition.
unions are invented just to provide right alignment
for every member listed in common memory block.
Why do you think alignment restrictions on
char variables would prevent a memory allocator
from being written?

Oh, alignment restrictions would have not prevent memory
allocator to be written, it would just make lot of legal
C code into undefined behavior.
Or perhaps you are using the word
"allocator" to mean something like memcpy.
True, if there are alignment restrictions on
char variables then it is not trivial to
write my_memcpy in C.

Not that it is not trivial, it is impossible
in that case.

You need to
do some calculations and some masking and
shifting to handle end cases (this sort
of thing must be done by C implementatons
that define a byte to be smaller than
the smallest unit the hardware can address).

This is not the case and is not conformant at least with
the C++ standard.
Such implementation would simply have CHAR_BIT == 32
in limit.h for example, on implementation that can address
32 bit unit minimaly, because sizeof(char) *must*
always be equal to 1, not 0.25 or 4 or such.
This is how it is defined in C++. Perhaps
someone with C standard can confirm or rebute
this assertion?

Greetings, Bane.
 
J

Jordan Abel

This is not the case and is not conformant at least with
the C++ standard.
Such implementation would simply have CHAR_BIT == 32
in limit.h for example, on implementation that can address
32 bit unit minimaly, because sizeof(char) *must*
always be equal to 1, not 0.25 or 4 or such.
This is how it is defined in C++. Perhaps
someone with C standard can confirm or rebute
this assertion?

Greetings, Bane.


Or you could have the compiler handle everything. the "as if" rule
strikes again.
 
W

William Hughes

Branimir said:
William said:
Branimir said:
William Hughes wrote:
Branimir Maksimovic wrote:
Christian Bau wrote:
This is just wrong example, but if we observe this:
double x[2];y=0.;
memcpy((char*)x+1,&y,sizeof(y));
double t = *(double*)((char*)x+1); /* depends on
hardware tolerance to alignment */

I would recommend to write ((char *) x) + 1 instead of (char *) x + 1,
so that (1) everyone knows what the expression means without having to
look up the precedence of cast operators, and (2) everyone knows that
what you wrote is what you meant.

memcpy(x,&y,sizeof(y));
t = *x;

It is obvious that second case will be always faster or at least
equal then first case, even if memcpy have to copy same number of bytes
and use ram instead or sizeof (x) ==1 .

In this case, the first assignment to t will have undefined behavior.
There are implementations where it will crash, there are others where it
will be set t to the same value as y, just very slowly, but it is
undefined behavior.

Only on implementations where alignment requirement for a type
is not met.

No! It is undefined behaviour on any implementation.
The fact that it works and works the way you expect does
not make it defined behaviour

Ok, since I don't have C standard, you owe me a quote.


This is a basic thing for implementing memory allocators.
memcpy works in all cases because it is defined that char is
aligned on any address.
If that wouldn't be the case then no memory allocator can't be written
in C or C++ without causing undefined behavior.

Assuming you did not intend the double negative, wrong.

It is not clear if you mean

- a memory allocator cannot be written in C
(i.e. a C function, say my_malloc, cannot be written)

this case.


However in either case you are incorrect

(as an extreme case consider a memory allocator that
allocates a block of 1 megabyte of memory, suitably
alligned for anything no matter how much memory is
asked for. Ruinously inefficient, but it certainly
can be done.)

I think that you don't undestand the point.


What point? Clealy my_malloc can be written
in C (or C++) just by having my_malloc call
malloc [note: the fact that this is not
sensible is not relevant to an existence proof].
This has nothing to do with the fact that
char addresses have no alignment restrictions.

Point is that if you return such pointer to application
every attempt to dereference or store anything within
that memory region would produce undefined behavior
by your definition.

Not if the attempt to store and deference were made to
the start of the region.
unions are invented just to provide right alignment
for every member listed in common memory block.


Oh, alignment restrictions would have not prevent memory
allocator to be written, it would just make lot of legal
C code into undefined behavior.

Such as?

Not that it is not trivial, it is impossible
in that case.

I suppose you think it is impossible,
given two four byte ints, to copy the first byte
from the first int to the third byte of the
second int without reading anything smaller than an int.
You need to

This is not the case and is not conformant at least with
the C++ standard.
Such implementation would simply have CHAR_BIT == 32
in limit.h for example, on implementation that can address
32 bit unit minimaly, because sizeof(char) *must*
always be equal to 1, not 0.25 or 4 or such.
This is how it is defined in C++. Perhaps
someone with C standard can confirm or rebute
this assertion?

sizof(char) *must* always be equal to 1, but
the underlying hardware does not have to addresss
something as small as CHAR_BIT. However, the
abstact machine must address something as small
as CHAR_BIT, so the implementation may have to
perform some tricks.

-William Hughes
 
M

Mark McIntyre

So basically you think that there is implementation that will run
time check every value

What keith thinks is immaterial. The point is, its allowed by the
standard so an implementation is perfectly at liberty to do it.
before using mechanisms that produce
exceptions otherwise apply string copy semantics for memcpy?

This bit seems quite disconnected with either Keith's comment, or hte
preceding text including any of the snipped bits. Seems to me that
someone has inserted it to muddy the waters and/or deflect criticism
from themselves..
 
K

Keith Thompson

William Hughes said:
peter said:
William Hughes skrev:
peter koch wrote:
Christian Bau skrev: [snip]
Consider this:

double x, y;
memcpy (&x, &y, sizeof (x) - 1);
memcpy (&x, &y, sizeof (x));

[snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes
undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)

The undefined behaviour comes from reading from the uninitialized
variable y.

This is silly. Under this interpretation the code snippet

a = b;

invokes undefined behaviour. There is a limit to pedantry
even in comp.lang.c.

The following code:

double x, y;
x = y;

*does* invoke undefined behavior by reading an uninitialized variable.
An implementation that implicitly initializes doubles to some trap
value could even trap on the assignment.
[An interesting question. Does reading an uninitialized variable
as a sequence of unsigned char invoke undefined behaviour]

No.

I think a lot of the confusion in this thread started from an
assumption that the value of x would be read as a double; after all,
if you're not going to use it as a double, there's no point in
declaring it as one. But then, there's no real point in using
memcpy() like this, so any assumptions of reasonableness aren't going
to get you very far.

There is no undefined behavior in the code quote above.
 
B

Branimir Maksimovic

Mark said:
What keith thinks is immaterial. The point is, its allowed by the
standard so an implementation is perfectly at liberty to do it.


This bit seems quite disconnected with either Keith's comment, or hte
preceding text including any of the snipped bits. Seems to me that
someone has inserted it to muddy the waters and/or deflect criticism
from themselves..

If this is legal thing or not does not have anything to do with
original context, speeking about muddy waters.
I just tried to keep up the context in which
this thread started.
This is important in regard of what makes original poster conclude
that memcpy(&x,&y,sizeof(x)) should be always faster or equal then
memcpy(&x,&y,sizeof(x)-1).
Clearly memcpy parameter is not about number of copy operations, and
in this respect his conclusion is invalid, because if
memcpy(&x,&y,sizeof(x)-1) does not produce trap value,
implementation can use load register, store register,
which can be faster then memcpy (&x,&y,sizeof(x)), if that one
ends with trap represantation value.
On systems where there are no hardware exceptions,
implementation can use load register store register for both.

Greetings, Bane.
 
W

William Hughes

Keith said:
The following code:

double x, y;
x = y;

*does* invoke undefined behavior by reading an uninitialized variable.
An implementation that implicitly initializes doubles to some trap
value could even trap on the assignment.


Ok, I was wrong. There is no limit to pedantry in comp.lang.c

I think that when dealing with code snippets (as opposed to full
programs) certain assumptions can be made, e.g. variables are
initialized if necessary, prototypes are in scope etc.
Note, that if the prototype for memcpy is not in scope then
we do have undefined behaviour as there is no guarantee that
the address of a double can be used for an unsigned char without
conversion.

- William Hughes
 
B

Branimir Maksimovic

Branimir said:
This is important in regard of what makes original poster conclude
that memcpy(&x,&y,sizeof(x)) should be always faster or equal then

"should be always faster or equal on particular implementation"
 
B

Branimir Maksimovic

Branimir said:
"should be always faster or equal on particular implementation"
to be clear this isn't quote rather to specify what I have meant,
because original text can be interpreted in several meanings.
 
K

Keith Thompson

Branimir Maksimovic said:
Keith Thompson wrote: [...]
The implementation can use floating point registers in the
implementation of memcpy() if it can guarantee that doing so meets the
standard's requirements for memcpy(). Hardware exceptions aren't the
only consideration, as Christian Bau very clearly explained (see
above).

For "floating point registers", you can substitute any conceivable
implementation detail, including carrier pigeons carrying clay
tablets. It just has to work.


Unrealistic, but perfectly legal as far as the standard is concerned.

So basically you think that there is implementation that will run
time check every value before using mechanisms that produce
exceptions otherwise apply string copy semantics for memcpy?

No, I never said anything about the existence of any particular
implementation. I merely said that such an implementation would be
legal as far as the standard is concerned.
This is completely opposite of what orginial message says,
that memcpy can have x=y semantics which is clearly not the case.

I don't know who said that, or exactly what the wording was, but your
statement of it is incorrect. An assignment can invoke undefined
behavior if the source expression is uninitialized or has a trap
representation. memcpy() just copies the raw bytes of the
representation.

memcpy() can legally be implemented using some kind of assignment *if*
the implementation can guarantee that it behaves the same way as
copying the byte. One way to do this would be to check for potential
traps or bit-altering conversions before each assignment, falling back
to a byte-by-byte copy if necessary, but that would probably be slower
than just doing the byte-by-byte copy in the first place.

The standard doesn't require efficient implementations, only correct
ones.
 
W

William Hughes

Branimir said:
If this is legal thing or not does not have anything to do with
original context, speeking about muddy waters.
I just tried to keep up the context in which
this thread started.

You failed miserably.
This is important in regard of what makes original poster conclude
that memcpy(&x,&y,sizeof(x)) should be always faster or equal then
memcpy(&x,&y,sizeof(x)-1).

Christian Bau (who was not the original poster) did not conclude this.
His point was that memcpy(&x,&y,sizeof(x)) *could* be faster
than memcpy(&x,&y,sizeof(x)-1).
Clearly memcpy parameter is not about number of copy operations, and
in this respect his conclusion is invalid, because if
memcpy(&x,&y,sizeof(x)-1) does not produce trap value,
implementation can use load register, store register,

Nope, you have to preseve the last byte. You will need more than
load register, store register. The point is that
it *may be* possible to do memcpy (&x,&y,sizeof(x)) by
using load register, store register. In this case
memcpy (&x,&y,sizeof(x)) may be faster than memcpy(&x,&y,sizeof(x)-1).

Stop digging.

-William Hughes
 
C

Christian Bau

"Branimir Maksimovic said:
Thank you for proving my point. memcpy can't have x=y semantics
in any way. It can only have same final effect, but paths are
different as x=y is allowed to produce hardware exception
but memcpy(&x,&y,sizeof(x)); is not

And if the compiler can guarantee that x = y will have no hardware
exceptions, then it can replace the memcpy with the assignment.
 
C

Christian Bau

"Branimir Maksimovic said:
So basically what you are saying is that if particular hardware
does not cause hardware exceptions then implementation can use
floating point registers?
In such case both memcpy's can use registers without problem.
Case that implemementation checks every
size bytes for trap value and use some other means otherwise
to copy is completely unrealistic.

In my example, the first memcpy copied _one byte less_ than a complete
double. Therefore it could not be replaced by an assignment: The last
byte would have been overwritten, and memcpy is not allowed to copy
eight bytes when you tell it to copy only seven.

Implementations where a floating point load and store would copy any
possible bit pattern without problems are quite common.
 
M

Michael Mair

Branimir said:
If this is legal thing or not does not have anything to do with
original context, speeking about muddy waters.
I just tried to keep up the context in which
this thread started.
This is important in regard of what makes original poster conclude
that memcpy(&x,&y,sizeof(x)) should be always faster or equal then
memcpy(&x,&y,sizeof(x)-1).
Clearly memcpy parameter is not about number of copy operations, and
in this respect his conclusion is invalid, because if
memcpy(&x,&y,sizeof(x)-1) does not produce trap value,
implementation can use load register, store register,
which can be faster then memcpy (&x,&y,sizeof(x)), if that one
ends with trap represantation value.
On systems where there are no hardware exceptions,
implementation can use load register store register for both.

You are obviously and, in my opinion, not in the least innocently,
misinterpreting the original example and its intent and the facts
stated by C. Bau and seem impervious to the fact that you have to
adjust your reasoning continuously to make up for your plain
"Nonsense" reply which was clearly wrong.

-Michael
 
M

Michael Mair

Christian said:
And if the compiler can guarantee that x = y will have no hardware
exceptions, then it can replace the memcpy with the assignment.

Honestly, if he has not understood it by now, then I suspect
he just does not want to understand it so he has not to admit
he was wrong.

Cheers
Michael
 
C

Christian Bau

"Branimir Maksimovic said:
I just read what was written. If implemntation is allowed to apply
x=y semantics to memcpy then it is normal thing to allow exceptions
to happen.

An implementation is allowed to replace memcpy with an assignment if _on
this implementation_ all bytes will be copied without any modification.
Another implementation, where this is not the case, is _not_ allowed to
replace memcpy with an assignment.
 
B

Branimir Maksimovic

William said:
You failed miserably.


Christian Bau (who was not the original poster) did not conclude this.
His point was that memcpy(&x,&y,sizeof(x)) *could* be faster
than memcpy(&x,&y,sizeof(x)-1).


Nope, you have to preseve the last byte.

What byte?
Implementation can do anything that does not change observable
behavior. Since you have stored something that is not
valid double object, last byte would not change observable behavior,
because using x as a double further in program
produces undefined behavior anyway.

you have to do
memcpy(&x,&y,sizeof(x)-1);
memcpy(((char*)(void*)&x)+sizeof(x)-1,((char*)(void*)&y)+sizeof(x)-1,1);
in order to preserve last byte and use x as a valid double.

So any conclusions about speed in this case are quite nonsense.

The point is that
it *may be* possible to do memcpy (&x,&y,sizeof(x)) by
using load register, store register. In this case
memcpy (&x,&y,sizeof(x)) may be faster than memcpy(&x,&y,sizeof(x)-1).

Of course that it is always possible that one variant can
be faster then the other.
Stop digging.
I've just started all this with rough comment and didn't have any
intention
for digging in the first place.
So I'll just shut up.

Greetings, Bane.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top