why prefix increment is faster than postfix increment?

Branimir Maksimovic · Oct 25, 2005

Christian said:
memcpy has some defined meaning, defined by the C Standard (and the C++
Standard uses the same definition). The implementation is free to do
whatever it likes, as long as it guarantees that the results will be the
same as required.

If I have variables

double x, y;

and a call

memcpy (&x, &y, sizeof (x));

then _in this particular case_ the effect of the memcpy case happens to
be exactly the same as the effect of

(void) (x = y)

(not on every possible implementation, but in many implementations. The
implementation would have to know for example that assigning NaN's or
negative zeroes or denormalised numbers etc. doesn't change the bit
pattern, and doesn't cause any side effects like hardware exceptions).

So if the implementation knows all that, then in this particular case it
can use floating point registers for copying these bytes instead of
calling memcpy.

So basically what you are saying is that if particular hardware
does not cause hardware exceptions then implementation can use
floating point registers?
In such case both memcpy's can use registers without problem.
Case that implemementation checks every
size bytes for trap value and use some other means otherwise
to copy is completely unrealistic.

Greetings, Bane.

Keith Thompson · Oct 26, 2005

Branimir Maksimovic said:
So basically what you are saying is that if particular hardware
does not cause hardware exceptions then implementation can use
floating point registers?

The implementation can use floating point registers in the
implementation of memcpy() if it can guarantee that doing so meets the
standard's requirements for memcpy(). Hardware exceptions aren't the
only consideration, as Christian Bau very clearly explained (see
above).

For "floating point registers", you can substitute any conceivable
implementation detail, including carrier pigeons carrying clay
tablets. It just has to work.

In such case both memcpy's can use registers without problem.
Case that implemementation checks every
size bytes for trap value and use some other means otherwise
to copy is completely unrealistic.

Unrealistic, but perfectly legal as far as the standard is concerned.

William Hughes · Oct 26, 2005

Branimir said:
William said:

Branimir said:

William Hughes wrote:
peter koch wrote:
Christian Bau skrev:

[snip]

Not necessarily. If one can show that Y performs every operation that X
performs, and then has to perform additional operations outside of that
set and that require a measurable amount of time to complete, then one
would have successfully proven that X is faster than Y.

One would have proven no such thing.

Consider this:

double x, y;
memcpy (&x, &y, sizeof (x) - 1);
memcpy (&x, &y, sizeof (x));

[snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes
undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)

and is just utterly contrived and useless.

Contrived yes. Most simple examples of complex behaviour
are contrived. Useless no. Indeed this code is not meant
to be used but the *example* of an "bigger operaton"
(copying x bytes rather than x-1 bytes) that might reasonably
be expected to execute faster is useful indeed.

A related question. Is it ever better to use an int
variable, even when a char is big enough?

[For a less useful example consider a perverse
implementation (e.g. the DS2K) which introduces a
delay of say 20 minutes, seemingly at random. If the "smaller"
operation incurs the delay, but the "bigger" does not, then
the larger operation will be faster. While this
is correct, such an implementation cannot be considered
reasonable.]

You
also have my sympathy when you call a poster who suggests using
assignment to assign for a "complete bullshitter".

The poster claimed undefined behaviour, then when challenged
claimed ignorance (and gave a stupid exuse for this
ignorance). The term "complete bullshitter" seems an accurate
description.

No I didn't claim undefined behavior.
I claimed that first case would probably produce hardware exception

Click to expand...

And this differs from undefined behaviour how?
(are you claiming implementation defined behaviour?)

Click to expand...

If implementation is allowed to use floating point registers
for memcpy, then yes implementation defined behavior.

No. Check the standard. memcpy has to work! An
implementation can use floating point registers
for memcpy only if they do not cause problems.

In case that implementation use FPU registers for
memcpy of floating point variables that would be
normal.

No. Check the standard. memcpy has to work!

It is irrelevant how many bytes are copied.

Question is: Are such implementations conformant?

Yes, this is the important question. Pity you did not
answer it earlier.

eg:
double x,double y; // produces FPU exception if x,y gets trap value?
memcpy(&x,&y,sizeof(x)); // produces exception if FPU registers are
used
// and y has trap representation value
// which is non conformant as I understand memcpy semantics

So as memcpy is probaby conformant, the statement that
memcpy(&x,&y,sizeof(x)-1) will probably lead to a hardware trap is
wrong.

Conclusion: if FPU registers are allowed to be used
for memcpy then it is normal to allow hardware exceptions
during memcpy.

Yes and if my Grandmother had wheels she would be a bus. If
FPU registers are going to cause problems then they cannot
be used during memcpy.

Compiler wouldn't care if memcpy produce exception or not
in that case.

A conforming compiler cannot produce code that produces an
exception in this case.

- William Hughes

William Hughes · Oct 26, 2005

Branimir said:
Christian said:

This is just wrong example, but if we observe this:
double x[2];y=0.;
memcpy((char*)x+1,&y,sizeof(y));
double t = *(double*)((char*)x+1); /* depends on
hardware tolerance to alignment */

Click to expand...

I would recommend to write ((char *) x) + 1 instead of (char *) x + 1,
so that (1) everyone knows what the expression means without having to
look up the precedence of cast operators, and (2) everyone knows that
what you wrote is what you meant.

memcpy(x,&y,sizeof(y));
t = *x;

It is obvious that second case will be always faster or at least
equal then first case, even if memcpy have to copy same number of bytes
and use ram instead or sizeof (x) ==1 .

Click to expand...

In this case, the first assignment to t will have undefined behavior.
There are implementations where it will crash, there are others where it
will be set t to the same value as y, just very slowly, but it is
undefined behavior.

Click to expand...

Only on implementations where alignment requirement for a type
is not met.

No! It is undefined behaviour on any implementation.
The fact that it works and works the way you expect does
not make it defined behaviour

This is a basic thing for implementing memory allocators.
memcpy works in all cases because it is defined that char is
aligned on any address.
If that wouldn't be the case then no memory allocator can't be written
in C or C++ without causing undefined behavior.

Assuming you did not intend the double negative, wrong.

It is not clear if you mean

- a memory allocator cannot be written for C
(i.e. malloc cannot be written)

- a memory allocator cannot be written in C
(i.e. a C function, say my_malloc, cannot be written)

However in either case you are incorrect

(as an extreme case consider a memory allocator that
allocates a block of 1 megabyte of memory, suitably
alligned for anything no matter how much memory is
asked for. Ruinously inefficient, but it certainly
can be done.)

- William Hughes

William Hughes · Oct 26, 2005

Branimir said:
In case that sizeof(x) == 1 , I agree.

Well, I don't need to, cause I don't use memcpy to assign variables.

Someone who is not aware of the sematics of memcpy but
makes pronouncements about its behaviour is properly called
a bullshitter

- William Hughes.

Jordan Abel · Oct 26, 2005

Jordan Abel said:
Jordan Abel said:

peter koch wrote:
Christian Bau skrev:

One would have proven no such thing.

Consider this:

double x, y;
memcpy (&x, &y, sizeof (x) - 1);
memcpy (&x, &y, sizeof (x));

[snip explanation that second memcpy might be faster]

Just a marvellous example you gave us - code that in C++ (and
C) causes undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)

Click to expand...

for example you could end up with a trap representation in x.
say, a signalling nan of some kind. and in any case you're not
guaranteed anything useful about the value you might get

Click to expand...

Even if the memcpy() stores a trap representation in x, there's no
undefined behavior until you try to read x as a double. The
quoted code doesn't do that.

BTW, please keep your text down to about 72 columns so it doesn't
overflow an 80-column screen when quoted. My newsreader lets me
reformat quoted text easily, but others might not.

Sorry about that - and, right - if you want to get pedantic about it
there's no undefined behavior invoked _here_ [except possibly for
reading from an uninitialized variable] - and indeed none at all if
you follow the memcpy with ((unsigned char *)x)[(sizeof x)-1] =
((unsigned char *)y)[(sizeof x)-1] or otherwise finish the job... I
just assumed you wouldn't have a double unless you intended to use
it as such.

Keith Thompson · Oct 26, 2005

Jordan Abel said:
Sorry about that - and, right - if you want to get pedantic about it
there's no undefined behavior invoked _here_ [except possibly for
reading from an uninitialized variable] - and indeed none at all if
you follow the memcpy with ((unsigned char *)x)[(sizeof x)-1] =
((unsigned char *)y)[(sizeof x)-1] or otherwise finish the job... I
just assumed you wouldn't have a double unless you intended to use
it as such.

Any time the term "undefined behavior" is used in a discussion, you
can assume that pedantry is appropriate.

Michael Mair · Oct 26, 2005

Branimir said:
Thank you for proving my point. memcpy can't have x=y semantics
in any way. It can only have same final effect, but paths are
different as x=y is allowed to produce hardware exception
but memcpy(&x,&y,sizeof(x)); is not

"Curiouser and curiouser."
If we really agreed from the start, why did you originally claim
-as can still be seen above- that the first one would generate a
hardware exception whereas the second one would probably not?
Now, you are making the second case the potentially dangerous one
if replaced by "x = y;". Note that, if y is properly initialized,
we have no trap representation, so the replacement is valid.
There are clear rules for potentially arriving at a trap
representation, so Christian Bau's original statement, maybe
modified by an initializer for y (for clarity), still stands.

Cheers
Michael

William Hughes · Oct 26, 2005

Greg said:
Old said:

Not true, unless the additional operations are independent of the
X operations.

For example, if you apply the same logic to a file system, then
appending data to a file should increase the amount of space
required to store a file. But for many filesystems that is not
true.

Similar possibilities apply to the CPU case. Maybe the extra
operation fits within some timing interval that had to happen
anyway. Maybe the extra instruction means the whole operation
can be done with different assembly instructions that work out
faster. Maybe the CPU's pipelining is better in one case than
the other. Etc.

Click to expand...

If the additional operations follow the ones in common, than it would
be difficult to see how executing those instructions would be able to
speed up the previous set of instructions that have already executed.

But even if the additional instructions came before or were
interspersed with the ones in common, the only way that the additional
instructions would not add time to the procedure would be if the
program could execute two instructions in less time than it could
execute one of those instructions. [Note that the one instruction must
also be one of the two executed in the comparison]

On a macro scale, because similar operations can be composed of
different sub-operations, adding an operation may make an existing one
faster. But as the granularity of the operations becomes finer, at a
certain point every operation is independent of another and each
executes in constant time.

This still does not make the case. You are ignoring the possiblitiy
of optimizations that can be done only if a sufficient number
of operations are done. Consider the very common case
where storage can only be done in chunks of N bytes.

To copy n < N bytes you

-read N bytes from the storage device to a buffer
-copy n bytes to the buffer
-write N bytes from the buffer to the storage device

To copy N bytes you

-copy N bytes to the buffer
-write N bytes from the buffer to the storage device

In this case copying each byte may be an independent operation,
so copying N bytes to the buffer takes N times as long as
copying one byte, but if the reads and writes to storage
dominate then copying N bytes to storage can be faster than
copying 1 byte.

In theory there is no way to be certain that a smaller
number of operations will execute faster than a larger
number of operation without knowlege of the implementaton,
because a perverse implementation could chose to execute
the smaller number of operations more slowly. However, even
ignoring perverse implementations, you still cannot
be certain that a smaller number of operations will
execute faster than a larger number, even if the operations
are independent.

- William Hughes

Annajiat · Oct 26, 2005

For built in data types,
Generally, ++i is faster than i++ as it takes atleast 1 cpu cycle less
than i++.

For simplicity of understanding, assume:
example 1:
i=10;
cout<<i++;
output: is 10.

Computer had to
1) read i
2) keep the current value
3) use it
4) store the value of i+1;
=4 steps

If it were
i=10;
cout<<++i;
output: is 11.
Computer had to
1) read i
2) store the value of i+1
3) use it
=3 steps (<4)

However, this was presented this way for simplicity of understanding.
The computer may do similar things in different order.

What is the source of my information?
1) I learnt them in my class.
2) Programs that I coded for judging in UVA OJ (http://acm.uva.es/p)
using ++i got higher rank depending cpu time (measured upto .001) than
the same code using i++.

However, please note the I am not an expert, just a student.
--
Thanking you
à¦†à¦¨à§à¦¨à¦¾à¦¿à¦œà§Ÿà¦¾à¦¤ à¦†à¦²à§€à¦® à¦°à¦¾à§‡à¦¸à¦²
Annajiat Alim Rasel
Secretary
BUCC
BRAC University Computer Club
BUCC: http://groups-beta.google.com/group/bucc
BUCC Programming Contest Wing:
http://groups.yahoo.com/group/buacm
http://groups-beta.google.com/group/buacm

peter koch · Oct 26, 2005

William Hughes skrev:

peter said:
peter said:

Christian Bau skrev:

[snip]

Not necessarily. If one can show that Y performs every operation that X
performs, and then has to perform additional operations outside of that
set and that require a measurable amount of time to complete, then one
would have successfully proven that X is faster than Y.

One would have proven no such thing.

Consider this:

double x, y;
memcpy (&x, &y, sizeof (x) - 1);
memcpy (&x, &y, sizeof (x));

Click to expand...

[snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes
undefined behaviour

Click to expand...

What is the undefined behaviour (assume sizeof (x) >1)

The undefined behaviour comes from reading from the uninitialized
variable y.

Contrived yes. Most simple examples of complex behaviour
are contrived. Useless no. Indeed this code is not meant
to be used but the *example* of an "bigger operaton"
(copying x bytes rather than x-1 bytes) that might reasonably
be expected to execute faster is useful indeed.

And as an example of that it is useless. It just demonstrates that the
two operations involved are different (as one operation is a byte-wise
copy and one is an aligned copy). Also the example puts an extra burden
on the compiler since it more or less requires it to know about the
semantics of memcpy. This might very well be the case, of course, but
it nevertheless makes the example "special".

/Peter

A related question. Is it ever better to use an int
variable, even when a char is big enough?

This all depends on the platform, of course. On e.g. the new 64-bit
CPUs it would normally be faster to use a long if you use the Microsoft
compiler. Sad but true.

[For a less useful example consider a perverse
implementation (e.g. the DS2K) which introduces a
delay of say 20 minutes, seemingly at random. If the "smaller"
operation incurs the delay, but the "bigger" does not, then
the larger operation will be faster. While this
is correct, such an implementation cannot be considered
reasonable.]

That is an even worse example of "less is more".

The poster claimed undefined behaviour, then when challenged
claimed ignorance (and gave a stupid exuse for this
ignorance). The term "complete bullshitter" seems an accurate
description.

Reading from uninitialized variables is undefined behaviour in C++ and
almost certainly also in C. And calling someoone a "bullshitter" is
rude and unappropriate in a forum like this.

- William Hughes

/Peter

Villy Kruse · Oct 26, 2005

For built in data types,
Generally, ++i is faster than i++ as it takes atleast 1 cpu cycle less
than i++.

For simplicity of understanding, assume:
example 1:
i=10;
cout<<i++;
output: is 10.

Computer had to
1) read i
2) keep the current value
3) use it
4) store the value of i+1;
=4 steps

If it were
i=10;
cout<<++i;
output: is 11.
Computer had to
1) read i
2) store the value of i+1
3) use it
=3 steps (<4)

Counter example:

x = i++;

get i
store x
add 1
store i

x = ++i;

get i
add 1
store x
store i

In both cases 4 simple instructions.

Do try to see what a real optimizing compiler would do.

Villy

XiongBin · Oct 26, 2005

<<more effective c++>> will tell you detail!

William Hughes · Oct 26, 2005

peter said:
William Hughes skrev:

peter said:

Christian Bau skrev:

[snip]

Not necessarily. If one can show that Y performs every operation that X
performs, and then has to perform additional operations outside of that
set and that require a measurable amount of time to complete, then one
would have successfully proven that X is faster than Y.

One would have proven no such thing.

Consider this:

double x, y;
memcpy (&x, &y, sizeof (x) - 1);
memcpy (&x, &y, sizeof (x));

[snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes
undefined behaviour

Click to expand...

What is the undefined behaviour (assume sizeof (x) >1)

Click to expand...

The undefined behaviour comes from reading from the uninitialized
variable y.

This is silly. Under this interpretation the code snippet

a = b;

invokes undefined behaviour. There is a limit to pedantry
even in comp.lang.c.

[An interesting question. Does reading an uninitialized variable
as a sequence of unsigned char invoke undefined behaviour]

And as an example of that it is useless. It just demonstrates that the
two operations involved are different (as one operation is a byte-wise
copy and one is an aligned copy).

Since this was meant to show how the execution speed of an
operation defined in abstract terms can depend on the implementation,
and
alignment issues are certainly implementation dependent.
your remark is flat out wrong. You did check some of the
context before replying, didn't you?

Also the example puts an extra burden
on the compiler since it more or less requires it to know about the
semantics of memcpy. This might very well be the case, of course, but
it nevertheless makes the example "special".

Since the whole point is that the implementation may be "special"
and the compiler may be able to do intersting tricks,
(and it is anything but uncommon for a compiler to know about
the semantics of library functions) this
comment is ridiculous. You did check some of the
context before replying, didn't you?

/Peter

This all depends on the platform, of course.

You mean that doing something with a smaller object can be slower
than doing the same thing with a larger object depending on the
platform.
You did check some of the context before replying, didn't you?

On e.g. the new 64-bit
CPUs it would normally be faster to use a long if you use the Microsoft
compiler. Sad but true.

[For a less useful example consider a perverse
implementation (e.g. the DS2K) which introduces a
delay of say 20 minutes, seemingly at random. If the "smaller"
operation incurs the delay, but the "bigger" does not, then
the larger operation will be faster. While this
is correct, such an implementation cannot be considered
reasonable.]

Click to expand...

That is an even worse example of "less is more".

Which is why it was prefaced with "For a less useful example"
and closed with "such an implementation cannot be considered
reasonable". (You did read these parts, didn't you?).

Reading from uninitialized variables is undefined behaviour in C++ and
almost certainly also in C.

Funny, when repeatedly challenged the poster did produce a rational
for the undefined behaviour (more exactly the claimed probability
of hardware traps). This did not involve reading from uninitialized
variables.

And calling someoone a "bullshitter" is
rude and unappropriate in a forum like this.

-William Hughes

XiongBin · Oct 26, 2005

<<more effective c++>> will tell you details!

Richard Bos · Oct 26, 2005

William Hughes said:
peter said:

The undefined behaviour comes from reading from the uninitialized
variable y.

Click to expand...

This is silly. Under this interpretation the code snippet

a = b;

invokes undefined behaviour. There is a limit to pedantry
even in comp.lang.c.

[An interesting question. Does reading an uninitialized variable
as a sequence of unsigned char invoke undefined behaviour]

No. unsigned char has no trap values.

Richard

William Hughes · Oct 26, 2005

Richard said:
William Hughes said:

peter said:

The undefined behaviour comes from reading from the uninitialized
variable y.

Click to expand...

This is silly. Under this interpretation the code snippet

a = b;

invokes undefined behaviour. There is a limit to pedantry
even in comp.lang.c.

[An interesting question. Does reading an uninitialized variable
as a sequence of unsigned char invoke undefined behaviour]

Click to expand...

No. unsigned char has no trap values.

Understood, but is this enough?

Consider the following implementation. A float variable
is declared but not initialized. The implementation assigns
an address that is only valid if the operating system does
some work. However, the operating system is not instructed
to do this work until just before the variable is initialized.
Thus, until the variable is initialized, the address of its
first byte is invalid. An attempt to use that address may
cause a hardware trap (e.g. the address resolves to a page
not owned by the process and a segfault occurs).
Is such an implementation conforming? (An earlier discussion
about examining free'd pointers suggests
that such an implementation is not conforming, but the
situations are not identical.)

-William Hughes

Mark McIntyre · Oct 26, 2005

Michael Mair wrote:

Thank you for proving my point. memcpy can't have x=y semantics
in any way.

You miss the point entirely. As long as memcpy() fulfills the
requirements of the Standard, it can have any internal semantics it
likes, including carrier pigeons, moving stars around like abacus
beads, or whatever.

It can only have same final effect,

Precisely. The same effect. In other words, indistinguishable from ...

Kai-Uwe Bux · Oct 26, 2005

William said:
Richard said:

William Hughes said:

peter koch wrote:
The undefined behaviour comes from reading from the uninitialized
variable y.

This is silly. Under this interpretation the code snippet

a = b;

invokes undefined behaviour. There is a limit to pedantry
even in comp.lang.c.

[An interesting question. Does reading an uninitialized variable
as a sequence of unsigned char invoke undefined behaviour]

Click to expand...

No. unsigned char has no trap values.

Click to expand...

Understood, but is this enough?

Consider the following implementation. A float variable
is declared but not initialized. The implementation assigns
an address that is only valid if the operating system does
some work. However, the operating system is not instructed
to do this work until just before the variable is initialized.
Thus, until the variable is initialized, the address of its
first byte is invalid. An attempt to use that address may
cause a hardware trap (e.g. the address resolves to a page
not owned by the process and a segfault occurs).
Is such an implementation conforming? (An earlier discussion
about examining free'd pointers suggests
that such an implementation is not conforming, but the
situations are not identical.)

Hm, according to the standard an object is a region of memory; and an object
is created by its definition. Since

double y;

is a definition, it creates the object and henceforth the object (i.e., the
region of memory) is something that the program is allowed to operate on.
So, I think, the implementation you described is non-conforming.

Best

Kai-Uwe Bux

Branimir Maksimovic · Oct 26, 2005

Michael said:
"Curiouser and curiouser."
If we really agreed from the start, why did you originally claim
-as can still be seen above- that the first one would generate a
hardware exception whereas the second one would probably not?
Now, you are making the second case the potentially dangerous one
if replaced by "x = y;". Note that, if y is properly initialized,
we have no trap representation, so the replacement is valid.
There are clear rules for potentially arriving at a trap
representation, so Christian Bau's original statement, maybe
modified by an initializer for y (for clarity), still stands.

I just read what was written. If implemntation is allowed to apply
x=y semantics to memcpy then it is normal thing to allow exceptions
to happen.
I think that this is clear at this point.

Greetings, Bane.

historical question, C unary operators	13	Mar 29, 2012
Idk need help in editing this source code	0	Nov 5, 2022
Increment	5	Apr 2, 2007
++i is faster than i++ in Java?	43	Oct 25, 2005
We've got i++, why not a "post-assignment" operator ?	15	Sep 9, 2010
Pointer math	7	Nov 11, 2008
Prefix increment/decrement results in lvalue, but postfix one results in rvalue?	8	Sep 12, 2005
Incrementing Values	1	Aug 21, 2022

why prefix increment is faster than postfix increment?

Branimir Maksimovic

Keith Thompson

William Hughes

William Hughes

William Hughes

Jordan Abel

Keith Thompson

Michael Mair

William Hughes

Annajiat

peter koch

Villy Kruse

XiongBin

William Hughes

XiongBin

Richard Bos

William Hughes

Mark McIntyre

Kai-Uwe Bux

Branimir Maksimovic

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads