min/max by value - 25% faster

Eirik · Feb 22, 2012

I see a ~25% increase in performance (VC2010 SP1) when adding an
overload that takes integers by value instead of const references.
Why are these not implemented as a part of STL for int, double, etc..?

#define MIN(a, b) a<b?a:b
#define MAX(a, b) a>b?a:b

namespace std
{
inline const int min(int const a, int const b)
{
return MIN(a, b);
}
inline const int max(int const a, int const b)
{
return MAX(a, b);
}
}

caller:
{
std::vector<int> v(2);
v[0] = get_some_random_number();
v[1] = get_some_random_number();
int m = std::min(v[0], v[1]); // ~25% faster when the overload above
is chosen
}

/Eirik

Mike McCarty · Feb 22, 2012

~25% faster than what?

0.3ns vs. 0.4ns?

I.e. how did you measure this and how can you be sure what you're seeing isn't just noise?

gwowen · Feb 22, 2012

~25% faster than what?

0.3ns vs. 0.4ns?

I.e. how did you measure this and how can you be sure what you're seeing isn't just noise?

Quite. And with what compiler options? Is the function even being
inlined, let alone having the reference/dereference elided?

Eirik · Feb 22, 2012

~25% faster than what?

0.3ns vs. 0.4ns?

I.e. how did you measure this and how can you be sure what you're seeing isn't just noise?

~25% faster than just using std::min(int const&, int const&) as
specified by the standard.
How many nano seconds faster is not really relevant. But it ran for
about 20 seconds with about 420.000 iterations (see test iterations
below) per second.

test setup:
int some_dummy_test_value = 0;
size_t vector_size = 1000;
vector<int> x = fill_vector_with_random_numbers(vector_size);
vector<int> y = fill_vector_with_random_numbers(vector_size);

test iteration:
for(size_t i(0); i != vector_size; ++i)
{
some_dummy_test_value += std::min(x, y);
some_dummy_test_value += std::max(x, y);
}

Mike McCarty · Feb 22, 2012

420K iterations/second? That's ~2us per iteration == very slow (as in unbelievably so). It should take on order of nanoseconds, not microseconds.

In any case, it is possible that the optimizer realized that min(x,y)+max(x,y) == x+y and elided the conditional branches entirely. It may not have been able to make that elision via references. You might want to try making your benchmark less trivial.

Joshua Maurice · Feb 22, 2012

I see a ~25% increase in performance (VC2010 SP1) when adding an
overload that takes integers by value instead of const references.
Why are these not implemented as a part of STL for int, double, etc..?

#define MIN(a, b) a<b?a:b
#define MAX(a, b) a>b?a:b

namespace std
{
inline const int min(int const a, int const b)
{
return MIN(a, b);
}
inline const int max(int const a, int const b)
{
return MAX(a, b);
}

}

caller:
{
std::vector<int> v(2);
v[0] = get_some_random_number();
v[1] = get_some_random_number();
int m = std::min(v[0], v[1]); // ~25% faster when the overload above
is chosen

}

As others have asked, what compiler, what platform, what compiler
options - specifically is optimization enabled?

Jorgen Grahn · Feb 22, 2012

I see a ~25% increase in performance (VC2010 SP1) when adding an
overload that takes integers by value instead of const references.
Why are these not implemented as a part of STL for int, double, etc..?
....
namespace std
{
inline const int min(int const a, int const b)
{
return MIN(a, b);
}

With my compiler (g++ -O3 for amd64) I get the same assembly code when
using the overload -- which is exactly what I'd expect for something
this easy to optimize.

int foo(int a, int b)
{
return std::min(a, b);
}

_Z3fooii:
cmpl %edi, %esi
movl %edi, %eax
cmovle %esi, %eax
ret

/Jorgen

BGB · Feb 22, 2012

420K iterations/second? That's ~2us per iteration == very slow (as in unbelievably so). It should take on order of nanoseconds, not microseconds.

yes. 2us is about 2x as long as it takes Windows to do a lock/unlock
mutex pair on my system (via the "CreateMutex()" /
"WaitForSingleObject()" route), which is itself absurdly slow (an
"EnterCriticalSection()" / "LeaveCriticalSection()" pair is 15ns IIRC,
or about 67x faster).

if so, yes, that is pretty damn slow.
unless of course someone is using an interpreter, HW emulator, or
dragged some old-as-hell PC out of the closet ("how fast does it go on
my early generation 386?").

far more likely, there is some sort of measurement issue, such as giving
the wrong number of iterations or something...

In any case, it is possible that the optimizer realized that min(x,y)+max(x,y) == x+y and elided the conditional branches entirely. It may not have been able to make that elision via references. You might want to try making your benchmark less trivial.

it depends.

it the compiler can figure out that the references aren't needed, it can
optimize things away.

otherwise, it could be wasting a little bit of time getting/setting via
references (which involve an additional indirection internally).

granted, in general, I would advise against putting too much weight into
benchmarks of this sort without good reason though (very rarely are
these the sorts of issues which impact the overall performance of a
program).

generally, it is a good idea not to try to micro-optimize stuff unless
it is really needed (not that one can't be aware of this stuff, or
believe that it is all unknowable magic or something, but rather it is
more that trying to optimize too much early on tends to result in
overall worse code, and often not in areas which are actually
particularly relevant to the overall running time).

this is also part of why there are profilers: they can help point out
where optimization effort would be most useful.

or such...

Eirik · Feb 23, 2012

With my compiler (g++ -O3 for amd64) I get the same assembly code when
using the overload -- which is exactly what I'd expect for something
this easy to optimize.

int foo(int a, int b)
{
return std::min(a, b);

}

_Z3fooii:
cmpl %edi, %esi
movl %edi, %eax
cmovle %esi, %eax
ret

/Jorgen

That example will of course result int the same, as foo takes
arguments by value...

Miles Bader · Feb 23, 2012

Eirik said:
That example will of course result int the same, as foo takes
arguments by value...

Er, isn't that the whole point -- that std::min yields _exactly the
same code_ as your hand-rolled not-using-references MIN thingy
(meaning there's no reason to not use std::min)?

-miles

Jorgen Grahn · Feb 23, 2012

That example will of course result int the same, as foo takes
arguments by value...

It was the simplest case I could come up with quickly without having
it all optimized away.

Then please come up with a better example. "int foo(const int& a,
const int& b)" perhaps? The burden of proof must ultimately be on you.

/Jorgen

std::min/max vs own functions	10	Apr 11, 2011
Hi! can someone help me with converting observablehq's viewof operator in javascript.	1	Apr 3, 2021
Trying to get the average value of the elements, please help ! JavaScript	3	Dec 13, 2022
How to stop an animated image from sticking to top of page?	2	Feb 17, 2022
Programming math challenge gives wrong answer	2	Aug 6, 2023
Operator () overloading, cannot return by value.	8	Sep 8, 2008
Char array as a function returned value	2	Nov 23, 2019
Is there in memory tree structure faster than std::map?	11	Aug 12, 2012

min/max by value - 25% faster

Eirik

Mike McCarty

gwowen

Eirik

Mike McCarty

Joshua Maurice

Jorgen Grahn

BGB

Eirik

Miles Bader

Jorgen Grahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads