min/max by value - 25% faster

E

Eirik

I see a ~25% increase in performance (VC2010 SP1) when adding an
overload that takes integers by value instead of const references.
Why are these not implemented as a part of STL for int, double, etc..?

#define MIN(a, b) a<b?a:b
#define MAX(a, b) a>b?a:b

namespace std
{
inline const int min(int const a, int const b)
{
return MIN(a, b);
}
inline const int max(int const a, int const b)
{
return MAX(a, b);
}
}

caller:
{
std::vector<int> v(2);
v[0] = get_some_random_number();
v[1] = get_some_random_number();
int m = std::min(v[0], v[1]); // ~25% faster when the overload above
is chosen
}

/Eirik
 
M

Mike McCarty

~25% faster than what?

0.3ns vs. 0.4ns?

I.e. how did you measure this and how can you be sure what you're seeing isn't just noise?
 
G

gwowen

~25% faster than what?

0.3ns vs. 0.4ns?

I.e. how did you measure this and how can you be sure what you're seeing isn't just noise?

Quite. And with what compiler options? Is the function even being
inlined, let alone having the reference/dereference elided?
 
E

Eirik

~25% faster than what?

0.3ns vs. 0.4ns?

I.e. how did you measure this and how can you be sure what you're seeing isn't just noise?

~25% faster than just using std::min(int const&, int const&) as
specified by the standard.
How many nano seconds faster is not really relevant. But it ran for
about 20 seconds with about 420.000 iterations (see test iterations
below) per second.

test setup:
int some_dummy_test_value = 0;
size_t vector_size = 1000;
vector<int> x = fill_vector_with_random_numbers(vector_size);
vector<int> y = fill_vector_with_random_numbers(vector_size);

test iteration:
for(size_t i(0); i != vector_size; ++i)
{
some_dummy_test_value += std::min(x, y);
some_dummy_test_value += std::max(x, y);
}
 
M

Mike McCarty

420K iterations/second? That's ~2us per iteration == very slow (as in unbelievably so). It should take on order of nanoseconds, not microseconds.

In any case, it is possible that the optimizer realized that min(x,y)+max(x,y) == x+y and elided the conditional branches entirely. It may not have been able to make that elision via references. You might want to try making your benchmark less trivial.
 
J

Joshua Maurice

I see a ~25% increase in performance (VC2010 SP1) when adding an
overload that takes integers by value instead of const references.
Why are these not implemented as a part of STL for int, double, etc..?

#define MIN(a, b) a<b?a:b
#define MAX(a, b) a>b?a:b

namespace std
{
        inline const int min(int const a, int const b)
        {
                return MIN(a, b);
        }
        inline const int max(int const a, int const b)
        {
                return MAX(a, b);
        }

}

caller:
{
        std::vector<int> v(2);
        v[0] = get_some_random_number();
        v[1] = get_some_random_number();
        int m = std::min(v[0], v[1]); // ~25% faster when the overload above
is chosen

}

As others have asked, what compiler, what platform, what compiler
options - specifically is optimization enabled?
 
J

Jorgen Grahn

I see a ~25% increase in performance (VC2010 SP1) when adding an
overload that takes integers by value instead of const references.
Why are these not implemented as a part of STL for int, double, etc..?
....
namespace std
{
inline const int min(int const a, int const b)
{
return MIN(a, b);
}

With my compiler (g++ -O3 for amd64) I get the same assembly code when
using the overload -- which is exactly what I'd expect for something
this easy to optimize.

int foo(int a, int b)
{
return std::min(a, b);
}

_Z3fooii:
cmpl %edi, %esi
movl %edi, %eax
cmovle %esi, %eax
ret

/Jorgen
 
B

BGB

420K iterations/second? That's ~2us per iteration == very slow (as in unbelievably so). It should take on order of nanoseconds, not microseconds.

yes. 2us is about 2x as long as it takes Windows to do a lock/unlock
mutex pair on my system (via the "CreateMutex()" /
"WaitForSingleObject()" route), which is itself absurdly slow (an
"EnterCriticalSection()" / "LeaveCriticalSection()" pair is 15ns IIRC,
or about 67x faster).

if so, yes, that is pretty damn slow.
unless of course someone is using an interpreter, HW emulator, or
dragged some old-as-hell PC out of the closet ("how fast does it go on
my early generation 386?").

far more likely, there is some sort of measurement issue, such as giving
the wrong number of iterations or something...

In any case, it is possible that the optimizer realized that min(x,y)+max(x,y) == x+y and elided the conditional branches entirely. It may not have been able to make that elision via references. You might want to try making your benchmark less trivial.

it depends.

it the compiler can figure out that the references aren't needed, it can
optimize things away.

otherwise, it could be wasting a little bit of time getting/setting via
references (which involve an additional indirection internally).


granted, in general, I would advise against putting too much weight into
benchmarks of this sort without good reason though (very rarely are
these the sorts of issues which impact the overall performance of a
program).

generally, it is a good idea not to try to micro-optimize stuff unless
it is really needed (not that one can't be aware of this stuff, or
believe that it is all unknowable magic or something, but rather it is
more that trying to optimize too much early on tends to result in
overall worse code, and often not in areas which are actually
particularly relevant to the overall running time).

this is also part of why there are profilers: they can help point out
where optimization effort would be most useful.


or such...
 
E

Eirik

With my compiler (g++ -O3 for amd64) I get the same assembly code when
using the overload -- which is exactly what I'd expect for something
this easy to optimize.

int foo(int a, int b)
{
    return std::min(a, b);

}

_Z3fooii:
        cmpl    %edi, %esi
        movl    %edi, %eax
        cmovle  %esi, %eax
        ret

/Jorgen

That example will of course result int the same, as foo takes
arguments by value...
 
M

Miles Bader

Eirik said:
That example will of course result int the same, as foo takes
arguments by value...

Er, isn't that the whole point -- that std::min yields _exactly the
same code_ as your hand-rolled not-using-references MIN thingy
(meaning there's no reason to not use std::min)?

-miles
 
J

Jorgen Grahn

That example will of course result int the same, as foo takes
arguments by value...

It was the simplest case I could come up with quickly without having
it all optimized away.

Then please come up with a better example. "int foo(const int& a,
const int& b)" perhaps? The burden of proof must ultimately be on you.

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top