std::min/max vs own functions

eLVa · Apr 11, 2011

Hi everyone,

Below is a test code I made after noticing a big difference in terms
of running time between the use of std::min or std::max and a
templated version of it. I'm not sure where it does come from, so any
comments are welcome.
The code does not anything usefull, but it is a simplification of the
real code (the purpose is only to show the difference of timings ..)

I don't know if there is a proper way of posting a chunk of code, so I
just drop it here ...

#include <algorithm>
#include <iostream>
#include <sys/time.h>

typedef unsigned char uchar;
typedef unsigned int uint;
using namespace std;

void fn(const uchar *pL, const uchar *pR, int w, uint *pD) {
int vlm, vlp;
for (int i=0; i<w; ++i) {
vlm = (i>0 ? (pL[i-1]+pL)/2 : pL);
vlp = (i<w-1 ? (pL+pL[i+1])/2 : pL);

pD = std::min<int>(vlp,std::max<int>(vlm,pR));
}
}

template <class T> T min2(const T&a, const T&b) { return a<b?a:b; }
template <class T> T max2(const T&a, const T&b) { return a<b?a:b; }

void fn2(const uchar *pL, const uchar *pR, int w, uint *pD) {
int vlm, vlp;
for (int i=0; i<w; ++i) {
vlm = (i>0 ? (pL[i-1]+pL)/2 : pL);
vlp = (i<w-1 ? (pL+pL[i+1])/2 : pL);

pD = min2<int>(vlp,max2<int>(vlm,pR));
}
}

int main(int argc, char *argv[]) {

int h = 400, w = 500;
uchar *T1[h], *T2[h];
uint *T3[h];

for (int i=0; i<h; ++i) {
T1 = new uchar[w];
T2 = new uchar[w];
T3 = new uint[w];
}

for (int j=0; j<h; ++j) {
for (int i=0; i<w; ++i) {
T1[j] = rand()%256;
T2[j] = rand()%256;
}
}

struct timeval t1, t2;
gettimeofday(&t1, NULL);
for (int z=0; z<1000; ++z) {
for (int j=0; j<h; ++j) {
uchar *p1 = T1[j];
uchar *p2 = T2[j];
uint *p3 = T3[j];
fn(p1, p2, w, p3);
}
}
gettimeofday(&t2, NULL);
cout << "Took " << (t2.tv_sec-t1.tv_sec)+(t2.tv_usec-t1.tv_usec)/
1.e6 << endl;

gettimeofday(&t1, NULL);
for (int z=0; z<1000; ++z) {
for (int j=0; j<h; ++j) {
uchar *p1 = T1[j];
uchar *p2 = T2[j];
uint *p3 = T3[j];
fn2(p1, p2, w, p3);
}
}
gettimeofday(&t2, NULL);
cout << "Took " << (t2.tv_sec-t1.tv_sec)+(t2.tv_usec-t1.tv_usec)/
1.e6 << endl;
}

Then I compiled it with :
g++ test.cpp -O3 -o test

And this are the results :
Took 1.58356
Took 0.789235

So the second version which use templated functions is approx 2times
faster. Is it normal ?

I tested it on MacosX (10.5) (the timings shown above), and on Linux
where timings are (5.45 for the first version and 4.71 for the second
one : the machine is older, thus the big difference, but still the
second is faster).

Can anyone reproduce this and tell me if this is normal.

Thanks

eLVa · Apr 11, 2011

Different isn't the same. Since max2 doesn't do the same thing as
std::max, despite the misleading name, the measurements don't mean much.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Sorry for the mistake, it's a copy paste error, of course you should
read

template <class T> T max2(const T&a, const T&b) { return a>b?a:b; }

But the timings stay the same ...
So where could this come from ?

Alf P. Steinbach /Usenet · Apr 11, 2011

* eLVa, on 11.04.2011 19:58:

Sorry for the mistake, it's a copy paste error, of course you should
read

template<class T> T max2(const T&a, const T&b) { return a>b?a:b; }

But the timings stay the same ...
So where could this come from ?

I think your testing code is pretty obscure.

Can you reproduce in small simple test program, regardless of testing std first
or second?

If so, might be that std::max is instrumented in some way. Check your compiler's
documentation about switches and preprocessor symbols. Please post your results
-- however it turns out!

Cheers,

- Alf

Puppet_Sock · Apr 11, 2011

Hi everyone,

Below is a test code I made after noticing a big difference in terms
of running time between the use of std::min or std::max and a
templated version of it. I'm not sure where it does come from, so any
comments are welcome.
The code does not anything usefull, but it is a simplification of the
real code (the purpose is only to show the difference of timings ..)

I don't know if there is a proper way of posting a chunk of code, so I
just drop it here ...

#include <algorithm>
#include <iostream>
#include <sys/time.h>

typedef unsigned char uchar;
typedef unsigned int uint;
using namespace std;

void fn(const uchar *pL, const uchar *pR, int w, uint *pD) {
int vlm, vlp;
for (int i=0; i<w; ++i) {
vlm = (i>0 ? (pL[i-1]+pL)/2 : pL);
vlp = (i<w-1 ? (pL+pL[i+1])/2 : pL);

pD = std::min<int>(vlp,std::max<int>(vlm,pR));
}

}

Hmmm... vlm and vlp are int, getting assigned the
results of uchar arithmetic.

Hmmm... pR is a uchar, but the max that is getting
called is max<int>. Hmmm...

Hmmm... pD is a unit getting assigned the result
of std::min<int> Hmmm...

Also, you are doing a compare of i to 0 and a compare
of i to w-1 for each call of std::max and std::min.
And you are doing two averages for most cases.
So, I'm expecting that calls to min and max do not
dominate this function.

I wonder if you can't make a function that is
dominated by the max/min calls. I don't know just
what is happening to produce the reported values
you gave, but I'm thinking they don't show what
you think they show.
Socks

Jeff Flinn · Apr 11, 2011

eLVa said:
Sorry for the mistake, it's a copy paste error, of course you should
read

template <class T> T max2(const T&a, const T&b) { return a>b?a:b; }

But the timings stay the same ...
So where could this come from ?

What happens if you call fn2 first, then fn?

Jeff

eLVa · Apr 11, 2011

Hi everyone,

Click to expand...

Below is a test code I made after noticing a big difference in terms
of running time between the use of std::min or std::max and a
templated version of it. I'm not sure where it does come from, so any
comments are welcome.
The code does not anything usefull, but it is a simplification of the
real code (the purpose is only to show the difference of timings ..)

Click to expand...

I don't know if there is a proper way of posting a chunk of code, so I
just drop it here ...

Click to expand...

#include <algorithm>
#include <iostream>
#include <sys/time.h>

Click to expand...

typedef unsigned char uchar;
typedef unsigned int uint;
using namespace std;

Click to expand...

void fn(const uchar *pL, const uchar *pR, int w, uint *pD) {
int vlm, vlp;
for (int i=0; i<w; ++i) {
vlm = (i>0 ? (pL[i-1]+pL)/2 : pL);
vlp = (i<w-1 ? (pL+pL[i+1])/2 : pL);

Click to expand...

pD = std::min<int>(vlp,std::max<int>(vlm,pR));
}

Click to expand...

}

Click to expand...

Hmmm... vlm and vlp are int, getting assigned the
results of uchar arithmetic.

Is that a problem ?

Hmmm... pR is a uchar, but the max that is getting
called is max<int>. Hmmm...

Click to expand...

Again, is it a problem ?
It's a subsample of my code, where later I use int calculations ...

Hmmm... pD is a unit getting assigned the result
of std::min<int> Hmmm...

Also, you are doing a compare of i to 0 and a compare
of i to w-1 for each call of std::max and std::min.
And you are doing two averages for most cases.
So, I'm expecting that calls to min and max do not
dominate this function.

Click to expand...

The other function (fn2) is doing the same so why would it be
dominated by that time in one case, not in the other ?

I wonder if you can't make a function that is
dominated by the max/min calls. I don't know just
what is happening to produce the reported values
you gave, but I'm thinking they don't show what
you think they show.
Socks

Click to expand...

I tried to but depending on what I do, the results are not showing
those differences ...
I'm not sure what to test, and how to diagnose the real problem!

eLVa · Apr 11, 2011

What happens if you call fn2 first, then fn?

I get the same results (fn2 is two times faster than fn).

eLVa · Apr 11, 2011

(e-mail address removed):

Hi everyone,

Click to expand...

Below is a test code I made after noticing a big difference in terms
of running time between the use of std::min or std::max and a
templated version of it. I'm not sure where it does come from, so any
comments are welcome. ...
int h = 400, w = 500;
uchar *T1[h], *T2[h];
uint *T3[h];

Click to expand...

for (int i=0; i<h; ++i) {
T1 = new uchar[w];
T2 = new uchar[w];
T3 = new uint[w];
}

Click to expand...

If you are so keen on performance then note that allocating each row
separately may take quite a lot of time, and accessing the data later may
also be slower because of worse locality and double indirection (the
latter is not a problem in this concrete example as you take out the row
pointers beforehand). In general it might be faster to allocate the whole
array as new uchar[w*h] and access as T1[j*w+i]. YMMV, of course.

This is a sample of my code where I use a function that's applied on
two "lines".
In my real code, the array are allocated continuously, but for the
test, I found it simpler
to juste do it quickly and allocating each row separately.

The purpose of this sample was to show the problem, and I don't think
allocating data continuously might change it.

std::min/max are also template functions.

FWIW, there seems to be some effect similar to your findings also in
Windows (VS2010 64-bit Release build, with timeofday replaced by clock
()). Seems to be due to different inlining of functions, but I'm not
sure. The difference is not so large, ca 0.93s vs 0.71s. In any case it's
quite strange, std::min/max should not have any intrinsic overhead.

Click to expand...

Yes I know that std::min/max are template functions, however I don't
understand where the overhead is coming from.
Do any of you experience the same, or is it some kind of compiler
trick ? (I have no knowledge on this, and that's why I'm eager to know
about your testings).

Martijn van Buul · Apr 11, 2011

* eLVa:

gettimeofday(&t1, NULL);
for (int z=0; z<1000; ++z) {
for (int j=0; j<h; ++j) {
uchar *p1 = T1[j];
uchar *p2 = T2[j];
uint *p3 = T3[j];
fn(p1, p2, w, p3);
}
}
gettimeofday(&t2, NULL);
cout << "Took " << (t2.tv_sec-t1.tv_sec)+(t2.tv_usec-t1.tv_usec)/
1.e6 << endl;

While I doubt it'll explain the differences you're seeing, one word of
advice:

You're benchmarking against 'real time' here, which may be skewing your
results, especially on a loaded system. It might even be getting distorted
by efforts to keep the system clock synchronised.

It's usually safer to benchmark against process runtime - On Unix-like
systems, see getrusage or clock.

eLVa · Apr 11, 2011

* eLVa, on 11.04.2011 19:58:

I think your testing code is pretty obscure.

Can you reproduce in small simple test program, regardless of testing stdfirst
or second?

If so, might be that std::max is instrumented in some way. Check your compiler's
documentation about switches and preprocessor symbols. Please post your results
-- however it turns out!

After some testing, I have found that the particular compiler I was
using (/usr/bin/g++ v4.0.1 on MacOsX) does not give the same timings
as a recent one (g++ v4.4). In this one, both functions works with the
same timings.
As for the Linux version, I have not tested, but I suppose this is the
same issue.

I consider my problem fixed (as it seems it was just a problem of
updating my version of g++)
Does any of you have the same difference in timings ?

RaZiel · Apr 11, 2011

Hi everyone,

Below is a test code I made after noticing a big difference in terms
of running time between the use of std::min or std::max and a
templated version of it. I'm not sure where it does come from, so any
comments are welcome.
The code does not anything usefull, but it is a simplification of the
real code (the purpose is only to show the difference of timings ..)

I don't know if there is a proper way of posting a chunk of code, so I
just drop it here ...

Then I compiled it with :
g++ test.cpp -O3 -o test

And this are the results :
Took 1.58356
Took 0.789235

So the second version which use templated functions is approx 2times
faster. Is it normal ?

I tested it on MacosX (10.5) (the timings shown above), and on Linux
where timings are (5.45 for the first version and 4.71 for the second
one : the machine is older, thus the big difference, but still the
second is faster).

Can anyone reproduce this and tell me if this is normal.

Thanks

MinGW 4.5.2:
Took 2.4804
Took 2.574

msvc-9.0: (GetTickCount)
Took 3.604
Took 3.01

- RaZ

Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
Counter-intuitive io vs no-io time readings	6	Apr 9, 2014
Need Help: Program to Accept 2 Matrices and Show their Sum	0	Aug 21, 2022
Python point location of intersect between two lines	0	Feb 28, 2018
Problems with std::less	3	Jun 10, 2007
Compile-time introspection of free-floating functions, does this work?	4	Feb 13, 2012
member template -- reusing type information?	0	Aug 9, 2008
Templating composed functions	2	Jul 6, 2008

std::min/max vs own functions

eLVa

eLVa

Alf P. Steinbach /Usenet

Puppet_Sock

Jeff Flinn

eLVa

eLVa

eLVa

Martijn van Buul

eLVa

RaZiel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads