std::min/max vs own functions

E

eLVa

Hi everyone,

Below is a test code I made after noticing a big difference in terms
of running time between the use of std::min or std::max and a
templated version of it. I'm not sure where it does come from, so any
comments are welcome.
The code does not anything usefull, but it is a simplification of the
real code (the purpose is only to show the difference of timings ..)

I don't know if there is a proper way of posting a chunk of code, so I
just drop it here ...

#include <algorithm>
#include <iostream>
#include <sys/time.h>

typedef unsigned char uchar;
typedef unsigned int uint;
using namespace std;

void fn(const uchar *pL, const uchar *pR, int w, uint *pD) {
int vlm, vlp;
for (int i=0; i<w; ++i) {
vlm = (i>0 ? (pL[i-1]+pL)/2 : pL);
vlp = (i<w-1 ? (pL+pL[i+1])/2 : pL);

pD = std::min<int>(vlp,std::max<int>(vlm,pR));
}
}

template <class T> T min2(const T&a, const T&b) { return a<b?a:b; }
template <class T> T max2(const T&a, const T&b) { return a<b?a:b; }

void fn2(const uchar *pL, const uchar *pR, int w, uint *pD) {
int vlm, vlp;
for (int i=0; i<w; ++i) {
vlm = (i>0 ? (pL[i-1]+pL)/2 : pL);
vlp = (i<w-1 ? (pL+pL[i+1])/2 : pL);

pD = min2<int>(vlp,max2<int>(vlm,pR));
}
}

int main(int argc, char *argv[]) {

int h = 400, w = 500;
uchar *T1[h], *T2[h];
uint *T3[h];

for (int i=0; i<h; ++i) {
T1 = new uchar[w];
T2 = new uchar[w];
T3 = new uint[w];
}

for (int j=0; j<h; ++j) {
for (int i=0; i<w; ++i) {
T1[j] = rand()%256;
T2[j] = rand()%256;
}
}

struct timeval t1, t2;
gettimeofday(&t1, NULL);
for (int z=0; z<1000; ++z) {
for (int j=0; j<h; ++j) {
uchar *p1 = T1[j];
uchar *p2 = T2[j];
uint *p3 = T3[j];
fn(p1, p2, w, p3);
}
}
gettimeofday(&t2, NULL);
cout << "Took " << (t2.tv_sec-t1.tv_sec)+(t2.tv_usec-t1.tv_usec)/
1.e6 << endl;

gettimeofday(&t1, NULL);
for (int z=0; z<1000; ++z) {
for (int j=0; j<h; ++j) {
uchar *p1 = T1[j];
uchar *p2 = T2[j];
uint *p3 = T3[j];
fn2(p1, p2, w, p3);
}
}
gettimeofday(&t2, NULL);
cout << "Took " << (t2.tv_sec-t1.tv_sec)+(t2.tv_usec-t1.tv_usec)/
1.e6 << endl;
}

Then I compiled it with :
g++ test.cpp -O3 -o test

And this are the results :
Took 1.58356
Took 0.789235

So the second version which use templated functions is approx 2times
faster. Is it normal ?

I tested it on MacosX (10.5) (the timings shown above), and on Linux
where timings are (5.45 for the first version and 4.71 for the second
one : the machine is older, thus the big difference, but still the
second is faster).

Can anyone reproduce this and tell me if this is normal.

Thanks
 
E

eLVa

Different isn't the same. Since max2 doesn't do the same thing as
std::max, despite the misleading name, the measurements don't mean much.

--
  Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Sorry for the mistake, it's a copy paste error, of course you should
read

template <class T> T max2(const T&a, const T&b) { return a>b?a:b; }

But the timings stay the same ...
So where could this come from ?
 
A

Alf P. Steinbach /Usenet

* eLVa, on 11.04.2011 19:58:
Sorry for the mistake, it's a copy paste error, of course you should
read

template<class T> T max2(const T&a, const T&b) { return a>b?a:b; }

But the timings stay the same ...
So where could this come from ?

I think your testing code is pretty obscure.

Can you reproduce in small simple test program, regardless of testing std first
or second?

If so, might be that std::max is instrumented in some way. Check your compiler's
documentation about switches and preprocessor symbols. Please post your results
-- however it turns out!


Cheers,

- Alf
 
P

Puppet_Sock

Hi everyone,

Below is a test code I made after noticing a big difference in terms
of running time between the use of std::min or std::max and a
templated version of it. I'm not sure where it does come from, so any
comments are welcome.
The code does not anything usefull, but it is a simplification of the
real code (the purpose is only to show the difference of timings ..)

I don't know if there is a proper way of posting a chunk of code, so I
just drop it here ...

#include <algorithm>
#include <iostream>
#include <sys/time.h>

typedef unsigned char uchar;
typedef unsigned int uint;
using namespace std;

void fn(const uchar *pL, const uchar *pR, int w, uint *pD) {
    int vlm, vlp;
    for (int i=0; i<w; ++i) {
        vlm = (i>0 ? (pL[i-1]+pL)/2 : pL);
        vlp = (i<w-1 ? (pL+pL[i+1])/2 : pL);

        pD = std::min<int>(vlp,std::max<int>(vlm,pR));
    }

}


Hmmm... vlm and vlp are int, getting assigned the
results of uchar arithmetic.

Hmmm... pR is a uchar, but the max that is getting
called is max<int>. Hmmm...

Hmmm... pD is a unit getting assigned the result
of std::min<int> Hmmm...

Also, you are doing a compare of i to 0 and a compare
of i to w-1 for each call of std::max and std::min.
And you are doing two averages for most cases.
So, I'm expecting that calls to min and max do not
dominate this function.

I wonder if you can't make a function that is
dominated by the max/min calls. I don't know just
what is happening to produce the reported values
you gave, but I'm thinking they don't show what
you think they show.
Socks
 
J

Jeff Flinn

eLVa said:
Sorry for the mistake, it's a copy paste error, of course you should
read

template <class T> T max2(const T&a, const T&b) { return a>b?a:b; }

But the timings stay the same ...
So where could this come from ?

What happens if you call fn2 first, then fn?

Jeff
 
E

eLVa

Hi everyone,
Below is a test code I made after noticing a big difference in terms
of running time between the use of std::min or std::max and a
templated version of it. I'm not sure where it does come from, so any
comments are welcome.
The code does not anything usefull, but it is a simplification of the
real code (the purpose is only to show the difference of timings ..)
I don't know if there is a proper way of posting a chunk of code, so I
just drop it here ...
#include <algorithm>
#include <iostream>
#include <sys/time.h>
typedef unsigned char uchar;
typedef unsigned int uint;
using namespace std;
void fn(const uchar *pL, const uchar *pR, int w, uint *pD) {
    int vlm, vlp;
    for (int i=0; i<w; ++i) {
        vlm = (i>0 ? (pL[i-1]+pL)/2 : pL);
        vlp = (i<w-1 ? (pL+pL[i+1])/2 : pL);

        pD = std::min<int>(vlp,std::max<int>(vlm,pR));
    }


Hmmm... vlm and vlp are int, getting assigned the
results of uchar arithmetic.


Is that a problem ?
Hmmm... pR is a uchar, but the max that is getting
called is max<int>. Hmmm...


Again, is it a problem ?
It's a subsample of my code, where later I use int calculations ...
Hmmm... pD is a unit getting assigned the result
of std::min<int> Hmmm...

Also, you are doing a compare of i to 0 and a compare
of i to w-1 for each call of std::max and std::min.
And you are doing two averages for most cases.
So, I'm expecting that calls to min and max do not
dominate this function.


The other function (fn2) is doing the same so why would it be
dominated by that time in one case, not in the other ?
I wonder if you can't make a function that is
dominated by the max/min calls. I don't know just
what is happening to produce the reported values
you gave, but I'm thinking they don't show what
you think they show.
Socks

I tried to but depending on what I do, the results are not showing
those differences ...
I'm not sure what to test, and how to diagnose the real problem!
 
E

eLVa

(e-mail address removed):




Hi everyone,
Below is a test code I made after noticing a big difference in terms
of running time between the use of std::min or std::max and a
templated version of it. I'm not sure where it does come from, so any
comments are welcome. ...
    int h = 400, w = 500;
    uchar *T1[h], *T2[h];
    uint *T3[h];
    for (int i=0; i<h; ++i) {
        T1 = new uchar[w];
        T2 = new uchar[w];
        T3 = new uint[w];
    }


If you are so keen on performance then note that allocating each row
separately may take quite a lot of time, and accessing the data later may 
also be slower because of worse locality and double indirection (the
latter is not a problem in this concrete example as you take out the row
pointers beforehand). In general it might be faster to allocate the whole
array as new uchar[w*h] and access as T1[j*w+i]. YMMV, of course.


This is a sample of my code where I use a function that's applied on
two "lines".
In my real code, the array are allocated continuously, but for the
test, I found it simpler
to juste do it quickly and allocating each row separately.

The purpose of this sample was to show the problem, and I don't think
allocating data continuously might change it.

std::min/max are also template functions.

FWIW, there seems to be some effect similar to your findings also in
Windows (VS2010 64-bit Release build, with timeofday replaced by clock
()). Seems to be due to different inlining of functions, but I'm not
sure. The difference is not so large, ca 0.93s vs 0.71s. In any case it's
quite strange, std::min/max should not have any intrinsic overhead.

Yes I know that std::min/max are template functions, however I don't
understand where the overhead is coming from.
Do any of you experience the same, or is it some kind of compiler
trick ? (I have no knowledge on this, and that's why I'm eager to know
about your testings).
 
M

Martijn van Buul

* eLVa:
gettimeofday(&t1, NULL);
for (int z=0; z<1000; ++z) {
for (int j=0; j<h; ++j) {
uchar *p1 = T1[j];
uchar *p2 = T2[j];
uint *p3 = T3[j];
fn(p1, p2, w, p3);
}
}
gettimeofday(&t2, NULL);
cout << "Took " << (t2.tv_sec-t1.tv_sec)+(t2.tv_usec-t1.tv_usec)/
1.e6 << endl;

While I doubt it'll explain the differences you're seeing, one word of
advice:

You're benchmarking against 'real time' here, which may be skewing your
results, especially on a loaded system. It might even be getting distorted
by efforts to keep the system clock synchronised.

It's usually safer to benchmark against process runtime - On Unix-like
systems, see getrusage or clock.
 
E

eLVa

* eLVa, on 11.04.2011 19:58:







I think your testing code is pretty obscure.

Can you reproduce in small simple test program, regardless of testing stdfirst
or second?

If so, might be that std::max is instrumented in some way. Check your compiler's
documentation about switches and preprocessor symbols. Please post your results
  --  however it turns out!

After some testing, I have found that the particular compiler I was
using (/usr/bin/g++ v4.0.1 on MacOsX) does not give the same timings
as a recent one (g++ v4.4). In this one, both functions works with the
same timings.
As for the Linux version, I have not tested, but I suppose this is the
same issue.

I consider my problem fixed (as it seems it was just a problem of
updating my version of g++)
Does any of you have the same difference in timings ?
 
R

RaZiel

Hi everyone,

Below is a test code I made after noticing a big difference in terms
of running time between the use of std::min or std::max and a
templated version of it. I'm not sure where it does come from, so any
comments are welcome.
The code does not anything usefull, but it is a simplification of the
real code (the purpose is only to show the difference of timings ..)

I don't know if there is a proper way of posting a chunk of code, so I
just drop it here ...

Then I compiled it with :
g++ test.cpp -O3 -o test

And this are the results :
Took 1.58356
Took 0.789235

So the second version which use templated functions is approx 2times
faster. Is it normal ?

I tested it on MacosX (10.5) (the timings shown above), and on Linux
where timings are (5.45 for the first version and 4.71 for the second
one : the machine is older, thus the big difference, but still the
second is faster).

Can anyone reproduce this and tell me if this is normal.

Thanks

MinGW 4.5.2:
Took 2.4804
Took 2.574

msvc-9.0: (GetTickCount)
Took 3.604
Took 3.01

- RaZ
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top