Concurrent code performance

  • Thread starter Saeed Amrollahi
  • Start date
S

Saeed Amrollahi

Hi All C++ developers

I learned (from theory and practice) the most important benefit of
qualified multi-threaded programming is better performance.
Here it is a (not well-written) concurrent (two worker threads)
and sequential version
of Summation of [0, 2000000[ interval.
Of course my concurrent code is not good. It's for exposition only.

// concurrent code
#include <thread>
#include <iostream>

using namespace std;
long long s1 = 0, s2 = 0;
void Sum(int first, int last, long long& res) // calculate the sum in
[first, last[ interval
{
while (first < last)
res += first++;
}

int main()
{
long long r1 = 0, r2 = 0;
thread t1(Sum, 0, 1000000, std::ref(r1));
thread t2(Sum, 1000000, 2000000, std::ref(r2));
t1.join(); t2.join();
cout << r1 + r2 << '\n';

return 0;
}

// sequential code
#include <iostream>

using namespace std;
void Sum(int first, int last, long long& res) // calculate the sum in
[first, last[ interval
{
while (first < last)
res += first++;
}

int main()
{
long long r = 0;
Sum(0, 2000000, r);

cout << r<< '\n';

return 0;
}

I compiled and ran two codes using the following commands:
$ g++ -std=c++0x -pthread -pedantic -o2 concurrent_sum.c++ -o
concurrent_sum
$ g++ -std=c++0x -pedantic -o2 sequential_sum.c++ -o sequential_sum
$ time ./concurrent_sum
1999999999000
real 0m0.014s
user 0m0.016s
sys 0m0.004s
$ time ./sequential_sum
1999999999000
real 0m0.021s
user 0m0.020s
sys 0m0.000s
Of course the time command differs in several
execution, but honestly, I didn't see so much difference between two
execution.
In addition the generated code for sequential_sum is about 6 KB, but
the size
of concurrent_code is about 45 KB.
Is there any problem in my measurement? Why the concurrent object code
is 7+ times bigger than sequential version? How do you explain it?

FYI, I use GCC 4.6.0 under Ubuntu on a Dell single processor notebook.
Please shed some light.

-- Saeed Amrollahi
 
M

Marc

Saeed said:
Hi All C++ developers

I learned (from theory and practice) the most important benefit of
qualified multi-threaded programming is better performance.
Here it is a (not well-written) concurrent (two worker threads)
and sequential version
of Summation of [0, 2000000[ interval.
Of course my concurrent code is not good. It's for exposition only.

// concurrent code
#include <thread>
#include <iostream>

using namespace std;
long long s1 = 0, s2 = 0;
void Sum(int first, int last, long long& res) // calculate the sum in
[first, last[ interval
{
while (first < last)
res += first++;
}

int main()
{
long long r1 = 0, r2 = 0;
thread t1(Sum, 0, 1000000, std::ref(r1));
thread t2(Sum, 1000000, 2000000, std::ref(r2));

I would make Sum return a long long instead of updating a reference.
If you are unlucky, r1 and r2 end up on the same cache line, you have
false sharing and the concurrent version ends up slower than the
sequential one.
t1.join(); t2.join();
cout << r1 + r2 << '\n';

return 0;
}
In addition the generated code for sequential_sum is about 6 KB, but
the size
of concurrent_code is about 45 KB.
Is there any problem in my measurement? Why the concurrent object code
is 7+ times bigger than sequential version? How do you explain it?

Your basic code is very simple: a loop, a couple additions, almost
nothing. It is not really surprising that code meant to handle
multiple threads is several times larger. Now it might be possible for
libstdc++ to factor more of that code into the shared library, but
that's a different, more complicated question.
 
S

Saeed Amrollahi

Saeed Amrollahi  wrote:
Hi All C++ developers
I learned (from theory and practice) the most important benefit of
qualified multi-threaded programming is better performance.
Here it is a (not well-written) concurrent (two worker threads)
and sequential version
of Summation of [0, 2000000[ interval.
Of course my concurrent code is not good. It's for exposition only.
// concurrent code
#include <thread>
#include <iostream>
using namespace std;
long long s1 = 0, s2 = 0;
void Sum(int first, int last, long long& res) // calculate the sum in
[first, last[ interval
{
  while (first < last)
    res += first++;
}
int main()
{
  long long r1 = 0, r2 = 0;
  thread t1(Sum, 0, 1000000, std::ref(r1));
  thread t2(Sum, 1000000, 2000000, std::ref(r2));

I would make Sum return a long long instead of updating a reference.
If you are unlucky, r1 and r2 end up on the same cache line, you have
false sharing and the concurrent version ends up slower than the
sequential one.

I know. As I noted, my code is not well-written. but I tried
to write two equivalent code.
Can you explain your answer in more detail? the cache line thing
and false sharing ...
Your basic code is very simple: a loop, a couple additions, almost
nothing. It is not really surprising that code meant to handle
multiple threads is several times larger. Now it might be possible for
libstdc++ to factor more of that code into the shared library, but
that's a different, more complicated question.

Why it's not surprising? is the size of my threads big?
Would you explain more.

TIA,
-- Saeed Amrollahi
 
M

Marc

Saeed said:
Can you explain your answer in more detail? the cache line thing
and false sharing ...

There's a wikipedia article on this...
Why it's not surprising? is the size of my threads big?
Would you explain more.

Size of thread: basically how much you call operator new.
Size of object file: basically length of (used) source code.

The basic code is self contained and fits in less than 10 lines. The
threaded code includes the <thread> header, which is rather long...
 
P

Pavel

Saeed said:
Hi All C++ developers ....

I compiled and ran two codes using the following commands:
$ g++ -std=c++0x -pthread -pedantic -o2 concurrent_sum.c++ -o
concurrent_sum
$ g++ -std=c++0x -pedantic -o2 sequential_sum.c++ -o sequential_sum ....

Is there any problem in my measurement?
Not that I can see. You had a decent speed-up in real-time of 50% (or 33%,
depending on how you count). Speed-up factor would almost never be exactly 2 for
2 threads as there are lots of resources in your system that can become
bottlenecks (you may consider reading "What Every Programmer Should Know About
Memory" by Ulrich Drepper, available, for example, at
www.akkadia.org/drepper/cpumemory.pdf); also, context switches would take some
time. Probably your single-CPU notebook has hyper-threading turned on (try 'cat
/proc/cpuinfo' to say for sure); else I can't explain that you saw the reduction
in real time at all.
Why the concurrent object code
is 7+ times bigger than sequential version? How do you explain it?
.....
The concurrent code has thread library code linked in whereas the sequential one
doesn't. As you use GCC, you can run 'objdump -d' on your concurrent binary and
see what exactly functions were added as compared to the sequential version.
-- Saeed Amrollahi


Hope this helps,
-Pavel
 
G

Goran

Hi All C++ developers

I learned (from theory and practice) the most important benefit of
qualified multi-threaded programming is better performance.
Here it is a (not well-written) concurrent (two worker threads)
and sequential version
of Summation of [0, 2000000[ interval.
Of course my concurrent code is not good. It's for exposition only.

// concurrent code
#include <thread>
#include <iostream>

using namespace std;
long long s1 = 0, s2 = 0;
void Sum(int first, int last, long long& res) // calculate the sum in
[first, last[ interval
{
  while (first < last)
    res += first++;

}

int main()
{
  long long r1 = 0, r2 = 0;
  thread t1(Sum, 0, 1000000, std::ref(r1));
  thread t2(Sum, 1000000, 2000000, std::ref(r2));
  t1.join(); t2.join();
  cout << r1 + r2 << '\n';

  return 0;

}

// sequential code
#include <iostream>

using namespace std;
void Sum(int first, int last, long long& res) // calculate the sum in
[first, last[ interval
{
  while (first < last)
   res += first++;

}

int main()
{
  long long r = 0;
  Sum(0, 2000000, r);

  cout << r<< '\n';

  return 0;

}

I compiled and ran two codes using the following commands:
$ g++ -std=c++0x -pthread -pedantic -o2 concurrent_sum.c++ -o
concurrent_sum
$ g++ -std=c++0x  -pedantic -o2 sequential_sum.c++ -o sequential_sum
$ time ./concurrent_sum
1999999999000
real   0m0.014s
user  0m0.016s
sys   0m0.004s
$ time ./sequential_sum
1999999999000
real   0m0.021s
user  0m0.020s
sys   0m0.000s
Of course the time command differs in several
execution, but honestly, I didn't see so much difference between two
execution.
In addition the generated code for sequential_sum is about 6 KB, but
the size
of concurrent_code is about 45 KB.
Is there any problem in my measurement? Why the concurrent object code
is 7+ times bigger than sequential version? How do you explain it?

I don't see a problem. On a single-core machine, I am surprised you
saw even that much of an improvement. I guess, just like Pavel, that
hyper-threading is working. Your code should give better performance
on a multi-core machines. Given what it does, it should hardly touch
memory, if at all, so memory should not be a bottleneck.

Your executable is 7 times bigger because you linked in (part of)
pthreads in it.

Goran.
 
S

Saeed Amrollahi

Saeed Amrollahi  wrote:

There's a wikipedia article on this...



Size of thread: basically how much you call operator new.
Size of object file: basically length of (used) source code.

The basic code is self contained and fits in less than 10 lines. The
threaded code includes the <thread> header, which is rather long...

Thank you for your answers Marc.
-- Saeed
 
S

Saeed Amrollahi

Not that I can see. You had a decent speed-up in real-time of 50% (or 33%,
depending on how you count). Speed-up factor would almost never be exactly 2 for
2 threads as there are lots of resources in your system that can become
bottlenecks
Memory" by Ulrich Drepper, available, for example, atwww.akkadia.org/drepper/cpumemory.pdf);
I know about the paper, but I haven't to read it yet.
also, context switches would take some
time. Probably your single-CPU notebook has hyper-threading turned on (try 'cat
/proc/cpuinfo' to say for sure); else I can't explain that you saw the reduction
in real time at all.


....
The concurrent code has thread library code linked in whereas the sequential one
doesn't. As you use GCC, you can run 'objdump -d' on your concurrent binary and
see what exactly functions were added as compared to the sequential version.




Hope this helps,
-Pavel

Thank you very much. Besides concurrency concepts, I learned
a few UNIX-related commands.
-- Saeed
 
S

Saeed Amrollahi

Hi All C++ developers
I learned (from theory and practice) the most important benefit of
qualified multi-threaded programming is better performance.
Here it is a (not well-written) concurrent (two worker threads)
and sequential version
of Summation of [0, 2000000[ interval.
Of course my concurrent code is not good. It's for exposition only.
// concurrent code
#include <thread>
#include <iostream>
using namespace std;
long long s1 = 0, s2 = 0;
void Sum(int first, int last, long long& res) // calculate the sum in
[first, last[ interval
{
  while (first < last)
    res += first++;

int main()
{
  long long r1 = 0, r2 = 0;
  thread t1(Sum, 0, 1000000, std::ref(r1));
  thread t2(Sum, 1000000, 2000000, std::ref(r2));
  t1.join(); t2.join();
  cout << r1 + r2 << '\n';
  return 0;

// sequential code
#include <iostream>
using namespace std;
void Sum(int first, int last, long long& res) // calculate the sum in
[first, last[ interval
{
  while (first < last)
   res += first++;

int main()
{
  long long r = 0;
  Sum(0, 2000000, r);
  cout << r<< '\n';
  return 0;

I compiled and ran two codes using the following commands:
$ g++ -std=c++0x -pthread -pedantic -o2 concurrent_sum.c++ -o
concurrent_sum
$ g++ -std=c++0x  -pedantic -o2 sequential_sum.c++ -o sequential_sum
$ time ./concurrent_sum
1999999999000
real   0m0.014s
user  0m0.016s
sys   0m0.004s
$ time ./sequential_sum
1999999999000
real   0m0.021s
user  0m0.020s
sys   0m0.000s
Of course the time command differs in several
execution, but honestly, I didn't see so much difference between two
execution.
In addition the generated code for sequential_sum is about 6 KB, but
the size
of concurrent_code is about 45 KB.
Is there any problem in my measurement? Why the concurrent object code
is 7+ times bigger than sequential version? How do you explain it?

I don't see a problem. On a single-core machine, I am surprised you
saw even that much of an improvement. I guess, just like Pavel, that
hyper-threading is working. Your code should give better performance
on a multi-core machines. Given what it does, it should hardly touch
memory, if at all, so memory should not be a bottleneck.

Your executable is 7 times bigger because you linked in (part of)
pthreads in it.

Goran.

Thank you Goran. I just found devising appropriate Multi-threaded
algorithms is much more difficult than just creating and using several
threads.
-- Saeed
 
J

Jorgen Grahn

.
As you use GCC, you can run 'objdump -d' on your concurrent binary

Nitpick: gcc has nothing to do with it. If the binary is on a format
which objdump understands, objdump will understand it. (The object
files generated by gcc happen to be in that category.)

'nm -nC foo.o' might be a simpler tool for this, by the way.

/Jorgen
 
P

Pavel

Jorgen said:
Nitpick: gcc has nothing to do with it. If the binary is on a format
which objdump understands, objdump will understand it. (The object
files generated by gcc happen to be in that category.)

'nm -nC foo.o' might be a simpler tool for this, by the way.

/Jorgen

Yeah, should have probably say "GNU" or whatever but was lazy during wee hours.

Gcc had something do to with it, however: the line of thinking was something
like: uses GCC-on-Linux => should have GNU binutils => has objdump, compressed
with losses to the first keyword :). The point was not that objdump understood
the OP's binary format but that the OP should have objdump installed.

-Pavel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top