Threads with STL - can I run faster?

sunilkher · Nov 23, 2005

Here is a small sample program that I have.

#include <stdlib.h>
#include <pthread.h>
#include <string>

using namespace std;

pthread_t threads[10];
pthread_attr_t thr_attr;
int thr_in[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
int totalIter = 0;
int thr_cnt = 0;
bool debug = false;

extern "C" void *do_something(void *tid);

int main( int argc, const char* argv[] )
{
int thr_var = 0;

//------------------
// how many threads?
//------------------
thr_cnt = atoi(argv[1]);
if (thr_cnt > 8)
{
cout << "WARNING: Limiting the thread count to 8" <<
endl;
thr_cnt = 8;
}

//--------------------------
// how much work to be done?
//--------------------------
totalIter = atoi(argv[2]);
if (totalIter > 5000000)
{
cout << "WARNING: Limiting the iteration count to
5000000" << endl;
totalIter = 5000000;
}

//-------------------------------
// do you want to check up on me?
//-------------------------------
if (argv[3] != NULL) debug = true;

//--------
// threads
//--------
pthread_attr_init(&thr_attr);
pthread_attr_setdetachstate(&thr_attr,
PTHREAD_CREATE_JOINABLE);

for (thr_var = 1; thr_var<=thr_cnt; thr_var++)
pthread_create(&threads[thr_var], &thr_attr,
do_something, (void *)
&(thr_in[thr_var]));

for (thr_var=0; thr_var<thr_cnt; thr_var++)
pthread_join(threads[thr_var], NULL);

pthread_attr_destroy(&thr_attr);

return 0;
}

void *do_something(void *tid)
{
int myThreadId = *((int *)tid);
FILE *fp = NULL;
if (debug)
{
char filename[50] = "";
sprintf(filename, "%d.out", myThreadId);
fp = fopen(filename, "w");
fprintf(fp, "thread #%d processing starts\n",
myThreadId);
}

for (int i=1; i<=totalIter; i++)
{
if (i%thr_cnt == myThreadId-1)
{
if (debug)
{
fprintf(fp, "thread #%d processing
index %d\n", myThreadId, i);
}

string a("abc"), b;
b = a;
}
}

if (debug)
{
fprintf(fp, "thread #%d processing finish\n",
myThreadId);
fflush(fp);
fclose(fp);
}

pthread_exit(NULL);
return NULL;

}

Now when I run this with 1 thread, here is the time taken.

/home/skher/testIPC/testThr> time $BIN/testThr 1 5000000

real 0m0.65s
user 0m0.47s
sys 0m0.14s
/home/skher/testIPC/testThr>

impressive, considering I am doing 5 million iterations. So, I thought
when I run with 2 or more threads, I should be done even in less time.
But here is what I found.

/home/skher/testIPC/testThr> time $BIN/testThr 2 5000000

real 0m34.67s
user 0m58.48s
sys 0m5.20s
/home/skher/testIPC/testThr>

Why is this? I guess this is because whenever I allocate any STL
object, using the _node_alloc template defined in _alloc.c, it has a
lock and unlock mechanism using a static class _Node_Alloc_Lock which
has a static member variable.

Part of that class code is shown here.

template <bool __threads, int __inst>
class _Node_Alloc_Lock {
public:
_Node_Alloc_Lock() {

# ifdef _STLP_SGI_THREADS
if (__threads && __us_rsthread_malloc)
# else /* !_STLP_SGI_THREADS */
if (__threads)
# endif
_S_lock._M_acquire_lock();
}

~_Node_Alloc_Lock() {
# ifdef _STLP_SGI_THREADS
if (__threads && __us_rsthread_malloc)
# else /* !_STLP_SGI_THREADS */
if (__threads)
# endif
_S_lock._M_release_lock();
}

static _STLP_STATIC_MUTEX _S_lock;
};

OK. Now my (worth million dollar only to me) question.

How do I get around this? How do I make my program run faster with more
threads. If you see, the threads are really mutually exclusive since
they are working on different indexes (indices) but still compete with
each other for resources viz. lock while creating STL object. How do I
make this competition go away thus making my program run faster with
more threads.

Any help will be appreciated. Thanx, Sunil.

Neil Cerutti · Nov 23, 2005

OK. Now my (worth million dollar only to me) question.

How do I get around this? How do I make my program run faster
with more threads. If you see, the threads are really mutually
exclusive since they are working on different indexes (indices)
but still compete with each other for resources viz. lock while
creating STL object. How do I make this competition go away
thus making my program run faster with more threads.

Any help will be appreciated. Thanx, Sunil.

Unless you literally have more than one processor, I'd say forget
it. Unfortunately, the question is off-topic for this group. Try
comp.programming, or a group specific to your C++ implementation.

peter koch · Nov 23, 2005

(e-mail address removed) skrev:

Here is a small sample program that I have.
[snip]

Now when I run this with 1 thread, here is the time taken.

/home/skher/testIPC/testThr> time $BIN/testThr 1 5000000

real 0m0.65s
user 0m0.47s
sys 0m0.14s
/home/skher/testIPC/testThr>

impressive, considering I am doing 5 million iterations. So, I thought
when I run with 2 or more threads, I should be done even in less time.
But here is what I found.

/home/skher/testIPC/testThr> time $BIN/testThr 2 5000000

real 0m34.67s
user 0m58.48s
sys 0m5.20s
/home/skher/testIPC/testThr>

Why is this? I guess this is because whenever I allocate any STL
object, using the _node_alloc template defined in _alloc.c, it has a
lock and unlock mechanism using a static class _Node_Alloc_Lock which
has a static member variable.

[snip]
I have not examined your code in depth, but it looks like there is a
difference in what is done in a one-thread scenario and in a n-thread
scenario, Still - it does not matter as this probably is not the
correct group. Off-hand I know that memory allocation can be much more
expensive in a multithreaded environment, so this is certainly a
factor. Also it is not certain that using more threads will make your
program faster. If the task is CPU-bound this requires hardware with
multiple CPUs. But better ask in comp.programming.threads or a group
dedicated to your platform.

Peter

sunilkher · Nov 24, 2005

Yes, I would ask it in that group but since both of you have asked the
same question - I am indeed using a 4 CPU machine running Sun Solaris.
Thanx, Sunil.

Larry I Smith · Nov 24, 2005

Yes, I would ask it in that group but since both of you have asked the
same question - I am indeed using a 4 CPU machine running Sun Solaris.
Thanx, Sunil.

It's been a long time since I used Solaris, but I seem to remember
that the pthreads implementation on Solaris did not use multiple
CPU's, yet the Solaris native thread library (lwp ?) did use
multiple CPU's. You'd better ask on a Solaris newsgroup to
get definitive answers.

Pthreads on Linux (2.6+), I believe, will use multiple CPU's.

Regards,
Larry

Can I declare a variable with an uncertainty number suffix?	3	Mar 5, 2024
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
Print with command-line arguments	0	Oct 2, 2022
How can I fix my pattern coding error in c++	0	Mar 19, 2023
Can anyone help me whats wrong with this	2	Jun 2, 2022
Problems with ZODB,I can not persist and object accessed from 2 threads	0	Apr 29, 2014
Windows LLDP Driver Responds With No Data	0	Mar 17, 2023
GET NEIL DEGRASSES TYSON, I ripped a hole with this one...	0	Nov 10, 2022

Threads with STL - can I run faster?

sunilkher

Neil Cerutti

peter koch

sunilkher

Larry I Smith

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads