map/set and multithreading

H

He Shiming

Hi All,

I've got a question regarding using STL map and set class in a multithreaded
situation.

I'm developing a database program. The program fetches and parses millions
of records from the database and build a map/set to allow faster access to
records by certain rules. The type of map/set is:

map<long, set<long> > m_map;

It maps "category id" to a set of "record ids". So that I can easily get a
set of record IDs by accessing m_map[CATEGORY_ID] later on.

The fetch-and-parsing process is very long, and it takes more than 3 minutes
in usual cases. So you know, during this period, I wanted to give the user
some feedback on the UI, hopefully to show some already parsed records to
users.

So I put the parsing process in a background thread, it's doing all the
fetching job and stuff such as m_map[CATEGORY_ID].insert(RECORD_ID);. And
the UI access the map every few seconds, to fetch out record IDs. Of course,
first, it used map<long, set<long> >::const_iterator to remember
"CATEGORY_ID", and set<long>::const_iterator to remember the current
position of RECORD_ID in the set, do a little iteration, and fetch all the
relevant record IDs out. (well, during this process, the background thread
is still running)

I noticed that if I let the background thread finish before attempting to
refresh the UI, it works flawlessly. But if I do what's planned above,
refresh UI every few seconds, some how lots of records were slipped out. The
debug message shows 100 m_map[CATEGORY_ID].insert(RECORD_ID) insertions (of
course RECORD_IDs are all different), but the size of the set is only like
40 items or 30 items, it's random. I further checked the debug message and
found that the thread is running fine with no errors or exceptions, all
records are parsed the way they supposed to be.

Now, why is the number of set entries less than the number of insertions
(again, keys are all different)? My guess would be some kind of protection,
when another thread is accessing the set, all insertions are dismissed. Is
that true? How do I avoid this? I even tried pausing the thread while the UI
refresh process and got no luck.

I'm working on the Windows platform and I don't think it's a problem with
compile settings, relevant options are all checked. Any help will be much
appreciated.

Thanks in advance,
 
D

davidrubin

There is obviously a race condition between your threads. As nothing in
standard C++ is thread-aware, you need to use a platform-specific, or
third-party library to provide concurrency mechanisms.
 
K

Kai-Uwe Bux

He said:
Hi All,

I've got a question regarding using STL map and set class in a
multithreaded situation.

I'm developing a database program. The program fetches and parses millions
of records from the database and build a map/set to allow faster access to
records by certain rules. The type of map/set is:

map<long, set<long> > m_map;

It maps "category id" to a set of "record ids". So that I can easily get a
set of record IDs by accessing m_map[CATEGORY_ID] later on.

The fetch-and-parsing process is very long, and it takes more than 3
minutes in usual cases. So you know, during this period, I wanted to give
the user some feedback on the UI, hopefully to show some already parsed
records to users.

So I put the parsing process in a background thread, it's doing all the
fetching job and stuff such as m_map[CATEGORY_ID].insert(RECORD_ID);. And
the UI access the map every few seconds, to fetch out record IDs. Of
course, first, it used map<long, set<long> >::const_iterator to remember
"CATEGORY_ID", and set<long>::const_iterator to remember the current
position of RECORD_ID in the set, do a little iteration, and fetch all the
relevant record IDs out. (well, during this process, the background thread
is still running)

I noticed that if I let the background thread finish before attempting to
refresh the UI, it works flawlessly. But if I do what's planned above,
refresh UI every few seconds, some how lots of records were slipped out.
The debug message shows 100 m_map[CATEGORY_ID].insert(RECORD_ID)
insertions (of course RECORD_IDs are all different), but the size of the
set is only like 40 items or 30 items, it's random. I further checked the
debug message and found that the thread is running fine with no errors or
exceptions, all records are parsed the way they supposed to be.

Now, why is the number of set entries less than the number of insertions
(again, keys are all different)? My guess would be some kind of
protection, when another thread is accessing the set, all insertions are
dismissed. Is that true? How do I avoid this? I even tried pausing the
thread while the UI refresh process and got no luck.

The C++ standard does not make any statements about threads. Thus, the
behavior of standard containers in a multithreaded environment is entirely
up the implementation. You will need to check the documentation of your
library as to which thread-safety guarantees it provides and which idioms
of access it supports in a multi-threaded setting.

A common rule is: while one process is writing all other threads may not
access the container, not even for reading. If this is the case with your
implementation, you will need to use some explicit locking. Also note that
just suspending the writing process is not good enough since you might
suspend it in the middle of a write operation.

I'm working on the Windows platform and I don't think it's a problem with
compile settings, relevant options are all checked. Any help will be much
appreciated.

You should ask in a newsgroup for your particular compiler. As said above:
multi-threading is platform specific.


Best

Kai-Uwe Bux
 
?

=?iso-8859-1?q?Stephan_Br=F6nnimann?=

He said:
Hi All,

I've got a question regarding using STL map and set class in a multithreaded
situation.

I'm developing a database program. The program fetches and parses millions
of records from the database and build a map/set to allow faster access to
records by certain rules. The type of map/set is:

map<long, set<long> > m_map;

It maps "category id" to a set of "record ids". So that I can easily get a
set of record IDs by accessing m_map[CATEGORY_ID] later on.

The fetch-and-parsing process is very long, and it takes more than 3 minutes
in usual cases. So you know, during this period, I wanted to give the user
some feedback on the UI, hopefully to show some already parsed records to
users.

So I put the parsing process in a background thread, it's doing all the
fetching job and stuff such as m_map[CATEGORY_ID].insert(RECORD_ID);. And
the UI access the map every few seconds, to fetch out record IDs. Of course,
first, it used map<long, set<long> >::const_iterator to remember
"CATEGORY_ID", and set<long>::const_iterator to remember the current
position of RECORD_ID in the set, do a little iteration, and fetch all the
relevant record IDs out. (well, during this process, the background thread
is still running)

I noticed that if I let the background thread finish before attempting to
refresh the UI, it works flawlessly. But if I do what's planned above,
refresh UI every few seconds, some how lots of records were slipped out. The
debug message shows 100 m_map[CATEGORY_ID].insert(RECORD_ID) insertions (of
course RECORD_IDs are all different), but the size of the set is only like
40 items or 30 items, it's random. I further checked the debug message and
found that the thread is running fine with no errors or exceptions, all
records are parsed the way they supposed to be.

Now, why is the number of set entries less than the number of insertions
(again, keys are all different)? My guess would be some kind of protection,
when another thread is accessing the set, all insertions are dismissed. Is
that true? How do I avoid this? I even tried pausing the thread while the UI
refresh process and got no luck.

I'm working on the Windows platform and I don't think it's a problem with
compile settings, relevant options are all checked. Any help will be much
appreciated.

Thanks in advance,

Why use threads at all? Just provide a callback object to the DB
interface:

// Fetch and parse row from database
class DbInterface {
public:
class Callback;
public:
void fetch(const Callback& cb);
};

class DbInterface::Callback {
public:
virtual ~Callback() {}

// return: true if fetch should continue,
// false if fetching should stop
virtual bool operator()() = 0;

// number of rows after which operator()
// should be called
virtual int numRows() const =0;
};

void DbInterface::fetch(const Callback& cb)
{
int numFetched = 0;
// fetch and process
if (0 == (++numFetched % cb.numRows())) cb();
}

Regards, Stephan
 
?

=?iso-8859-1?q?Stephan_Br=F6nnimann?=

He needs to simultaneously read and write the map.

I agree that this is the design of the OP. As others have already
stated,
access to shared data must be protected in this case.

My approach is different: it uses a callback object that is called
every time x rows have been fetched. Until the callback function
returns, nobody will write to the shared data which therefore needn't
protection.
From the user's perspecitive the behaviour should not change:
While the application reads the data (seemingly in the background) the
user can access the latest "results".


Regards, Stephan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top