g++ unordered_multimap buggy?

R

Ralf Goertz

Hi,

I have now spent a day to track down a segmentation fault in a program
of mine that used to work. This time it crashed always exactly at the
same place (I was running it with new data). The program uses
unordered_multimap<string, SomeClass> and I repeatedly call a subroutine
with two const SomeClass & elements obtained from iterators of that map.
gdb told me that the segfault occured at the line where I called the
subroutine. Only after switching off optimization (which in this case is
very expensive because the map contains millions of elements and
execution time increased from 20 minutes to over an hour) I got a
different source line where the error occured:

Program terminated with signal 11, Segmentation fault.
#0 0x000000000041f642 in std::_Rb_tree<std::string, std::string, std::_Identity<std::string>, std::less<std::string>,
std::allocator<std::string> >::_M_insert_unique (
this=0x7fff600cf070, linknodes=...) at /usr/include/c++/4.5/bits/stl_tree.h:1181

In that file we have

1171 template<typename _Key, typename _Val, typename _KeyOfValue,
1172 typename _Compare, typename _Alloc>
1173 pair<typename _Rb_tree<_Key, _Val, _KeyOfValue,
1174 _Compare, _Alloc>::iterator, bool>
1175 _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::
1176 _M_insert_unique(const _Val& __v)
1177 {
1178 _Link_type __x = _M_begin();
1179 _Link_type __y = _M_end();
1180 bool __comp = true;
1181 while (__x != 0)
1182 {
1183 __y = __x;
1184 __comp = _M_impl._M_key_compare(_KeyOfValue()(__v), _S_key(__x));
1185 __x = __comp ? _S_left(__x) : _S_right(__x);
1186 }
1187 iterator __j = iterator(__y);
1188 if (__comp)
1189 {
1190 if (__j == begin())
1191 return pair<iterator, bool>(_M_insert_(__x, __y, __v), true);
1192 else
1193 --__j;
1194 }
1195 if (_M_impl._M_key_compare(_S_key(__j._M_node), _KeyOfValue()(__v)))
1196 return pair<iterator, bool>(_M_insert_(__x, __y, __v), true);
1197 return pair<iterator, bool>(__j, false);
1198 }


I still don't see where there should be a problem but after switching to
boost::unordered_multimap every thing works fine again.

I don't want to exclude the possibility that my program is buggy but
given that

* it had worked previously,
* works fine with boost,
* crashes only with specific input after many executions of that line
* doesn't use any pointers at all

what are the odds that the g++ implementation is the culprit? Maybe there
is something wrong with the hashing?
 
R

Ralf Goertz

Leigh Johnston wrote:

If it was a bug in the g++ implementation you should be able to ferment
a test case

How? If it had to do with the hashing e.g. whether or not the map needs
to rehash. I don't say that there are no much shorter programs that
would exhibit the problem but if the problem is data driven and I happen
to come across it with a huge data set how am I supposed to pin down the
exact circumstances? As i said the program worked with other (huge)
datasets, it even ran more than 3 months without crashing. If the
circumstances for crashing were frequent there would be somebody out
there having had the same problem, right?
but I doubt it is: a bug in your code could manifest
differently when not using g++ implementation for example if you are
trashing the heap.

What the program does is comparing instances (elements of the map) and
collapsing them if they are thought equal. That is: copy the features of
one instance to the other and erase the first from the map. I don't see
what I could do wrong here as I use only stl.

I don't understand hash maps very well but they seem complicated. Is it
possible that there is a bug that manifests itself only in very rare
circumstances or are they a simple piece of code?
 
K

Kai-Uwe Bux

Ralf said:
Hi,

I have now spent a day to track down a segmentation fault in a program
of mine that used to work. This time it crashed always exactly at the
same place (I was running it with new data). The program uses
unordered_multimap<string, SomeClass> and I repeatedly call a subroutine
with two const SomeClass & elements obtained from iterators of that map.
gdb told me that the segfault occured at the line where I called the
subroutine.

Hm, that sounds like you could be invalidating the references somewhere
between obtaining them and passing them to the subroutine. But without
seeing the code, we are limited to guesswork.

Only after switching off optimization (which in this case is
very expensive because the map contains millions of elements and
execution time increased from 20 minutes to over an hour) I got a
different source line where the error occured:

Program terminated with signal 11, Segmentation fault.
#0 0x000000000041f642 in std::_Rb_tree<std::string, std::string,
#std::_Identity<std::string>, std::less<std::string>,
std::allocator<std::string> >::_M_insert_unique (
this=0x7fff600cf070, linknodes=...) at
/usr/include/c++/4.5/bits/stl_tree.h:1181

In that file we have

1171 template<typename _Key, typename _Val, typename _KeyOfValue,
1172 typename _Compare, typename _Alloc>
1173 pair<typename _Rb_tree<_Key, _Val, _KeyOfValue,
1174 _Compare, _Alloc>::iterator, bool>
1175 _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::
1176 _M_insert_unique(const _Val& __v)
1177 {
1178 _Link_type __x = _M_begin();
1179 _Link_type __y = _M_end();
1180 bool __comp = true;
1181 while (__x != 0)
1182 {
1183 __y = __x;
1184 __comp = _M_impl._M_key_compare(_KeyOfValue()(__v),
_S_key(__x));
1185 __x = __comp ? _S_left(__x) : _S_right(__x);
1186 }
1187 iterator __j = iterator(__y);
1188 if (__comp)
1189 {
1190 if (__j == begin())
1191 return pair<iterator, bool>(_M_insert_(__x, __y, __v), true);
1192 else
1193 --__j;
1194 }
1195 if (_M_impl._M_key_compare(_S_key(__j._M_node),
_KeyOfValue()(__v)))
1196 return pair<iterator, bool>(_M_insert_(__x, __y, __v), true);
1197 return pair<iterator, bool>(__j, false);
1198 }


I still don't see where there should be a problem but after switching to
boost::unordered_multimap every thing works fine again.

I don't want to exclude the possibility that my program is buggy but
given that

* it had worked previously,
* works fine with boost,
* crashes only with specific input after many executions of that line
* doesn't use any pointers at all

what are the odds that the g++ implementation is the culprit?

Close to 0 :)

Maybe there is something wrong with the hashing?

I would rather suspect that your code has undefined behavior:

a) it crashes
b) randomly
c) and at different places depending on the optimization level
d) swapping the implementation of unordered_multimap changes the behavior

and, as mentioned above, invalidation of references seems a probable cause
of undefined behavior; in particular as your program does not use pointers.


Best,

Kai-Uwe Bux
 
R

Ralf Goertz

Kai-Uwe Bux said:
Hm, that sounds like you could be invalidating the references somewhere
between obtaining them and passing them to the subroutine. But without
seeing the code, we are limited to guesswork.



Close to 0 :)

:(

Thanks for taking a look.
I would rather suspect that your code has undefined behavior:

a) it crashes
b) randomly
c) and at different places depending on the optimization level

No that is not true. It crashes at the same point regardless of
optimization. The reason I see a different sourceline is – I suppose –
that some calls are optimized away.
d) swapping the implementation of unordered_multimap changes the behavior

and, as mentioned above, invalidation of references seems a probable cause
of undefined behavior; in particular as your program does not use pointers.

How can that be? This is the relevant part of the code

for (;ii!=itp.first;){
AIterator a1=nodes.getIterator(ii->second,0);
AIterator a2=nodes.getIterator(ii->second,1);
AIterator const &ai1(ii->second < itp.first->second ? a2 : a1),
&ai2(ii->second > itp.first->second ? a2 : a1);
if(isEqual(ai1->second, ai2->second) && merge(ai2,ai1)) {
}

So I declare AIterators a1 and a2 (getIterator returns a valid
iterator!=end()) and then I declare ai1 and ai2 to be const references
to a1 and a2 repectively depending on some sort criterion. The program
crashes at isEqual (when compiled with -O3) but in the stl-header
without optimization. But in both cases it is the same run in the loop.

Ralf
 
K

Kai-Uwe Bux

Ralf said:
:(

Thanks for taking a look.


No that is not true. It crashes at the same point regardless of
optimization. The reason I see a different sourceline is ? I suppose ?
that some calls are optimized away.


How can that be? This is the relevant part of the code

for (;ii!=itp.first;){
AIterator a1=nodes.getIterator(ii->second,0);
AIterator a2=nodes.getIterator(ii->second,1);
AIterator const &ai1(ii->second < itp.first->second ? a2 : a1),
&ai2(ii->second > itp.first->second ? a2 : a1);
if(isEqual(ai1->second, ai2->second) && merge(ai2,ai1)) {
}

So I declare AIterators a1 and a2 (getIterator returns a valid
iterator!=end()) and then I declare ai1 and ai2 to be const references
to a1 and a2 repectively depending on some sort criterion. The program
crashes at isEqual (when compiled with -O3) but in the stl-header
without optimization. But in both cases it is the same run in the loop.

Hm, indeed: there is nothing invalidating references that jumps right at me.
Here are some suggestions / observations:

a) If the AIterator objects are small, it probably does not pay off to use
AIterator const references. So, I would try:

AIterator ai1 = ii->second < itp.first->second ? a2 : a1;
AIterator ai2 = ii->second > itp.first->second ? a2 : a1;

and see if that changes anything.

b) Also: the cases

ii->second < itp.first->second

and

ii->second > itp.first->second

have the common false ii->second == itp.first->second. I cannot see whether
that is intentional, but it sure would be a _rare_ condition, and I would
check whether the spurious crashed might be related to that case.

c) A remark on style: Is the short circuit evaluation

if(isEqual(ai1->second, ai2->second) && merge(ai2,ai1)) {
}

just a result of trimming down the example? I would condent that

if( isEqual(ai1->second, ai2->second) ) {
merge(ai2,ai1);
}

expresses more clearly what happens.


Best,

Kai-Uwe Bux
 
R

Ralf Goertz

Okay, inserting a check as to whether a2 is indeed not equal to end()
solves the problem. However, I can't see how it can be equal to end() as
I have a set with all possible keys and those iterators are basically
obtained by equal_range(). getIterator(,0) returns end() (which was
checked but I trimmed that when I pasted the code here) if there was
only one element with that key. getIterator(,1) returned the iterator
following the the first in the range. So logically, a2 could never have
been end() and I omitted the check for optimization. It turns out that
in the case it crashes equal_range shows that there are 50 elements with
that key.

c) A remark on style: Is the short circuit evaluation

if(isEqual(ai1->second, ai2->second) && merge(ai2,ai1)) {
}

just a result of trimming down the example? I would condent that

if( isEqual(ai1->second, ai2->second) ) {
merge(ai2,ai1);
}

expresses more clearly what happens.

No, merge() returns boolean because it can fail too in which case the
deletion in the following block of code will not ben done.

Thanks again for looking at the code. Although I still don't understand
what's going on I have it now working with both implemeantions of
unordered_multimap. I will continue to try to solve the puzzle.
 
B

Balog Pal

Ralf Goertz said:
It turns out that
in the case it crashes equal_range shows that there are 50 elements with
that key.

Normally happens when you use a compare funtion that breaks the rules on
transitiveness, antisymmetry, etc... if you use gcc, it has cool defines to
add much diagnostic to STL, flagging use of invalid iterators, such wrong
functors and much more. Did you try them?
 
R

Ralf Goertz

Probably 0. I think I found an explanation.
How can that be? This is the relevant part of the code

for (;ii!=itp.first;){
AIterator a1=nodes.getIterator(ii->second,0);
AIterator a2=nodes.getIterator(ii->second,1);
AIterator const &ai1(ii->second < itp.first->second ? a2 : a1),
&ai2(ii->second > itp.first->second ? a2 : a1);
if(isEqual(ai1->second, ai2->second) && merge(ai2,ai1)) {
}

So I declare AIterators a1 and a2 (getIterator returns a valid
iterator!=end()) and then I declare ai1 and ai2 to be const references
to a1 and a2 repectively depending on some sort criterion. The program
crashes at isEqual (when compiled with -O3) but in the stl-header
without optimization. But in both cases it is the same run in the loop.

A colleague of mine recreated the input file from our database
(regrettably without saving the former version, because it is 1.8 GByte
big) and the problem disappeared. As I said in the other post a1 was
checked against end() while a2 wasn't. Logically, it couldn't have been
end() as long as all the keys were unique which they probably weren't in
the input file. So my multimap contained the same id (value) for the
same key. I used the value to retrieve an iterator from a normal map.
When I found it the first time it got deleted (because it was considered
equal to another instance) then I tried to retrieve it again and boom.
The reason why it only happened with the g++ implementation would be
that in the boost implementation there was a different order so that the
non existing key would have been retrieved as a1 instead of a2.

So in the end it was good that it crashed because that gave a hint to a
problem in the input. On the other hand I should have checked for that
possibility and exited gracefully with a meaningful message. Thanks
again for looking into it.
 
R

Ruben Safir

Ralf Goertz said:
:(

Thanks for taking a look.


No that is not true. It crashes at the same point regardless of
optimization. The reason I see a different sourceline is ? I suppose ?
that some calls are optimized away.


How can that be? This is the relevant part of the code

for (;ii!=itp.first;){
AIterator a1=nodes.getIterator(ii->second,0);
AIterator a2=nodes.getIterator(ii->second,1);
AIterator const &ai1(ii->second < itp.first->second ? a2 : a1),
&ai2(ii->second > itp.first->second ? a2 : a1);
if(isEqual(ai1->second, ai2->second) && merge(ai2,ai1)) {
}

So I declare AIterators a1 and a2 (getIterator returns a valid
iterator!=end()) and then I declare ai1 and ai2 to be const references
to a1 and a2 repectively depending on some sort criterion.

Most of this is over my head, but that seems to describe a problem

I don't think it is safe to assign non-constant references to constant
reference varriables.

In fact, I'm suprised you can do that without an explicit cast.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top