multiset segfault

A

Arthur J. O'Dwyer

I'm seeing a bug at the moment that I can't track down. It's
part of a moderately large program, but here is a small program
that exhibits the bug on gcc. (The program code follows at the
bottom of the message.)
The symptom is this:

% g++ --version
g++ (GCC) 3.2.1
[...]
% g++ -W -Wall -pedantic test.cpp
% ./a.out
push_back(3402,tian).
push_back(3401,tian).
push_back(3405,wu)Segmentation fault
%

So it appears something is going wrong in multiset::insert.
I'm not very familiar with the STL, so I suppose I'm abusing
multiset in some way, but for the life of me I can't figure
out how, or how to fix it.
(The goal of the TerminalList class is to be a container
of Terminals, but to allow very fast access to the list of
Terminals containing a certain 'syllable' or 'unino'. I'm
using 'lower_bound' and 'upper_bound' to extract a vector
of Terminals with a given syllable from myStrHash; please
tell me if you can see a better way, but this is unrelated
to the segfault bug, which is why I snipped the code that
does all that from the program below.)

Thanks much,
-Arthur

==code begins==

#include <cstdio>
#include <cctype>
#include <vector>
#include <set>
#include <string>
using namespace std;

struct Terminal
{
int unino;
string syllable;
int d;

Terminal(int u, const string& y, int c):
unino(u), syllable(y), d(c)
{}
};

struct myStrPred {
bool operator() (Terminal* a, Terminal* b)
{ return (a->syllable < b->syllable); }
};

struct TerminalList
{
void push_back(const Terminal& t);

typedef multiset<Terminal*, myStrPred> StrHashT;
vector<Terminal> myVector;
StrHashT myStrHash;
};

void TerminalList::push_back(const Terminal& t)
{
myVector.push_back(t);
Terminal* pt = &myVector.back();

printf("push_back(%04x,%s)",
pt->unino, pt->syllable.c_str());
fflush(stdout);

myStrHash.insert(pt);

printf(".\n");
}

int main()
{
TerminalList characters;
characters.push_back(Terminal(0x3402, "tian", 1));
characters.push_back(Terminal(0x3401, "tian", 1));
characters.push_back(Terminal(0x3405, "wu", 3));
return 0;
}
 
V

Victor Bazarov

Arthur J. O'Dwyer said:
I'm seeing a bug at the moment that I can't track down. It's
part of a moderately large program, but here is a small program
that exhibits the bug on gcc. (The program code follows at the
bottom of the message.)
The symptom is this:

% g++ --version
g++ (GCC) 3.2.1
[...]
% g++ -W -Wall -pedantic test.cpp
% ./a.out
push_back(3402,tian).
push_back(3401,tian).
push_back(3405,wu)Segmentation fault
%

So it appears something is going wrong in multiset::insert.
I'm not very familiar with the STL, so I suppose I'm abusing
multiset in some way, but for the life of me I can't figure
out how, or how to fix it.

The only way to fix it is to insert something that doesn't change
between calls to your 'push_back' function. I am nor really sure
what to recommend. An index should definitely be OK, but then
you need a mechanism to extract the values from the vector...
(The goal of the TerminalList class is to be a container
of Terminals, but to allow very fast access to the list of
Terminals containing a certain 'syllable' or 'unino'. I'm
using 'lower_bound' and 'upper_bound' to extract a vector
of Terminals with a given syllable from myStrHash; please
tell me if you can see a better way, but this is unrelated
to the segfault bug, which is why I snipped the code that
does all that from the program below.)

Thanks much,
-Arthur

==code begins==

#include <cstdio>
#include <cctype>
#include <vector>
#include <set>
#include <string>
using namespace std;

struct Terminal
{
int unino;
string syllable;
int d;

Terminal(int u, const string& y, int c):
unino(u), syllable(y), d(c)
{}
};

struct myStrPred {
bool operator() (Terminal* a, Terminal* b)
{ return (a->syllable < b->syllable); }
};

struct TerminalList
{
void push_back(const Terminal& t);

typedef multiset<Terminal*, myStrPred> StrHashT;
vector<Terminal> myVector;
StrHashT myStrHash;
};

void TerminalList::push_back(const Terminal& t)
{
myVector.push_back(t);

This invalidates all references and pointers to elements of that
vector _if_ reallocation occurs. In your case, reallocation does
not occur until the third element is inserted (probably).
Terminal* pt = &myVector.back();

Here you take an address to an element of the vector. That address
_can_ and _will_ be invalidated by the next push_back.
printf("push_back(%04x,%s)",
pt->unino, pt->syllable.c_str());
fflush(stdout);

myStrHash.insert(pt);

Here you attempt to preserve that address which can and will become
invalid next time you call this function.
printf(".\n");
}

I'd rewrite your class as

------------------------ Only changed parts
struct myStrPred {
vector<Terminal>& container;
myStrPred(vector<Terminal>& c) : container(c) {}
bool operator() (int i1, int i2)
{ return (container[i1].syllable < container[i2].syllable); }
};

struct TerminalList
{
void push_back(const Terminal& t);

typedef multiset<int, myStrPred> StrHashT;
vector<Terminal> myVector;
StrHashT myStrHash;

TerminalList() : myStrHash(myStrPred(myVector)) {}
};

void TerminalList::push_back(const Terminal& t)
{
int next_ind = myVector.size();
myVector.push_back(t);

printf("push_back(%04x,%s)",
myVector[next_ind].unino, myVector[next_ind].syllable.c_str());
fflush(stdout);

myStrHash.insert(next_ind);

printf(".\n");
}
--------------------------------------------------------
 
?

=?iso-8859-1?Q?Joaqu=EDn=20M=AA=20L=F3pez=20Mu=F1o

"Arthur J. O'Dwyer" ha escrito:

[segfault stuff deleted]
(The goal of the TerminalList class is to be a container
of Terminals, but to allow very fast access to the list of
Terminals containing a certain 'syllable' or 'unino'. I'm
using 'lower_bound' and 'upper_bound' to extract a vector
of Terminals with a given syllable from myStrHash; please
tell me if you can see a better way, but this is unrelated
to the segfault bug, which is why I snipped the code that
does all that from the program below.)

I've written a library called Boost.MultiIndex, to be promptly released
in Boost 1.32, that can help you construct this sort of hybrid
containers. Documentation can be consulted at

http://boost-consulting.com/boost/libs/multi_index/doc/index.html

In particular, the section "A bidirectional list with fast lookup" in
the
tutorial shows how to construct a container that, AFAICS, can
effectively replace your TerminalList. Maybe this is helpful to you.

Regards,

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
 
A

Arthur J. O'Dwyer

"Arthur J. O'Dwyer" ha escrito:

I've written a library called Boost.MultiIndex, to be promptly released
in Boost 1.32, that can help you construct this sort of hybrid
containers. Documentation can be consulted at

http://boost-consulting.com/boost/libs/multi_index/doc/index.html

In particular, the section "A bidirectional list with fast lookup" in
the tutorial shows how to construct a container that, AFAICS, can
effectively replace your TerminalList. Maybe this is helpful to you.

Yes and no. ;-( The 'multi_index_container' template *is* exactly
what I'm looking for; I certainly hope Boost becomes at least a
/de facto/ standard for C++ library support, if not part of the
standard itself! But as I understand it, to use 'multi_index_container'
I would need to download and install *all* of Boost (or at least the
large portion of it #included by multi_index_container.hpp) on all
the machines on which I intended to compile my program. And that's
a little much, for my humble purposes. :) [Being as there are three
of them, and one of them is a Linux system on which I have neither
administrative privileges nor megabytes of userspace to spare.]
Thank you for bringing it to my attention, though; I'm going to
have to start looking at what else Boost can do that I've been doing
by hand. Maybe one of these days it will become worth my while to
tie all my larger programs to Boost.

Meanwhile, I'm still wondering what's wrong with my use of
'multiset'...

-Arthur
 
?

=?iso-8859-1?Q?Joaqu=EDn=20M=AA=20L=F3pez=20Mu=F1o

"Arthur J. O'Dwyer" ha escrito:
[...]
Yes and no. ;-( The 'multi_index_container' template *is* exactly
what I'm looking for; I certainly hope Boost becomes at least a
/de facto/ standard for C++ library support, if not part of the
standard itself! But as I understand it, to use 'multi_index_container'
I would need to download and install *all* of Boost (or at least the
large portion of it #included by multi_index_container.hpp) on all
the machines on which I intended to compile my program.

Yep, that's true.
And that's
a little much, for my humble purposes. :) [Being as there are three
of them, and one of them is a Linux system on which I have neither
administrative privileges nor megabytes of userspace to spare.]

Well, there's a recurrent problem with the entry barriers to using
Boost, the main one being the huge size of the library.
But then again, if your main obstacle is how to convince your manager
to use Boost as part of the common development base, you can
put the matter like this: How long will it take you to craft your
TerminalList solution and debug it? A week? A week of your time
is orders of magnitude more expensive than the HD space taken
by Boost.

That said, there's an utility called bcp for extracting libraries
out of Boost, dragging their dependencies along:

http://boost-consulting.com/boost/tools/bcp/bcp.html

I can comment on how covnenient this tool is, but you might want to
have a look at it.

Thank you for bringing it to my attention, though; I'm going to
have to start looking at what else Boost can do that I've been doing
by hand. Maybe one of these days it will become worth my while to
tie all my larger programs to Boost.

Meanwhile, I'm still wondering what's wrong with my use of
'multiset'...

Ummm... The problem lies in that you're storing pointers to a
*vector* inside myStrHash. But these pointers are *not* stable,
the elements inside myVector will be relocated when its size
grows; in your code this happens at about the second insertion.
You have two solutions:

1. Use an std::list insiead of an std::vector. std::list references
are stable, i.e elements won't be relocated over time.
2. Redefine myStrHash to use indices (numbers) instead of
pointers.

I'd rather use #1, though.

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
 
A

Arthur J. O'Dwyer

"Arthur J. O'Dwyer" ha escrito:
[re: don't want to install Boost everywhere myself]
Well, there's a recurrent problem with the entry barriers to using
Boost, the main one being the huge size of the library.
But then again, if your main obstacle is how to convince your manager
to use Boost as part of the common development base, you can
put the matter like this: How long will it take you to craft your
TerminalList solution and debug it? A week? A week of your time
is orders of magnitude more expensive than the HD space taken
by Boost.

*If* that were my main obstacle. ;) The program in question is
just a hobby project anyway.
That said, there's an utility called bcp for extracting libraries
out of Boost, dragging their dependencies along:

http://boost-consulting.com/boost/tools/bcp/bcp.html

I'll take a look.

Ummm... The problem lies in that you're storing pointers to a
*vector* inside myStrHash. But these pointers are *not* stable,

Oh, bother! See, I knew it was something simple... :(
the elements inside myVector will be relocated when its size
grows; in your code this happens at about the second insertion.
You have two solutions:

1. Use an std::list insiead of an std::vector. std::list references
are stable, i.e elements won't be relocated over time.

Aha. This would be the easiest solution, then. Thanks.
For future reference, are there any other STL containers for
which the elements' addresses are guaranteed to be stable?
And I seem to recall that in a std::vector, the elements are
guaranteed to be in contiguous memory --- but this is not true
of std::list --- is it true of any other containers?

I've been relying mostly on the Dinkum C++ library reference
page and on Google for my STL information. Do you have any
suggestions of a better STL reference for the casual C++ user?
[Read: one that might manage to warn me about dumb mistakes like
that one.]

-Arthur
 
?

=?iso-8859-1?Q?Joaqu=EDn=20M=AA=20L=F3pez=20Mu=F1o

"Arthur J. O'Dwyer" ha escrito:
[...]
Oh, bother! See, I knew it was something simple... :(


Aha. This would be the easiest solution, then. Thanks.
For future reference, are there any other STL containers for
which the elements' addresses are guaranteed to be stable?
And I seem to recall that in a std::vector, the elements are
guaranteed to be in contiguous memory --- but this is not true
of std::list --- is it true of any other containers?

std::vector is the only container guaranteeing that elements
are in contiguous memory.

I've been relying mostly on the Dinkum C++ library reference
page and on Google for my STL information. Do you have any
suggestions of a better STL reference for the casual C++ user?
[Read: one that might manage to warn me about dumb mistakes like
that one.]

Well,

* C++ FAQ Lite (http://www.parashift.com/c++-faq-lite/) contains
some sections on STL containers, and it is anyway worth reading the
whole contents even if not specifically dealing with STL.
* The reference at cppreference (http://www.cppreference.com/cpp_stl.html)
is a little more readable than Dinkumware's stuff.
* Thinking in C++ (http://www.bruceeckel.com/) is a C++ programming
guide downloadable for free. Vol II has some stuff on STL.

In general, if you google for "STL tutorial" you'll find many more
links. Beware to check the date of the docs you're reading: if it
is prior to 1998 (when the standard was published) then you'd rather
not use it, much info will be outdated.

HTH

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
 
A

Arthur J. O'Dwyer

Aha. This would be the easiest solution, then. Thanks.

FYI: changed from std::vector to std::list in the two places I
used pointers to elements [thus ickifying a couple of loops that
expected random access to the underlying vectors, but...], and
the program now works just fine![1] Thanks for your help!

http://www.contrib.andrew.cmu.edu/~ajo/workshop/hao.cpp

-Arthur

[1] - Modulo the ickiness of the source code; it's still a
very rough version. Refactoring is coming... ;)
 
A

Arthur J. O'Dwyer

It has more tutorial material, to be sure, but it it also riddled
with inaccuracies and omissions. You'll want to supplement it
with a good reference.

Yes; that was the site where I found
http://www.cppreference.com/cppmultiset_details.html#insert
^^^^^^^^
The function insert() either:
* inserts val after the element at pos, and returns an iterator
to that element.
* inserts a range of elements from start to end.
* inserts val, but only if val doesn't already exist. The return ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
value is an iterator to the element inserted, and a boolean describing
whether an insertion took place.

That was a little disconcerting. :)

-Arthur
 
D

Dave Townsend

This odious piece of code is your trouble....
void TerminalList::push_back(const Terminal& t)
{
myVector.push_back(t);
Terminal* pt = &myVector.back();

printf("push_back(%04x,%s)",
pt->unino, pt->syllable.c_str());
fflush(stdout);

myStrHash.insert(pt);

printf(".\n");

It is not a good idea to take addresses of objects inside a vector for the
simple
reason as you insert more objects the objects are relocated in memory, and
the
pointer values you are holding onto no longer point to those objects. When
you
do the insert(pt), the compare function you provide will eventually hit one
of these
invalid pt values and you're a gonner sunshine.

dave



Arthur J. O'Dwyer said:
I'm seeing a bug at the moment that I can't track down. It's
part of a moderately large program, but here is a small program
that exhibits the bug on gcc. (The program code follows at the
bottom of the message.)
The symptom is this:

% g++ --version
g++ (GCC) 3.2.1
[...]
% g++ -W -Wall -pedantic test.cpp
% ./a.out
push_back(3402,tian).
push_back(3401,tian).
push_back(3405,wu)Segmentation fault
%

So it appears something is going wrong in multiset::insert.
I'm not very familiar with the STL, so I suppose I'm abusing
multiset in some way, but for the life of me I can't figure
out how, or how to fix it.
(The goal of the TerminalList class is to be a container
of Terminals, but to allow very fast access to the list of
Terminals containing a certain 'syllable' or 'unino'. I'm
using 'lower_bound' and 'upper_bound' to extract a vector
of Terminals with a given syllable from myStrHash; please
tell me if you can see a better way, but this is unrelated
to the segfault bug, which is why I snipped the code that
does all that from the program below.)

Thanks much,
-Arthur

==code begins==

#include <cstdio>
#include <cctype>
#include <vector>
#include <set>
#include <string>
using namespace std;

struct Terminal
{
int unino;
string syllable;
int d;

Terminal(int u, const string& y, int c):
unino(u), syllable(y), d(c)
{}
};

struct myStrPred {
bool operator() (Terminal* a, Terminal* b)
{ return (a->syllable < b->syllable); }
};

struct TerminalList
{
void push_back(const Terminal& t);

typedef multiset<Terminal*, myStrPred> StrHashT;
vector<Terminal> myVector;
StrHashT myStrHash;
};

void TerminalList::push_back(const Terminal& t)
{
myVector.push_back(t);
Terminal* pt = &myVector.back();

printf("push_back(%04x,%s)",
pt->unino, pt->syllable.c_str());
fflush(stdout);

myStrHash.insert(pt);

printf(".\n");
}

int main()
{
TerminalList characters;
characters.push_back(Terminal(0x3402, "tian", 1));
characters.push_back(Terminal(0x3401, "tian", 1));
characters.push_back(Terminal(0x3405, "wu", 3));
return 0;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top