Design problem - Stack unwinds to deleted

Christopher · Oct 11, 2011

First time I've run into this situation.

Listener creates connections and owns pointers to them in a collection
member.
When Listener is being shitdown directly or via its destructor
It locks the collection
It calls close on each connection pointer
As the connections close, they callback the listener to inform it,
that it is closed.
Listener has to be informed, so oit can delete the allocated
connection!
Listener trys to get a lock on the collection
Listener trys to delete the pointer.

Problem 1) Deadlock because the same thread just tried to lock twice
Problem 2) Even if I get passed the deadlock, I just had a method from
an instance make a call that deleted the instance!
So, when the notification handler returns, it has
nothing to return too?!

I can't for the life of me figure out a way around this. Even if I
move things back and forth between destructors and other threads. In
the end, the same thread that wants to close the connection is going
to be the thread that wants to delete it.

Any ideas on how to get around this situation?

Victor Bazarov · Oct 11, 2011

First time I've run into this situation.

Listener creates connections and owns pointers to them in a collection
member.
When Listener is being shitdown directly or via its destructor
It locks the collection
It calls close on each connection pointer
As the connections close, they callback the listener to inform it,
that it is closed.
Listener has to be informed, so oit can delete the allocated
connection!
Listener trys to get a lock on the collection
Listener trys to delete the pointer.

Problem 1) Deadlock because the same thread just tried to lock twice
Problem 2) Even if I get passed the deadlock, I just had a method from
an instance make a call that deleted the instance!
So, when the notification handler returns, it has
nothing to return too?!

I can't for the life of me figure out a way around this. Even if I
move things back and forth between destructors and other threads. In
the end, the same thread that wants to close the connection is going
to be the thread that wants to delete it.

Any ideas on how to get around this situation?

Usually a flag or an argument that would indicate that the closing of
the connection is originating in the listener itself, and that it
doesn't want to be notified (unless the closing fails, then it should
not be asynchronous anyway). With a flag - set by the listener in
itself - let the notification be made, and then the listener knows that
it itself is shutting the connections down, and doesn't do the locking
etc. With a parameter to the 'close' function: don't notify me, so the
notification is not being sent. The variation on the parameter is the
pointer of the "closer" - it's the object who should be skipped in all
the notifications the connection wants to send out.

What was your C++ language question, by the way?

V

Goran · Oct 12, 2011

First time I've run into this situation.

Listener creates connections and owns pointers to them in a collection
member.
When Listener is being shitdown directly or via its destructor
It locks the collection
It calls close on each connection pointer
As the connections close, they callback the listener to inform it,
that it is closed.
Listener has to be informed, so oit can delete the allocated
connection!
Listener trys to get a lock on the collection
Listener trys to delete the pointer.

Problem 1) Deadlock because the same thread just tried to lock twice
Problem 2) Even if I get passed the deadlock, I just had a method from
an instance make a call that deleted the instance!
So, when the notification handler returns, it has
nothing to return too?!

I can't for the life of me figure out a way around this. Even if I
move things back and forth between destructors and other threads. In
the end, the same thread that wants to close the connection is going
to be the thread that wants to delete it.

Any ideas on how to get around this situation?

I used a combination of shared_ptr/weak_ptr with great success in a
similar situation. For me, locking the collection and destroying the
element was a constant source of deadlocks (and my mutex was re-
entrant, too!)

Here's what I did: collection stores shared_ptr-s of elements.
Collection modification is thread-safe. Collection gives out only
weak_ptr-s out. In other words, they are transient. Caller calls
lock() on them, and if that gives out a non-null shared_ptr, it can
use it for a short amount of time (coding discipline needed here). If,
during that time, collection element is removed from the collection,
it becomes a "zombie" object and (possibly) starts refusing further
services (throws "zombie object here" exception; going from weak_ptr
to shared_ptr can do that, too). Users must take into account that
they might receive these and stop further processing using that
collection element (coding discipline needed here, too). Effectively,
there needs to be "logical" delineation of the operation where said
collection element is indispensable, and it needs to be wrapped in a
try-catch. If, during that process, an object becomes a "zombie",
operation terminates.

What is the secret sauce here? The fact that collection modification
and collection element destruction are not related with regard to
thread safety. What is the price to pay? See "coding discipline..."
above.

Goran.

Jorgen Grahn · Oct 12, 2011

First time I've run into this situation.

Listener creates connections and owns pointers to them in a collection
member.
When Listener is being shitdown directly or via its destructor
It locks the collection
It calls close on each connection pointer
As the connections close, they callback the listener to inform it,
that it is closed.
Listener has to be informed, so oit can delete the allocated
connection!
Listener trys to get a lock on the collection

Why would anyone else want access to that collection?

Listener trys to delete the pointer.

Problem 1) Deadlock because the same thread just tried to lock twice
Problem 2) Even if I get passed the deadlock, I just had a method from
an instance make a call that deleted the instance!
So, when the notification handler returns, it has
nothing to return too?!

I can't for the life of me figure out a way around this. Even if I
move things back and forth between destructors and other threads. In
the end, the same thread that wants to close the connection is going
to be the thread that wants to delete it.

Any ideas on how to get around this situation?

The question seems backwards somehow. Create a design which solves
your problem -- don't start with a design and try to wedge in a
solution to your problem later.

This is a TCP socket server, right? And threads are involved somehow.
Managing the lifetime of the threads, sockets and per-connection
objects is one of the most important jobs of your design. And the
exact properties of the application-level protocol probably has a
major impact, too.

There aren't enough details above for me to give any specific
comments. (Last time I designed a socket server it was based on the
knowledge that processing a command was "fast", and a response small
enough to be kept in memory. It was single-threaded and based on
non-blocking sockets.)

/Jorgen

Christopher · Oct 12, 2011

Why would anyone else want access to that collection?

The question seems backwards somehow. Create a design which solves
your problem -- don't start with a design and try to wedge in a
solution to your problem later.

This is a TCP socket server, right? And threads are involved somehow.
Managing the lifetime of the threads, sockets and per-connection
objects is one of the most important jobs of your design. And the
exact properties of the application-level protocol probably has a
major impact, too.

There aren't enough details above for me to give any specific
comments. (Last time I designed a socket server it was based on the
knowledge that processing a command was "fast", and a response small
enough to be kept in memory. It was single-threaded and based on
non-blocking sockets.)

/Jorgen

I think this problem can be simplified some. I think the source of it
is:

"I have a subscriber that owns its own publisher."

I think that in itself looks like it is breaking some unwritten rule.
Isn't it?
The subscriber wants to delete its own publisher, on a notification it
subscribed to! This can't be done without a crash, because the
publisher must exist in order to notify the publisher, and it must
exist as the stack unwinds in the actual notification!

Werner · Oct 12, 2011

I think this problem can be simplified some. I think the source of it
is:

"I have a subscriber that owns its own publisher."

I think that in itself looks like it is breaking some unwritten rule.
Isn't it?
The subscriber wants to delete its own publisher, on a notification it
subscribed to! This can't be done without a crash, because the
publisher must exist in order to notify the publisher, and it must
exist as the stack unwinds in the actual notification!

- Does the publisher have more than one subscriber?
- For this kind of thing I use a pattern that deletes the publisher
after the stack has unwound. All threads consist callback queues. Just
call a function on the publisher (from the subscriber) that adds a
callback on the queue to delete it...after the current callback has
been handled...

Christopher · Oct 12, 2011

- Does the publisher have more than one subscriber?
- For this kind of thing I use a pattern that deletes the publisher
after the stack has unwound. All threads consist callback queues. Just
call a function on the publisher (from the subscriber) that adds a
callback on the queue to delete it...after the current callback has
been handled...- Hide quoted text -

- Show quoted text -

Just one subscriber
Yea, I thought about the delayed queue type of thing and that is
essentially what I am trying now.

It becomes problematic when boost::asio::io_service comes into the
pictures in that things that
occur on when thread depend on thingns being complete on another.

The delete is one of those. I've got counters and mutexes, locks, and
unlocks everywhere.
I've probably got 20 more race conditions to go, after solving about
20 deadlock situations.

This like this occur:
delete calling close
close waiting for all read writes to finish
read and writes waiting on a thread to become available to finish
close deadlocking

So I post the close instead, which in effect is delaying it
When the posted close executes it still waits on read and writes to
complete or abort
The posted close is taking up the one more worker thread

It works for one connection.

Now when I bring in 100 or so, they easily get into a situation where
the close is taking a worker, the read complete from another
connection is taking a worker, so the read from the first never
completes causing a deadlock in the first.

There simply has to be some multithreaded paradigm/pattern knowledge
that goes along io_service/bind/callback type situations.

Werner · Oct 13, 2011

There simply has to be some multithreaded paradigm/pattern knowledge
that goes along io_service/bind/callback type situations.

I've written listeners that I've tested with more than thousand
connections. In retrospect (on looking at the code), I can see
potential deadlocks if the servers to the wrong thing, but luckily my
servers and listeners interact within my control.

You concerns aren't separated well in my opinion. My listener service
would accept a descriptors requiring accepting and publish/dispatch
them to the interested server associated with the listening
descriptor. It would then simply continue listening on the same
descriptor.

The server/subscriber would become responsible for the socket. In my
case I had a further one to many from Server to ServerSockets. The
listener was therefore opaque.

- You would create a server servicing a port/ip combination.
- You would create a socket and associate it with the applicable
server.
- The server would, upon receiving the accepted socket, create a
Connection object and hook it up with all the relevant services (such
as the service responsible for receiving data).
- It would then publish the Connection to interested/associated
servers.
- The first server "accepting" the connection (where accepting implies
an accept on application level, than might be based on an application
layer protocol) would steal ownership of the connection. Unaccepted
connections would be discarded/closed after some time be the server.

The server loop looked something to this effect...

void TcpSvrImpl::serviceConnections()
{
std::list<IfToSock*>::iterator socket;
std::list<TcpSvrConnection*>::iterator connection;

for( socket = connectNotifyList_.begin();
socket != connectNotifyList_.end();
++socket )
{
for( connection = connectionList_.begin();
connection != connectionList_.end();
++connection )
{
if( (*socket)->processingConnection() )
{
//Current socket is processing some connection -
// chances are allmost 100% it will continue
// doing so during this loop...This socket cannot
// be serviced further as it is still busy
// processing a connection
break;
}
else if( (*socket)->hasAcceptedConnection() )
{
//Current Socket has accepted a connection
// - remove from notification list
(*socket)->resetSvr();

//remove the connection itself from connection list as we
allow only
// one TcpSvrConnection to be assigned to one TcpSvrSock
if( (*socket)->isAssociatedWith( *connection ) )
{
connection = connectionList_.erase( connection );
}

socket = connectNotifyList_.erase( socket );
break;
}
else if( (*socket)->rejectedConnection( **connection ) )
{
//Current socket has rejected this connection previously, move
on to
// evaluate if this socket has processed the next connection
in loop
continue;
}
else if( !(*connection)->hasBeenAccepted() )
{
//Attempt to associate this connection with the socket
(*socket)->setConnection( **connection );
break;
}
}
//Handle the case where the only socket was erased, meaning socket
// is pointing to end.
if( socket == connectNotifyList_.end() )
{
break; //Breaking from outer loop.
}
}
}

I can mention that I had to handle the case of sockets being erased
while connections were being dispatched. This I did by using a concept
called a rendezvous. The context (or thread of execution) from where
sockets would dissociate themselves from the server would block until
the server would complete its loop. The code looked something like
this:

void TcpSvrImpl::detachFromConnectNotify( IfToSock& socket )
{
//- Create an asynchronous callback with argument (socket)
// that will execute in the context of thread associated
// with "cmdSvr_".
//- Wait for this callback to complete (synchCmd...) before
// continuing this threads execution.
cmdRetArg1StoreArg(
*this, &TcpSvrImpl::detachSockFromTcpSvrImpl, cmdSvr_ ).synchCmd(
syncDetachFromConnNotify_ )( &socket );
}

.... and the asynchronous call made into the servicing thread...

void TcpSvrImpl::detachSockFromTcpSvrImpl( IfToSock* socket )
{
if( RtsCurrent_TcpSvrImpl ==
RtsTcpSvrImplTcpSvrImpl_States_Created )
{
socket->resetSvr();
connectNotifyList_.remove( socket );
}
else
{
/* Event not handled */
}
RtsRunToCompletion();
}

Once the connection is the responsibility of the socket, things are
easier. You still require a way for the asynchronous reading service
to notify the socket when the peer disconnects though...

Werner · Oct 13, 2011

- It would then publish the Connection to interested/associated
servers.

....to associated sockets...

- The first server "accepting" the connection (where accepting implies

.... The first socket...

an accept on application level, than might be based on an application
layer protocol) would steal ownership of the connection. Unaccepted
connections would be discarded/closed after some time be the server.

.... server is correct here...

Hope it makes sense.

Werner

Jorgen Grahn · Oct 14, 2011

I think this problem can be simplified some. I think the source of it
is:

"I have a subscriber that owns its own publisher."

I think that in itself looks like it is breaking some unwritten rule.
Isn't it?

Yes, that was the feeling I got when I read your posting.

My gut reaction is "so why is there a subscriber--publisher
relationship anyway? get rid of it!". I have not seen any reason for
such a design in my socket programming ... but once again I don't know
anything about what problem you are trying to solve.

/Jorgen

returning to deleted classes	2	Jul 30, 2004
dynamic_cast to find out the deleted pointer	3	May 21, 2005
A Design Problem	6	Aug 21, 2007
I'm tempted to quit out of frustration	1	Aug 13, 2023
Stack Memory Deallocation Problem	2	Dec 17, 2004
How to reuse a deleted pointer?	11	Jan 24, 2004
Design decision	5	Jan 21, 2011
Game Engine Design Questions	4	Jan 11, 2008

Design problem - Stack unwinds to deleted

Christopher

Victor Bazarov

Goran

Jorgen Grahn

Christopher

Werner

Christopher

Werner

Werner

Jorgen Grahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads