TCP close_wait problem

G

Grant

Hi Everyone,

A little background to begin with. In my company, I've been given the
task of resolving a disconnection issue between a TCP probe scripted
in Perl, and a W2K app. I've barely touched Perl in the past, so
"right into the fire..." so-to-speak.

The Perl script resides on a Solaris 8 box, using Perl 5.8.0. Since
I've done a bit of C coding in the past, I've been able to determine
what the code is doing. It's a multithreaded script that opens a
socket to a number of W2K servers and reads data that is spit out from
an app on the W2K servers (data is dumped to port 3000 on its own
machine, a socket connection is made to this port, and the data is
picked up). The data then is dumped into a Queue from all operating
threads, then piped into another probe for further processing.

The problem occurs when the app on the W2K box initiates a disconnect.
I installed Windump on the W2K server, and saw that the app does send
a FIN and ACK, and does recieve an ACK back from the Perl probe, but
the FIN and ACK from the probe are never sent to the W2K server. Thus,
the W2K machine is stuck in a FIN_WAIT_2 state, and the Solaris
machine in a CLOSE_WAIT state. The states remain until the Perl probe
reaches a TCP inactivity timeout of 10 minutes (this condition is
explicitly checked within the code). At this point, an explicit call
to close the socket occurs. The connection is torn down, and the probe
reinitiates a connection back to the W2K server.

Adjusting the timeout value lower is not an option. It is possible
that the connection could be inactive (ie. no data sent) for up to 7
minutes, which is within normal operating parameters. The bottom line
is that if I lower it below that, I will have unneccessary inactivity
alarms come up throughout the day. But I need to have the probe
reconnect almost immediately, as the data that is sent is time
sensitive. In case you are wondering, the W2K app initiates a
disconnect each day in order to restart itself and compact a database
it uses.

I'm fairly certain the problem is not on the W2K side of the
connection, as there currently is a C probe running on another Solaris
machine that essentially does the same thing, and the TCP teardown is
done properly.

After reading that there was a setsockopt function, I thought there
would be a way to query the state of socket, but there isn't one that
I can find. I then thought I could make a system call, run netstat,
grep the IP that is used in the thread in question, search for
"CLOSE_WAIT" and if it exists, close the socket. But I kept saying to
myself that there has to be a more elegant way of dealing with this.

As mentioned earlier, the script is using Perl 5.8.0, along with the
threads module (and Threads::Queue which uses threads). I'm not really
sure how to determine if I'm running the proper threads module or not.
When I do a perl -V, I get this as part of the output:

usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define

Any and all help would be apprciated... I'm somewhat leary on posting
the code, as it wasn't my company that developed it (vendor support
isn't an option, which brings me here), but if it will shed more light
on this problem, I will.

Thanks a bunch in advance,


Grant
 
B

Bill

Grant said:
The Perl script resides on a Solaris 8 box, using Perl 5.8.0. Since
I've done a bit of C coding in the past, I've been able to determine
what the code is doing. It's a multithreaded script that opens a
socket to a number of W2K servers and reads data that is spit out from
an app on the W2K servers (data is dumped to port 3000 on its own
machine, a socket connection is made to this port, and the data is
picked up). The data then is dumped into a Queue from all operating
threads, then piped into another probe for further processing.
The problem occurs when the app on the W2K box initiates a disconnect.

I think perhaps that this is where the problem lies. Maybe your perl
program needs to initiate the disconnect from the socket it opened itself.
I installed Windump on the W2K server, and saw that the app does send
a FIN and ACK, and does recieve an ACK back from the Perl probe, but
the FIN and ACK from the probe are never sent to the W2K server. Thus,
the W2K machine is stuck in a FIN_WAIT_2 state, and the Solaris
machine in a CLOSE_WAIT state. The states remain until the Perl probe

Does the Perl program know when it is done with the read, or can it use
an independent activity timeout routine for this read if it cannot
otherwise know?

If so, can the perl program initiate the disconnect, or even just exit,
without the Win32 box doing so first?
 
G

Grant

Bill said:
I think perhaps that this is where the problem lies. Maybe your perl
program needs to initiate the disconnect from the socket it opened itself.

This is what I think I have to do. I've even tried a hard shutdown of
the W2K app (sending a RESET packet), and the Perl script acts no
different. I'm thinking I need to initiate a disconnect on the Perl
end, shortly after the W2K end has come back up. The only issue with
this is timing: the W2K app *begins* the application restart at 7am
each day, but it is not automatic... it usually is fully shutdown by
7:02am.
Does the Perl program know when it is done with the read, or can it use
an independent activity timeout routine for this read if it cannot
otherwise know?

If so, can the perl program initiate the disconnect, or even just exit,
without the Win32 box doing so first?

Basically, the way I interpret the script, the program as it stands is
stupid to the connection(s) it has going: It doesn't know when the W2K
app is disconnected or reconnected the TCP connection, and it doesn't
know when the read should be done. It only has a timeout value that is
used to determine when a Queue is not recieving data for specified
length of time.

But as you mentioned earlier, I'm thinking I have to initiate the
disconnect on the Perl side.

Thanks a bunch for the suggestion. I think I might've been going at
the solution the wrong way.


Grant
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
SterlingLa
Top