How to reuse TCP listening socket immediately after it was connectedat least once?


I

Igor Katson

I have written a socket server and some arbitrary clients. When I
shutdown the server, and do socket.close(), I cannot immediately start
it again cause it has some open sockets in TIME_WAIT state. It throws
address already in use exception at me. I have searched for that in
google but haven't found a way to solve that.

Tried
setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
but that does not help.

Is there a nice way to overcome this?
 
Ad

Advertisements

L

Lawrence D'Oliveiro

Igor Katson said:
I have written a socket server and some arbitrary clients. When I
shutdown the server, and do socket.close(), I cannot immediately start
it again cause it has some open sockets in TIME_WAIT state. It throws
address already in use exception at me.

There's a reason for that. It's to ensure that there are no leftover packets
floating around the Internet somewhere, that you might mistakenly receive
and think they were part of a new connection, when they were in fact part of
an old one.

The right thing to do is try to ensure that all your connections are
properly closed at shutdown. That may not be enough (if your server crashes
due to bugs), so the other thing you need to do is retry the socket open,
say, at 30-second intervals, until it succeeds.
 
Ð

Дамјан ГеоргиевÑки

I have written a socket server and some arbitrary clients. When I
shutdown the server, and do socket.close(), I cannot immediately start
it again cause it has some open sockets in TIME_WAIT state. It throws
address already in use exception at me. I have searched for that in
google but haven't found a way to solve that.

Tried
setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
but that does not help.

This should work, AFAIK you only need to do it before you call .bind(..)
on the accept-ing socket



--
дамјан ( http://softver.org.mk/damjan/ )

Give me the knowledge to change the code I do not accept,
the wisdom not to accept the code I cannot change,
and the freedom to choose my preference.
 
R

Roy Smith

Lawrence D'Oliveiro said:
There's a reason for that. It's to ensure that there are no leftover packets
floating around the Internet somewhere, that you might mistakenly receive
and think they were part of a new connection, when they were in fact part of
an old one.

In theory, that is indeed the reason for the TIME_WAIT state. In practice,
however, using SO_REUSEADDR is pretty safe, and common practice.

You've got several things working in your favor. First, late-delivery of
packets is pretty rare. Second, if some late packet were to arrive, the
chances of them having the same local and remote port numbers as an
existing connection is slim. And, finally, the TCP sequence number won't
line up.

One thing to be aware of is that SO_REUSEADDR isn't 100% portable. There
are some systems (ISTR HP-UX) which use SO_REUSEPORT instead of
SO_REUSEADDR. The original specifications weren't very clear, and some
implementers read them in strange ways. Some of that old code continues in
use today. I only mention this because if you try SO_REUSEADDR and it's
not doing what you expect, it's worth trying SO_REUSEPORT (or both) to see
what happens on your particular system.
The right thing to do is try to ensure that all your connections are
properly closed at shutdown. That may not be enough (if your server crashes
due to bugs), so the other thing you need to do is retry the socket open,
say, at 30-second intervals, until it succeeds.

That may be a reasonable thing to do for production code, but when you're
building and debugging a server, it's a real pain to not be able to restart
it quickly whenever you want (or need) to.
 
I

Igor Katson

Roy said:
In theory, that is indeed the reason for the TIME_WAIT state. In practice,
however, using SO_REUSEADDR is pretty safe, and common practice.

You've got several things working in your favor. First, late-delivery of
packets is pretty rare. Second, if some late packet were to arrive, the
chances of them having the same local and remote port numbers as an
existing connection is slim. And, finally, the TCP sequence number won't
line up.

One thing to be aware of is that SO_REUSEADDR isn't 100% portable. There
are some systems (ISTR HP-UX) which use SO_REUSEPORT instead of
SO_REUSEADDR. The original specifications weren't very clear, and some
implementers read them in strange ways. Some of that old code continues in
use today. I only mention this because if you try SO_REUSEADDR and it's
not doing what you expect, it's worth trying SO_REUSEPORT (or both) to see
what happens on your particular system.



That may be a reasonable thing to do for production code, but when you're
building and debugging a server, it's a real pain to not be able to restart
it quickly whenever you want (or need) to.
Thanks for a great answer, Roy!
 
L

Lawrence D'Oliveiro

That may be a reasonable thing to do for production code, but when you're
building and debugging a server, it's a real pain to not be able to
restart it quickly whenever you want (or need) to.

On the contrary, I run exactly the same logic--and that includes socket-
handling logic--in both test and production servers. How else can I be sure
it'll work properly in production?
 
Ad

Advertisements

R

Roy Smith

Lawrence D'Oliveiro said:
On the contrary, I run exactly the same logic--and that includes socket-
handling logic--in both test and production servers. How else can I be sure
it'll work properly in production?

If running without SO_REUASEADDR works for you, that's great. I was just
pointing out how it can be useful in cases such as the OP's, where he's
getting bind errors when he restarts his server.
 
L

Lawrence D'Oliveiro

I was just pointing out how it can be useful in cases such as the OP's,
where he's getting bind errors when he restarts his server.

And I was pointing out how important it was to make sure your code deals
gracefully with those errors.
 
T

Thomas Bellman

Roy Smith said:
That may be a reasonable thing to do for production code, but when you're
building and debugging a server, it's a real pain to not be able to restart
it quickly whenever you want (or need) to.

Speaking as a sysadmin, running applications for production,
programs not using SO_REUSEADDR should be taken out and shot.

You *can't* ensure that TCP connections are "properly closed".
For example, a *client* crashing, or otherwise becoming
unreachable, will leave TCP connections unclosed, no matter
what you do.

Not using SO_REUSEADDR means forcing a service interruption of
half an hour (IIRC) if for some reason the service must be
restarted, or having to reboot the entire machine. No thanks.
I have been in that situation.
 
L

Lawrence D'Oliveiro

Speaking as a sysadmin, running applications for production,
programs not using SO_REUSEADDR should be taken out and shot.
Not using SO_REUSEADDR means forcing a service interruption of
half an hour (IIRC) if for some reason the service must be
restarted, or having to reboot the entire machine.

No, you do not recall correctly. And anybody wanting to reboot a machine to
work around a "problem" like that should be taken out and shot.
 
T

Thomas Bellman

No, you do not recall correctly.

*Tests* It seems to be 100 seconds in Fedora 9 and 60 seconds in
Solaris 10. OK, that amount of time is not totally horrible, in
many cases just annoying. Still much longer for an interruption
of service that could have been just 1-2 seconds.

However, I *have* used systems where it took much longer. It was
slightly more than ten years ago, under an earlier version of
Solaris 2, problably 2.4. It may be that it only took that long
under certain circumstances that the application we used always
triggered, but we did have to wait several tens of minutes. It
was way faster to reboot the machine than waiting for the sockets
to time out.
And anybody wanting to reboot a machine to
work around a "problem" like that should be taken out and shot.

We weren't exactly keen on rebooting the machine, but it was the
fastest way of getting out of that situation that we could figure
out. How *should* we have dealt with it in your opinion?
 
Ad

Advertisements

L

Lawrence D'Oliveiro

We weren't exactly keen on rebooting the machine, but it was the
fastest way of getting out of that situation that we could figure
out. How *should* we have dealt with it in your opinion?

Remember, the timed_wait timeout is there for a reason, and trying to defeat
it could reduce the reliability of your application--that's why cutting
corners is a bad idea.

If you want to minimize the effect of the timeout, then just use different
ports, and have the clients find them via DNS SRV records.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top