IO::Socket::INET on OSX or TCP stack problem

S

Stuart Gall

Hello,
I have written a script using IO::Socket::INET to read and write data to
modbus devices. Using blocking IO ->read and ->write.
This is running on OSX V 10.4.11
I have the latest IO::Socket Library from CPAN.

Using perl 5.8.6 from OSX and I built perl 5.10 which has the same issue
The script will work for a few loops, maybe doing 100 or so modbus
read/writes
Then the sockets are closed for no apparent reason.

I copied the same script to another OSX machine running 10.4.11 and I
have the exact same problem.

I copied the script to a machine running Mandrivia 2009 and it works
just fine!

I did a tcpdump, what seams to be happening is that when an error occurs
in a TCP packet, the OSX end gets stuck in a loop requesting and re
requesting the packet.
Until eventually the other end sends RST and force-ably closes the socket.

Any Ideas what might be going on ?


TCPDUMP (192.168.251.3 is the OSX machine)

10:17:19.508241 IP 192.168.251.3.52312 > 192.168.251.102.502: P
160:172(12) ack 90 win 65535 <nop,nop,timestamp 1360281049 65303>
10:17:19.514522 IP 192.168.251.102.502 > 192.168.251.3.52312: P
90:101(11) ack 172 win 5669 <timestamp 65461 1360281049,nop,nop>
10:17:19.514544 IP 192.168.251.3.52312 > 192.168.251.102.502: . ack 101
win 65535 <nop,nop,timestamp 1360281049 65461>
That was a request and reply acknowledged - Normal
10:17:19.747871 IP 192.168.251.3.52312 > 192.168.251.102.502: P
172:187(15) ack 101 win 65535 <nop,nop,timestamp 1360281049 65461>
Request
10:17:19.752164 IP 192.168.251.102.502 > 192.168.251.3.52312: P
101:113(12) ack 187 win 5654 <timestamp 163 1360281049,nop,nop>
Reply
10:17:19.752183 IP 192.168.251.3.52312 > 192.168.251.102.502: . ack 101
win 65535 <nop,nop,timestamp 1360281050 65461>
Rejected
10:17:20.501290 IP 192.168.251.102.502 > 192.168.251.3.52312: P
101:113(12) ack 187 win 5654 <timestamp 912 1360281050,nop,nop>
Resent
10:17:20.501355 IP 192.168.251.3.52312 > 192.168.251.102.502: . ack 101
win 65535 <nop,nop,timestamp 1360281051 65461>
Rejected
10:17:20.750835 IP 192.168.251.3.52312 > 192.168.251.102.502: P
172:187(15) ack 101 win 65535 <nop,nop,timestamp 1360281051 65461>
Resend request (even though it was acknowledged) Is that legal ??
10:17:20.762936 IP 192.168.251.102.502 > 192.168.251.3.52312: . ack 187
win 5654 <timestamp 1174 1360281051,nop,nop>
Accept
10:17:21.929985 IP 192.168.251.102.502 > 192.168.251.3.52312: P
101:113(12) ack 187 win 5654 <timestamp 2341 1360281051,nop,nop>
Reply Again
10:17:21.930046 IP 192.168.251.3.52312 > 192.168.251.102.502: . ack 101
win 65535 <nop,nop,timestamp 1360281054 65461>
Rejected
10:17:21.931308 IP 192.168.251.102.502 > 192.168.251.3.52312: P
101:113(12) ack 187 win 5654 <timestamp 2342 1360281054,nop,nop>
Resent
10:17:21.931324 IP 192.168.251.3.52312 > 192.168.251.102.502: . ack 101
win 65535 <nop,nop,timestamp 1360281054 65461>
Rejected
10:17:21.932624 IP 192.168.251.102.502 > 192.168.251.3.52312: . ack 187
win 5654 <timestamp 2343 1360281054,nop,nop>
?

And so on
10:17:22.751058 IP 192.168.251.3.52312 > 192.168.251.102.502: P
172:187(15) ack 101 win 65535 <nop,nop,timestamp 1360281055 65461>
10:17:22.759845 IP 192.168.251.102.502 > 192.168.251.3.52312: . ack 187
win 5654 <timestamp 3171 1360281055,nop,nop>
10:17:24.778286 IP 192.168.251.102.502 > 192.168.251.3.52312: P
101:113(12) ack 187 win 5654 <timestamp 5189 1360281055,nop,nop>
10:17:24.778318 IP 192.168.251.3.52312 > 192.168.251.102.502: . ack 101
win 65535 <nop,nop,timestamp 1360281060 65461>
10:17:24.779660 IP 192.168.251.102.502 > 192.168.251.3.52312: . ack 187
win 5654 <timestamp 5190 1360281060,nop,nop>
10:17:26.751479 IP 192.168.251.3.52312 > 192.168.251.102.502: P
172:187(15) ack 101 win 65535 <nop,nop,timestamp 1360281063 65461>
10:17:26.758087 IP 192.168.251.102.502 > 192.168.251.3.52312: . ack 187
win 5654 <timestamp 7169 1360281063,nop,nop>
10:17:30.470519 IP 192.168.251.102.502 > 192.168.251.3.52312: P
101:113(12) ack 187 win 5654 <timestamp 10881 1360281063,nop,nop>
10:17:30.470583 IP 192.168.251.3.52312 > 192.168.251.102.502: . ack 101
win 65535 <nop,nop,timestamp 1360281071 65461>
10:17:30.471828 IP 192.168.251.102.502 > 192.168.251.3.52312: . ack 187
win 5654 <timestamp 10882 1360281071,nop,nop>
10:17:34.752192 IP 192.168.251.3.52312 > 192.168.251.102.502: P
172:187(15) ack 101 win 65535 <nop,nop,timestamp 1360281079 65461>
10:17:34.754071 IP 192.168.251.102.502 > 192.168.251.3.52312: . ack 187
win 5654 <timestamp 15164 1360281079,nop,nop>
10:17:37.576828 IP 192.168.251.102.502 > 192.168.251.3.52312: P
101:113(12) ack 187 win 5654 <timestamp 17987 1360281079,nop,nop>
10:17:37.576875 IP 192.168.251.3.52312 > 192.168.251.102.502: . ack 101
win 65535 <nop,nop,timestamp 1360281085 65461>
10:17:37.578139 IP 192.168.251.102.502 > 192.168.251.3.52312: . ack 187
win 5654 <timestamp 17988 1360281085,nop,nop>
10:17:50.754009 IP 192.168.251.3.52312 > 192.168.251.102.502: P
172:187(15) ack 101 win 65535 <nop,nop,timestamp 1360281111 65461>
10:17:50.761068 IP 192.168.251.102.502 > 192.168.251.3.52312: R
249188453:249188453(0) win 0
Now I think 192.168.251.102 gets fed up and boots us off

TIA
Stuart.
 
D

derykus

Hello,
I have written a script using IO::Socket::INET to read and write data to
modbus devices. Using blocking IO ->read and ->write.
This is running on OSX V 10.4.11
I have the latest IO::Socket Library from CPAN.
....

No ideas about the problem but I'd recommend showing
some of your code. Seeing actual code often jogs
memories in ways other problem descriptions can't.
What options did you pass to the IO::Socket::INET constructor,
Reuseaddr, Timeout, Blocking, etc.?
 
S

Stuart Gall

No ideas about the problem but I'd recommend showing
some of your code. Seeing actual code often jogs
memories in ways other problem descriptions can't.
What options did you pass to the IO::Socket::INET constructor,
Reuseaddr, Timeout, Blocking, etc.?

Good point, my code is so basic that I am convinced there must be a
generic problem on OSX. I just cant believe no one else has come across
it. :-(

I tried Reuseaddr which did not make any diferance, I am only openeing
one socket per destination anyways.

I open the sockets with
sub OpenSocket() {
#Get Socket address for IP or open a new socket
my ($IP,$PORT,$S,$R,$I,@R);
$IP=shift;
$PORT=502;

if(!$SOCKETS{$IP}) {
$S = new IO::Socket::INET (PeerAddr => $IP,
PeerPort => $PORT,
Proto => "tcp",
Timeout =>10,
Type => SOCK_STREAM) or return(undef);
$SOCKETS{$IP}=$S;
$SELECT->add($SOCKETS{$IP});
$SOCKETS{$IP}->sockopt(TCP_NODELAY,1);
};

return ($SOCKETS{$IP});
};


This is then called from for example

sub Read_10_Integer($$) {
my (@R, $IP, $ADD, $PORT,$COMMAND, $q, $socket, $len, @Reply);
$IP = shift;
$ADD = shift;
$PORT=502;
$ADD--;

$COMMAND =
"\x00\x00\x00\x00\x00\x06\x01\x03".chr(int($ADD/256)).chr($ADD %
256)."\x00\x0A";

unless($socket= OpenSocket($IP)) {
print "Can't connect ($IP) : $@ [$!]\n";
return undef;
};

print $socket $COMMAND;
$socket->read($r,6); #5th byte is the length byte *****
This is one place that it hangs with repeated retries on the tcpdump
@R=split(//,$r);
return undef if(ord($R[0])!=0 or ord($R[1])!=0);
$len=ord($R[5]);
$socket->read($r,$len);
@R=split(//,$r);
Exception(ord($R[2])) if(ord($R[1]) > 0x80);

@Reply=();
for($q=0;$q<10;$q++) {
$Reply[$q]=ord($R[$q*2+3])*256+ord($R[$q*2+4]);
};


return @Reply;
};
 
U

Uri Guttman

SG> Good point, my code is so basic that I am convinced there must be a
SG> generic problem on OSX. I just cant believe no one else has come
SG> across it. :-(

your code could still use major improvements.


SG> I open the sockets with
SG> sub OpenSocket() {
SG> #Get Socket address for IP or open a new socket
SG> my ($IP,$PORT,$S,$R,$I,@R);
SG> $IP=shift;

my( $ip ) = @_ ;

SG> $PORT=502;

my $port = 502 ;

use lower case names for lexical vars. upper case is for constants (yes,
the port is a constant but still i like lower as do most perl coders)

SG> if(!$SOCKETS{$IP}) {

unless( $sockets{$ip} ) {

where is %sockets declared? it should be passed in via a hash ref
instead of being a global

SG> $S = new IO::Socket::INET (PeerAddr => $IP,
SG> PeerPort => $PORT,
SG> Proto => "tcp",
SG> Timeout =>10,
SG> Type => SOCK_STREAM) or return(undef);

my $sock = IO::Socket::INET->new( "$ip:$port" ) ;

there, isn't that much simpler? basic tcp socket connections can pass a
single host:port string. the type, proto are defaulted
correctly. timeout is fine with its default.

$sock or return ;

no need to return undef. plain return does that when called in scalar
context.

SG> $SOCKETS{$IP}=$S;
SG> $SELECT->add($SOCKETS{$IP});

what's with all the shouting? perl style is lower case

SG> $SOCKETS{$IP}->sockopt(TCP_NODELAY,1);

here is the big question. why do you do that? it is not needed except in
very special cases and your code is not one of them. it may even be the
cause of the socket dropouts.

SG> };

SG> return ($SOCKETS{$IP});

why do you return the socket from the global hash? you have it in $S (or
in my version $sock).

SG> };




SG> This is then called from for example

SG> sub Read_10_Integer($$) {
SG> my (@R, $IP, $ADD, $PORT,$COMMAND, $q, $socket, $len, @Reply);

don't declare vars before you need them. you can my each of these in the
assignments below
SG> $IP = shift;
SG> $ADD = shift;

my( $ip, $add ) = @_ ;

shifting @_ for args is poor style. it is useful in some cases but
assiging @_ to a list of my vars is standard perl style.

SG> $PORT=502;

why is this set again? if it is really a constant, move it outside the
subs or pass it in as an arg.

SG> $ADD--;

what is $ADD? use a better name for it.

SG> $COMMAND =
SG> "\x00\x00\x00\x00\x00\x06\x01\x03".chr(int($ADD/256)).chr($ADD %
SG> 256)."\x00\x0A";


SG> unless($socket= OpenSocket($IP)) {

since this is a boolean test, have OpenSocket just return 1 on success,
not the socket itself.

SG> print "Can't connect ($IP) : $@ [$!]\n";
SG> return undef;
SG> };

SG> print $socket $COMMAND;

$socket->print( $COMMAND ) ;

SG> $socket->read($r,6); #5th byte is the length byte

$socket->sysread(my $read_buf, 6);

use sysread on sockets. unless you are doing blocking line by line
protos, you shouldn't use stdio or perlio for sockets.

SG> This is one place that it hangs with repeated retries on the tcpdump
SG> @R=split(//,$r);

my @read_bytes = split(//,$r);

SG> return undef if(ord($R[0])!=0 or ord($R[1])!=0);
SG> $len=ord($R[5]);
SG> $socket->read($r,$len);

maybe the retries are caused by the NO_DELAY thing. it moves data out in
smaller packets and those are getting lost often. remove that unhelpful
option and see what happens.

SG> @R=split(//,$r);
SG> Exception(ord($R[2])) if(ord($R[1]) > 0x80);

SG> @Reply=();

my @reply ;

SG> for($q=0;$q<10;$q++) {
SG> $Reply[$q]=ord($R[$q*2+3])*256+ord($R[$q*2+4]);
SG> };

i am sure that can be done more cleanly but i can't fathom it right now.

uri
 
S

Stuart Gall

On 2009-05-09 01:18:45 +0300, Uri Guttman <[email protected]> said:

Hello uri,
Thank you for such a detailed reply.
I have placed some comments in line and included my new code at the
end, with most of your suggestions.
With your suggestions (I think mainly using sysread and syswrite) the
situation is much improved.
Before I would get a lockup after 4-10 loops now it goes for 17 - 20
That is a huge improvement, but I still eventualy get a bad packet
which is apparently not correctly handled by the TCP stack on OSX or by
perl on OSX I am not sure where the handoff is.
SG> Good point, my code is so basic that I am convinced there must be a
SG> generic problem on OSX. I just cant believe no one else has come
SG> across it. :-(

your code could still use major improvements.


SG> I open the sockets with
SG> sub OpenSocket() {
SG> #Get Socket address for IP or open a new socket
SG> my ($IP,$PORT,$S,$R,$I,@R);
SG> $IP=shift;

my( $ip ) = @_ ; Like it.

SG> $PORT=502;

my $port = 502 ; OK

use lower case names for lexical vars. upper case is for constants (yes,
the port is a constant but still i like lower as do most perl coders)

SG> if(!$SOCKETS{$IP}) {

unless( $sockets{$ip} ) { Yes

where is %sockets declared? it should be passed in via a hash ref
instead of being a global
I did not include
sub CloseAllSockets
sub CloseSocket
%SOCKETS is a mutual static variable for those subs in a BEGIN block.
SG> $S = new IO::Socket::INET (PeerAddr => $IP,
SG> PeerPort => $PORT,
SG> Proto => "tcp",
SG> Timeout =>10,
SG> Type => SOCK_STREAM) or return(undef);

my $sock = IO::Socket::INET->new( "$ip:$port" ) ;

there, isn't that much simpler? basic tcp socket connections can pass a
single host:port string. the type, proto are defaulted
correctly. timeout is fine with its default.
Perhaps, I prefer to specify specifically what I want, it is also
better when things are not working it is easier to tweak things.
$sock or return ;

no need to return undef. plain return does that when called in scalar
context.
OK No need but when I am returning a value and undef on error I prefer
to write it explicitly.
SG> $SOCKETS{$IP}=$S;
SG> $SELECT->add($SOCKETS{$IP});

what's with all the shouting? perl style is lower case
Well it wasn't working I thought if I shouted the computer might listen
to me a bit more :))
SG> $SOCKETS{$IP}->sockopt(TCP_NODELAY,1);

here is the big question. why do you do that? it is not needed except in
very special cases and your code is not one of them. it may even be the
cause of the socket dropouts.
Well, I am communicating with real time devices using modbus, which has
short messages and I need an immediate reply. I do not want the short
messages buffered.
This line does not make any diferance to the issue I have. If it is
commented out I still have the same problem. I originaly put the line
in to try and fix the issue.

Actually I am not realy sure that this is the correct syntax.
with perl -w I get
Argument "TCP_NODELAY" isn't numeric in setsockopt at
/System/Library/Perl/5.8.6/darwin-thread-multi-2level/IO/Socket.pm line
247

Also all the packets (with data) sent from the mac have the PUSH flag
set even without TCP_NODELAY
I thought that TCP_NODELAY disables local buffering but should also set
PUSH so that the receiver does not buffer. Perhaps this is a feature of
the auto flush in IO::Socket, it sets PUSH if the packet is small and
being flushed.
SG> };

SG> return ($SOCKETS{$IP});

why do you return the socket from the global hash? you have it in $S (or
in my version $sock).
Answered above $SOCKETS is local static.
SG> };




SG> This is then called from for example

SG> sub Read_10_Integer($$) {
SG> my (@R, $IP, $ADD, $PORT,$COMMAND, $q, $socket, $len, @Reply);

don't declare vars before you need them. you can my each of these in the
assignments below OK - better
SG> $IP = shift;
SG> $ADD = shift;

my( $ip, $add ) = @_ ;

shifting @_ for args is poor style. it is useful in some cases but
assiging @_ to a list of my vars is standard perl style.
This is MUCH nicer, I have never seen it though! It is obvious now.
SG> $PORT=502;

why is this set again? if it is really a constant, move it outside the
subs or pass it in as an arg.
It is left over from a time when I was opening the socket inside the
Read / write routine and immediate closing it.
I mentioned in my OP that I rewrote the code to keep the sockets open,
which then caused all my problems.
SG> $ADD--;

what is $ADD? use a better name for it.
OK $REGISTER
SG> $COMMAND =
SG> "\x00\x00\x00\x00\x00\x06\x01\x03".chr(int($ADD/256)).chr($ADD %
SG> 256)."\x00\x0A";


SG> unless($socket= OpenSocket($IP)) {

since this is a boolean test, have OpenSocket just return 1 on success,
not the socket itself.
I need the socket to read/write %SOCKET is static local
SG> print "Can't connect ($IP) : $@ [$!]\n";
SG> return undef;
SG> };

SG> print $socket $COMMAND;

$socket->print( $COMMAND ) ;

SG> $socket->read($r,6); #5th byte is the length byte

$socket->sysread(my $read_buf, 6);

use sysread on sockets. unless you are doing blocking line by line
protos, you shouldn't use stdio or perlio for sockets.
I did not know that, this seems to have helped alot.
SG> This is one place that it hangs with repeated retries on the tcpdump
SG> @R=split(//,$r);

my @read_bytes = split(//,$r);

SG> return undef if(ord($R[0])!=0 or ord($R[1])!=0);
SG> $len=ord($R[5]);
SG> $socket->read($r,$len);

maybe the retries are caused by the NO_DELAY thing. it moves data out in
smaller packets and those are getting lost often. remove that unhelpful
option and see what happens.
I put the option *IN* to try and correct the problem.
SG> @R=split(//,$r);
SG> Exception(ord($R[2])) if(ord($R[1]) > 0x80);

SG> @Reply=();

my @reply ;

SG> for($q=0;$q<10;$q++) {
SG> $Reply[$q]=ord($R[$q*2+3])*256+ord($R[$q*2+4]);
SG> };

i am sure that can be done more cleanly but i can't fathom it right now.
Probably, It is very C-ish and not very Perl-ish, but for sure it is
not the cause of the socket retry problems.


New code


BEGIN {
my %Sockets;

sub OpenSocket($) {
#Get Socket address for IP or open a new socket
my ($IP)=@_;
my $PORT=502;

unless ($Sockets{$IP}) {
my $S = new IO::Socket::INET (PeerAddr => $IP,
PeerPort => $PORT,
Proto => "tcp",
Timeout =>10,
Type => SOCK_STREAM) or return(undef);
$Sockets{$IP}=$S;
$SELECT->add($Sockets{$IP});
$Sockets{$IP}->sockopt(TCP_NODELAY,1);
};

return ($Sockets{$IP});
};

sub CloseAllSockets() {
foreach my $S (keys %Sockets) {
$Sockets{$S}->close();
delete $Sockets{$S};
};
$SELECT=IO::Select->new();
};

sub CloseSocket($) {
my($IP) = @_;
$SELECT->remove($SOCKETS{$IP});
$SOCKETS{$IP}->close();
delete $SOCKETS{$IP};
};

}# BEGIN BLOCK


And one of the calling functions

sub Read_10_Integer($$) {

my($IP,$Register) = @_;
$Register--;

#MBAP Header
# Bytes Meaning Value
# 2 Transaction ident - any number 00 00
# 2 Protocol Identifier - 0= Modbus 00 00
# 2 Length Number of following bytes 00 06
# 1 Unit Identifier 01
# 1 Function code 03 (Read Holding Register)
# 2 Register $REGISTER
# 2 Count 00 0A


my $COMMAND =
"\x00\x00\x00\x00\x00\x06\x01\x03".chr(int($Register/256)).chr($Register
% 256)."\x00\x0A";

my $socket = OpenSocket($IP);
unless($socket) {
print "Can't connect ($IP) : $@ [$!]\n";
return undef;
};

$socket->syswrite($COMMAND);
$socket->sysread(my $r,6); #5th byte is the length byte
my @R=split(//,$r);
return undef if(ord($R[0])!=0 or ord($R[1])!=0);
my $len=ord($R[5]);
$socket->sysread($r,$len);
@R=split(//,$r);
if(ord($R[1]) > 0x80) {
Exception(ord($R[2]),"READ 10 INTEGER IP=$IP ADD=$Register");
return(undef);
};
my @Reply=();
for(my $q=0;$q<10;$q++) {
$Reply[$q]=ord($R[$q*2+3])*256+ord($R[$q*2+4]);
};


return @Reply;
};

Stuart
 
U

Uri Guttman

SG> $S = new IO::Socket::INET (PeerAddr => $IP,
SG> PeerPort => $PORT,
SG> Proto => "tcp",
SG> Timeout =>10,
SG> Type => SOCK_STREAM) or return(undef);
SG> Perhaps, I prefer to specify specifically what I want, it is also
SG> better when things are not working it is easier to tweak things.

you won't need to tweak the connect call. almost no one ever does. as i
said, the protocol and type NEVER change if you want a tcp connection
and the timeout isn't critical to change.
SG> OK No need but when I am returning a value and undef on error I prefer
SG> to write it explicitly.

that is bad coding. if someone were to call your code in a list context,
instead of getting an empty list (which is false) they would get a
single element list containing undef (which is true). perl's return is
context sensitive and should be used with no arg for basic or empty returns.

SG> $SOCKETS{$IP}->sockopt(TCP_NODELAY,1);
SG> Well, I am communicating with real time devices using modbus, which
SG> has short messages and I need an immediate reply. I do not want the
SG> short messages buffered.
SG> This line does not make any diferance to the issue I have. If it is
SG> commented out I still have the same problem. I originaly put the line
SG> in to try and fix the issue.

it sounds like maybe your osx stack is crapola. but that isn't a perl
problem and i am sure many mac coders have written perl socket code for it.

SG> Actually I am not realy sure that this is the correct syntax.
SG> with perl -w I get
SG> Argument "TCP_NODELAY" isn't numeric in setsockopt at
SG> /System/Library/Perl/5.8.6/darwin-thread-multi-2level/IO/Socket.pm
SG> line 247

that is another problem. you didn't import the constant so it is being
passed as just a string. but i doubt you really need this.

SG> Also all the packets (with data) sent from the mac have the PUSH flag
SG> set even without TCP_NODELAY
SG> I thought that TCP_NODELAY disables local buffering but should also
SG> set PUSH so that the receiver does not buffer. Perhaps this is a
SG> feature of the auto flush in IO::Socket, it sets PUSH if the packet is
SG> small and being flushed.

autoflush has nothing to do with tcp. it is a stdio/perlio thing and if
you use syswrite it is ignored since that bypasses stdio.
SG> Answered above $SOCKETS is local static.

that doesn't answer why you return a hash lookup for the boolean
return. return 1 is simpler, faster and more accurate.
SG> print "Can't connect ($IP) : $@ [$!]\n";
SG> return undef;
SG> };SG> I did not know that, this seems to have helped alot.

and don't mix sysread/write with other types of i/o on the sockets. that
will be living hell. always use one style or the others on any given
socket. the choice is also dependent on the data (not tcp) stream
protocol, whether you want blocking or not, how the other side is
working, etc. socket programming isn't a one sided thing.
SG> I put the option *IN* to try and correct the problem.

and it didn't fix it so it isn't the cause. it would be unlikely to be
the cause since it is rarely used. and if your code really does work
fine on linux but not on osx, that points (but not prove) that the osx
stack is guilty. have you googled for osx and this error?


SG> BEGIN {

why the begin block? declaring empty vars subs in begin blocks does
nothing.

SG> my %Sockets;

SG> sub OpenSocket($) {

why are you using prototypes? they are meant for one thing alone
(changing how a sub call is parsed). they are not useful for arg
checking or stuff.

SG> #Get Socket address for IP or open a new socket

SG> my ($IP)=@_;
SG> my $PORT=502;

my $socket = $Sockets{$IP} ;
return $socket if $socket ;

the rest of the code is mainline, indented to the left and you save a
block. return as early when you can is a good style.

SG> unless ($Sockets{$IP}) {
SG> my $S = new IO::Socket::INET (PeerAddr => $IP,
SG> PeerPort => $PORT,
SG> Proto => "tcp",
SG> Timeout =>10,
SG> Type => SOCK_STREAM) or return(undef);
SG> $Sockets{$IP}=$S;
SG> $SELECT->add($Sockets{$IP});
SG> $Sockets{$IP}->sockopt(TCP_NODELAY,1);

you don't need that. and it isn't correct as you haven't defined the
constant. this means you aren't running strict/warnings clean. fix that
before you continue. let perl help you find bugs.


SG> sub CloseAllSockets() {
SG> foreach my $S (keys %Sockets) {
SG> $Sockets{$S}->close();
SG> delete $Sockets{$S};

values is useful there

foreach my $sock ( values %Sockets) {
$sock->close() ;
}

or even:

$_->close() for values %Sockets ;

SG> sub CloseSocket($) {
SG> my($IP) = @_;

SG> $SELECT->remove($SOCKETS{$IP});
SG> $SOCKETS{$IP}->close();
SG> delete $SOCKETS{$IP};

delete returns what it is deleting.

my $sock = delete $Sockets{$IP};
$SELECT->remove($sock);
$sock->close() ;

that saves you two hash lookups and is cleaner to read.

also you are using %SOCKETS and $Sockets which are not the same
var. strict would find that. use strict or die!!

SG> };

SG> }# BEGIN BLOCK

again, why the begin block? nothing i saw there needs to be done at
compile time.

SG> $socket->sysread(my $r,6); #5th byte is the length byte
SG> my @R=split(//,$r);
SG> return undef if(ord($R[0])!=0 or ord($R[1])!=0);
SG> my $len=ord($R[5]);
SG> $socket->sysread($r,$len);
SG> @R=split(//,$r);

SG> if(ord($R[1]) > 0x80) {
SG> Exception(ord($R[2]),"READ 10 INTEGER IP=$IP ADD=$Register");
SG> return(undef);
SG> };
SG> my @Reply=();
SG> for(my $q=0;$q<10;$q++) {

foreach my $q ( 0 .. 9 ) :

SG> $Reply[$q]=ord($R[$q*2+3])*256+ord($R[$q*2+4]);
SG> };

since you seem to need byte values use unpack to get them. much cleaner
than all those ord calls. same for building the command, pack is better
then chr calls. there is a doc called perlpacktut you can read about
pack/unpack.

uri
 
D

derykus

On 2009-05-07 23:03:16 +0300, (e-mail address removed) said:

...
        print $socket $COMMAND;
        $socket->read($r,6);    #5th byte is the length byte     *****
This is one place that it hangs with repeated retries

Any possibility of a timeout as a workaround:

eval { local $SIG{ALRM} = sub { die 'socket t/o';
alarm(...);
$socket->read($r, 6) };
alarm(0);
};
if ( $@ =~ m{socket t/o} and not $socket->connected ) {
... reopen socket etc.
}
 
D

Dr.Ruud

eval { local $SIG{ALRM} = sub { die 'socket t/o';
alarm(...);
$socket->read($r, 6) };
alarm(0);
};
if ( $@ =~ m{socket t/o} and not $socket->connected ) {
... reopen socket etc.
}

That template looks wrong. My go at it:

eval {
local $SIG{ALRM} = sub { die 'socket t/o' };
alarm 8;
$socket->read($r, 6);
alarm 0;
1; # success
}
or do {
my $err = $@ || "unknown";
alarm 0;
if ( $err =~ m{socket t/o} and not $socket->connected ) {
... reopen socket etc.
}
};
 
D

derykus

That template looks wrong. My go at it:

   eval {
       local $SIG{ALRM} = sub { die 'socket t/o' };
       alarm 8;
       $socket->read($r, 6);
       alarm 0;
       1;  # success
   }
   or do {
       my $err = $@ || "unknown";
       alarm 0;
       if ( $err =~ m{socket t/o} and not $socket->connected ) {
           ...  reopen socket etc.
       }
   };

Much better.

Perhaps a lower-level recv/send pair might be a further improvement to
get the RST directly:

unless ( $socket->recv($r, 6) ) {
if ( $! == ECONNRESET ) {
... re-open socket, etc.
}
...
 
U

Uri Guttman

d> Perhaps a lower-level recv/send pair might be a further improvement to
d> get the RST directly:

recv/send are just different apis from sysread/syswrite. they don't do
anything special underneath. they were intended for tcp to support data
packet boundaries. i forget the proto name but the flags are defined and
unsupported. recv/send would work with those boundaries but since they
aren't supported, they are just like sysread/write.

uri
 
D

derykus

  d> Perhaps a lower-level recv/send pair might be a further improvement to
  d> get the RST directly:

recv/send are just different apis from sysread/syswrite. they don't do
anything special underneath. they were intended for tcp to support data
packet boundaries. i forget the proto name but the flags are defined and
unsupported. recv/send would work with those boundaries but since they
aren't supported, they are just like sysread/write.

Ah, the usual sysread/syswrite should work then if
the TCP errors all propagate back:

unless ( $socket->sysread($r,6) ) {
if ( $! == ECONNRESET ) {
... re-open socket etc.
 
S

Stuart Gall

it sounds like maybe your osx stack is crapola. but that isn't a perl
problem and i am sure many mac coders have written perl socket code for it.

That is what I am worried about. But if this problem was generic i.e.
any bad tcp packet results in the system getting hung in a retry loop
OSX would be very broken.

So it must be either a perl/osx thing or there must be something odd
about the TCP packets coming from these modbus devices which is
upsetting OSX.
SG> Actually I am not realy sure that this is the correct syntax.
SG> with perl -w I get
SG> Argument "TCP_NODELAY" isn't numeric in setsockopt at
SG> /System/Library/Perl/5.8.6/darwin-thread-multi-2level/IO/Socket.pm
SG> line 247

that is another problem. you didn't import the constant so it is being
passed as just a string. but i doubt you really need this.

It imports as Socket::TCP_NODELAY
and you are right it made no diferance.
SG> Answered above $SOCKETS is local static.

that doesn't answer why you return a hash lookup for the boolean
return. return 1 is simpler, faster and more accurate.


SG> BEGIN {

why the begin block? declaring empty vars subs in begin blocks does
nothing.

It makes them static.
Is there another way to define a static variable ?
SG> my %Sockets;

SG> sub OpenSocket($) {

why are you using prototypes? they are meant for one thing alone
(changing how a sub call is parsed). they are not useful for arg
checking or stuff.

I am using perl -w - I dont usually, but while I am trying to figure
out why this script does not work I am.
perl -w complains if you don't use prototypes. Or is there some other
way around that ?
 
S

Stuart Gall

Any possibility of a timeout as a workaround:

eval { local $SIG{ALRM} = sub { die 'socket t/o';
alarm(...);
$socket->read($r, 6) };
alarm(0);
};
if ( $@ =~ m{socket t/o} and not $socket->connected ) {
... reopen socket etc.
}

I have sig alarm set anyway to catch the whole read/write loop.

OK Here is the realy odd thing
The first time a socket goes bad the read call will eventually return 0
bytes. and put the socket at eof.

I am using IO::Select to detect when a socket is READ_READY.
At the point that I call select no socket should be read ready so if it
is it means that it is at eof.
So this enables me to detect which socket has gone bad.

If I close and reopen that socket, the next socket that I attempt to do
IO hangs with the same issue, repeated TCP retries then RST. And one by
one they all fall down.

So my work around is that when I get a socket go eof I close all the
sockets and reopen them.
This then lets the code work for a few more loops.

Some how what ever is going wrong is corrupting all the sockets.

Incidentally I upgraded to 10.5.7 and amazingly I still have the same problem.

Stuart
 
U

Uri Guttman

SG> On 2009-05-10 19:29:47 +0300, Uri Guttman <[email protected]> said:

SG> So it must be either a perl/osx thing or there must be something odd
SG> about the TCP packets coming from these modbus devices which is
SG> upsetting OSX.

hard to say from here. i highly doubt it is perl. if you want to verify
this do the same thing in c or another language. or google for this
situation and see if others have seen it.

SG> It makes them static.
SG> Is there another way to define a static variable ?

just declaring vars outside a sub makes them static. the issue is more
about scoping than compile vs run time which is what BEGIN controls.

SG> sub OpenSocket($) {
SG> I am using perl -w - I dont usually, but while I am trying to figure
SG> out why this script does not work I am.
SG> perl -w complains if you don't use prototypes. Or is there some other
SG> way around that ?

huh?? perl doesn't complain if you don't use prototypes. it will complain if
you use wrong prototypes. don't put any () after the sub names. read
perldoc perlsub for more.

uri
 
C

CDeRykus

On 2009-05-11 02:12:46 +0300, (e-mail address removed) said:
...
I have sig alarm set anyway to catch the whole read/write loop.

Just a wild guess but you may want to try checking
sysread for failure since that's the first hint of
trouble. Also, that may be the best point to recover
since the alarm may be too late with errors cascading and getting even
more bizarre.


use Errno; # check Errno for portability issues
...
unless ( $socket->sysread($r,6) ) {
if ( $! == ECONNRESET ) {
... re-open socket etc


Otherwise, this is probably not a Perl issue and
you'll need to google.
 
S

Stuart Gall

EURIKA!!!!

The problem is the tcp timestamps.
OSX is adding tcp timestamps, mandrivia does not.

These little modbus devices only have a 16 bit time stamp counter. Once
this counter hits FFFF and goes back to 0 OSX thinks it has got an old
packet from a looped sequence number.

So either I have to turn off time stamps, or I have to disable PAWS on the OSX.

Any ideas how I do that?

TIA
Stuart
 
U

Uri Guttman

SG> EURIKA!!!!

eureka

SG> The problem is the tcp timestamps.
SG> OSX is adding tcp timestamps, mandrivia does not.

SG> These little modbus devices only have a 16 bit time stamp
SG> counter. Once this counter hits FFFF and goes back to 0 OSX thinks it
SG> has got an old packet from a looped sequence number.

SG> So either I have to turn off time stamps, or I have to disable
SG> PAWS on the OSX.

SG> Any ideas how I do that?

so this is not a perl problem as we forecast. maybe there is some option
for doing this to a socket but you would need to google to find out.

on linux man -s 7 socket shows this socket option:

SO_TIMESTAMP
Enable or disable the receiving of the SO_TIMESTAMP control mes]
sage. The timestamp control message is sent with level
SOL_SOCKET and the cmsg_data field is a struct timeval indicat]
ing the reception time of the last packet passed to the user in
this call. See cmsg(3) for details on control messages.

so it could be done if you have that constant in perl from socket.pm or
similar and you use the setsockopt call correctly. they can be tricky to
get done right in c but likely easier in perl.

uri
 
S

Stuart Gall

EURIKA!!!!

The problem is the tcp timestamps.
OSX is adding tcp timestamps, mandrivia does not.

These little modbus devices only have a 16 bit time stamp counter. Once
this counter hits FFFF and goes back to 0 OSX thinks it has got an old
packet from a looped sequence number.

So either I have to turn off time stamps, or I have to disable PAWS on the OSX.

Any ideas how I do that?

TIA
Stuart

OK Its fixed
sudo sysctl -w net.inet.tcp.rfc1323=0

or add the line

net.inet.tcp.rfc1323=0
to /etc/sysctl.conf'

This disables tcp window scaling also but that is not a issue for me.

I have also contacted the manufacture of the modbus devices to suggest
they inplement tcp timestamps properly or not at all.

Many thanks to all who posted, your coments were a big help.


Stuart.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top