Questions about (non-blocking) sockets

Discussion in 'Perl Misc' started by Holger, Jul 24, 2006.

  1. Holger

    Holger Guest

    Hi,

    After having read quite a bit of postings on the topic it's still not
    entirely clear to me.

    I have working code that is a forever running client and which opens
    just one connection (with blocking reads) to a server. Generally, it's
    kind of a ping pong protocol, but the server also arbitrarily sends
    data. I suppose that therefore I haven't run into a deadlock problem
    yet of not sending often enough data that cause a response in return.
    Here's the code frame:

    #!/usr/bin/perl -w
    use strict;
    use IO::Socket;
    use IO::Select;

    # init snipped ...
    $SIG{__DIE__} = \&SigDieHandler;

    sub SigDieHandler
    {
    $socket->close if ( defined $socket );
    $SIG{__DIE__} = 'DEFAULT';
    exit( 0 );
    }

    my $socket = IO::Socket::INET->new( PeerAddr => $RemoteHost,
    PeerPort => $RemotePort,
    Proto => 'tcp',
    Type => SOCK_STREAM )
    or die "Could not connect to $RemoteHost:$RemotePort: [email protected]\n";
    my $data;
    while ( defined $socket->recv( $data, 1028 ) ) {
    # while processing $data ... {
    print $socket $otherdatatosend;
    # error handling snipped ...
    # }
    }
    __END__

    I've read to not use recv or sysread together with print on the same
    socket. Does this apply to the entire socket or only per direction?
    After all, the above works.

    I'd like to convert the code into non-blocking call(s), but am a
    little confused about the exact details, since I've only found
    examples for the server side with listening sockets. Perhaps, I don't
    see the wood for all the trees. I merely have a single connection, not
    an entire pool.

    So far I've come up with the following (btw, needs to work on win32,
    too). But before changing the program I wanted to ask whether I'm on
    the right track.

    my $socket = IO::Socket::INET->new( PeerAddr => $RemoteHost,
    PeerPort => $RemotePort )
    or die "Could not connect to $RemoteHost:$RemotePort: [email protected]\n";
    my $select = IO::Select->new( $socket );
    while ( 1 ) {
    if ( $select->can_read( $timeout ) ) {
    unless ( defined $socket->recv( $data, 1028 ) ) {
    break;
    }
    else {
    # process received $data
    }
    }
    # do stuff, send data
    }
    $select->remove( $socket );
    close $socket;
    __END__

    I also have a few questions:
    What exact effects do values undef, 0, and e.g. 5 for $timeout have?

    $select->can_read returns in this case either a list with one entry or
    an empty list (on timeout). Correct?

    Should I use sysread instead of recv?

    Is the $select->remove necessary for cleanup?

    Any thoughts on having the cleanup code in the die handler or not?

    Best regards,

    Holger
     
    Holger, Jul 24, 2006
    #1
    1. Advertisements

  2. Holger

    xhoster Guest

    What does that mean? One query yields exactly one response?
    With what is shown, it is hard to say whether you are just lucky so far
    or if your code is really not susceptible to deadlocks. I'd have to see
    the code on the other end too, and know how you are aligning messages.
    This seems a little unusual. Usually the client initiates a connection,
    and then initiates the query on that connection. Here, this process
    initiates the connection, but then waits for other side to initiate
    the query. I don't know if that makes this process the client or the
    server.

    Also, does the other end always send records of size exactly 1028?

    I don't think that mixing them on different legs of the same socket will
    cause problems.
    Why do you want to do it non-blocking? I can answer some of your technical
    questions without knowing the reason, but knowing why would help us help
    you see the forest for the trees.
    You might want the "send data" to be inside the else. Do you really need
    to send data to the other every $timeout seconds, even if the other side
    hasn't yet responded to your previous queries?
    Undef means wait forever for the handle to become readable. 0 means wait
    for zero seconds. 5 means to wait for 5 seconds.

    The tradition I've seen is to recv only for UDP. So I would say yes, use
    sysread. I don't know if there is a basis beyond tradition for this.
    Not as is, because the program immediately terminates anyway. If the
    program went on to use $select later, then you should remove the dead
    handle.
    I don't see any point in it.

    Xho
     
    xhoster, Jul 25, 2006
    #2
    1. Advertisements

  3. Holger

    Uri Guttman Guest

    x> This seems a little unusual. Usually the client initiates a connection,
    x> and then initiates the query on that connection. Here, this process
    x> initiates the connection, but then waits for other side to initiate
    x> the query. I don't know if that makes this process the client or the
    x> server.

    i try to teach this point to all when discussing sockets. only the
    connection is required to be oriented from client to running (listening)
    server. tcp is bidirectional after that. you can then run client/server
    in either direction or peer to peer or whatever you want. there is no
    association between the connection direction and the data's. now, most
    client connections are also client requestors (http being the classic
    now). but i do peer to peer with stem and only worry about the
    connection direction as needed.

    you can do whatever you want as long as you understand the issuse
    involved. the problem is they are subtle and tricky so obeying those
    rules is good in general. and you can use buffered i/o in one direction
    unbuffered in the other since they are independent. you just have to
    make sure each side (for a given direction) knows how the other side
    will transmit/receive data. this is why i always use sysread/write for
    all my socket code as i want the control and not to have to worry about
    some strange buffering done by print and stdio.

    uri
     
    Uri Guttman, Jul 25, 2006
    #3
  4. Holger

    Ben Morrow Guest

    Quoth :
    The only difference between sysread snd recv is that recv allows you to
    determine where the packet was from. Since a TCP socket (indeed, a
    SOCK_STREAM socket in general) only has one 'other end', there's no need
    to use recv. OTOH, there's no real reason not to beyond preserving the
    analogy to pipes.

    Ben
     
    Ben Morrow, Jul 25, 2006
    #4
  5. Holger

    Holger Guest

    Hi Xho,

    Sorry for not responding sooner. Thank you, and also Uri and Ben, for
    your replies.

    It's loosely based on Telnet, but without escape sequences. The server
    sends more or less continuously data and responds to commands the
    client sends. Most of the communication is line based, but
    unfortunately with some exceptions.
    I couldn't really post more code than the example because the posting
    got longer than I intended anyway. But I can gladly supply more
    information. And the other end isn't under my control.
    This server starts by sending a banner and a login prompt, which I
    need to wait for. Since my program (theoretically) runs for ever, it
    could be viewed as kind of a server, too. Personally, I call something
    a server if it's either longer running than the communication partners
    or listen()ing for (possibly more than one at a time) new connections.
    No. As I understood the parameter, it should be the buffer length and
    therefore the maximum of data to be received at that call. I also have
    omitted code in the processing part that adds the chunks together and
    scans for and then processes the lines, messages or records.
    Ok, thanks.
    In the current code I have a command queue from which I send one
    command per incoming and processed message (inner loop not shown in
    the example). There isn't a strict one to one relation however. For
    some data it's not necessary to send anything, for other I want to
    generate multiple commands. Since the server sends data on its own,
    which I can use as a heart beat (despite being very irregular), the
    danger of getting to the situation where I don't get enough chances to
    send something shouldn't be high. (On second thought, this isn't
    really a deadlock, so I used the term wrongly.)
    As mentioned above I'd like to be able to send commands independently
    from the server sending data. Perhaps, I will insert code to send
    waiting data in both places.
    So undef is again blocking, and 0 results in polling, which I guess
    should be avoided.

    Thanks also for the other answers.

    Regards,

    Holger
     
    Holger, Jul 26, 2006
    #5
  6. Holger

    xhoster Guest

    There are two risks for deadlock here if your print to the socket in a
    blocking (and even more so blocking and buffered) way, but they depend on
    the unknown behavior of the other end (which I'll call the server). Since
    the server seems to have two streams of output, one in reponse to the
    client and one autonomous stream, how does it interweave them and are they
    blocking or not? If the server prints the autonomous stream in a blocking
    way, then it is possible that a big slug of autonomous data after your code
    passes the recv and/or timeout but before your code finishes "do stuff;
    send data". Then you might block waiting for server to read the data you
    are trying to send, and thus you will not doing any "recv"ing; while the
    server is blocked waiting for you to recv the autonomous data, refusing to
    read more commands from you until you read form it. If your commands are
    small and you only send a command after receiving a reply to the previous
    command and the volume of autonomous data is low and evenly spread, this
    risk is low. But if your commands are large, or you send many of them
    asynchronously (which you probably are if you send after a timeout, in
    addition to sending after a successful read), or the autonomous data comes
    in large chunks, the risk is increased.

    Consider what happens if the server does send autonomous data unblockingly.
    What would it do if you just could never keep up with it? It would have to
    buffer that unsent data internally, and the internal buffer could grow
    without bound. Is that a risk the implementer of the server would likely
    take? The implementer probably took some steps to avoid thus, but whether
    that would be to use blocking, or to drop overflowing data, or just abort,
    I don't know.


    Second, even if the server sends autonomous data without blocking, does
    it also send responsive data without blocking? It is quite plausible that
    the server will only read as much from your socket as it needs to get one
    complete command, and will then not read anymore until you process that
    command and *you read all the results of that command*. So again, you will
    block sending another command, while it blocks waiting for you to read the
    output of the previous command, you have deadlock.

    So in short, if you don't know how the server handles its data, you might
    want to consider making the "send data" part in a nonblocking way using
    select and syswrite rather than print. Of course, it depends on your
    paranoia/laziness trade off.

    Xho
     
    xhoster, Jul 27, 2006
    #6
  7. Holger

    Holger Guest

    I don't know details about the interweaving; I can only guess from the
    data. There seem to be certain fixed data sequences, but in general I
    suppose this depends on the events the server receives, e.g. from
    other users. My guess is that it handles everything asynchronously,
    because answers don't relate to a particular command. It may even be
    the case that due to server load it doesn't answer. One thing which is
    assured of course, is that messages are fully sent before the next
    one.
    I think the server doesn't block on read and would keep sending
    anyway. Hmm, how would I actually test this? Syswrite something
    without a record separator and then try to read?

    [scenarios snipped]
    Thanks, points taken. I'll have a closer look at that.

    Regards,

    Holger
     
    Holger, Jul 28, 2006
    #7
  8. Holger

    xhoster Guest

    I don't think that that is what I was trying to describe. The server is
    not blocking on read, it is blocking on write (waiting for the client to
    read). In other words, it is trying to keep sending, but the client won't
    read what it is writing and the buffer is full, so its attempt at sending
    blocks.

    Don't try to read at all, just keep writing. syswrite or print, doesn't
    matter but print is easier, and you may as well use a record separator
    (otherwise you would be testing something different). If the server uses
    blocking writes to you, then eventually the server will block on its write
    to you, and hence stop reading what you write to it, and then you will
    block on writing to it, resulting in a deadlock.

    while (1) {
    print $SOCKET $one_datachunk;
    warn "done with one more chunk ",$count++;
    };

    If the program freezes for a long time (stops printing the warnings) then
    it probably deadlocked.

    Of course, if you do this and the server doesn't use blocking writes but
    rather stores up all the stuff it wants to write but can't, then its memory
    usage will grow without limit. So if it doesn't block (kepts emitting the
    warnings), then either don't let this run for too long, or monitor the
    server processes size.

    Xho
     
    xhoster, Jul 28, 2006
    #8
  9. Holger

    Holger Guest

    Ah, ok. Misunderstood before.
    Thanks. I'll try this.

    Regards,

    Holger
     
    Holger, Jul 28, 2006
    #9
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.