Simulating smaller MTU? ie sending small packets.

Discussion in 'Perl Misc' started by Ed W, Aug 16, 2005.

  1. Ed W

    Ed W Guest

    Hi, for various reasons I'm writing a little stress test app which tries
    to simulate the effects of varying sized TCP packets on the overall
    transfer speed.

    So I have written a little app which acts as a server, waits for a
    connection and then spews data in fixed sized chunks of your choice. I
    also turn off nagle, turn on autoflush, and as far as I can tell ask for
    the data to go out immediately

    What I observe (using an ethernet dump) is that once the receiver is not
    keeping up with the speed the sender is spewing packets, the *sender*
    (which in this case is linux 2.6.12) is starting to coallesce the packets

    So for example if I ask it to send 1000 byte packets I can see from the
    network trace that it starts to send lots of MTU sized packets instead
    (larger).

    This is not what I was expecting at all, in fact I had no idea that
    there was some clever process in linux to coallesce small network
    packets? Am I tripping over some perl buffering instead? Any thoughts
    on where to look?

    Note, that it's not a mis-measurement problem at the receiving side. A
    Network trace is showing me that the packets are coming out at MTU sized
    (in general, but with a smattering of packets the size I requested).

    If I slow down the sending rate, or speedup the receiver then the
    packets go through at the correct size...

    Grateful for any help trying to work around this

    Ed W
     
    Ed W, Aug 16, 2005
    #1
    1. Advertising

  2. Ed W

    Ed W Guest

    > The only work-around I can see is make sure that the readers operate
    > faster than the writers.


    I have nearly found a workaround. If I use the following:

    setsockopt($sock, &Socket::IPPROTO_TCP, &Socket::TCP_MAXSEG, 500);

    Then I can change the size of the MSS for the connection (which is the
    effect I'm basically after).

    The problem is that this only seems to be working if I use it on the
    listening socket before I accept any connections. It doesn't seem to
    work if I call it on the accepted connection

    Looking at the C docs however, suggests that this *ought* to work at any
    time... Likewise attempts to turn on and off TCP_CORK (for fun) aren't
    working on the open connection and this is definitely supposed to be
    possible

    Any ideas what I am missing?

    Thanks

    Ed W
     
    Ed W, Aug 17, 2005
    #2
    1. Advertising

  3. Also sprach Ed W:

    > Hi, for various reasons I'm writing a little stress test app which tries
    > to simulate the effects of varying sized TCP packets on the overall
    > transfer speed.


    This is probably a moot venture. The smaller the packets are, the lower
    the overall throughput is going to be. This is due to the fact that TCP
    packet have to be acknowledged. If no piggy-bagging is used (which is
    the case for a pure-receiver side), then an additional 40 bytes (IP +
    TCP minimum header size) need to be sent out on each acknowledgement.

    The number of ACKs sent depend on the window size. Normally a receiver
    tries to minimize ACKs and window size update messages (Clark).

    > So I have written a little app which acts as a server, waits for a
    > connection and then spews data in fixed sized chunks of your choice. I
    > also turn off nagle, turn on autoflush, and as far as I can tell ask for
    > the data to go out immediately
    >
    > What I observe (using an ethernet dump) is that once the receiver is not
    > keeping up with the speed the sender is spewing packets, the *sender*
    > (which in this case is linux 2.6.12) is starting to coallesce the packets
    >
    > So for example if I ask it to send 1000 byte packets I can see from the
    > network trace that it starts to send lots of MTU sized packets instead
    > (larger).


    But it's probably going to send these larger packets at a lower rate.
    Did you also check the ACK packets from the receiver? The Nagle
    algorithm tells the sender never to send small packets. Turning it off
    means sending them immediately as long as the receiver's side can keep
    up. Now, if the receiver is congested, I would assume that the sender
    still buffers small packets and once an ACK packet arrives it sends out
    as many data as there is space in the receiving window.

    > This is not what I was expecting at all, in fact I had no idea that
    > there was some clever process in linux to coallesce small network
    > packets? Am I tripping over some perl buffering instead? Any thoughts
    > on where to look?


    No, you're tripping over a sane implementation of the TCP stack. TCP by
    nature is slow and has some overhead which is reduced by various means,
    most notably the Nagle (sender) and Clark (receiver) algorithms.
    Furthermore, in order to avoid clogging the subnet between sender and
    receiver, congestion control is carried out (see TCP slow start
    algorithm).

    > Note, that it's not a mis-measurement problem at the receiving side. A
    > Network trace is showing me that the packets are coming out at MTU sized
    > (in general, but with a smattering of packets the size I requested).
    >
    > If I slow down the sending rate, or speedup the receiver then the
    > packets go through at the correct size...


    In order to do your measurements, you should probably adjust parameters
    on the receiving side. If you want smaller packets, try to set the
    window size (TCP_WINDOW_CLAMP, I think). TCP_MAX_SEG also needs to be
    set there as the MSS is announced by the receiver during the
    three-way-handshake when the connection is established.

    Tassilo
    --
    use bigint;
    $n=71423350343770280161397026330337371139054411854220053437565440;
    $m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);
     
    Tassilo v. Parseval, Aug 17, 2005
    #3
  4. Ed W

    Ed W Guest

    > This is probably a moot venture. The smaller the packets are, the lower
    > the overall throughput is going to be. This is due to the fact that TCP
    > packet have to be acknowledged.


    Be careful with your generalisation. The point of my experiment is to
    test an unreliable (and very slow) satellite network to determine
    whether faster speed would be achieved using smaller MTU due to less
    retranmissions. 1500 bytes represents up to 7 seconds of transmission
    time...

    > In order to do your measurements, you should probably adjust parameters
    > on the receiving side. If you want smaller packets, try to set the
    > window size (TCP_WINDOW_CLAMP, I think). TCP_MAX_SEG also needs to be
    > set there as the MSS is announced by the receiver during the
    > three-way-handshake when the connection is established.


    I'm not sure I can see how window size affects things, but it's
    interesting to see that I can influence it on a per connection basis?

    I'm trying to change TCP_MAX_SEG and the docs imply it can be changed
    once the connection is established, but at least using perl this doesn't
    appear to work.

    If I change it on a listening socket then I observe that the subsequent
    tcp handshake uses the original max values, but that TCP then uses the
    smaller values for sending data (ie it does what I expect). It would
    just be useful to be able to change the MSS while the connection is
    operating

    It might for example be useful to change the MSS if we observe more
    corrupted tcp packets arriving, or other similar algorithm.


    Also, is it possible to observe how full the network buffers are?
    getsockopt(xxx)? Again, it might be useful to observe this value in the
    situation above and slow down sending when the buffers are filling up
    (for example with these huge latencies I might want to have more control
    over the amount of outstanding data)

    Any thoughts?

    Thanks

    Ed W
     
    Ed W, Aug 17, 2005
    #4
  5. Also sprach Ed W:

    >> This is probably a moot venture. The smaller the packets are, the lower
    >> the overall throughput is going to be. This is due to the fact that TCP
    >> packet have to be acknowledged.

    >
    > Be careful with your generalisation. The point of my experiment is to
    > test an unreliable (and very slow) satellite network to determine
    > whether faster speed would be achieved using smaller MTU due to less
    > retranmissions. 1500 bytes represents up to 7 seconds of transmission
    > time...


    Ah, there we are! TCP design was heavily based on the assumption of
    using wires on the physical layer (which means: the medium is fairly
    reliable). I made the same assumption. :)

    TCP over a wireless link is a completely different story. The current
    design still works but it can be horrendously inefficient, especially
    congestion control.

    >> In order to do your measurements, you should probably adjust parameters
    >> on the receiving side. If you want smaller packets, try to set the
    >> window size (TCP_WINDOW_CLAMP, I think). TCP_MAX_SEG also needs to be
    >> set there as the MSS is announced by the receiver during the
    >> three-way-handshake when the connection is established.

    >
    > I'm not sure I can see how window size affects things, but it's
    > interesting to see that I can influence it on a per connection basis?


    Yes, per connection. The window size is a parameter sent with each ACK
    message and it refers to the currently remaining size of the receiving
    buffer. This is normally handled by the TCP stack. It is not a global
    parameter you set once and then it remains there. But from the
    description in tcp(7) I assume that TCP_WINDOW_CLAMP is an upper bound
    and no sent packet may ever exceed it.

    > I'm trying to change TCP_MAX_SEG and the docs imply it can be changed
    > once the connection is established, but at least using perl this doesn't
    > appear to work.


    TCP_MAXSEG actually. To what value did you set it? If it exceeds the MTU
    of your interface, the value is ignored. Also, the minimum size is, I
    think, 556 which is a TCP requirement.

    > If I change it on a listening socket then I observe that the subsequent
    > tcp handshake uses the original max values, but that TCP then uses the
    > smaller values for sending data (ie it does what I expect). It would
    > just be useful to be able to change the MSS while the connection is
    > operating


    As far as I know this is not in the specifications of TCP. The segment
    size is agreed on during connection establishment (actually, both sides
    may use different values for the MSS).

    > It might for example be useful to change the MSS if we observe more
    > corrupted tcp packets arriving, or other similar algorithm.


    The only way to do that is to shutdown the connection and establish a
    new one with the updated values for the MSS.

    > Also, is it possible to observe how full the network buffers are?
    > getsockopt(xxx)? Again, it might be useful to observe this value in the
    > situation above and slow down sending when the buffers are filling up
    > (for example with these huge latencies I might want to have more control
    > over the amount of outstanding data)


    With a packet-sniffer, you certainly can. You find it in the window size
    field of the TCP header that is sent back as acknowledgement from the
    receiver. But as I said earlier, this is a dynamic value so I don't
    think you can find that value on a per-socket basis. There is TCP_INFO
    that returns some values. But the structure returned seems to have no
    information on the last state of the receiver's window.

    Since you are trying to optimize a TCP connection over a wireless link,
    are you sure at all that reducing the packet size is a good idea? The
    problem with wireless links is their lack of reliability (and latency).
    When data get through corrupted (or not at all) a approach is to send
    again immediately in the hope that it gets through this time. Also,
    making packages smaller does not necessarily make the transmission more
    reliable. Wireless is really just send-and-pray.

    However, if you have problems with the buffer of the receiver filling up
    too quickly, wouldn't that mean that the data got through beautifully?
    If the receiver constantly has full buffers, it means the link is in
    fact quite reliable and thus making it look similar to a wired link. In
    this case you can just rely on the default behaviour of your TCP stacks
    as they work well for reliable links.

    Tassilo
    --
    use bigint;
    $n=71423350343770280161397026330337371139054411854220053437565440;
    $m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);
     
    Tassilo v. Parseval, Aug 17, 2005
    #5
  6. On Wed, 17 Aug 2005, Ed W wrote:

    > > This is probably a moot venture. The smaller the packets are, the
    > > lower the overall throughput is going to be. This is due to the
    > > fact that TCP packet have to be acknowledged.

    >
    > Be careful with your generalisation. The point of my experiment is
    > to test an unreliable (and very slow) satellite network to determine
    > whether faster speed would be achieved using smaller MTU due to less
    > retranmissions. 1500 bytes represents up to 7 seconds of
    > transmission time...


    Yes, but the acknowledgement doesn't have to be for every individual
    packet. Check the "window" parameter. You may however need some very
    large buffers if you hope to improve performance. One sees a similar
    effect without the satellite, if trying to get good bulk data
    throughtput on a transatlantic cable link: despite having at least
    1Gbit/sec paths at all points between the hosts at each end, and the
    hosts themselves being adequate to the purpose, the throughput looks
    quite miserable unless some serious tuning of the TCP parameters is
    done.

    However, this would be better explored on a networking group, I think,
    than right here on c.l.p.misc.

    And google for * tcp tuning throughput * and similar combinations of
    terms. Anything recent-ish which comes back with LBL.gov and/or
    internet2 in the URL is likely to be worth a look.

    > I'm not sure I can see how window size affects things, but it's
    > interesting to see that I can influence it on a per connection
    > basis?


    The acknowledgements are serial numbered as to which packets they
    relate to, so you can be acknowledging a packet which was quite some
    time back while you have all the intervening packets "up the spout" or
    in transit, at least for the major part of the transfer (at the ends
    of course it sorts itself out).

    hope this helps.
     
    Alan J. Flavell, Aug 17, 2005
    #6
  7. Ed W

    Ed W Guest

    You may however need some very
    > large buffers if you hope to improve performance. One sees a similar
    > effect without the satellite, if trying to get good bulk data
    > throughtput on a transatlantic cable link: despite having at least
    > 1Gbit/sec paths at all points between the hosts at each end, and the
    > hosts themselves being adequate to the purpose, the throughput looks
    > quite miserable unless some serious tuning of the TCP parameters is
    > done.


    I think you overestimate the problem. I am using Iridium...

    On a clear day with no clouds, the satellite overhead and a following
    wind it can do 2400 baud...

    Retransmissions are what I need to reduce

    Ed W
     
    Ed W, Aug 17, 2005
    #7
  8. Ed W

    Ed W Guest

    > TCP_MAXSEG actually. To what value did you set it? If it exceeds the MTU
    > of your interface, the value is ignored. Also, the minimum size is, I
    > think, 556 which is a TCP requirement.


    The docs say that you can use it to reduce the MSS over what was
    negotiated at the link establishment. This seems to be the case I can
    see the MTU being established at 1500 using a packet sniffer, but
    setting TCP_MAXSEG then means packets go out at (say) 500 bytes

    I haven't found an issue under Linux setting values from 300 bytes to
    1400 bytes, so I don't know if there is any limit?


    >>If I change it on a listening socket then I observe that the subsequent
    >>tcp handshake uses the original max values, but that TCP then uses the
    >>smaller values for sending data (ie it does what I expect). It would
    >>just be useful to be able to change the MSS while the connection is
    >>operating

    >
    >
    > As far as I know this is not in the specifications of TCP. The segment
    > size is agreed on during connection establishment (actually, both sides
    > may use different values for the MSS).


    It seems obvious though that there is no technical reason that we can't
    agree on 1400 bytes being the largest we are allowed to send to the
    remote and then sending 700 byte packets instead

    My understanding is that this param can be changed at runtime. Also
    there is TCP_CORK which is supposed to optimally pack data into packets
    - again I can't seem to toggle this using the perl code

    >>It might for example be useful to change the MSS if we observe more
    >>corrupted tcp packets arriving, or other similar algorithm.

    >
    >
    > The only way to do that is to shutdown the connection and establish a
    > new one with the updated values for the MSS.


    The docs imply not?


    > Since you are trying to optimize a TCP connection over a wireless link,
    > are you sure at all that reducing the packet size is a good idea? The
    > problem with wireless links is their lack of reliability (and latency).
    > When data get through corrupted (or not at all) a approach is to send
    > again immediately in the hope that it gets through this time. Also,
    > making packages smaller does not necessarily make the transmission more
    > reliable. Wireless is really just send-and-pray.


    I observe several to a few dozen random errors per minute. If you do a
    little quick maths I think you can easily see that packets taking 7 secs
    each to send (and retransmit in case of error) are much less efficient
    than say 1 second packets.

    I have just done some testing and this is very much bourne out practice
    with the smaller packets (I tested at 300, 500 and 600 bytes) suffer
    only very small amounts of slowdown and are much more robust in the case
    of retransmit. I think a quick model in Excel would also show that this
    is the case?


    > However, if you have problems with the buffer of the receiver filling up
    > too quickly, wouldn't that mean that the data got through beautifully?
    > If the receiver constantly has full buffers, it means the link is in
    > fact quite reliable and thus making it look similar to a wired link.


    Wrong buffer. I'm worried about the sending buffer at my end. I can't
    see how the remote buffer would ever be anything but empty since that
    would imply the application was sleeping on the job and not consuming
    the transmitted input

    In my case I was to keep the untransmitted data as small as possible.
    Over my satellite system if I transmit 65Kb, and then later some urgent
    data comes along it ends up on the tail of the other data. This means
    that some minutes will pass by before I can even start to get the urgent
    data out.

    This is also compounded at the remote ISP end which might have several
    megabyte buffers and be queuing up tons of data to squeeze down this
    tiny pipe. Obviously I can't control any QOS settings at the remote end
    because it's not under my control

    Anyway, we are off track. What I really need to do now is figure out
    how to control the sending packet size effectively. Variable MSS will
    be a big help if it can be done?

    Ed W
    In
    > this case you can just rely on the default behaviour of your TCP stacks
    > as they work well for reliable links.
    >
    > Tassilo
     
    Ed W, Aug 17, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stephen Gutknecht \(RS\)
    Replies:
    1
    Views:
    498
    Ravikanth[MVP]
    Aug 1, 2003
  2. Ashoka!
    Replies:
    6
    Views:
    579
    Ashoka!
    Apr 26, 2007
  3. Thomas Jollans
    Replies:
    0
    Views:
    394
    Thomas Jollans
    Aug 2, 2007
  4. Dale Ackerman
    Replies:
    9
    Views:
    228
    Dale Ackerman
    Jan 24, 2010
  5. Meerz

    Ways to find MTU and MSS

    Meerz, Feb 22, 2006, in forum: Perl Misc
    Replies:
    2
    Views:
    103
    Tad McClellan
    Feb 23, 2006
Loading...

Share This Page