ruby 1.8.6, threadpooling and blocking sockets - advice/help

Discussion in 'Ruby' started by Daniel Bush, Oct 19, 2009.

  1. Daniel Bush

    Daniel Bush Guest

    Hi,
    I think I'm running up against ruby 1.8.6's not so
    stellar threading system. Was hoping someone
    could confirm or otherwise point out some flaws.

    Note: I get reasonable performance when running on
    ruby 1.9 it's just 1.8.6 that hangs like a
    deadlock when I start using too many threads in
    one of my test scripts. (My focus is actually
    on 1.9 and jruby anyway).

    Give you an idea:

    I might get a pool of 10 acceptor threads to run
    something like the following (each has their own
    version of this code):

    client, client_sockaddr = @socket.accept
    # Threads block on #accept.
    data = client.recvfrom( 40 )[0].chomp
    @mutex.synchronize do
    puts "#{Thread.current} received #{data}... "
    end
    client.close

    on @socket which was set up like this:

    @socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
    @sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost')
    @socket.bind( @sockaddr )
    @socket.listen( 100 )

    I wanted to create a barrage of requests so next I
    create a pool of requester threads which each run
    something like this:

    socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
    sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost' )
    socket.connect( sockaddr )
    socket.puts "request #{i}"
    socket.close

    All of this in one script. If I have so much as
    2 requester threads in addition to the 10
    acceptors waiting to receive their requests, 1.8.6
    just seizes up before processing anything. If I
    use 2 acceptors and 2 requesters, it works. If I
    use 10 acceptors, 1 requester it works. When it
    does work however, it doesn't appear to schedule
    threads too well; it just seems to use one all the
    time - although this seems to happen only when
    using sockets as opposed to a more general job
    queue.

    I haven't submitted the full code because it uses
    a threadpool library I'm still building/reviewing.

    Regards,

    Daniel Bush
    --
    Posted via http://www.ruby-forum.com/.
    Daniel Bush, Oct 19, 2009
    #1
    1. Advertising

  2. On 10/19/2009 02:51 PM, Daniel Bush wrote:
    > Hi,
    > I think I'm running up against ruby 1.8.6's not so
    > stellar threading system. Was hoping someone
    > could confirm or otherwise point out some flaws.
    >
    > Note: I get reasonable performance when running on
    > ruby 1.9 it's just 1.8.6 that hangs like a
    > deadlock when I start using too many threads in
    > one of my test scripts. (My focus is actually
    > on 1.9 and jruby anyway).
    >
    > Give you an idea:
    >
    > I might get a pool of 10 acceptor threads to run
    > something like the following (each has their own
    > version of this code):
    >
    > client, client_sockaddr = @socket.accept
    > # Threads block on #accept.
    > data = client.recvfrom( 40 )[0].chomp
    > @mutex.synchronize do
    > puts "#{Thread.current} received #{data}... "
    > end
    > client.close
    >
    > on @socket which was set up like this:
    >
    > @socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
    > @sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost')
    > @socket.bind( @sockaddr )
    > @socket.listen( 100 )


    This won't work. You can have only 1 acceptor thread per server socket.
    Typically you dispatch processing *after* the accept to a thread
    (either newly created or taken from a pool).

    I have no idea what the interpreter is going to do if you have multiple
    threads trying to accept from the same socket. In the best case #accept
    is synchronized and only one thread gets to enter it. In worse
    scenarios anything bad may happen.

    > I wanted to create a barrage of requests so next I
    > create a pool of requester threads which each run
    > something like this:
    >
    > socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
    > sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost' )
    > socket.connect( sockaddr )
    > socket.puts "request #{i}"
    > socket.close


    Btw, why don't you use TCPServer and TCPSocket?

    > All of this in one script. If I have so much as
    > 2 requester threads in addition to the 10
    > acceptors waiting to receive their requests, 1.8.6
    > just seizes up before processing anything. If I
    > use 2 acceptors and 2 requesters, it works. If I
    > use 10 acceptors, 1 requester it works. When it
    > does work however, it doesn't appear to schedule
    > threads too well; it just seems to use one all the
    > time - although this seems to happen only when
    > using sockets as opposed to a more general job
    > queue.


    See above.

    > I haven't submitted the full code because it uses
    > a threadpool library I'm still building/reviewing.


    I would rather do something like this (sketeched):

    require 'thread'
    queue = Queue.new
    workers = (1..10).map do
    Thread.new queue do |q|
    until (cl = q.deq).equal? q
    # process data from / for client cl
    begin
    data = cl.gets.chomp
    @mutex.synchronize do
    puts "#{Thread.current} received #{data}..."
    end
    ensure
    cl.close
    end
    end
    end
    end

    server = TCPServer.new ...

    while client = server.accept
    queue.enq client
    end

    # elsewhere


    TCPSocket.open do |sock|
    sock.puts "request"
    end

    Kind regards

    robert


    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Oct 19, 2009
    #2
    1. Advertising

  3. Daniel Bush

    Daniel Bush Guest

    Robert Klemme wrote:
    > On 10/19/2009 02:51 PM, Daniel Bush wrote:
    >>
    >> puts "#{Thread.current} received #{data}... "
    >> end
    >> client.close
    >>
    >> on @socket which was set up like this:
    >>
    >> @socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
    >> @sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost')
    >> @socket.bind( @sockaddr )
    >> @socket.listen( 100 )

    >
    > This won't work. You can have only 1 acceptor thread per server socket.
    > Typically you dispatch processing *after* the accept to a thread
    > (either newly created or taken from a pool).
    >
    > I have no idea what the interpreter is going to do if you have multiple
    > threads trying to accept from the same socket. In the best case #accept
    > is synchronized and only one thread gets to enter it. In worse
    > scenarios anything bad may happen.


    Ok, I wasn't sure if it was appropriate having >1 thread per socket
    instance. It *appears* to work ok on ruby 1.9 up to about 100 socket
    connections - not that that means anything when it comes to testing
    stuff with threads. Maybe if I do 100,000+ I might elicit some type of
    error.

    I was intending to process the result of accept in another pool but I
    was toying with the idea of having 2-3 threads waiting on #accept
    assuming no synchronisation issues. I didn't know if it really mattered
    or not. It might make a difference if you have a large number of
    connections coming in depending on what the acceptor is doing in
    addition; I wasn't sure.

    I guess I'll have to scupper that idea or exhaustively test it to prove
    it works and has benefit - both of which are questionable at this point.

    >
    >> I wanted to create a barrage of requests so next I
    >> create a pool of requester threads which each run
    >> something like this:
    >>
    >> socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
    >> sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost' )
    >> socket.connect( sockaddr )
    >> socket.puts "request #{i}"
    >> socket.close

    >
    > Btw, why don't you use TCPServer and TCPSocket?


    yeah I was going to, I was just going off some examples in the
    documentation and trying to cut my teeth on them and writing some tests.
    But I was heading that way.

    >
    >> queue.

    > See above.
    >
    >> I haven't submitted the full code because it uses
    >> a threadpool library I'm still building/reviewing.

    >
    > I would rather do something like this (sketeched):
    >
    > require 'thread'
    > queue = Queue.new
    > workers = (1..10).map do
    > Thread.new queue do |q|
    > until (cl = q.deq).equal? q
    > # process data from / for client cl
    > begin
    > data = cl.gets.chomp
    > @mutex.synchronize do
    > puts "#{Thread.current} received #{data}..."
    > end
    > ensure
    > cl.close
    > end
    > end
    > end
    > end
    >
    > server = TCPServer.new ...
    >
    > while client = server.accept
    > queue.enq client
    > end
    >
    > # elsewhere
    >
    >
    > TCPSocket.open do |sock|
    > sock.puts "request"
    > end


    Thanks for the example.
    I am scratching my head a little with this line:
    until (cl = q.deq).equal? q

    I'm familiar with Queue and its behaviour.

    Cheers,
    Daniel Bush
    --
    Posted via http://www.ruby-forum.com/.
    Daniel Bush, Oct 20, 2009
    #3
  4. On 20.10.2009 02:31, Daniel Bush wrote:
    > Robert Klemme wrote:
    >> On 10/19/2009 02:51 PM, Daniel Bush wrote:
    >>> puts "#{Thread.current} received #{data}... "
    >>> end
    >>> client.close
    >>>
    >>> on @socket which was set up like this:
    >>>
    >>> @socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
    >>> @sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost')
    >>> @socket.bind( @sockaddr )
    >>> @socket.listen( 100 )

    >> This won't work. You can have only 1 acceptor thread per server socket.
    >> Typically you dispatch processing *after* the accept to a thread
    >> (either newly created or taken from a pool).
    >>
    >> I have no idea what the interpreter is going to do if you have multiple
    >> threads trying to accept from the same socket. In the best case #accept
    >> is synchronized and only one thread gets to enter it. In worse
    >> scenarios anything bad may happen.

    >
    > Ok, I wasn't sure if it was appropriate having >1 thread per socket
    > instance. It *appears* to work ok on ruby 1.9 up to about 100 socket
    > connections - not that that means anything when it comes to testing
    > stuff with threads. Maybe if I do 100,000+ I might elicit some type of
    > error.
    >
    > I was intending to process the result of accept in another pool but I
    > was toying with the idea of having 2-3 threads waiting on #accept
    > assuming no synchronisation issues. I didn't know if it really mattered
    > or not. It might make a difference if you have a large number of
    > connections coming in depending on what the acceptor is doing in
    > addition; I wasn't sure.
    >
    > I guess I'll have to scupper that idea or exhaustively test it to prove
    > it works and has benefit - both of which are questionable at this point.


    Frankly, I wouldn't invest that effort: every example in all programming
    languages I have seen has just a single acceptor thread. Accepting
    socket connections is not an expensive operation so as long as you
    refrain from further processing a single thread is completely sufficient
    for handling accepts.

    >>> I wanted to create a barrage of requests so next I
    >>> create a pool of requester threads which each run
    >>> something like this:
    >>>
    >>> socket = Socket.new( AF_INET, SOCK_STREAM, 0 )
    >>> sockaddr = Socket.pack_sockaddr_in( 2200, 'localhost' )
    >>> socket.connect( sockaddr )
    >>> socket.puts "request #{i}"
    >>> socket.close

    >> Btw, why don't you use TCPServer and TCPSocket?

    >
    > yeah I was going to, I was just going off some examples in the
    > documentation and trying to cut my teeth on them and writing some tests.
    > But I was heading that way.
    >
    >>> queue.

    >> See above.
    >>
    >>> I haven't submitted the full code because it uses
    >>> a threadpool library I'm still building/reviewing.

    >> I would rather do something like this (sketeched):
    >>
    >> require 'thread'
    >> queue = Queue.new
    >> workers = (1..10).map do
    >> Thread.new queue do |q|
    >> until (cl = q.deq).equal? q
    >> # process data from / for client cl
    >> begin
    >> data = cl.gets.chomp
    >> @mutex.synchronize do
    >> puts "#{Thread.current} received #{data}..."
    >> end
    >> ensure
    >> cl.close
    >> end
    >> end
    >> end
    >> end
    >>
    >> server = TCPServer.new ...
    >>
    >> while client = server.accept
    >> queue.enq client
    >> end
    >>
    >> # elsewhere
    >>
    >>
    >> TCPSocket.open do |sock|
    >> sock.puts "request"
    >> end

    >
    > Thanks for the example.
    > I am scratching my head a little with this line:
    > until (cl = q.deq).equal? q
    >
    > I'm familiar with Queue and its behaviour.


    That's the worker thread termination code which basically works by
    checking whether the item fetched from the Queue is the Queue instance
    itself. Actually I omitted the other half of the code (the place which
    puts all those q instances in itself) because I didn't want to make the
    code more complex and also termination condition was unknown (may be a
    signal, a number of handled connections etc.).

    If you want to make termination more readable you can also do something
    like this

    QueueTermination = Object.new
    ....
    until QueueTermination.equal?(cl = q.deq)
    ...
    end

    or

    until QueueTermination == (cl = q.deq)
    ...
    end

    or

    until QueueTermination === (cl = q.deq)
    ...
    end

    The basic idea is to stuff something in the queue which is unambiguously
    identifiable as non work content.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Oct 20, 2009
    #4
  5. Daniel Bush

    Daniel Bush Guest

    Robert Klemme wrote:
    > On 20.10.2009 02:31, Daniel Bush wrote:
    >>>> @socket.bind( @sockaddr )

    >> Ok, I wasn't sure if it was appropriate having >1 thread per socket
    >> addition; I wasn't sure.
    >>
    >> I guess I'll have to scupper that idea or exhaustively test it to prove
    >> it works and has benefit - both of which are questionable at this point.

    >
    > Frankly, I wouldn't invest that effort: every example in all programming
    > languages I have seen has just a single acceptor thread. Accepting
    > socket connections is not an expensive operation so as long as you
    > refrain from further processing a single thread is completely sufficient
    > for handling accepts.
    >
    >>
    >>>
    >>> end
    >>> queue.enq client

    >> I am scratching my head a little with this line:
    >> until (cl = q.deq).equal? q
    >>
    >> I'm familiar with Queue and its behaviour.

    >
    > That's the worker thread termination code which basically works by
    > checking whether the item fetched from the Queue is the Queue instance
    > itself. Actually I omitted the other half of the code (the place which
    > puts all those q instances in itself) because I didn't want to make the
    > code more complex and also termination condition was unknown (may be a
    > signal, a number of handled connections etc.).
    >


    Ok, that's cool. I was pushing termination jobs on the thing I was
    playing with although what you're doing there might be cleaner!

    Thanks for the advice.
    Cheers,

    Daniel Bush
    --
    Posted via http://www.ruby-forum.com/.
    Daniel Bush, Oct 21, 2009
    #5
  6. Robert Klemme wrote:
    > Frankly, I wouldn't invest that effort: every example in all programming
    > languages I have seen has just a single acceptor thread.


    ...or else serializes them so that only one thread accept()s at a time.
    For a proper example look at Apache with preforked workers, and the
    AcceptMutex directive.
    http://httpd.apache.org/docs/2.0/mod/mpm_common.html

    You could try the same approach, and use a ruby Mutex to protect your
    socket#accept - but that could turn out to be more expensive than having
    a single accept thread which dispatches to your worker pool, if you're
    going to have a separate worker pool anyway.
    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Oct 21, 2009
    #6
  7. Daniel Bush

    Daniel Bush Guest

    Brian Candler wrote:
    > Robert Klemme wrote:
    >> Frankly, I wouldn't invest that effort: every example in all programming
    >> languages I have seen has just a single acceptor thread.

    >
    > ...or else serializes them so that only one thread accept()s at a time.
    > For a proper example look at Apache with preforked workers, and the
    > AcceptMutex directive.
    > http://httpd.apache.org/docs/2.0/mod/mpm_common.html
    >


    Cool. Didn't even think to look at what the big boys do.
    Thanks for the pointer.

    > You could try the same approach, and use a ruby Mutex to protect your
    > socket#accept - but that could turn out to be more expensive than having
    > a single accept thread which dispatches to your worker pool, if you're
    > going to have a separate worker pool anyway.


    Yeah, I have a worker pool. I was sort of extrapolating from that and
    having an acceptor pool based around the socket in addition to the
    worker pool.

    I don't have a lot of experience with heavy traffic; but the (naive)
    motivation for this whole thing was to have one acceptor thread
    receiving while the other was pushing on the queue and then swapping
    over and over[1] -- at least to allow people to experiment with that
    sort of thing if they wanted to. But synchronisation issues with the
    extra thread might make things worse. I'm used to trying out duff ideas
    so heck maybe I might take a look at it at some point - if only to get a
    better feel for what's going on at that level.

    Cheers,
    Daniel Bush

    [1] actually, I naively wanted all the threads to block on the socket
    just like they would on a queue. oh well.
    --
    Posted via http://www.ruby-forum.com/.
    Daniel Bush, Oct 21, 2009
    #7
  8. On 21.10.2009 13:49, Daniel Bush wrote:
    > Brian Candler wrote:


    >> You could try the same approach, and use a ruby Mutex to protect your
    >> socket#accept - but that could turn out to be more expensive than having
    >> a single accept thread which dispatches to your worker pool, if you're
    >> going to have a separate worker pool anyway.

    >
    > Yeah, I have a worker pool. I was sort of extrapolating from that and
    > having an acceptor pool based around the socket in addition to the
    > worker pool.
    >
    > I don't have a lot of experience with heavy traffic; but the (naive)
    > motivation for this whole thing was to have one acceptor thread
    > receiving while the other was pushing on the queue and then swapping
    > over and over[1]


    You need to synchronize anyway (at least on the queue) so adding another
    synchronization point (at accept) won't gain you much I guess. As Brian
    said, the effect can be the opposite - and nobody seems to do it anyway.
    As said, accepting connections is a pretty cheap operation.

    > [1] actually, I naively wanted all the threads to block on the socket
    > just like they would on a queue. oh well.


    You should also note that the network layer has its own queue at the
    socket (you can control its size as well). So even if a single thread
    would temporarily not be sufficient connection requests are not
    necessarily rejected. Basically you have

    connect -> [network layer waiting queue] -> accept -> [ruby processing
    queue]

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Oct 21, 2009
    #8
  9. Daniel Bush

    Tony Arcieri Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Wed, Oct 21, 2009 at 5:49 AM, Daniel Bush <> wrote:

    > I don't have a lot of experience with heavy traffic; but the (naive)
    > motivation for this whole thing was to have one acceptor thread
    > receiving while the other was pushing on the queue and then swapping
    > over and over[1] -- at least to allow people to experiment with that
    > sort of thing if they wanted to. But synchronisation issues with the
    > extra thread might make things worse. I'm used to trying out duff ideas
    > so heck maybe I might take a look at it at some point - if only to get a
    > better feel for what's going on at that level.
    >


    You might look at an event framework like EventMachine or my own Rev (
    http://rev.rubyforge.org/) as a less error prone and high performance
    alternative to threads.

    The disadvantage of this approach is the need to invert control (event
    frameworks are asynchronous), however it will resolve the synchronization
    issues.

    --
    Tony Arcieri
    Medioh/Nagravision
    Tony Arcieri, Oct 21, 2009
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hendra Gunawan
    Replies:
    1
    Views:
    12,539
    Allan Herriman
    Apr 8, 2004
  2. Tero Saarni
    Replies:
    2
    Views:
    288
    Tero Saarni
    Aug 7, 2003
  3. Tim Black
    Replies:
    1
    Views:
    1,092
    Alan Kennedy
    Aug 3, 2004
  4. nukleus
    Replies:
    14
    Views:
    832
    Chris Uppal
    Jan 22, 2007
  5. Ram  Prasad

    Threadpooling in Linux

    Ram Prasad, Dec 1, 2011, in forum: C Programming
    Replies:
    4
    Views:
    275
    Jorgen Grahn
    Dec 4, 2011
Loading...

Share This Page