Asynchronous http POST?

I

Ivan Shevanski

Hey everyone, I'm new to Ruby and to the mailing list, so go easy.
Basically, I have to POST to a certain url, then I wait for a response.
The catch is that I have to do this to two urls at once. Both of them
may respond to me almost instantly, or they may take up to 10 seconds to
respond. I need to have a post to both of these running at all times to
catch incoming events. I will also need to post to other urls at the
same time that these are running. So, I need to find a way to run these
two posts in the background constantly. From what I've read, ruby
threads will hang on a command like this, since the interpreter does not
have control. Can anyone help (or understand) me?

Thanks,
Ivan
 
E

Ezra Zygmuntowicz

Hey everyone, I'm new to Ruby and to the mailing list, so go easy.
Basically, I have to POST to a certain url, then I wait for a
response.
The catch is that I have to do this to two urls at once. Both of them
may respond to me almost instantly, or they may take up to 10
seconds to
respond. I need to have a post to both of these running at all times
to
catch incoming events. I will also need to post to other urls at the
same time that these are running. So, I need to find a way to run
these
two posts in the background constantly. From what I've read, ruby
threads will hang on a command like this, since the interpreter does
not
have control. Can anyone help (or understand) me?

Thanks,
Ivan



Ivan-

This is a perfect job for eventmachine and em-http-request. You can
run as many async http requests as you want without blocking and
handle the results with callback blocks.

http://github.com/igrigorik/em-http-request/tree/master

Cheers-

Ezra Zygmuntowicz
(e-mail address removed)
 
J

Joel VanderWerf

Ezra said:
Ivan-

This is a perfect job for eventmachine and em-http-request. You can
run as many async http requests as you want without blocking and handle
the results with callback blocks.

In small scale cases (such as a simple client) is there any reason not
to use threads? EM just seems like overkill for a fairly simple client.
 
I

Ivan Shevanski

Joel said:
In small scale cases (such as a simple client) is there any reason not
to use threads? EM just seems like overkill for a fairly simple client.

Apparently, since control is not returned to the interpreter, when one
thread waits the other(s) will not continue. At least that's my
understanding.
 
B

Ben Giddings

Apparently, since control is not returned to the interpreter, when one
thread waits the other(s) will not continue. At least that's my
understanding.

A quick test seems to show that isn't the case. I wrote a simple webrick
servlet that accepts a post request and delays for a specified amount of time
(from the delay parameter to the post), and a client with 2 threads that post
to those URLs and keep track of when things start and end:

delay_servlet.rb:
require 'webrick'
require 'time'

class DelayServlet < WEBrick::HTTPServlet::AbstractServlet
def do_POST(request, response)
start_time = Time.now
delay = 0
if request.query["delay"]
delay = request.query["delay"].to_i
end

sleep(delay)

end_time = Time.now
response.body = "delayed for #{delay}s, started at " +
"#{start_time.iso8601}, ended at #{end_time.iso8601}\n"
end
end

if __FILE__ == $0
server = WEBrick::HTTPServer.new:)Port => 8000)
server.mount("/", DelayServlet)

trap("INT") {server.shutdown}
server.start
end


delay_client.rb:
require 'net/http'
require 'time'

if __FILE__ == $0
puts "Main thread start at #{Time.now.iso8601}"

t1 = Thread.new do
puts "Thread 1 start at #{Time.now.iso8601}"
res = Net::HTTP.post_form(URI.parse('http://localhost:8000/'),
{'delay'=>'5'})
puts "Response: " + res.body
puts "Thread 1 end at #{Time.now.iso8601}"
end

t2 = Thread.new do
puts "Thread 2 start at #{Time.now.iso8601}"
res = Net::HTTP.post_form(URI.parse('http://localhost:8000/'),
{'delay'=>'7'})
puts "Response: " + res.body
puts "Thread 2 end at #{Time.now.iso8601}"
end

t1.join
t2.join
puts "Main thread end at #{Time.now.iso8601}"
end

Output:
Main thread start at 2009-09-10T16:46:17-04:00
Thread 1 start at 2009-09-10T16:46:17-04:00
Thread 2 start at 2009-09-10T16:46:17-04:00
Response: delayed for 5s, started at 2009-09-10T16:46:17-04:00, ended at
2009-09-10T16:46:22-04:00
Thread 1 end at 2009-09-10T16:46:22-04:00
Response: delayed for 7s, started at 2009-09-10T16:46:17-04:00, ended at
2009-09-10T16:46:24-04:00
Thread 2 end at 2009-09-10T16:46:24-04:00
Main thread end at 2009-09-10T16:46:24-04:00

So it sure looks like it isn't blocking all threads when waiting for a HTTP
response.

Ben
 
I

Ivan Shevanski

Ben said:
Apparently, since control is not returned to the interpreter, when one
thread waits the other(s) will not continue. At least that's my
understanding.

A quick test seems to show that isn't the case. I wrote a simple
webrick
servlet that accepts a post request and delays for a specified amount of
time
(from the delay parameter to the post), and a client with 2 threads that
post
to those URLs and keep track of when things start and end:

delay_servlet.rb:
require 'webrick'
require 'time'

class DelayServlet < WEBrick::HTTPServlet::AbstractServlet
def do_POST(request, response)
start_time = Time.now
delay = 0
if request.query["delay"]
delay = request.query["delay"].to_i
end

sleep(delay)

end_time = Time.now
response.body = "delayed for #{delay}s, started at " +
"#{start_time.iso8601}, ended at #{end_time.iso8601}\n"
end
end

if __FILE__ == $0
server = WEBrick::HTTPServer.new:)Port => 8000)
server.mount("/", DelayServlet)

trap("INT") {server.shutdown}
server.start
end


delay_client.rb:
require 'net/http'
require 'time'

if __FILE__ == $0
puts "Main thread start at #{Time.now.iso8601}"

t1 = Thread.new do
puts "Thread 1 start at #{Time.now.iso8601}"
res = Net::HTTP.post_form(URI.parse('http://localhost:8000/'),
{'delay'=>'5'})
puts "Response: " + res.body
puts "Thread 1 end at #{Time.now.iso8601}"
end

t2 = Thread.new do
puts "Thread 2 start at #{Time.now.iso8601}"
res = Net::HTTP.post_form(URI.parse('http://localhost:8000/'),
{'delay'=>'7'})
puts "Response: " + res.body
puts "Thread 2 end at #{Time.now.iso8601}"
end

t1.join
t2.join
puts "Main thread end at #{Time.now.iso8601}"
end

Output:
Main thread start at 2009-09-10T16:46:17-04:00
Thread 1 start at 2009-09-10T16:46:17-04:00
Thread 2 start at 2009-09-10T16:46:17-04:00
Response: delayed for 5s, started at 2009-09-10T16:46:17-04:00, ended at
2009-09-10T16:46:22-04:00
Thread 1 end at 2009-09-10T16:46:22-04:00
Response: delayed for 7s, started at 2009-09-10T16:46:17-04:00, ended at
2009-09-10T16:46:24-04:00
Thread 2 end at 2009-09-10T16:46:24-04:00
Main thread end at 2009-09-10T16:46:24-04:00

So it sure looks like it isn't blocking all threads when waiting for a
HTTP
response.

Ben


Sure looks like you're right. Here's where I got that idea in my head:

http://www.rubycentral.com/pickaxe/tut_threads.html

"""
Multithreading

Often the simplest way to do two things at once is by using Ruby
threads. These are totally in-process, implemented within the Ruby
interpreter. That makes the Ruby threads completely portable---there is
no reliance on the operating system---but you don't get certain benefits
from having native threads. You may experience thread starvation (that's
where a low-priority thread doesn't get a chance to run). If you manage
to get your threads deadlocked, the whole process may grind to a halt.
(!!!) And if some thread happens to make a call to the operating system
that takes a long time to complete, all threads will hang until the
interpreter gets control back. (!!!) However, don't let these
potential problems put you off---Ruby threads are a lightweight and
efficient way to achieve parallelism in your code.
"""

(Sorry, I'm unsure if I'm allowed to use html tags or anything here, but
I think this will do. Looks like the faq link is broken.) Is this a
blatant lie? Maybe someone can explain to me what is actually being
referred to?

Thanks,
Ivan
 
J

Joel VanderWerf

Ivan said:
http://www.rubycentral.com/pickaxe/tut_threads.html

"""
Multithreading

Often the simplest way to do two things at once is by using Ruby
threads. These are totally in-process, implemented within the Ruby
interpreter. That makes the Ruby threads completely portable---there is
no reliance on the operating system---but you don't get certain benefits
from having native threads. You may experience thread starvation (that's
where a low-priority thread doesn't get a chance to run). If you manage
to get your threads deadlocked, the whole process may grind to a halt.
(!!!) And if some thread happens to make a call to the operating system
that takes a long time to complete, all threads will hang until the
interpreter gets control back. (!!!) However, don't let these
potential problems put you off---Ruby threads are a lightweight and
efficient way to achieve parallelism in your code.
"""

Here's my relatively naive understanding of the situation (for MRI, 1.8):

System calls will block all threads, except in a few cases. The
exceptions include:

1. Waiting on IO. Ruby's threads are really an abstraction over a single
native thread calling select() on all the file descriptors that the ruby
threads are waiting on. When a fd is ready for reading, say, the native
thread starts executing the ruby thread that was waiting for that fd.

2. Starting processes and waiting for them to finish. This is why

Thread.new { system "long-running process" }

is a useful idiom (and it even works on windows).

Still, if you expect a lot of threads, EM will probably be much more
efficient instead.

But many other system calls (#flock without the nonblock flag, for
example) will block all ruby threads.
 
I

Ivan Shevanski

Ezra said:
Ivan-

This is a perfect job for eventmachine and em-http-request. You can
run as many async http requests as you want without blocking and
handle the results with callback blocks.

http://github.com/igrigorik/em-http-request/tree/master

Cheers-

Ezra Zygmuntowicz
(e-mail address removed)


I couldn't seem to get this running with threads, so I'm trying
eventmachine. I can get a single post to run fine with callback, but
what do I have to do to get continuous posts running? I need to have a
post to the site going at all times, while handling the responses.
Documentation/examples seem very hard to find. A decent em-http-request
tutorial would be great.
 
C

Clifford Heath

Ivan said:
I couldn't seem to get this running with threads, so I'm trying
eventmachine.

I think EM is overkill here. The following example uses PUT not POST,
but I'm sure you'll be able to adapt it.

require 'net/http'
require 'uri'

# Configuration variables:
THREAD_COUNT = 10
REQUESTS_PER_THREAD = 10
FILENAME = 'file_to_put'
URL = 'http://localhost/DropBox/file_to_put'

# Put a data string to the specified url:
def urlput(url, data)
begin
uri = URI.parse(url)
response = nil
value = nil
Net::HTTP.start(uri.host) { |http|
response, value = http.put(uri.path, data, nil)
}
p response.message if (response.code.to_i >= 300)
rescue => e
p e
end
value
end

# Read the file to put:
data = File.new(FILENAME).read

start = Time.now
$threads = []
(1..THREAD_COUNT).each {|thread|
$threads << Thread.new(thread) { |thread_no|
(1..REQUESTS_PER_THREAD).each {
urlput(URL, data)
}
}
}
$threads.each { |aThread| aThread.join }
puts "#{THREAD_COUNT*REQUESTS_PER_THREAD} requests completed in #{Time.now - start} seconds"

Clifford Heath.
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

Apparently, since control is not returned to the interpreter, when one
thread waits the other(s) will not continue. At least that's my
understanding.

In MRI, you can do multiplex I/O across threads, however the code that
implements this will make your eyes bleed (eval.c)
 
E

Ezra Zygmuntowicz

I think EM is overkill here.

I disagree that EM is overkill here. EM is not a heavyweight library
and does a *much* better job of this type of http async stuff then
threads and net/http does that EM really should be the preferred way
of doing something like this.

require 'eventmachine'

def make_request(site='http://www.website.com/', body={})
http = EventMachine::HttpRequest.new(site).post :body => body
http.errback { p 'request failed' }
http.callback {
p http.response_header.status
p http.response_header
p http.response
}
end


EM.run do
# make a request every 1 second
EM.add_periodic_timer(1) do
make_request "http://foo.com/", :param => 'hi', :param2 => 'there'
end
end


# look ma, no threads but I still get full async network concurrent IO.

Cheers-

Ezra Zygmuntowicz
(e-mail address removed)
 
E

Eric Jensen

I'm not sure why ruby doesn't provide the ability to send the request
without reading the response, but it's fairly trivial to split the
Net::HTTP.request method into two halves to do so, as per below:

require 'net/http'

module Net
class HTTP < Protocol
# pasted first half of HTTP.request that writes the request to the
server,
# does not return HTTPResponse and does not take a block
def request_async(req, body = nil)
if proxy_user()
unless use_ssl?
req.proxy_basic_auth proxy_user(), proxy_pass()
end
end

req.set_body_internal body
begin_transport req
req.exec @socket, @curr_http_version, edit_path(req.path)
end

# second half of HTTP.request that yields or returns the response
def read_response(req, body = nil, &block) # :yield: +response+
begin
res = HTTPResponse.read_new(@socket)
end while res.kind_of?(HTTPContinue)
res.reading_body(@socket, req.response_body_permitted?) {
yield res if block_given?
}
end_transport req, res

res
end
end
end

# Example usage for a non-blocking GET without following redirects:
http = Net::HTTP.new('www.google.com')
req = Net::HTTP::Get.new('/')
http.start
begin
http.request_async(req)
# do other stuff
res = http.read_response(req)
ensure
http.finish
end
res.value # raise if error
p res.body
 
J

Jaap Haagmans

Eric, it's great that you thought about this as I'm currently stuck on
this.

However, your solution won't work. The http.start triggers the
Net::HTTP.start method which can take quite a while to complete. In
fact, it will take much longer than the actual request in cases where
the host wasn't queried for some time (and thus not cached).
 
E

Eric Jensen

Ya, obviously this doesn't parallelize the connect, just the request.
Unless you're doing SSL, the only blocking thing Net::HTTP.connect does
is the underlying TCPSocket.open. If that is your bottleneck and you've
already set open_timeout as low as you can go, then you'd have to patch
deeper to get Net::HTTP to use connect_nonblock as per
http://www.ruby-doc.org/core/classes/Socket.html#M002091 instead of
TCPSocket.open
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top