cURL in ruby? Faster than Net::HTTP?

B

Ben Johnson

I've found a couple of packages that claim to integrate the curl library
into ruby. Which one is the standard library?

Also the reason I am asking is because I did some tests and came to find
out that curl is quite a bit faster than the HTTP library. Is this true,
maybe my tests were distorted, but curl seemed to be quite a bit faster
in initializing the connection and downloading.

Would it be smart of me to switch from Net::HTTP to curl? Because a
tenth of a second is precious in my application.

Thanks for your help.
 
C

Corey Jewett

I can't speak to the speed of any curl library, but I can cite my
recent experience building a crawler like app. I'm using non-blocking
sockets and therefore can't utilize Net::HTTP and am hand-coding HTTP
directly. Under OS X I found a lot of latency (around 100ms) for both
IPSocket.getaddress() and Socket.sockaddr_in(). Under Linux packing
sockaddr seems to incur a negligible cost. Under OS X I pack the
sockaddr manually (yeah, it's gross). To mitigate the host lookup
cost I maintain a cache (Hashmap) of host => IP. (At least under OS
X, even resolving localhost takes 100ms, even on repeat calls.)

The point being that I would assume that Net::HTTP inherits the costs
of these two calls. Which would explain at least some of the
connection slowness. As for download speed, I could only make
guesses, and they'd be pretty uneducated. I would suspect C has
better I/O performance than Ruby, so a native library would probably
be faster.

Corey
 
E

Ezra Zygmuntowicz

I've found a couple of packages that claim to integrate the curl
library
into ruby. Which one is the standard library?

Also the reason I am asking is because I did some tests and came to
find
out that curl is quite a bit faster than the HTTP library. Is this
true,
maybe my tests were distorted, but curl seemed to be quite a bit
faster
in initializing the connection and downloading.

Would it be smart of me to switch from Net::HTTP to curl? Because a
tenth of a second is precious in my application.

Thanks for your help.

Hey Ben-

I haven't used the libcurl bindings myself so I can't comment on
those. But you may want to look at Zed's rfuzz project[1]. It is for
testing web apps but he also says that it is a faster replacement for
net/http. Since its http parser is written in c using the same parser
that mongrel does it should be faster then net/http.

Cheers-
-Ezra

[1] http://www.zedshaw.com/projects/rfuzz/
 
W

why the lucky stiff

Also the reason I am asking is because I did some tests and came to find
out that curl is quite a bit faster than the HTTP library. Is this true,
maybe my tests were distorted, but curl seemed to be quite a bit faster
in initializing the connection and downloading.

The cURL library is indeed very fast, but it also suffers from a problem that
Net::HTTP suffers from: its DNS lookup is not asynchronous and will block your
process. To overcome that, you'll need c-ares[1], which will probably also need
to be wrapped as an extension.

In my experience, Net::HTTP actually performs much better when you use Ruby's
non-blocking DNS resolver:

require 'resolv-replace'

I wrote a cURL extension and benchmarked it against Net::HTTP with
resolv-replace and wasn't completely impressed with the speed difference,
so I abandoned the extension.

_why

[1] http://daniel.haxx.se/projects/c-ares/
 
B

Ben Johnson

why said:
Also the reason I am asking is because I did some tests and came to find
out that curl is quite a bit faster than the HTTP library. Is this true,
maybe my tests were distorted, but curl seemed to be quite a bit faster
in initializing the connection and downloading.

The cURL library is indeed very fast, but it also suffers from a problem
that
Net::HTTP suffers from: its DNS lookup is not asynchronous and will
block your
process. To overcome that, you'll need c-ares[1], which will probably
also need
to be wrapped as an extension.

In my experience, Net::HTTP actually performs much better when you use
Ruby's
non-blocking DNS resolver:

require 'resolv-replace'

I wrote a cURL extension and benchmarked it against Net::HTTP with
resolv-replace and wasn't completely impressed with the speed
difference,
so I abandoned the extension.

_why

[1] http://daniel.haxx.se/projects/c-ares/

What do you mean by the DNY lookup is asynchronous and will block my
process? If I was to call curl directly from the command line using
`curl` in ruby wouldn't that be much faster. In this instance it wo uld
get it's own process and take better advantage of a dual processor
system. Am I correct, because what I planned on doing was just using
curl directly from the command line unless there is a downside to this.
 
S

snacktime

What do you mean by the DNY lookup is asynchronous and will block my
process? If I was to call curl directly from the command line using
`curl` in ruby wouldn't that be much faster. In this instance it wo uld
get it's own process and take better advantage of a dual processor
system. Am I correct, because what I planned on doing was just using
curl directly from the command line unless there is a downside to this.

From my understanding dns lookups block in ruby, as in they stop the
whole program until the dns is resolved. I can't imagine that forking
another process would be more efficient then using net/http.
 
C

Corey Jewett

why said:
Also the reason I am asking is because I did some tests and came
to find
out that curl is quite a bit faster than the HTTP library. Is
this true,
maybe my tests were distorted, but curl seemed to be quite a bit
faster
in initializing the connection and downloading.

The cURL library is indeed very fast, but it also suffers from a
problem
that
Net::HTTP suffers from: its DNS lookup is not asynchronous and will
block your
process. To overcome that, you'll need c-ares[1], which will
probably
also need
to be wrapped as an extension.

In my experience, Net::HTTP actually performs much better when you
use
Ruby's
non-blocking DNS resolver:

require 'resolv-replace'

I wrote a cURL extension and benchmarked it against Net::HTTP with
resolv-replace and wasn't completely impressed with the speed
difference,
so I abandoned the extension.

_why

[1] http://daniel.haxx.se/projects/c-ares/

What do you mean by the DNY lookup is asynchronous and will block my
process? If I was to call curl directly from the command line using
`curl` in ruby wouldn't that be much faster. In this instance it wo
uld
get it's own process and take better advantage of a dual processor
system. Am I correct, because what I planned on doing was just using
curl directly from the command line unless there is a downside to
this.

No Kernel.`` doesn't fork a new process. It blocks your current
process and waits for the subprocess to return. See Kernel.fork and
Process.detach.

Also there's some gems that could probably help you out. Ara T.
Howard's slave[1] comes to mind.

Corey

1. http://codeforpeople.com/lib/ruby/slave/
 
B

Ben Johnson

snacktime said:
From my understanding dns lookups block in ruby, as in they stop the
whole program until the dns is resolved. I can't imagine that forking
another process would be more efficient then using net/http.

In my program each curl request would be in its own thread. I also think
the forking a new process by using `` would be quicker. Mainly because I
am doing this on a dual processor server. Having everything run under
one process doesn't take advantage of that. Lastly, curl has a timeout
variable, so if for some reason the request didn't response it would
time out. I also noticed that running curl and Net::HTTP side by side,
curl wins hands down. There is even a hitch right before the request is
made in Net::HTTP, about .5 to 1 second.

Am I wrong here?

What I'm going to do is probably implement the curl functionaltiy in my
program and post the speed differences for future reference. Unless
someone tells me I'm about going about this all wrong.

Thanks a lot for everyones help.
 
D

daniel.haxx

why said:
The cURL library is indeed very fast, but it also suffers from a problem that
Net::HTTP suffers from: its DNS lookup is not asynchronous and will block your
process.

libcurl offers an asynchronous API that does the name resolving
asynchronously if you've built libcurl to do so.
 
W

why the lucky stiff

libcurl offers an asynchronous API that does the name resolving
asynchronously if you've built libcurl to do so.

Does it use the native getaddrinfo()? The problem I've had on FreeBSD
is that getaddrinfo() will block.

_why
 
D

David Vallner

Ben said:
What do you mean by the DNY lookup is asynchronous and will block my
process? If I was to call curl directly from the command line using
`curl` in ruby wouldn't that be much faster. In this instance it wo uld
get it's own process and take better advantage of a dual processor
system. Am I correct, because what I planned on doing was just using
curl directly from the command line unless there is a downside to this.

Odds are the process startup would take up more time than you'd gain.
IMO that's NOT a good way to leverage a dual-core processor. Doing hacks
like this only makes sense in a CPU-intensive application (which curl
hardly is), and you want to split the work between two (or maybe more)
*threads* more or less equally. You also want these threads being
managed in a thread pool to avoid OS thread initialisation time. For
added hilarity, you need native threads for this, not green threads -
the OS can't schedule those on different cores.

Technically, you could do this using processes instead of threads.
Except once again, you want to outweigh the process initialisation time,
and the time it takes to transfer data between the processes, with the
added performance eliminating context switches brings. Which just might
not be all that easy.

David Vallner
 
B

Ben Johnson

David said:
Odds are the process startup would take up more time than you'd gain.
IMO that's NOT a good way to leverage a dual-core processor. Doing hacks
like this only makes sense in a CPU-intensive application (which curl
hardly is), and you want to split the work between two (or maybe more)
*threads* more or less equally. You also want these threads being
managed in a thread pool to avoid OS thread initialisation time. For
added hilarity, you need native threads for this, not green threads -
the OS can't schedule those on different cores.

Technically, you could do this using processes instead of threads.
Except once again, you want to outweigh the process initialisation time,
and the time it takes to transfer data between the processes, with the
added performance eliminating context switches brings. Which just might
not be all that easy.

David Vallner

Thanks for your response.

I just implemented curl using `curl`. I would say I have about 60 - 100
simulatneous requests going out at the same time. With the switch
between Net::HTTP to using `curl` I noticed a speed increase of almost 3
times. Either Net::HTTP is slow or ruby is slow, but something in
Net::HTTP is slowing it down quite a bit.
 
A

ara.t.howard

Thanks for your response.

I just implemented curl using `curl`. I would say I have about 60 - 100
simulatneous requests going out at the same time. With the switch
between Net::HTTP to using `curl` I noticed a speed increase of almost 3
times. Either Net::HTTP is slow or ruby is slow, but something in
Net::HTTP is slowing it down quite a bit.

search for http reverse dns.

-a
 
B

Ben Johnson

unknown said:
search for http reverse dns.

-a

Can you be a little more specific? Also, what if I was to connect to the
server via the ip address and not the domain name? Would that speed
things up a bit?
 
F

Francis Cianfrocca

Does it use the native getaddrinfo()? The problem I've had on FreeBSD
is that getaddrinfo() will block.

_why

Does it matter whether it blocks or not? Ruby can't schedule its green
threads while you're inside a system-library call unless the call
knows about Ruby's scheduler (which it doesn't). Right?
 
D

daniel.haxx

Does it matter whether it blocks or not? Ruby can't schedule its green
threads while you're inside a system-library call unless the call
knows about Ruby's scheduler (which it doesn't). Right?

You _could_ read up on the libcurl details in the libcurl docs, but
then what fun would that be? Let's continue making assumptions like
this...

No, it is _not_ asynchronous inside a system call and it is _not_ using
the native getaddrinfo() for asynchronous name resolves.
 
F

Francis Cianfrocca

You _could_ read up on the libcurl details in the libcurl docs, but
then what fun would that be? Let's continue making assumptions like
this...

No, it is _not_ asynchronous inside a system call and it is _not_ using
the native getaddrinfo() for asynchronous name resolves.

You may have misunderstood me. Even if libcurl or anything else
resolves names "asynchronously" (which can mean more than one thing),
then does that make it faster on a per-resolution basis, or just more
concurrent? If the former, then I'll read your code to see how you did
it. If the latter, then doesn't a Ruby program need to be written in a
special way in order to benefit from the concurrency?
 
F

Francis Cianfrocca

You _could_ read up on the libcurl details in the libcurl docs, but
then what fun would that be? Let's continue making assumptions like
this...

No, it is _not_ asynchronous inside a system call and it is _not_ using
the native getaddrinfo() for asynchronous name resolves.


Never mind, I figured it out. As I suspected, curl just wrote their
own protocol handler for DNS lookups. They fit it into their
event-driven architecture so name lookups can be happening
simultaneously with other work. I didn't see any cacheing or anything
similar but maybe I didn't look hard enough. As with other approaches,
there's no magic speedup- you still have to write your program in such
a way as to capture the concurrency.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top