cURL in ruby? Faster than Net::HTTP?

Discussion in 'Ruby' started by Ben Johnson, Aug 23, 2006.

  1. Ben Johnson

    Ben Johnson Guest

    I've found a couple of packages that claim to integrate the curl library
    into ruby. Which one is the standard library?

    Also the reason I am asking is because I did some tests and came to find
    out that curl is quite a bit faster than the HTTP library. Is this true,
    maybe my tests were distorted, but curl seemed to be quite a bit faster
    in initializing the connection and downloading.

    Would it be smart of me to switch from Net::HTTP to curl? Because a
    tenth of a second is precious in my application.

    Thanks for your help.

    --
    Posted via http://www.ruby-forum.com/.
     
    Ben Johnson, Aug 23, 2006
    #1
    1. Advertising

  2. Ben Johnson

    Corey Jewett Guest

    I can't speak to the speed of any curl library, but I can cite my
    recent experience building a crawler like app. I'm using non-blocking
    sockets and therefore can't utilize Net::HTTP and am hand-coding HTTP
    directly. Under OS X I found a lot of latency (around 100ms) for both
    IPSocket.getaddress() and Socket.sockaddr_in(). Under Linux packing
    sockaddr seems to incur a negligible cost. Under OS X I pack the
    sockaddr manually (yeah, it's gross). To mitigate the host lookup
    cost I maintain a cache (Hashmap) of host => IP. (At least under OS
    X, even resolving localhost takes 100ms, even on repeat calls.)

    The point being that I would assume that Net::HTTP inherits the costs
    of these two calls. Which would explain at least some of the
    connection slowness. As for download speed, I could only make
    guesses, and they'd be pretty uneducated. I would suspect C has
    better I/O performance than Ruby, so a native library would probably
    be faster.

    Corey


    On Aug 22, 2006, at 8:17 PM, Ben Johnson wrote:

    > I've found a couple of packages that claim to integrate the curl
    > library
    > into ruby. Which one is the standard library?
    >
    > Also the reason I am asking is because I did some tests and came to
    > find
    > out that curl is quite a bit faster than the HTTP library. Is this
    > true,
    > maybe my tests were distorted, but curl seemed to be quite a bit
    > faster
    > in initializing the connection and downloading.
    >
    > Would it be smart of me to switch from Net::HTTP to curl? Because a
    > tenth of a second is precious in my application.
    >
    > Thanks for your help.
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
     
    Corey Jewett, Aug 23, 2006
    #2
    1. Advertising

  3. On Aug 22, 2006, at 8:17 PM, Ben Johnson wrote:

    > I've found a couple of packages that claim to integrate the curl
    > library
    > into ruby. Which one is the standard library?
    >
    > Also the reason I am asking is because I did some tests and came to
    > find
    > out that curl is quite a bit faster than the HTTP library. Is this
    > true,
    > maybe my tests were distorted, but curl seemed to be quite a bit
    > faster
    > in initializing the connection and downloading.
    >
    > Would it be smart of me to switch from Net::HTTP to curl? Because a
    > tenth of a second is precious in my application.
    >
    > Thanks for your help.
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >


    Hey Ben-

    I haven't used the libcurl bindings myself so I can't comment on
    those. But you may want to look at Zed's rfuzz project[1]. It is for
    testing web apps but he also says that it is a faster replacement for
    net/http. Since its http parser is written in c using the same parser
    that mongrel does it should be faster then net/http.

    Cheers-
    -Ezra

    [1] http://www.zedshaw.com/projects/rfuzz/
     
    Ezra Zygmuntowicz, Aug 23, 2006
    #3
  4. On Wed, Aug 23, 2006 at 12:17:42PM +0900, Ben Johnson wrote:
    > Also the reason I am asking is because I did some tests and came to find
    > out that curl is quite a bit faster than the HTTP library. Is this true,
    > maybe my tests were distorted, but curl seemed to be quite a bit faster
    > in initializing the connection and downloading.


    The cURL library is indeed very fast, but it also suffers from a problem that
    Net::HTTP suffers from: its DNS lookup is not asynchronous and will block your
    process. To overcome that, you'll need c-ares[1], which will probably also need
    to be wrapped as an extension.

    In my experience, Net::HTTP actually performs much better when you use Ruby's
    non-blocking DNS resolver:

    require 'resolv-replace'

    I wrote a cURL extension and benchmarked it against Net::HTTP with
    resolv-replace and wasn't completely impressed with the speed difference,
    so I abandoned the extension.

    _why

    [1] http://daniel.haxx.se/projects/c-ares/
     
    why the lucky stiff, Aug 23, 2006
    #4
  5. Ben Johnson

    Ben Johnson Guest

    why the lucky stiff wrote:
    > On Wed, Aug 23, 2006 at 12:17:42PM +0900, Ben Johnson wrote:
    >> Also the reason I am asking is because I did some tests and came to find
    >> out that curl is quite a bit faster than the HTTP library. Is this true,
    >> maybe my tests were distorted, but curl seemed to be quite a bit faster
    >> in initializing the connection and downloading.

    >
    > The cURL library is indeed very fast, but it also suffers from a problem
    > that
    > Net::HTTP suffers from: its DNS lookup is not asynchronous and will
    > block your
    > process. To overcome that, you'll need c-ares[1], which will probably
    > also need
    > to be wrapped as an extension.
    >
    > In my experience, Net::HTTP actually performs much better when you use
    > Ruby's
    > non-blocking DNS resolver:
    >
    > require 'resolv-replace'
    >
    > I wrote a cURL extension and benchmarked it against Net::HTTP with
    > resolv-replace and wasn't completely impressed with the speed
    > difference,
    > so I abandoned the extension.
    >
    > _why
    >
    > [1] http://daniel.haxx.se/projects/c-ares/


    What do you mean by the DNY lookup is asynchronous and will block my
    process? If I was to call curl directly from the command line using
    `curl` in ruby wouldn't that be much faster. In this instance it wo uld
    get it's own process and take better advantage of a dual processor
    system. Am I correct, because what I planned on doing was just using
    curl directly from the command line unless there is a downside to this.

    --
    Posted via http://www.ruby-forum.com/.
     
    Ben Johnson, Aug 23, 2006
    #5
  6. Ben Johnson

    snacktime Guest

    >
    > What do you mean by the DNY lookup is asynchronous and will block my
    > process? If I was to call curl directly from the command line using
    > `curl` in ruby wouldn't that be much faster. In this instance it wo uld
    > get it's own process and take better advantage of a dual processor
    > system. Am I correct, because what I planned on doing was just using
    > curl directly from the command line unless there is a downside to this.
    >


    From my understanding dns lookups block in ruby, as in they stop the
    whole program until the dns is resolved. I can't imagine that forking
    another process would be more efficient then using net/http.
     
    snacktime, Aug 23, 2006
    #6
  7. Ben Johnson

    Corey Jewett Guest

    On Aug 22, 2006, at 9:21 PM, Ben Johnson wrote:

    > why the lucky stiff wrote:
    >> On Wed, Aug 23, 2006 at 12:17:42PM +0900, Ben Johnson wrote:
    >>> Also the reason I am asking is because I did some tests and came
    >>> to find
    >>> out that curl is quite a bit faster than the HTTP library. Is
    >>> this true,
    >>> maybe my tests were distorted, but curl seemed to be quite a bit
    >>> faster
    >>> in initializing the connection and downloading.

    >>
    >> The cURL library is indeed very fast, but it also suffers from a
    >> problem
    >> that
    >> Net::HTTP suffers from: its DNS lookup is not asynchronous and will
    >> block your
    >> process. To overcome that, you'll need c-ares[1], which will
    >> probably
    >> also need
    >> to be wrapped as an extension.
    >>
    >> In my experience, Net::HTTP actually performs much better when you
    >> use
    >> Ruby's
    >> non-blocking DNS resolver:
    >>
    >> require 'resolv-replace'
    >>
    >> I wrote a cURL extension and benchmarked it against Net::HTTP with
    >> resolv-replace and wasn't completely impressed with the speed
    >> difference,
    >> so I abandoned the extension.
    >>
    >> _why
    >>
    >> [1] http://daniel.haxx.se/projects/c-ares/

    >
    > What do you mean by the DNY lookup is asynchronous and will block my
    > process? If I was to call curl directly from the command line using
    > `curl` in ruby wouldn't that be much faster. In this instance it wo
    > uld
    > get it's own process and take better advantage of a dual processor
    > system. Am I correct, because what I planned on doing was just using
    > curl directly from the command line unless there is a downside to
    > this.


    No Kernel.`` doesn't fork a new process. It blocks your current
    process and waits for the subprocess to return. See Kernel.fork and
    Process.detach.

    Also there's some gems that could probably help you out. Ara T.
    Howard's slave[1] comes to mind.

    Corey

    1. http://codeforpeople.com/lib/ruby/slave/
     
    Corey Jewett, Aug 23, 2006
    #7
  8. Ben Johnson

    Ben Johnson Guest

    snacktime wrote:
    >>
    >> What do you mean by the DNY lookup is asynchronous and will block my
    >> process? If I was to call curl directly from the command line using
    >> `curl` in ruby wouldn't that be much faster. In this instance it wo uld
    >> get it's own process and take better advantage of a dual processor
    >> system. Am I correct, because what I planned on doing was just using
    >> curl directly from the command line unless there is a downside to this.
    >>

    >
    > From my understanding dns lookups block in ruby, as in they stop the
    > whole program until the dns is resolved. I can't imagine that forking
    > another process would be more efficient then using net/http.


    In my program each curl request would be in its own thread. I also think
    the forking a new process by using `` would be quicker. Mainly because I
    am doing this on a dual processor server. Having everything run under
    one process doesn't take advantage of that. Lastly, curl has a timeout
    variable, so if for some reason the request didn't response it would
    time out. I also noticed that running curl and Net::HTTP side by side,
    curl wins hands down. There is even a hitch right before the request is
    made in Net::HTTP, about .5 to 1 second.

    Am I wrong here?

    What I'm going to do is probably implement the curl functionaltiy in my
    program and post the speed differences for future reference. Unless
    someone tells me I'm about going about this all wrong.

    Thanks a lot for everyones help.

    --
    Posted via http://www.ruby-forum.com/.
     
    Ben Johnson, Aug 23, 2006
    #8
  9. Ben Johnson

    Guest

    why the lucky stiff wrote:

    > The cURL library is indeed very fast, but it also suffers from a problem that
    > Net::HTTP suffers from: its DNS lookup is not asynchronous and will block your
    > process.


    libcurl offers an asynchronous API that does the name resolving
    asynchronously if you've built libcurl to do so.
     
    , Aug 24, 2006
    #9
  10. On Thu, Aug 24, 2006 at 04:15:02PM +0900, wrote:
    > libcurl offers an asynchronous API that does the name resolving
    > asynchronously if you've built libcurl to do so.


    Does it use the native getaddrinfo()? The problem I've had on FreeBSD
    is that getaddrinfo() will block.

    _why
     
    why the lucky stiff, Aug 24, 2006
    #10
  11. Re: cURL in ruby? Faster than Net::HTTP? [OT]

    Ben Johnson wrote:
    > What do you mean by the DNY lookup is asynchronous and will block my
    > process? If I was to call curl directly from the command line using
    > `curl` in ruby wouldn't that be much faster. In this instance it wo uld
    > get it's own process and take better advantage of a dual processor
    > system. Am I correct, because what I planned on doing was just using
    > curl directly from the command line unless there is a downside to this.


    Odds are the process startup would take up more time than you'd gain.
    IMO that's NOT a good way to leverage a dual-core processor. Doing hacks
    like this only makes sense in a CPU-intensive application (which curl
    hardly is), and you want to split the work between two (or maybe more)
    *threads* more or less equally. You also want these threads being
    managed in a thread pool to avoid OS thread initialisation time. For
    added hilarity, you need native threads for this, not green threads -
    the OS can't schedule those on different cores.

    Technically, you could do this using processes instead of threads.
    Except once again, you want to outweigh the process initialisation time,
    and the time it takes to transfer data between the processes, with the
    added performance eliminating context switches brings. Which just might
    not be all that easy.

    David Vallner
     
    David Vallner, Aug 24, 2006
    #11
  12. Ben Johnson

    Ben Johnson Guest

    David Vallner wrote:
    > Ben Johnson wrote:
    >> What do you mean by the DNY lookup is asynchronous and will block my
    >> process? If I was to call curl directly from the command line using
    >> `curl` in ruby wouldn't that be much faster. In this instance it wo uld
    >> get it's own process and take better advantage of a dual processor
    >> system. Am I correct, because what I planned on doing was just using
    >> curl directly from the command line unless there is a downside to this.

    >
    > Odds are the process startup would take up more time than you'd gain.
    > IMO that's NOT a good way to leverage a dual-core processor. Doing hacks
    > like this only makes sense in a CPU-intensive application (which curl
    > hardly is), and you want to split the work between two (or maybe more)
    > *threads* more or less equally. You also want these threads being
    > managed in a thread pool to avoid OS thread initialisation time. For
    > added hilarity, you need native threads for this, not green threads -
    > the OS can't schedule those on different cores.
    >
    > Technically, you could do this using processes instead of threads.
    > Except once again, you want to outweigh the process initialisation time,
    > and the time it takes to transfer data between the processes, with the
    > added performance eliminating context switches brings. Which just might
    > not be all that easy.
    >
    > David Vallner


    Thanks for your response.

    I just implemented curl using `curl`. I would say I have about 60 - 100
    simulatneous requests going out at the same time. With the switch
    between Net::HTTP to using `curl` I noticed a speed increase of almost 3
    times. Either Net::HTTP is slow or ruby is slow, but something in
    Net::HTTP is slowing it down quite a bit.

    --
    Posted via http://www.ruby-forum.com/.
     
    Ben Johnson, Aug 24, 2006
    #12
  13. Ben Johnson

    Guest

    On Fri, 25 Aug 2006, Ben Johnson wrote:

    >
    > Thanks for your response.
    >
    > I just implemented curl using `curl`. I would say I have about 60 - 100
    > simulatneous requests going out at the same time. With the switch
    > between Net::HTTP to using `curl` I noticed a speed increase of almost 3
    > times. Either Net::HTTP is slow or ruby is slow, but something in
    > Net::HTTP is slowing it down quite a bit.


    search for http reverse dns.

    -a
    --
    to foster inner awareness, introspection, and reasoning is more efficient than
    meditation and prayer.
    - h.h. the 14th dalai lama
     
    , Aug 24, 2006
    #13
  14. Ben Johnson

    Ben Johnson Guest

    unknown wrote:
    > On Fri, 25 Aug 2006, Ben Johnson wrote:
    >
    >>
    >> Thanks for your response.
    >>
    >> I just implemented curl using `curl`. I would say I have about 60 - 100
    >> simulatneous requests going out at the same time. With the switch
    >> between Net::HTTP to using `curl` I noticed a speed increase of almost 3
    >> times. Either Net::HTTP is slow or ruby is slow, but something in
    >> Net::HTTP is slowing it down quite a bit.

    >
    > search for http reverse dns.
    >
    > -a


    Can you be a little more specific? Also, what if I was to connect to the
    server via the ip address and not the domain name? Would that speed
    things up a bit?

    --
    Posted via http://www.ruby-forum.com/.
     
    Ben Johnson, Aug 24, 2006
    #14
  15. On 8/24/06, why the lucky stiff <> wrote:
    > On Thu, Aug 24, 2006 at 04:15:02PM +0900, wrote:
    > > libcurl offers an asynchronous API that does the name resolving
    > > asynchronously if you've built libcurl to do so.

    >
    > Does it use the native getaddrinfo()? The problem I've had on FreeBSD
    > is that getaddrinfo() will block.
    >
    > _why
    >
    >


    Does it matter whether it blocks or not? Ruby can't schedule its green
    threads while you're inside a system-library call unless the call
    knows about Ruby's scheduler (which it doesn't). Right?
     
    Francis Cianfrocca, Aug 24, 2006
    #15
  16. Ben Johnson

    Guest

    Francis Cianfrocca wrote:

    > > > libcurl offers an asynchronous API that does the name resolving
    > > > asynchronously if you've built libcurl to do so.

    > >
    > > Does it use the native getaddrinfo()? The problem I've had on FreeBSD
    > > is that getaddrinfo() will block.


    > Does it matter whether it blocks or not? Ruby can't schedule its green
    > threads while you're inside a system-library call unless the call
    > knows about Ruby's scheduler (which it doesn't). Right?


    You _could_ read up on the libcurl details in the libcurl docs, but
    then what fun would that be? Let's continue making assumptions like
    this...

    No, it is _not_ asynchronous inside a system call and it is _not_ using
    the native getaddrinfo() for asynchronous name resolves.
     
    , Aug 25, 2006
    #16
  17. On 8/25/06, <> wrote:
    > Francis Cianfrocca wrote:
    >
    > > > > libcurl offers an asynchronous API that does the name resolving
    > > > > asynchronously if you've built libcurl to do so.
    > > >
    > > > Does it use the native getaddrinfo()? The problem I've had on FreeBSD
    > > > is that getaddrinfo() will block.

    >
    > > Does it matter whether it blocks or not? Ruby can't schedule its green
    > > threads while you're inside a system-library call unless the call
    > > knows about Ruby's scheduler (which it doesn't). Right?

    >
    > You _could_ read up on the libcurl details in the libcurl docs, but
    > then what fun would that be? Let's continue making assumptions like
    > this...
    >
    > No, it is _not_ asynchronous inside a system call and it is _not_ using
    > the native getaddrinfo() for asynchronous name resolves.
    >
    >
    >


    You may have misunderstood me. Even if libcurl or anything else
    resolves names "asynchronously" (which can mean more than one thing),
    then does that make it faster on a per-resolution basis, or just more
    concurrent? If the former, then I'll read your code to see how you did
    it. If the latter, then doesn't a Ruby program need to be written in a
    special way in order to benefit from the concurrency?
     
    Francis Cianfrocca, Aug 25, 2006
    #17
  18. On 8/25/06, <> wrote:
    > You _could_ read up on the libcurl details in the libcurl docs, but
    > then what fun would that be? Let's continue making assumptions like
    > this...
    >
    > No, it is _not_ asynchronous inside a system call and it is _not_ using
    > the native getaddrinfo() for asynchronous name resolves.
    >



    Never mind, I figured it out. As I suspected, curl just wrote their
    own protocol handler for DNS lookups. They fit it into their
    event-driven architecture so name lookups can be happening
    simultaneously with other work. I didn't see any cacheing or anything
    similar but maybe I didn't look hard enough. As with other approaches,
    there's no magic speedup- you still have to write your program in such
    a way as to capture the concurrency.
     
    Francis Cianfrocca, Aug 25, 2006
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John D Giotta

    HTTP POST File without cURL

    John D Giotta, Sep 9, 2009, in forum: Python
    Replies:
    3
    Views:
    1,826
    John Giotta
    Sep 11, 2009
  2. Daniel Berger

    Ruby wrapper for curl

    Daniel Berger, May 13, 2005, in forum: Ruby
    Replies:
    0
    Views:
    182
    Daniel Berger
    May 13, 2005
  3. Replies:
    2
    Views:
    132
  4. Kad Kerforn
    Replies:
    2
    Views:
    133
    Kad Kerforn
    Aug 28, 2010
  5. wkhedr

    Curl/Perl http post performanc issue

    wkhedr, Aug 3, 2006, in forum: Perl Misc
    Replies:
    3
    Views:
    330
    wkhedr
    Aug 3, 2006
Loading...

Share This Page