Net:HTTP performance downloading large files

Discussion in 'Ruby' started by Chad Burt, Nov 2, 2006.

  1. Chad Burt

    Chad Burt Guest

    Hi folks,
    I'm working on my first ruby/rails project and have run into my first
    major problem.
    My application uses a web api for a scientific database and I am trying
    to download a very large file(150mb) using Net:HTTP.post_form. This
    operation is turning out to be very slow.

    Net:HTTP is at least 10x slower than using my browser to download this
    file on my local network. It also pegs one of my processors while doing
    it.

    It seems to have something to do with a "green thread" issue explained
    here :
    http://headius.blogspot.com/2006_06_01_archive.html#114996043877111235

    Are there any alternatives to using Net:HTTP to download files off the
    web with ruby?


    ---------- Code in question -----------
    def Metacat.read(docid)
    #load uri set in environment.rb
    uri = URI.parse(Path_to_metacat)
    #uri.query = "action=read&qformat=xml&docid=#{docid}"
    response = Net::HTTP.post_form(uri, {
    'action' => 'read',
    'qformat' => 'xml',
    'docid' => docid
    })
    #this line will raise an exception if post failed
    response.value
    if(response.content_type == "text/xml")
    doc = REXML::Document.new(response.body)
    #check to see if Metacat is sending an error message or EML
    if(doc.root.name == 'error')
    nil
    else
    Eml.new(response.body)
    end
    elsif(response.content_type == "text/plain")
    DataTable.new(docid, response.body)
    end
    end
    --------------------------------------

    File I'm trying to download :
    http://data.piscoweb.org/catalog/me...=xml&docid=HMS001_020ADCP019R00_20060612.40.1

    --
    Posted via http://www.ruby-forum.com/.
    Chad Burt, Nov 2, 2006
    #1
    1. Advertising

  2. Chad Burt

    Craig Beck Guest

    How about just calling out to curl?

    --
    Craig Beck

    AIM: kreiggers
    Craig Beck, Nov 2, 2006
    #2
    1. Advertising

  3. Chad;

    This might be a stupid question, but it's always worth asking just in
    case =). Are you sure that the performance problem in the code above
    is in fetching the document. A 150M xml file can take a long time to
    parse into a REXML document.

    Actually, in a quick read of this:

    > if(response.content_type == "text/xml")
    > doc = REXML::Document.new(response.body)
    > #check to see if Metacat is sending an error message or EML
    > if(doc.root.name == 'error')
    > nil
    > else
    > Eml.new(response.body)
    > end
    > elsif(response.content_type == "text/plain")
    > DataTable.new(docid, response.body)
    > end
    > end


    It doesn't look like your using 'doc' to do anything except check the
    root node. Meanwhile REXML has to parse the entire document--tree
    parser. You might want to give one of the streaming parsers a shot.


    --
    Lou.
    Louis J Scoras, Nov 2, 2006
    #3
  4. On Thu, Nov 02, 2006 at 09:43:30AM +0900, Chad Burt wrote:
    > Hi folks,
    > I'm working on my first ruby/rails project and have run into my first
    > major problem.
    > My application uses a web api for a scientific database and I am trying
    > to download a very large file(150mb) using Net:HTTP.post_form. This
    > operation is turning out to be very slow.
    >
    > Net:HTTP is at least 10x slower than using my browser to download this
    > file on my local network. It also pegs one of my processors while doing
    > it.
    >
    > It seems to have something to do with a "green thread" issue explained
    > here :
    > http://headius.blogspot.com/2006_06_01_archive.html#114996043877111235


    It probably has more to do with the buffer size used in Net::HTTP.
    Check out this thread:

    http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/202223

    Hope that helps!

    --
    Aaron Patterson
    http://tenderlovemaking.com/
    Aaron Patterson, Nov 2, 2006
    #4
  5. Chad Burt

    Chad Burt Guest


    > http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/202223


    Thanks. Found this file:


    and changed

    def rbuf_fill
    timeout(@read_timeout) {
    #changed by cburt
    @rbuf << @io.sysread(1024)
    }
    end

    to

    def rbuf_fill
    timeout(@read_timeout) {
    #changed by cburt
    @rbuf << @io.sysread(16384)
    }
    end

    Now downloading a 150MB file takes 25 seconds compared to 21 seconds for
    straight curl and 40-sum seconds for curl using a popen.

    The problem now is that I have a web API client that I was going to
    package into a ruby-gem that would have been easy to install. Now I have
    to tell people to start hacking the standard library if they want to use
    it. Uhhg!

    --
    Posted via http://www.ruby-forum.com/.
    Chad Burt, Nov 14, 2006
    #5
  6. Chad Burt

    Chad Burt Guest

    Chad Burt, Nov 14, 2006
    #6
  7. On 11/14/06, Chad Burt <> wrote:
    > Thanks. Found this file: /usr/local/lib/ruby/1.8/net/protocol.rb
    > and changed
    > ...
    > The problem now is that I have a web API client that I was going to
    > package into a ruby-gem that would have been easy to install. Now I have
    > to tell people to start hacking the standard library if they want to use
    > it. Uhhg!


    With Ruby's open classes, you shouldn't have to. At the top of your
    file/library/program, just open the class you'd like to modify, in
    this case the module Net and the class BufferedIO, and do what you
    want.

    Something like:
    module Net
    class BufferedIO
    def rbuf_fill
    timeout(@read_timeout) {
    #changed by cburt to a much larger buffer for speed
    @rbuf << @io.sysread(16384)
    }
    end
    end
    end

    Note: Modifying the standard library is usually considered bad form,
    but if you know what you're doing and are explicit about it it's
    usually ok.


    HTH,
    Keith
    Keith Fahlgren, Nov 14, 2006
    #7
  8. Chad Burt

    Chad Burt Guest

    Chad Burt, Nov 14, 2006
    #8
  9. Comfort Eagle, Nov 24, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    326
  2. mwt
    Replies:
    10
    Views:
    652
    Fuzzyman
    Feb 13, 2006
  3. Simon Clark
    Replies:
    0
    Views:
    124
    Simon Clark
    Feb 2, 2005
  4. Michael C. Gates

    Downloading Large Files

    Michael C. Gates, Jan 27, 2004, in forum: ASP General
    Replies:
    0
    Views:
    161
    Michael C. Gates
    Jan 27, 2004
  5. thehercman
    Replies:
    0
    Views:
    111
    thehercman
    Jan 20, 2006
Loading...

Share This Page