Efficient file downloading

Discussion in 'Ruby' started by Kyle Hunter, Feb 22, 2008.

  1. Kyle Hunter

    Kyle Hunter Guest

    Hello,

    I'm using open-uri to download files using a buffer. It seems very
    inefficient in terms of resource usage (CPU is ~10-20% in usage).

    If possible, I'd like some suggestions for downloading a file which
    names the outputted file the same as the URL, and does not actually
    write if the file comes out to a 404 (or some other exception hits).

    Current code:
    BUFFER_SIZE=4096
    def download(url)
    from = open(url)
    if (buffer = from.read(BUFFER_SIZE))
    puts "Downloading #{url}"
    File.open(url.split('/').last, 'wb') do |file|
    begin
    file.write(buffer)
    end while (buffer = from.read(BUFFER_SIZE))
    end
    end
    end
    --
    Posted via http://www.ruby-forum.com/.
    Kyle Hunter, Feb 22, 2008
    #1
    1. Advertising

  2. Kyle Hunter

    Kyle Hunter Guest

    To clarify, I mean the file-name should be the same as it is on the web,
    not the same as the URL.
    --
    Posted via http://www.ruby-forum.com/.
    Kyle Hunter, Feb 22, 2008
    #2
    1. Advertising

  3. Kyle Hunter

    James Tucker Guest

    On 22 Feb 2008, at 01:54, Kyle Hunter wrote:

    > Hello,
    >
    > I'm using open-uri to download files using a buffer. It seems very
    > inefficient in terms of resource usage (CPU is ~10-20% in usage).
    >
    > If possible, I'd like some suggestions for downloading a file which
    > names the outputted file the same as the URL, and does not actually
    > write if the file comes out to a 404 (or some other exception hits).
    >
    > Current code:
    > BUFFER_SIZE=4096


    Try making that a lot lot bigger.

    >
    > def download(url)
    > from = open(url)
    > if (buffer = from.read(BUFFER_SIZE))
    > puts "Downloading #{url}"
    > File.open(url.split('/').last, 'wb') do |file|
    > begin
    > file.write(buffer)
    > end while (buffer = from.read(BUFFER_SIZE))
    > end
    > end
    > end
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    James Tucker, Feb 22, 2008
    #3
  4. Kyle Hunter

    Kyle Hunter Guest

    James Tucker wrote:
    > On 22 Feb 2008, at 01:54, Kyle Hunter wrote:
    >
    >> BUFFER_SIZE=4096

    > Try making that a lot lot bigger.


    Doh! Thanks James. Brings it down to much more reasonable usage. I
    totally overlooked that very small buffer size that was set - thanks.
    --
    Posted via http://www.ruby-forum.com/.
    Kyle Hunter, Feb 22, 2008
    #4
  5. Kyle Hunter

    fedzor Guest

    On Feb 21, 2008, at 8:54 PM, Kyle Hunter wrote:

    > Hello,
    >
    > I'm using open-uri to download files using a buffer. It seems very
    > inefficient in terms of resource usage (CPU is ~10-20% in usage).
    >
    > If possible, I'd like some suggestions for downloading a file which
    > names the outputted file the same as the URL, and does not actually
    > write if the file comes out to a 404 (or some other exception hits).
    >
    > Current code:
    > BUFFER_SIZE=4096
    > def download(url)
    > from = open(url)
    > if (buffer = from.read(BUFFER_SIZE))
    > puts "Downloading #{url}"
    > File.open(url.split('/').last, 'wb') do |file|
    > begin
    > file.write(buffer)
    > end while (buffer = from.read(BUFFER_SIZE))
    > end
    > end
    > end


    $ sudo gem install snoopy
    $ snoopy http://en.wikipedia.org/wiki/Main_Page
    => file Main_Page

    Ta dah! there's a lot of magic behind it right now, and torrentz
    don't work (fixed on my machine, need to release it). It does
    segmented downloading, ideal for large files. For smaller ones, it
    still works fine.

    The problem with open-uri is this: it downloads the whole thing to
    your tmp directory first, so using the BUFFER_SIZE thing won't
    actually help.

    snoopy won't not write the file if there's an error.

    -------------------------------------------------------|
    ~ Ari
    Some people want love
    Others want money
    Me... Well...
    I just want this code to compile
    fedzor, Feb 22, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. darrel
    Replies:
    2
    Views:
    316
    darrel
    Mar 1, 2005
  2. Materialised

    Efficient Text File Copy

    Materialised, Jan 31, 2004, in forum: C Programming
    Replies:
    19
    Views:
    654
    Keith Bostic
    Feb 3, 2004
  3. noro

    efficient text file search.

    noro, Sep 11, 2006, in forum: Python
    Replies:
    10
    Views:
    494
    Sion Arrowsmith
    Sep 12, 2006
  4. Bill Scherer

    Re: efficient text file search.

    Bill Scherer, Sep 11, 2006, in forum: Python
    Replies:
    3
    Views:
    348
  5. Arash Nikkar
    Replies:
    8
    Views:
    578
    Arash Nikkar
    Nov 27, 2006
Loading...

Share This Page