*Could use some help here* Ruby script that downloads .png files thatare ordered in sequence and sav

  • Thread starter patrick.anthony124
  • Start date

P

patrick.anthony124

Hey everyone. This is the first time I've written something in Ruby to do something for myself, everything else has been some how part of an assignment or a tutorial or a walk-through. I want this script to download a series of .png files and save them locally in the same order. I have posted it below but it just doesn't seem to work. Any help or suggestions would be greatly appreciated.

require "net/http"
remote_base_url = "https://path.to/the/folder"

start_page = 001
end_page = 281

# Images are named p001.png to p281.png.
(start_page..end_page).each do |it|
rpage = open(remote_base_url + "/" + "p" + it.to_s)

local_fname = "copy-of-" + it.to_s + ".png"
local_file = open(local_fname, "w")
local_file.write(rpage.read)
local_file.close
# Optional output line:
puts "Wrote file " + local_fname
sleep 1
end

# Write to the compiled file now:
compiled_file = open(start_page.to_s + "-" + end_page.to_s + ".png", "w")
(start_page..end_page).each do |it|
local_fname = "copy-of-" + it.to_s + ".png"
local_file = open(local_fname, "r")

compiled_file.write(local_file.read)
local_file.close
end

compiled_file.close
 
Ad

Advertisements

R

Robert Klemme

Hey everyone. This is the first time I've written something in Ruby to do something for myself, everything else has been some how part of an assignment or a tutorial or a walk-through. I want this script to download a series of .png files and save them locally in the same order. I have posted it below but it just doesn't seem to work. Any help or suggestions would be greatly appreciated.

require "net/http"
remote_base_url = "https://path.to/the/folder"

start_page = 001
end_page = 281

The reason is probably that you use integers here. Note:

irb(main):001:0> x = 001
=> 1
irb(main):002:0> puts x
1
=> nil

Leading zeros are removed.

If you use that (and I would recommend using integers here) you must
ensure the zeros are added when creating URLs:
# Images are named p001.png to p281.png.
(start_page..end_page).each do |it|
rpage = open(remote_base_url + "/" + "p" + it.to_s)

# example:
rpage = open(sprintf("%s/p%0d", remote_base_url, it))
local_fname = "copy-of-" + it.to_s + ".png"

# other approach
local_fname = "copy-of-%03d.png" % it
local_file = open(local_fname, "w")
local_file.write(rpage.read)
local_file.close
# Optional output line:
puts "Wrote file " + local_fname
sleep 1

Why the sleep?
end

# Write to the compiled file now:
compiled_file = open(start_page.to_s + "-" + end_page.to_s + ".png", "w")
(start_page..end_page).each do |it|
local_fname = "copy-of-" + it.to_s + ".png"
local_file = open(local_fname, "r")

compiled_file.write(local_file.read)
local_file.close
end

compiled_file.close

And a general remark: you should use the block form of #open instead of
the explicit #close call. This is much more robust because it ensures
files are always closed. See my blog post for more details

http://blog.rubybestpractices.com/posts/rklemme/002_Writing_Block_Methods.html

Kind regards

robert
 
R

Rick Johnson

Any help or suggestions would be greatly appreciated.

============================================================
SECTION 1:
============================================================
require "net/http"
remote_base_url = "https://path.to/the/folder"

I really don't like your naming convention for the URL. Since this
variable should NEVER change, it should instead be a constant:

BASE_URL = "https://path.to/the/folder"

....and by declaring this variable as a constant you gain the added
safety of Ruby not allowing you to accidentally change the URL string.

rb> BASE_URL = "hat"
(eval):33: warning: already initialized constant BASE_URL

============================================================
SECTION 2:
============================================================
start_page = 001
end_page = 281

Two glaring problems here!

PROBLEM 1. Bad idea to hard code the page numbers. Instead, declare a
single variable to hold the current page number (initialized to the
starting page number) and then (in a while loop) increment the page
number by 1 for each iteration.

PROBLEM 2. You are erroneously trying to create a "zero-padded"
integer to alpha-numerically tag your filename. What you /should/ be
doing is keeping a running count of page numbers (f.e. pageNum = 0),
and then applying a string format operation to the page number when
needed. Observe:

rb> for x in 0..5
... puts "%03d"%x
... end
000
001
002
003
004
005

============================================================
SECTION 3:
============================================================

Now on to your loop logic...
(start_page..end_page).each do |it|

Using a "for loop" for this purpose is not a good idea. Instead, use a
while loop:

pageNum = 0
while pageNum < maxpage+1
# do something here...
pageNum += 1

I also think the identifier "it" was a horrible choice. If you want to
convey "counter symbol" succinctly, then use "i" instead (think of "i"
as meaning: "generic integer"). But this detail is a non-starter if
you follow my advice and use a while loop.

============================================================
SECTION 4:
============================================================
rpage = open(remote_base_url + "/" + "p" + it.to_s)
local_fname = "copy-of-" + it.to_s + ".png"

This type of string concatenation is syntactically noisy and renders
the code unreadable. Don't feel bad, we all did this sort of thing in
the beginning, but soon you will find better alternatives:

rb> pageNum = 0
rb> BASE_URL = "https://path.to/the/folder"
rb> BASE_URL + "/p#{pageNum}"
https://path.to/the/folder/p0

I'll leave the onerous on you to combine the "padded integer string
representation"[1] with "string formatting of a named symbol"[2]

============================================================
CONCLUSION:
============================================================

Of course there are many other problems worth discussing, but i don't
want to fling too much poo in your direction at one time. :-D

============================================================
REFERENCES:
============================================================

[1] "%03d" % integer
[2] "#{symbol}"
 
Ad

Advertisements

R

Rick Johnson

Any help or suggestions would be greatly appreciated.

One more piece of good advice that i learned a long time ago.

Whenever you are writing code that operates on volitile data (like
files), be sure to debug file paths very carefully before running the
code. As an example: Your first draft of the program should start from
something as simple as this:

pageNum = 0
while pageNumber < 100
puts pageNum
pageNum += 1

Then you should start expanding in *very* small increments. In the
next evolution i will create the incremental web page paths and then
print the paths to stdout (and that's it!):

BASE_URL = "https://path.to/the/folder"
pageNum = 0
while pageNumber < 281+1
puts pageNum
xPath = BASE_URL + "/p#{pageNum}"
puts "Page Path is: #{xPath}"
pageNum += 1

In the next evolution of my code evolution i will start reading web
pages in and printing the data to standard out (or maybe just a slice
of the data because the strings could be huge!).

Then in the 3rd evolution i will construct a path which will be the
destination of the web page data. At this time i will ONLY print the
path! We only read/write files AFTER we are 200% confident that the
file paths are pointing where we want them to point!

Now in the forth iteration of my code evolution i will probably feel
comfortable enough to actually read/write the page data to local file.
If everything goes well, i can evolve to the fifth evolution and so
forth.

This is how you build complex code, in baby steps. I think your
problem is that you have tried to skip a few steps in an attempt to
speed along your evolution; well, now you see what happens huh? Bugs
bugs and more bugs.

Anytime you find yourself in a coding situation that you cannot seem
to "debug yourself out of", then you need to simplify the code until
you CAN debug yourself out of it. At this time you will be at a level
from which you can increment again.

Remember: Baby steps!
 

Top