Question about ruby syntax

R

Richard Hill

Hi

I've been reading about ruby and started to learn it to replace the work
I was doing with perl. I have a question about the code from this link.


http://www.adaruby.com/2008/01/11/scraping-gmail-with-mechanize-and-hpricot/

I don't understand this code. I can see that it is a block that returns
array of entries containing the html for every tr with a white
background

#################################

page.search("//tr[@bgcolor='#ffffff']") do |row|

from, subject = *row.search("//b/text()")
url = page.uri.to_s.sub(/ui.*$/,
row.search("//a").first.attributes["href"])
puts "From: #{from}\nSubject: #{subject}\nLink: #{url}\n\n"

email = agent.get url


##################################


But what does the from, subject = *row.search("//b/text()") do?

How is the *row different than row?


Finally what does this do? I can see a regex but don't understand the
line.

url = page.uri.to_s.sub(/ui.*$/,
row.search("//a").first.attributes["href"])

Thanks in advance for your help.

Regards Richard

















#################################### FULL CODE
#############################

require 'rubygems'
require 'mechanize'

agent = WWW::Mechanize.new

page = agent.get 'http://www.gmail.com'
form = page.forms.first
form.Email = '***your gmail account***'
form.Passwd = '***your password***'
page = agent.submit form

page = agent.get
page.search("//meta").first.attributes['href'].gsub(/'/,'')
page = agent.get page.uri.to_s.sub(/\?.*$/, "?ui=html&zy=n")
page.search("//tr[@bgcolor='#ffffff']") do |row|
from, subject = *row.search("//b/text()")
url = page.uri.to_s.sub(/ui.*$/,
row.search("//a").first.attributes["href"])
puts "From: #{from}\nSubject: #{subject}\nLink: #{url}\n\n"

email = agent.get url
# ..
end
 
J

James Coglan

[Note: parts of this message were removed to make it a legal post.]
I don't understand this code. I can see that it is a block that returns
array of entries containing the html for every tr with a white
background

#################################

page.search("//tr[@bgcolor='#ffffff']") do |row|

from, subject = *row.search("//b/text()")
url = page.uri.to_s.sub(/ui.*$/,
row.search("//a").first.attributes["href"])
puts "From: #{from}\nSubject: #{subject}\nLink: #{url}\n\n"

email = agent.get url


##################################


But what does the from, subject = *row.search("//b/text()") do?


The `*` performs what's called a destructuring binding. This expression
takes the array returned by `row.search("//b/text()")` and assigns one
member to each variable listed on the left hand side. It's equivalent to:

temp = row.search("//b/text()")
from = temp[0]
subject = temp[1]


Finally what does this do? I can see a regex but don't understand the
line.

url = page.uri.to_s.sub(/ui.*$/,
row.search("//a").first.attributes["href"])


The expression `row.search("//a").first.attributes["href"]` finds the first
<a> element in the row and gets its `href` attribute, which I imagine will
be a string containing a URL. So this is basically just:

url = page.uri.to_s.sub(/ui.*$/, "SOME_URL")

So it takes the page's URI, casts it to a string (to_s) and replaces the
pattern /ui.*$/ with the `href` from an anchor tag.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top