Page crawling and URL grabbing

Patrick L. · Jan 27, 2009

Hey guys,
I'm trying to write an application that goes onto a website (istockphoto
specifically), opens up istockphoto.com/file_browse.php and grabs the
URLs of the photos that appear there.

It's my first time doing something like this. I'm reading some
documentation right now...but a hand would be greatly appreciated. I'm
not really sure how to do regex on an html file...or even find the right
stuff within that file. I'm guessing its..

open('http://www.istockphoto.com/file_browse.php/') do |f|
f.find # dot something something
end

but I really have no idea. Any help would be great - thanks in advance!

Jesús Gabriel y Galán · Jan 27, 2009

Hey guys,
I'm trying to write an application that goes onto a website (istockphoto
specifically), opens up istockphoto.com/file_browse.php and grabs the
URLs of the photos that appear there.

It's my first time doing something like this. I'm reading some
documentation right now...but a hand would be greatly appreciated. I'm
not really sure how to do regex on an html file...or even find the right
stuff within that file. I'm guessing its..

Miroslaw Niegowski · Jan 27, 2009

2009/1/27 Patrick L. said:
Hey guys,
I'm trying to write an application that goes onto a website (istockphoto
specifically), opens up istockphoto.com/file_browse.php and grabs the
URLs of the photos that appear there.

It's my first time doing something like this. I'm reading some
documentation right now...but a hand would be greatly appreciated. I'm
not really sure how to do regex on an html file...or even find the right
stuff within that file. I'm guessing its..

open('http://www.istockphoto.com/file_browse.php/') do |f|
f.find # dot something something
end

Try Mechanize.
It's easy :

agent = WWW::Mechanize.new
agent.user_agent_alias='Mac Safari'
page = agent.get('http://www.istockphoto.com/file_browse.php');
page.links.text(/jpg/)
...

Patrick L. · Jan 27, 2009

Miroslaw said:
Try Mechanize.
It's easy :

agent = WWW::Mechanize.new
agent.user_agent_alias='Mac Safari'
page = agent.get('http://www.istockphoto.com/file_browse.php');
page.links.text(/jpg/)
...

That's great, or it sounds great. Is there any documentation aside from
blog posts and this: http://mechanize.rubyforge.org/mechanize/ ? What
did you use to learn it?

Tsunami Script · Jan 27, 2009

mechanize is very easy and intuitive ... you could basically learn to
use mechanize just by playing with it in irb . Combine that with reading
some/the docs , and you're good to go .

Coding going wrong	1	Oct 22, 2019
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
Ruby and E.V.E. Paradox	33	Jan 16, 2007
Javascript and IE? Javascript and C#?	6	Oct 5, 2007
Writing a PHP and Javascript generated page to a PHP variable...	3	Sep 3, 2006
Lists and Tuples and Much More	15	Apr 12, 2007
SiteMapProvider and userid	1	Jun 25, 2008
Creating a static ASP.NET page and a few other newbie questions	3	Jun 25, 2004

Page crawling and URL grabbing

Patrick L.

Jesús Gabriel y Galán

Miroslaw Niegowski

Patrick L.

Tsunami Script

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads