Easy way for a Nuub to get link-element from a html-source

M

Marcus Strube

hi all.

im very new to ruby and im not sure how to do this the easiest way in
ruby. i want to read the content from e.g. "www.spiegel.de" and just
this line

<link rel="alternate" type="application/rss+xml" title="SPIEGEL ONLINE
als RSS-Feed" href="http://www.spiegel.de/schlagzeilen/rss/index.xml" />

and from this line the "title" and the "href"

since the order in "link" is not sure, it doesnt look like regexp is the
first choice. and i couldn't find a HTML::parse.
 
K

Kai Brust

hi all.

im very new to ruby and im not sure how to do this the easiest way in
ruby. i want to read the content from e.g. "www.spiegel.de" and just
this line

<link rel="alternate" type="application/rss+xml" title="SPIEGEL ONLINE
als RSS-Feed" href="http://www.spiegel.de/schlagzeilen/rss/
index.xml" />

and from this line the "title" and the "href"

since the order in "link" is not sure, it doesnt look like regexp is
the
first choice. and i couldn't find a HTML::parse.

How about hpricot?

http://code.whytheluckystiff.net/hpricot/

- Kai Brust
 
P

Peter Szinek

Marcus said:
hi all.

im very new to ruby and im not sure how to do this the easiest way in
ruby. i want to read the content from e.g. "www.spiegel.de" and just
this line

<link rel="alternate" type="application/rss+xml" title="SPIEGEL ONLINE
als RSS-Feed" href="http://www.spiegel.de/schlagzeilen/rss/index.xml" />

and from this line the "title" and the "href"

since the order in "link" is not sure, it doesnt look like regexp is the
first choice. and i couldn't find a HTML::parse.

Another possibility is scRUBYt!:

==========================================
require 'rubygems'
require 'scrubyt'

feed_data = Scrubyt::Extractor.define do
fetch 'http://www.spiegel.de/'

link "//link[@rel='alternate']" do
title "title", :type => :attribute
href "href", :type => :attribute
end
end

puts feed_data.to_xml
==========================================

output:

==========================================
<root>
<link>
<title>SPIEGEL ONLINE als RSS-Feed</title>
<href>http://www.spiegel.de/schlagzeilen/rss/index.xml</href>
</link>
</root>
==========================================

or, to_hash:

==========================================
[{:title=>"SPIEGEL ONLINE als RSS-Feed",
:href=>"http://www.spiegel.de/schlagzeilen/rss/index.xml"}]
==========================================

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
 
M

Marcus Strube

Another possibility is scRUBYt!:

That looks good. That looks good. Thank you!
 
P

Peter Szinek

Marcus said:
That looks good. That looks good. Thank you!

Hm yeah, but the downside (as of the recent version - it'll be fixed in
the next one) is that the installation process is somewhat... hmm... not
that easy (mainly if you are on win32). If you still decide to go for
scRUBYt!, we can talk on #scrubyt @ irc.freenode.net or you can ask your
questions in the forum (http://agora.scrubyt.org).

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top