Easy way for a Nuub to get link-element from a html-source

Marcus Strube · Nov 26, 2007

hi all.

im very new to ruby and im not sure how to do this the easiest way in
ruby. i want to read the content from e.g. "www.spiegel.de" and just
this line

<link rel="alternate" type="application/rss+xml" title="SPIEGEL ONLINE
als RSS-Feed" href="http://www.spiegel.de/schlagzeilen/rss/index.xml" />

and from this line the "title" and the "href"

since the order in "link" is not sure, it doesnt look like regexp is the
first choice. and i couldn't find a HTML:

arse.

Lee Jarvis · Nov 26, 2007

Marcus said:
since the order in "link" is not sure, it doesnt look like regexp is the
first choice. and i couldn't find a HTML:arse.

Check out hpricot.

http://code.whytheluckystiff.net/hpricot/

Regards,
Lee

Kai Brust · Nov 26, 2007

hi all.

im very new to ruby and im not sure how to do this the easiest way in
ruby. i want to read the content from e.g. "www.spiegel.de" and just
this line

<link rel="alternate" type="application/rss+xml" title="SPIEGEL ONLINE
als RSS-Feed" href="http://www.spiegel.de/schlagzeilen/rss/
index.xml" />

and from this line the "title" and the "href"

since the order in "link" is not sure, it doesnt look like regexp is
the
first choice. and i couldn't find a HTML:arse.

How about hpricot?

http://code.whytheluckystiff.net/hpricot/

- Kai Brust

Marcus Strube · Nov 26, 2007

How about hpricot?

http://code.whytheluckystiff.net/hpricot/

ok, hpricot then.

is it just

gem install hpricot ??

or do i need to install this "ragel"-thing too?? (and if so which which
is the best way to do so??)

Peter Szinek · Nov 26, 2007

Marcus said:
hi all.

im very new to ruby and im not sure how to do this the easiest way in
ruby. i want to read the content from e.g. "www.spiegel.de" and just
this line

<link rel="alternate" type="application/rss+xml" title="SPIEGEL ONLINE
als RSS-Feed" href="http://www.spiegel.de/schlagzeilen/rss/index.xml" />

and from this line the "title" and the "href"

since the order in "link" is not sure, it doesnt look like regexp is the
first choice. and i couldn't find a HTML:arse.

Another possibility is scRUBYt!:

==========================================
require 'rubygems'
require 'scrubyt'

feed_data = Scrubyt::Extractor.define do
fetch 'http://www.spiegel.de/'

link "//link[@rel='alternate']" do
title "title", :type => :attribute
href "href", :type => :attribute
end
end

puts feed_data.to_xml
==========================================

output:

==========================================
<root>
<link>
<title>SPIEGEL ONLINE als RSS-Feed</title>
<href>http://www.spiegel.de/schlagzeilen/rss/index.xml</href>
</link>
</root>
==========================================

or, to_hash:

==========================================
[{:title=>"SPIEGEL ONLINE als RSS-Feed",
:href=>"http://www.spiegel.de/schlagzeilen/rss/index.xml"}]
==========================================

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org

Marcus Strube · Nov 26, 2007

Another possibility is scRUBYt!:

That looks good. That looks good. Thank you!

Peter Szinek · Nov 26, 2007

Marcus said:
That looks good. That looks good. Thank you!

Hm yeah, but the downside (as of the recent version - it'll be fixed in
the next one) is that the installation process is somewhat... hmm... not
that easy (mainly if you are on win32). If you still decide to go for
scRUBYt!, we can talk on #scrubyt @ irc.freenode.net or you can ask your
questions in the forum (http://agora.scrubyt.org).

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org

How to store data from a sign up form on a website into an sql databse	1	Sep 9, 2022
Why <link/> is not working?	4	Jan 1, 2020
I want to Display Excel As HTML In js	2	Feb 24, 2023
PHP RSS Feed Aggregator changing to todays date everytime feed is aggregated	1	Jan 11, 2022
XHTML - how extend/create ELEMENT body in my DTD?	0	Oct 29, 2019
I need help fixing my website	2	Oct 15, 2023
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
Regular Expression interesting problem	0	Mar 28, 2009

Easy way for a Nuub to get link-element from a html-source

Marcus Strube

Lee Jarvis

Kai Brust

Marcus Strube

Peter Szinek

Marcus Strube

Peter Szinek

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads