HTML parser using Hpricot

Cincin Robert · Jan 8, 2010

Hello folks,

I am a newbie to RoR and writing an HTML parser using Hpricot. I can
parse a html file with:
(1) doc = open( "MyFileToParse.html" ) { |f| Hpricot(f) }
(2) elements = (doc.search("/html/body/table/tr/td/table/tr/td/font") )
(3) puts (elements[13]).inner_html

to get the following output:

Giaever G, et al (2002). Functional profiling of the Saccharomyces c
erevisiae genome. Nature, 418:387-91. [<a
href="http://www.ncbi.nlm.nih.gov/entr
ez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=12140549&dopt=Abstract"
target="_
blank">PubMed</a>]

How can I proceed to get the following results (3) and (4) respectively?
(3) Giaever G, et al (2002). Functional profiling of the Saccharomyces c
erevisiae genome. Nature, 418:387-91.

(4) http://www.ncbi.nlm.nih.gov/pubmed/12140549?dopt=Abstract

NOTE: to get (4) I need to take two more steps: (5) replace "&" with "?"
(6) replace "PubMed" with "pubmed" (this might be trivial, but how?) in
the process of parsing in addition to "normal" HTML parsing.

Thanks a lot in advance.
Robert

javascript xml parser question.	4	Jul 5, 2004
Help with Dll	0	Mar 28, 2006
Ruby Weekly News 13th - 26th June 2005	0	Jun 27, 2005
comp.lang.vhdl FAQ part 2 of 4: books	0	Jul 8, 2003
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004
comp.lang.vhdl FAQ part 3 of 4: products & services	0	Jul 8, 2003
comp.lang.c Changes to Answers to Frequently Asked Questions (FAQ)	1	Jul 4, 2004

HTML parser using Hpricot

Cincin Robert

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads