need help with Hpricot

Li Chen · Oct 8, 2008

Hi all,

I try to get"Slang " and "A close companion or comrade." ONLY out of
the following a webpage(part of it) with hpricot. There are so many
javascripts there. I don't think I know path/tag for target.

Thanks,

Li

<td><b>sideÃ‚Â·kick</b>  
<script type="text/javascript">
............................
.............................................
</script><noscript><a
href="http://dictionary.reference.com/audio.html/ahd4WAV/S0388900/sidekick"
target="_blank"><img src="http://cache.lexico.com/g/d/speaker.gif"
border="0" /></a></noscript>     (sÃ„Â«d'kÃ„Âk')  <a
href="http://cache.lexico.com/help/ahd4/pronkey.html" class="pronkey"
title="Click for guide to symbols." onclick="ahdpop();return
false;">Pronunciation Key</a> 
<br />


n.  

<i>Slang</i>
<br />


A close companion or comrade.
<br />

<br />
</td>

Mark Thomas · Oct 9, 2008

I try to get"Slang " and "A close companion or comrade." ONLY out of
the following a webpage(part of it) with hpricot. There are so many
javascripts there. I don't think I know path/tag for target.

There's not a whole lot of HTML structure there. If you can
definitively target the <td> with Hpricot, you can use regular
expressions to find the appropriate comments and grab the following
text.

You can get a little more specific with XPath expressions. The
following sample code (requires libxml-ruby) extracts the two values
from your sample code:

require 'xml'
html = %Q(your_html_here)
doc = XML::HTMLParser.string(html).parse
puts doc.find('//comment()[contains(.,"SUBHEAD")]/following::i/
text()').first
puts doc.find('//comment()[contains(.,"BOF_DEF")]/
following::text()').first

Li Chen · Oct 9, 2008

Hi Mark Thomas:

Thank you for the suggestion.

I aslo search the forum and find an earlier post which helps me get the
job done. The ideas of it are 1) use regular expression to remove
non-convention HMLT stuff such as javascripts. 2) then let hpricot
handle the remaining. It works pretty good for me.

Here is the title and author of that post/reply:

Re: HTML parser Hpricot? and how to get all text
Posted by SpringFlowers AutumnMoon (winterheat) on 03.11.2007 09:10

Li

Help with my responsive home page	2	Dec 14, 2022
Help with code	0	Jun 12, 2022
I need help fixing my website	2	Oct 15, 2023
Help with Visual Lightbox: Scripts	2	May 3, 2023
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
SendGrid email issue in responsive Gmail	1	Nov 4, 2021
Can someone tell me if this a real tracker? Or is it one designed to show you a different message at certain times, ie. acting like one?	0	Jan 10, 2021
Only one table shows up with the information	2	Mar 29, 2023

need help with Hpricot

Li Chen

Mark Thomas

Li Chen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads