Extracting some text from HTML

A

Albert Schlef

It's a quick question (I hope).

I'm using Nokogiri.

I have the following HTML:

...
<div class="lead">
<div class="something"> blah blha blah </div>
<div class="another-something"> blah blah blah </div>

...some text I want to extract...

<div class="blah"> whatever </div>
</div>
...

How can I extract the text "...some text I want to extract..." ?

My problem is that this text isn't wrapped in a DIV. It's a "text" node
(not an "element" node).

Anybody can figure out an xpath expression for it?
 
A

Albert Schlef

Albert said:
I'm using Nokogiri.

I have the following HTML:

...
<div class="lead">
<div class="something"> blah blha blah </div>
<div class="another-something"> blah blah blah </div>

...some text I want to extract...

<div class="blah"> whatever </div>
</div>
...

How can I extract the text "...some text I want to extract..." ?

I solved the problem. I used the following code:

the_text_i_want = doc.at_xpath('//div[@class="lead"]/text()[3]')
puts the_text_i_want.content
 
D

Dan Sr.

Hi, I want to do something similar to what you are doing.

Basically I would like to go through a whole bunch of links and text on
a page and scrape just the Text I want, and if I get that text scrape
the corresponding URL in the same table with it.

Here is my actual code so far

require 'nokogiri'
require 'rubygems'
require 'open-uri'

def certs
@Certs = %{MCSE "A\+" MCITP MCDBA MCPD MCSA} # Text I would like
scraped
end
for i in 1..100 do # yay for page loop
url = "http://www.hawaiicrcs.org/searchprog.asp?cat=&pg=#{i}" pages
scraped
doc = Nokogiri::HTML(open(url))
for s in 1..100 do # yay for table loop
tts = doc.css("tr:nth-child(#{s})").each do |var| # pages in an array

puts var
end
end
end
end

Hope I am being clear. for the href I have gotten it to display with
tts = doc.css("tr:nth-child(#{s})")[:href]

But am unsure how to go about getting the href with the compared to text
I am thinking an if then statement or case. maybe someone can help.

Something like

if compared data = true
p "doc.css("tr:nth-child(#{s})")[:href]"

or something of the sort. I am a newb so forgive my error's if there are
any when I type.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top