Problem with xpath in scrubyt.

likhon · Jul 15, 2009

I wrote the below code using scrubyt gem. but it return null value.
But when I run this code using firefox agent it return perfect
results. I do not want to open any agent. Could anyone tell me whats
wrong with the following code.

--------------------------------------------------------
def self.get_data_from_half_ebay(isbn)
half_ebay_data = Scrubyt::Extractor.define do
fetch "http://search.half.ebay.com/#{isbn}"
henewbooks "//html/body/table[2]/tbody/tr/td/table[4]/tbody/tr/td
[2]/table/tbody/tr/td/table[2]/tbody" do
henb "//tr[@class = 'tr-border']" do
henbprice "//span[@class = 'ItemPrice']"
henbbuylink "//a[@class = MoreInfo]/@href"
end
end

end
@description = half_ebay_data.to_xml
return @description
end

Aaron Patterson · Jul 23, 2009

I wrote the below code using scrubyt gem. but it return null value.
But when I run this code using firefox agent it return perfect
results. I do not want to open any agent. Could anyone tell me whats
wrong with the following code.

--------------------------------------------------------
def self.get_data_from_half_ebay(isbn)
half_ebay_data = Scrubyt::Extractor.define do
fetch "http://search.half.ebay.com/#{isbn}"
henewbooks "//html/body/table[2]/tbody/tr/td/table[4]/tbody/tr/td
[2]/table/tbody/tr/td/table[2]/tbody" do
henb "//tr[@class = 'tr-border']" do
henbprice "//span[@class = 'ItemPrice']"
henbbuylink "//a[@class = MoreInfo]/@href"
end
end

end
@description = half_ebay_data.to_xml
return @description
end
----------------------------------------------------------------------------------

Firefox adds the "tbody" node. Try your xpath again, but remove
"tbody".

Mark Thomas · Jul 24, 2009

fetch "http://search.half.ebay.com/#{isbn}"
henewbooks "//html/body/table[2]/tbody/tr/td/table[4]/tbody/tr/td
[2]/table/tbody/tr/td/table[2]/tbody" do
henb "//tr[@class = 'tr-border']" do

Positional XPaths like this are fragile. I don't recommend just taking
the path from Firebug or other DOM inspectors. What happens when
Half.com decides to put a notice in an additional table early in the
page? Your XPath will break.

Instead, think of how a human identifies the item. It's the table
immediately after the Brand New label, correct? A better XPath would
be:

//table[preceding-sibling::a[1][@name="itemlist_BRAND_NEW"]]//tr
[@class="tr-border"]

This does the same thing, but is much less susceptible to unrelated
page changes. If you do a lot of scraping, I recommend really getting
familiar with all that XPath has to offer. DOM inspectors are not
smart enough.

-- Mark.

Sort by number of characters	1	Nov 2, 2023
I am using 2 loops, 1 for input and 1 for td. Can we achieve the same functionality with 1 loop in Jquery?	4	Sep 29, 2023
SendGrid email issue in responsive Gmail	1	Nov 4, 2021
Uncaught ReferenceError: item is not defined at HTMLButtonElement.onclick in the: <button onclick="item.inserir()">Inserir dados</button>	1	Apr 22, 2023
Help : Error in scrubyt	0	Feb 18, 2010
scrubyt scraper help	0	Oct 1, 2010
Data extraction using Scrubyt	3	Dec 5, 2008
Problem while using scrubyt	0	Oct 8, 2008

Problem with xpath in scrubyt.

likhon

Aaron Patterson

Mark Thomas

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads