Data extraction using Scrubyt

Discussion in 'Ruby' started by Vipin Vm, Dec 5, 2008.

  1. Vipin Vm

    Vipin Vm Guest

    Hi All,

    I need to fetch some information from http://www.ebay.in.
    My required fields are : Name of the product, Image, Price and the link
    to that product.

    am able to get the data using this method.
    require 'rubygems'
    require 'scrubyt'

    google_data = Scrubyt::Extractor.define do
    fetch 'http://www.ebay.in'
    fill_textfield 'satitle', 'ipod shuffle'
    submit
    record
    "/html/body/div[2]/div[4]/div[2]/div/div/div[2]/div[2]/div/div/div[3]/div/div/table/tr"
    do
    name "/td[2]/div/a"
    price "/td[5]"
    image "/td/a/img" do
    url "src", :type => :attribute
    end
    link "/td[2]/div/a" do
    url "href", :type => :attribute
    end
    end
    end

    google_data.to_xml.write($stdout, 1)

    but my problem is for some products its not working properly. (div may
    be changed). is there any better solution for this?

    Thanks in advance,
    Vipin
    --
    Posted via http://www.ruby-forum.com/.
     
    Vipin Vm, Dec 5, 2008
    #1
    1. Advertising

  2. Vipin Vm

    Peter Szinek Guest

    [Note: parts of this message were removed to make it a legal post.]

    You need to create smarter XPaths, relying on CSS id/class attributes
    or other properties rather than a full XPath from the root - for
    example:

    require 'rubygems'
    require 'scrubyt'

    ebay_data = Scrubyt::Extractor.define do

    fetch 'http://www.ebay.in/'
    fill_textfield 'satitle', 'ipod'
    submit

    record "//table[@class='nol']" do
    name "//td[@class='details']/div/a"
    end
    end

    puts ebay_data.to_xml

    etc.

    This way your scraper will be more robust and prone to page changes.

    HTH,
    Peter
    ___
    http://www.rubyrailways.com
    http://scrubyt.org


    On 2008.12.05., at 8:02, Vipin Vm wrote:

    > Hi All,
    >
    > I need to fetch some information from http://www.ebay.in.
    > My required fields are : Name of the product, Image, Price and the
    > link
    > to that product.
    >
    > am able to get the data using this method.
    > require 'rubygems'
    > require 'scrubyt'
    >
    > google_data = Scrubyt::Extractor.define do
    > fetch 'http://www.ebay.in'
    > fill_textfield 'satitle', 'ipod shuffle'
    > submit
    > record
    > "/html/body/div[2]/div[4]/div[2]/div/div/div[2]/div[2]/div/div/
    > div[3]/div/div/table/tr"
    > do
    > name "/td[2]/div/a"
    > price "/td[5]"
    > image "/td/a/img" do
    > url "src", :type => :attribute
    > end
    > link "/td[2]/div/a" do
    > url "href", :type => :attribute
    > end
    > end
    > end
    >
    > google_data.to_xml.write($stdout, 1)
    >
    > but my problem is for some products its not working properly. (div may
    > be changed). is there any better solution for this?
    >
    > Thanks in advance,
    > Vipin
    > --
    > Posted via http://www.ruby-forum.com/.
    >
     
    Peter Szinek, Dec 5, 2008
    #2
    1. Advertising

  3. Vipin Vm

    Vipin Vm Guest

    Hi Peter,

    Thanks for the Help... its working fine :)

    Vipin

    Peter Szinek wrote:
    > You need to create smarter XPaths, relying on CSS id/class attributes
    > or other properties rather than a full XPath from the root - for
    > example:
    >
    > require 'rubygems'
    > require 'scrubyt'
    >
    > ebay_data = Scrubyt::Extractor.define do
    >
    > fetch 'http://www.ebay.in/'
    > fill_textfield 'satitle', 'ipod'
    > submit
    >
    > record "//table[@class='nol']" do
    > name "//td[@class='details']/div/a"
    > end
    > end
    >
    > puts ebay_data.to_xml
    >
    > etc.
    >
    > This way your scraper will be more robust and prone to page changes.
    >
    > HTH,
    > Peter
    > ___
    > http://www.rubyrailways.com
    > http://scrubyt.org


    --
    Posted via http://www.ruby-forum.com/.
     
    Vipin Vm, Dec 6, 2008
    #3
  4. Vipin Vm

    Peter Szinek Guest

    [Note: parts of this message were removed to make it a legal post.]


    On 2008.12.06., at 4:46, Vipin Vm wrote:

    > Hi Peter,
    >
    > Thanks for the Help... its working fine :)


    Glad that I could help. I am just working on a new release btw, so
    stay tuned!

    Cheers,
    Peter
    ___
    http://www.rubyrailways.com
    http://scrubyt.org
     
    Peter Szinek, Dec 6, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter Szinek
    Replies:
    0
    Views:
    147
    Peter Szinek
    Feb 5, 2007
  2. Peter Szinek
    Replies:
    2
    Views:
    185
    Peter Szinek
    Feb 21, 2007
  3. Peter Szinek

    [ANN] scRUBYt! 0.2.8

    Peter Szinek, Apr 19, 2007, in forum: Ruby
    Replies:
    4
    Views:
    122
    Peter Szinek
    Apr 19, 2007
  4. Prabhas Gupte

    Problem while using scrubyt

    Prabhas Gupte, Oct 8, 2008, in forum: Ruby
    Replies:
    0
    Views:
    106
    Prabhas Gupte
    Oct 8, 2008
  5. Rolin Nelson

    Using Scrubyt on bad markup pages

    Rolin Nelson, Apr 28, 2009, in forum: Ruby
    Replies:
    2
    Views:
    108
    Rolin Nelson
    Apr 28, 2009
Loading...

Share This Page