how to extract url's from html source of google search result

Discussion in 'Ruby' started by sujeet kumar, Jun 11, 2005.

  1. sujeet kumar

    sujeet kumar Guest

    hi
    I want to make a Tk window where you give some input string and it
    search that on google and prints the web address (http url) of the
    result found on google in the TkFrame of that window. My program
    connects to net and get the html source through function "http.get".
    Now from html source , how can I find the url's of the search. Can i
    do it by regular expression or any other way.
    Give me any suggestion.
    Thanks
    sujeet
    sujeet kumar, Jun 11, 2005
    #1
    1. Advertising

  2. On Sun, Jun 12, 2005 at 03:44:03AM +0900, sujeet kumar wrote:
    > I want to make a Tk window where you give some input string and it
    > search that on google and prints the web address (http url) of the
    > result found on google in the TkFrame of that window. My program
    > connects to net and get the html source through function "http.get".
    > Now from html source , how can I find the url's of the search. Can i
    > do it by regular expression or any other way.
    > Give me any suggestion.


    The URI.extract method from the uri library can extract an array of uri's from
    a string:

    require 'uri'
    URI.extract('My favorite site is http://google.com')
    # => ["http://google.com"]

    An optional second argument can limit the schemes that it will match against
    and return:

    URI.extract('Why do people use mailto: links?')
    # => ["mailto:"]
    URI.extract('Why do people use mailto: links?', 'http')
    # => []

    marcel
    --
    Marcel Molina Jr. <>
    Marcel Molina Jr., Jun 12, 2005
    #2
    1. Advertising

  3. Marcel Molina Jr. wrote:

    >On Sun, Jun 12, 2005 at 03:44:03AM +0900, sujeet kumar wrote:
    >
    >
    >>how can I find the url's of the search. Can i
    >>do it by regular expression or any other way.
    >>
    >>

    >The URI.extract method from the uri library can extract an array of uri's from
    >a string:
    >
    >

    A universal regexp that finds URIs from an abstract text is a
    complicated thing, indeed. Besides, it can produce false positives
    (finding things that look like URIs, but aren't).

    If you are sure that the page is a well-formed XHTML (I'm not sure if
    that's the case or not with Google), you might instead parse it with
    REXML, and use XPath to retrieve href attributes of all <a>..</a>
    elements, selecting only those that start with "http://" (there may also
    be mailto:, ftp:, JavaScript calls etc).

    Best regards,
    Alexey Verkhovsky
    Alexey Verkhovsky, Jun 12, 2005
    #3
  4. sujeet kumar

    Eric Hodel Guest

    On 11 Jun 2005, at 11:44, sujeet kumar wrote:

    > hi
    > I want to make a Tk window where you give some input string and it
    > search that on google and prints the web address (http url) of the
    > result found on google in the TkFrame of that window. My program
    > connects to net and get the html source through function "http.get".
    > Now from html source , how can I find the url's of the search. Can i
    > do it by regular expression or any other way.


    Why not use the Google API?

    --
    Eric Hodel - - http://segment7.net
    FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04
    Eric Hodel, Jun 12, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?TGFrc2htaSBOYXJheWFuYW4uUg==?=

    Google search result like site search!! How?

    =?Utf-8?B?TGFrc2htaSBOYXJheWFuYW4uUg==?=, May 5, 2005, in forum: ASP .Net
    Replies:
    3
    Views:
    670
    Lucas Tam
    May 6, 2005
  2. savvy
    Replies:
    0
    Views:
    772
    savvy
    Jan 14, 2006
  3. Frank Potter
    Replies:
    4
    Views:
    371
    Brett g Porter
    Feb 15, 2006
  4. Michael Tan
    Replies:
    32
    Views:
    958
    Ara.T.Howard
    Jul 21, 2005
  5. stumblng.tumblr
    Replies:
    1
    Views:
    198
    stumblng.tumblr
    Feb 4, 2008
Loading...

Share This Page