Pattern matching problem.

Discussion in 'Perl Misc' started by demolitionz@gmail.com, Jul 24, 2005.

  1. Guest

    I'm working on a project to try and write a program in perl which will
    connect to google, search for a specified keyword and return the URLs
    found. My problem is that I can only get the program to return the
    first URL found, and despite spending a good few hours playing around
    with it and searching the web for answers, I can't seem to solve the
    problem. Here's the code I've written so far...

    #!usr/bin/perl
    use warnings;
    use strict;
    if ($ARGV[0] eq '') { print "Script called incorrectly.\nFormat:
    google.pl keyword"; exit; }
    use LWP::UserAgent;
    my $browser = LWP::UserAgent->new;
    my $response =
    $browser->get("http://www.google.com/search?q=$ARGV[0]");
    if ($response->is_success) {
    if ($response->content =~ m{<font color=#008000>(.*?)</font>}i)
    {
    print "$1\n";
    }
    else { print "Could not connect"; }
    }
    exit;

    Now I personally assumed the solution would have been as easy as
    changing $1 to $2 to get the second URL, but it doesn't seem so. That
    being the case I assume this script will need a total rework, but have
    no idea where to even begin. Can anyone help?
     
    , Jul 24, 2005
    #1
    1. Advertising

  2. * schrieb:

    > I'm working on a project to try and write a program in perl which will
    > connect to google, search for a specified keyword and return the URLs
    > found.


    Nice, Google is providing an API for this. You haven't to parse any
    webpage. Just get the data you want to have. Have a look at

    http://www.google.com/apis/

    for creating a free Google Account you need for usage. To use Google's
    API from Perl, ask CPAN for some help. I suggest to start with module

    WWW::Search::Google

    If your desires aren't fullfilled, start another look at Net::Google.
    This module is the basis for the first one, and you could do more
    specific things.

    >
    > My problem is that I can only get the program to return the
    > first URL found, and despite spending a good few hours playing around
    > with it and searching the web for answers, I can't seem to solve the
    > problem. Here's the code I've written so far...


    > if ($response->content =~ m{<font color=#008000>(.*?)</font>}i)
    > {
    > print "$1\n";
    > }
    > else { print "Could not connect"; }
    > }
    > exit;
    >
    > Now I personally assumed the solution would have been as easy as
    > changing $1 to $2 to get the second URL, but it doesn't seem so. That
    > being the case I assume this script will need a total rework, but have
    > no idea where to even begin. Can anyone help?


    To match more than once you could use a loop. Untested:

    while ( $response->content =~ m{<font color=#008000>(.*?)</font>}ig ) {
    print "$1\n";
    }

    Note the g-modifier behind the regex.

    regards,
    fabian
     
    Fabian Pilkowski, Jul 24, 2005
    #2
    1. Advertising

  3. Guest

    Klaus Eichner wrote:
    > while ($response->content =~ m{<font color=#008000>(.*?)</font>}ig)
    > {
    > print "$1\n";
    > }


    Thanks for the responses. I gave the quoted bit of code a go but
    unfortunately it just went into an infinite loop repeating the first
    result. I also tried replacing "while" with "foreach", but that didn't
    work either. I've been playing around with the original idea some more
    and have finally got it to work through a very messy bit of code.
    Unfortunately due to google's formatting I get <b> html tags in the
    middle of some of my results, but that can be ironed out later. I've
    attached the working code below just in case people were
    interested/want to make it a bit less messy ;) PS this code was
    written in haste and ever increasing frustration, so the names of the
    arrays etc are rather random!

    #!usr/bin/perl
    use strict;
    use warnings;
    if ($ARGV[0] eq '') { print "Script called incorrectly.\nFormat:
    google.pl keyword"; exit; }
    use LWP::UserAgent;
    my $browser = LWP::UserAgent->new;
    my $response =
    $browser->get("http://www.google.com/search?q=$ARGV[0]");
    if ($response->is_success) {
    my $content = $response->content;
    my @broken = split("<br>",$content);
    my $searchterm = "<font color=#008000>(.*?)</font>";
    my @found = grep(/$searchterm/i, @broken);
    foreach (@found) { if ($_ =~ m{<font color=#008000>(.*?)</font>}ig) {
    @_ = split(' ',$1); print "$_[0]\n"; } }
    }
    exit;
     
    , Jul 24, 2005
    #3
  4. <> wrote in message
    news:...

    [snip]

    > My problem is that I can only get the program to return the
    > first URL found


    > if ($response->content =~ m{<font color=#008000>(.*?)</font>}i)
    > {
    > print "$1\n";
    > }


    > Now I personally assumed the solution would have been as easy as
    > changing $1 to $2 to get the second URL, but it doesn't seem so.


    No need to change $1, just add the g option at the end of the regular
    expression "m{...}i" and make it a while-loop rather than a simple if. That
    should do the trick. (see also "perldoc perlop", paragraph "Regexp
    Quote-Like Operators")

    while ($response->content =~ m{<font color=#008000>(.*?)</font>}ig)
    {
    print "$1\n";
    }


    --
    Klaus
     
    Klaus Eichner, Jul 24, 2005
    #4
  5. Eric Amick Guest

    On 24 Jul 2005 11:49:04 -0700, wrote:

    >Klaus Eichner wrote:
    >> while ($response->content =~ m{<font color=#008000>(.*?)</font>}ig)
    >> {
    >> print "$1\n";
    >> }

    >
    >Thanks for the responses. I gave the quoted bit of code a go but
    >unfortunately it just went into an infinite loop repeating the first
    >result.


    $response->content is a method call, and the //g business works properly
    only when the string does not change from pass to pass in the loop. Try

    my $content = $response->content;
    while ($content =~ m{<font color=#008000>(.*?)</font>}ig)

    instead.
    --
    Eric Amick
    Columbia, MD
     
    Eric Amick, Jul 24, 2005
    #5
  6. <> wrote in message
    news:...
    > Klaus Eichner wrote:
    > > while ($response->content =~ m{<font color=#008000>(.*?)</font>}ig)
    > > {
    > > print "$1\n";
    > > }

    >
    > Thanks for the responses. I gave the quoted bit of code a go but
    > unfortunately it just went into an infinite loop repeating the first
    > result.


    I don't think that the "while (...m{...}ig)" is directly responsible for the
    infinite loop.

    Here is a small, but complete example to demonstrate the principle of "while
    (...m{...}ig)":
    ============================================
    use strict;
    use warnings;

    my $resp = q{
    <html>
    <body bgcolor="#ffffff">
    <title>xxx</title>
    <font color=#008000>item 1</font><br>
    <font color=#008000>item 2</font><br>
    <font color=#008000>item 3</font><br>
    <font color=#008000>item 4</font><br>
    </body>
    </html>
    };

    while ($resp =~ m{<font color=#008000>(.*?)</font>}ig)
    {
    print "$1\n";
    }
    ============================================

    The output of that program is:
    ======================
    item 1
    item 2
    item 3
    item 4
    ======================


    > ...have finally got it to work.


    I am happy that you finally succeeded.

    [snip]

    > my $response =
    > $browser->get("http://www.google.com/search?q=$ARGV[0]");
    > if ($response->is_success) {
    > my $content = $response->content;
    > my @broken = split("<br>",$content);
    > my $searchterm = "<font color=#008000>(.*?)</font>";
    > my @found = grep(/$searchterm/i, @broken);
    > foreach (@found) { if ($_ =~ m{<font color=#008000>(.*?)</font>}ig) {
    > @_ = split(' ',$1); print "$_[0]\n"; } }
    > }
    > exit;


    --
    Klaus
     
    Klaus Eichner, Jul 24, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. anshul
    Replies:
    2
    Views:
    1,783
  2. Ari Brown

    Pattern Matching Problem

    Ari Brown, Jul 6, 2007, in forum: Ruby
    Replies:
    12
    Views:
    185
    Morton Goldberg
    Jul 6, 2007
  3. Marc Bissonnette

    Pattern matching : not matching problem

    Marc Bissonnette, Jan 8, 2004, in forum: Perl Misc
    Replies:
    9
    Views:
    237
    Marc Bissonnette
    Jan 13, 2004
  4. Bryan

    Pattern matching problem

    Bryan, Jun 12, 2004, in forum: Perl Misc
    Replies:
    6
    Views:
    121
  5. Bobby Chamness
    Replies:
    2
    Views:
    231
    Xicheng Jia
    May 3, 2007
Loading...

Share This Page