Pattern matching problem.

demolitionz · Jul 24, 2005

I'm working on a project to try and write a program in perl which will
connect to google, search for a specified keyword and return the URLs
found. My problem is that I can only get the program to return the
first URL found, and despite spending a good few hours playing around
with it and searching the web for answers, I can't seem to solve the
problem. Here's the code I've written so far...

#!usr/bin/perl
use warnings;
use strict;
if ($ARGV[0] eq '') { print "Script called incorrectly.\nFormat:
google.pl keyword"; exit; }
use LWP::UserAgent;
my $browser = LWP::UserAgent->new;
my $response =
$browser->get("http://www.google.com/search?q=$ARGV[0]");
if ($response->is_success) {
if ($response->content =~ m{(.*?)}i)
{
print "$1\n";
}
else { print "Could not connect"; }
}
exit;

Now I personally assumed the solution would have been as easy as
changing $1 to $2 to get the second URL, but it doesn't seem so. That
being the case I assume this script will need a total rework, but have
no idea where to even begin. Can anyone help?

Fabian Pilkowski · Jul 24, 2005

* [email protected] said:
I'm working on a project to try and write a program in perl which will
connect to google, search for a specified keyword and return the URLs
found.

Nice, Google is providing an API for this. You haven't to parse any
webpage. Just get the data you want to have. Have a look at

http://www.google.com/apis/

for creating a free Google Account you need for usage. To use Google's
API from Perl, ask CPAN for some help. I suggest to start with module

WWW::Search::Google

If your desires aren't fullfilled, start another look at Net::Google.
This module is the basis for the first one, and you could do more
specific things.

My problem is that I can only get the program to return the
first URL found, and despite spending a good few hours playing around
with it and searching the web for answers, I can't seem to solve the
problem. Here's the code I've written so far...

if ($response->content =~ m{(.*?)}i)
{
print "$1\n";
}
else { print "Could not connect"; }
}
exit;

Now I personally assumed the solution would have been as easy as
changing $1 to $2 to get the second URL, but it doesn't seem so. That
being the case I assume this script will need a total rework, but have
no idea where to even begin. Can anyone help?

To match more than once you could use a loop. Untested:

while ( $response->content =~ m{(.*?)}ig ) {
print "$1\n";
}

Note the g-modifier behind the regex.

regards,
fabian

demolitionz · Jul 24, 2005

Klaus said:
while ($response->content =~ m{(.*?)}ig)
{
print "$1\n";
}

Thanks for the responses. I gave the quoted bit of code a go but
unfortunately it just went into an infinite loop repeating the first
result. I also tried replacing "while" with "foreach", but that didn't
work either. I've been playing around with the original idea some more
and have finally got it to work through a very messy bit of code.
Unfortunately due to google's formatting I get html tags in the
middle of some of my results, but that can be ironed out later. I've
attached the working code below just in case people were
interested/want to make it a bit less messy

PS this code was
written in haste and ever increasing frustration, so the names of the
arrays etc are rather random!

#!usr/bin/perl
use strict;
use warnings;
if ($ARGV[0] eq '') { print "Script called incorrectly.\nFormat:
google.pl keyword"; exit; }
use LWP::UserAgent;
my $browser = LWP::UserAgent->new;
my $response =
$browser->get("http://www.google.com/search?q=$ARGV[0]");
if ($response->is_success) {
my $content = $response->content;
my @broken = split(" ",$content);
my $searchterm = "(.*?)";
my @found = grep(/$searchterm/i, @broken);
foreach (@found) { if ($_ =~ m{(.*?)}ig) {
@_ = split(' ',$1); print "$_[0]\n"; } }
}
exit;

Klaus Eichner · Jul 24, 2005

[snip]

My problem is that I can only get the program to return the
first URL found

if ($response->content =~ m{(.*?)}i)
{
print "$1\n";
}

Now I personally assumed the solution would have been as easy as
changing $1 to $2 to get the second URL, but it doesn't seem so.

No need to change $1, just add the g option at the end of the regular
expression "m{...}i" and make it a while-loop rather than a simple if. That
should do the trick. (see also "perldoc perlop", paragraph "Regexp
Quote-Like Operators")

while ($response->content =~ m{(.*?)}ig)
{
print "$1\n";
}

Eric Amick · Jul 24, 2005

Thanks for the responses. I gave the quoted bit of code a go but
unfortunately it just went into an infinite loop repeating the first
result.

$response->content is a method call, and the //g business works properly
only when the string does not change from pass to pass in the loop. Try

my $content = $response->content;
while ($content =~ m{(.*?)}ig)

instead.

Klaus Eichner · Jul 24, 2005

Thanks for the responses. I gave the quoted bit of code a go but
unfortunately it just went into an infinite loop repeating the first
result.

I don't think that the "while (...m{...}ig)" is directly responsible for the
infinite loop.

Here is a small, but complete example to demonstrate the principle of "while
(...m{...}ig)":
============================================
use strict;
use warnings;

my $resp = q{
<html>
<body bgcolor="#ffffff">
<title>xxx</title>
item 1 
item 2 
item 3 
item 4 
</body>
</html>
};

while ($resp =~ m{(.*?)}ig)
{
print "$1\n";
}
============================================

The output of that program is:
======================
item 1
item 2
item 3
item 4
======================

...have finally got it to work.

I am happy that you finally succeeded.

[snip]

my $response =
$browser->get("http://www.google.com/search?q=$ARGV[0]");
if ($response->is_success) {
my $content = $response->content;
my @broken = split(" ",$content);
my $searchterm = "(.*?)";
my @found = grep(/$searchterm/i, @broken);
foreach (@found) { if ($_ =~ m{(.*?)}ig) {
@_ = split(' ',$1); print "$_[0]\n"; } }
}
exit;

https request failing	2	Sep 18, 2012
Pattern Matching problem!	12	Nov 14, 2005
pattern matching and abstract functions	12	Mar 29, 2011
pattern matching	11	Apr 20, 2006
How do i resolve this error message Please! I need help	1	Mar 30, 2013
RE Perl Pattern matching	6	Apr 2, 2008
pattern matching	7	May 23, 2006
Pattern Matching and skipping	36	Sep 6, 2006

Pattern matching problem.

demolitionz

Fabian Pilkowski

demolitionz

Klaus Eichner

Eric Amick

Klaus Eichner

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads