F
fatted
From a html file, I'd like to extract a href value of an <a> tag which
contains an <img> tag who's src value I'm searching on.
Basically (but theres more!):
<a href="IwantThis.html"><img src="importantimage.gif"></a>
(Un)Interesting part:
I first match a line from the html file containing importantimage.gif,
I then try to find my href value on this line.
But this line contains multiple <a> tags, (which have href values and
might also have an <img> tag with associated src value). Also all of
the <a> tags and <img> tags have more than one attribute.
So the line actually looks something like this:
<a class="red" href="uninteresting.html" target="_new">Not so exciting
text</a><a href="equallyboring.html" class = "blue">yawn</a><a
class="green" href="IwantThis.html"><img border="0"
src="importantimage.gif" alt="MeMe"></a>
My code:
use warnings;
use strict;
open(FILE,"<","4body.html");
while(<FILE>)
{
my $line = $_;
if($line =~ /importantimage\.gif/i)
{
if($line =~ /<a.+?href="(.+?)".+?src="importantimage\.gif".+?><\/a>/)
{
print $1."\n";
}
}
}
which results in:
uninteresting.html
I think I understand why it gets this value, but I can't get the value
I want
contains an <img> tag who's src value I'm searching on.
Basically (but theres more!):
<a href="IwantThis.html"><img src="importantimage.gif"></a>
(Un)Interesting part:
I first match a line from the html file containing importantimage.gif,
I then try to find my href value on this line.
But this line contains multiple <a> tags, (which have href values and
might also have an <img> tag with associated src value). Also all of
the <a> tags and <img> tags have more than one attribute.
So the line actually looks something like this:
<a class="red" href="uninteresting.html" target="_new">Not so exciting
text</a><a href="equallyboring.html" class = "blue">yawn</a><a
class="green" href="IwantThis.html"><img border="0"
src="importantimage.gif" alt="MeMe"></a>
My code:
use warnings;
use strict;
open(FILE,"<","4body.html");
while(<FILE>)
{
my $line = $_;
if($line =~ /importantimage\.gif/i)
{
if($line =~ /<a.+?href="(.+?)".+?src="importantimage\.gif".+?><\/a>/)
{
print $1."\n";
}
}
}
which results in:
uninteresting.html
I think I understand why it gets this value, but I can't get the value
I want