function modification

A

abraxas

Hello,
i'm just a beginner using Perl and i have a little problem.
I have an html page with several links and a function that search the first
of those links. The function is the following:
search_first_link {
$cadena = $_[0];
$p = index ($cadena, '<a');
return -1 if ($p < 0);
$p1 = index ($cadena, 'href', $p);
return -1 if ($p1 < 0);
$p2 = index ($cadena, '>', $p1);
return -1 if ($p2 < 0);
$subcadena = substr ($cadena, $p1, $p2-$p1);
$reg_exp = "href=\"([^\"]+)\"";
return ($1) if ($subcadena =~ /$reg_exp/s);
return (-1);
}

It takes the html code as a parameter and returns the url of the first link
that it finds. For example, if the first link that the function find is the
following:
<a href="http://www.page.com">
My Page
</a>

it returns "http://www.page.com".

My problem is that also i want the function to return the text of the link,
"My Page".
How can I modify the function to get it?

Thanks and sorry for my awfull english!!! :)
 
J

Jürgen Exner

abraxas said:
I have an html page with several links and a function that search the
first of those links. The function is the following:
[home-cooked code based on REs snipped]
It takes the html code as a parameter and returns the url of the
first link that it finds. For example, if the first link that the
function find is the following:
<a href="http://www.page.com">
My Page
</a>

And it fails on a myriad of legal HTML code, e.g. even if the the tag name
is upper case. While this particular problem may be easy to fix, there are
gazillions of more 'easy-to-fix' issues, which -taken as a whole- make it a
waste of time to even start fixing them when you consider that there are
ready-made modules that do the job perfectly.
My problem is that also i want the function to return the text of the
link, "My Page".

Actually no. Your problem is that you are trying to parse HTML using regular
expressions.
As explained in 'perldoc -q HTML': "How do I remove HTML from a string?" a
simple RE-based approach may work on simple HTML code where you have full
control of the source code. But nobody with a sane mind would attempt to
write a general HTML parser using REs.

You will be way better of using one of the HTML parser modules from CPAN.

jue
 
G

Gunnar Hjalmarsson

abraxas said:
i'm just a beginner using Perl and i have a little problem. I have
an html page with several links and a function that search the
first of those links. The function is the following:
search_first_link {
$cadena = $_[0];
$p = index ($cadena, '<a');
return -1 if ($p < 0);
$p1 = index ($cadena, 'href', $p);
return -1 if ($p1 < 0);
$p2 = index ($cadena, '>', $p1);
return -1 if ($p2 < 0);
$subcadena = substr ($cadena, $p1, $p2-$p1);
$reg_exp = "href=\"([^\"]+)\"";
return ($1) if ($subcadena =~ /$reg_exp/s);
return (-1);
}

From where did you copy that function?
My problem is that also i want the function to return the text of
the link, "My Page".
How can I modify the function to get it?

This may work, but only under certain conditions:

sub search_first_link {
my $reg_exp = 'href="([^"]+)"[^>]*>([^<]+)<';
return $1, $2 if shift =~ /$reg_exp/i;
return -1;
}
my ($url, $text) = search_first_link($html);

But you'd better follow Jürgen's advise and use a module.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top