HTML::LinkExtor or me ?

Discussion in 'Perl Misc' started by Saya, May 5, 2004.

  1. Saya

    Saya Guest

    Hi,

    This is the code:

    sub Escape{
    $item = shift;

    use HTML::LinkExtor;

    $p = HTML::LinkExtor->new(\&replaceURL, "");
    $p->parse($item);

    return $item;
    }


    sub replaceURL {

    my(@links) = @_;


    my $makeSubstitution = false;
    my $newLink;

    foreach my $link (@links) {
    #$link =~ s/\/$//i;
    $makeSubstitution = compareValues($link);

    if ($makeSubstitution eq true) {
    if($link =~ /http|www/) {
    if ($link !~ /http/) {
    $newLink = "http://" . $link;
    }
    else {
    $newLink = $link;
    }
    $item =~ s/href=\"$link/href=\"\/redirect.asp?forwardURL=$newLink/i;
    }
    }
    else {
    if($link =~ /http|www/) {
    if ($link !~ /http/) {
    $item=~ s/href=\"$link/href=\"http:\/\/$link/i;
    }
    }
    }
    }
    }

    sub compareValues {
    my $link = shift;

    my @safeLinkArr;
    @safeLinkArr = getSafeSites();
    my $sizeOfArray = @safeLinkArr;
    my $result = true;

    if($sizeOfArray eq 0) {
    return $result;
    }

    foreach my $safeLink (@safeLinkArr) {

    if ( (0 <= (index($link, $safeLink))) or (0 <= (index($safeLink,
    $link))) ) {
    $result = false;
    last;
    }
    else {
    $result = true;
    }
    }

    return $result;
    }



    sub getSafeSites {
    use XML::DOM;

    my $count;
    my $WAPath;
    my @linkArr;


    foreach $arg (@ARGV)
    {
    if ($ARGV[$count] eq '-iw_include-location')
    {
    $WAPath = $ARGV[$count + 1];
    }
    $count++;
    }

    my $nonRedirectList = $WAPath . "/include/nonRedirectList.xml";

    # --- Parsing the XML file ---
    my $parser = XML::DOM::parser->new();
    my $doc = $parser->parsefile($nonRedirectList);

    # --- get all tags ---
    my $links = $doc->getElementsByTagName('Link');
    my $link;

    for my $i (0..$links->getLength()-1) {
    $link = $links->item($i);

    if ($link->getFirstChild->getNodeValue) {
    @linkArr[$i] = $link->getFirstChild->getNodeValue;
    }
    $i++;
    }

    $doc->dispose;

    return @linkArr;
    }

    Escape($item);

    $item = is real scenario is text + <a> + text <a> etc.

    For some reason that I do not understand some links are not parsed
    correctly. Does anyone have a reason for why this might be happening ?

    I have looked at this problem for 2 days now, and can not find the
    problem, so any help will be greatly appreciated :)

    /Saya
    Saya, May 5, 2004
    #1
    1. Advertising

  2. Saya

    Gisle Aas Guest

    (Saya) writes:

    > This is the code:
    >
    > sub Escape{
    > $item = shift;
    >
    > use HTML::LinkExtor;
    >
    > $p = HTML::LinkExtor->new(\&replaceURL, "");
    > $p->parse($item);
    >
    > return $item;
    > }


    What are you actually trying to do? Please describe that and remove
    unrelated details from your example program before you post.

    If you want to do substitutions on links in an HTML document, then
    this example program might be a good start.

    http://search.cpan.org/src/GAAS/HTML-Parser-3.36/eg/hrefsub

    --
    Gisle Aas
    Gisle Aas, May 6, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Kamoski
    Replies:
    1
    Views:
    7,077
  2. Mitchua
    Replies:
    1
    Views:
    7,048
    Ice Demon
    Jul 15, 2003
  3. Laura
    Replies:
    1
    Views:
    518
    Gunnar Hjalmarsson
    Jun 5, 2004
  4. Adam Akhtar
    Replies:
    9
    Views:
    508
    Florian Gilcher
    Aug 16, 2008
  5. Helmut Blass

    Problem with HTML::LinkExtor

    Helmut Blass, Oct 1, 2004, in forum: Perl Misc
    Replies:
    0
    Views:
    66
    Helmut Blass
    Oct 1, 2004
Loading...

Share This Page