removing some text

Discussion in 'Perl Misc' started by Tony W, Jul 30, 2003.

  1. Tony W

    Tony W Guest

    Hello,

    I know nothing about perl but have a perl script that I need to
    modify.

    The line below removes all web links but leaves the text in the file.

    # Remove existing glossary links
    $newbody =~ s/<A HREF=\"Javascript\:popup\('[0-9]+',[^>]*>([^<]*)<\/A>/$1/ig;

    eg. Starts with this:
    <a href="JavaScript:popup('104',380,460);" class="results">rent in
    advance</a>

    ends with this:
    rent in advance

    I don't really understand the code but I know that all current links
    are in the format that I've shown in the example above.

    I want a line that will remove any a href link, except for the term
    (eg landlords - as below)

    <a href="/privrent/landlordresps-360-Een-f0.cfm"
    class="results">Landlords</a>
    Tony W, Jul 30, 2003
    #1
    1. Advertising

  2. (Tony W) writes:

    > I know nothing about perl but have a perl script that I need to
    > modify.


    The standard reply is to need to either learn Perl or hire a Perl
    programmer.

    > The line below removes all web links but leaves the text in the file.


    No very reliably. See FAQ: "How do I remove HTML from a string?"

    > # Remove existing glossary links
    > $newbody =~ s/<A HREF=\"Javascript\:popup\('[0-9]+',[^>]*>([^<]*)<\/A>/$1/ig;
    >
    > eg. Starts with this:
    > <a href="JavaScript:popup('104',380,460);" class="results">rent in
    > advance</a>
    >
    > ends with this:
    > rent in advance
    >
    > I don't really understand the code but I know that all current links
    > are in the format that I've shown in the example above.
    >
    > I want a line that will remove any a href link, except for the term
    > (eg landlords - as below)
    >
    > <a href="/privrent/landlordresps-360-Een-f0.cfm"
    > class="results">Landlords</a>


    A simple HTML::Filter should do that.

    --
    \\ ( )
    . _\\__[oo
    .__/ \\ /\@
    . l___\\
    # ll l\\
    ###LL LL\\
    Brian McCauley, Jul 30, 2003
    #2
    1. Advertising

  3. John J. Trammell, Jul 30, 2003
    #3
  4. Tony W <> wrote:

    > The line below removes all web links but leaves the text in the file.



    > # Remove existing glossary links
    > $newbody =~ s/<A HREF=\"Javascript\:popup\('[0-9]+',[^>]*>([^<]*)<\/A>/$1/ig;

    ^ ^
    ^ ^

    Neither of those backslashes are needed, making the experience
    level of whoever wrote this code questionable...


    > eg. Starts with this:
    ><a href="JavaScript:popup('104',380,460);" class="results">rent in
    > advance</a>



    > I know that all current links
    > are in the format that I've shown in the example above.



    That is a profoundly important caveat.

    It is the one that lets you ignore the usual response to your
    "remove HTML" FAQ.


    > I want a line that will remove any a href link, except for the term
    > (eg landlords - as below)
    >
    ><a href="/privrent/landlordresps-360-Een-f0.cfm"

    ^^^^^
    ^^^^^ where is the "Javascript:popup" part?

    > class="results">Landlords</a>



    So, you want to remove all <a> tags for <a> tags formated as in that
    first one, and you don't care if it is easily broken by legal HTML?

    With all of that out of the way, then you might try doing
    it with a regex.

    But your problem specification is wrong somewhere, the pattern above
    should *already* be leaving that Landlords one alone, it does not
    have "JavaScript" in it...


    ---------------------------------------------
    #!/usr/bin/perl
    use strict;
    use warnings;

    $_ = q(
    <a href="JavaScript:popup('104',380,460);" class="results">rent in advance</a>
    <a href="JavaScript:popup('104',380,460);" class="results">Landlords</a>
    <a href="JavaScript:popup('104',380,460);" class="results">rent in advance</a>
    );


    s/(<A HREF="Javascript:popup\('[0-9]+',[^>]*>([^<]*)<\/A>)/
    $2 eq 'Landlords' ? $1 : $2
    /ige;

    print;
    ---------------------------------------------


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Jul 30, 2003
    #4
  5. Tony W

    Tony W Guest

    (Tad McClellan) wrote in message news:<>...
    > Tony W <> wrote:
    >
    > > The line below removes all web links but leaves the text in the file.

    >
    >
    > > # Remove existing glossary links
    > > $newbody =~ s/<A HREF=\"Javascript\:popup\('[0-9]+',[^>]*>([^<]*)<\/A>/$1/ig;

    > ^ ^
    > ^ ^
    >
    > Neither of those backslashes are needed, making the experience
    > level of whoever wrote this code questionable...
    >
    >
    > > eg. Starts with this:
    > ><a href="JavaScript:popup('104',380,460);" class="results">rent in
    > > advance</a>

    >
    >
    > > I know that all current links
    > > are in the format that I've shown in the example above.

    >
    >
    > That is a profoundly important caveat.
    >
    > It is the one that lets you ignore the usual response to your
    > "remove HTML" FAQ.
    >
    >
    > > I want a line that will remove any a href link, except for the term
    > > (eg landlords - as below)
    > >
    > ><a href="/privrent/landlordresps-360-Een-f0.cfm"

    > ^^^^^
    > ^^^^^ where is the "Javascript:popup" part?
    >
    > > class="results">Landlords</a>

    >
    >
    > So, you want to remove all <a> tags for <a> tags formated as in that
    > first one, and you don't care if it is easily broken by legal HTML?
    >
    > With all of that out of the way, then you might try doing
    > it with a regex.
    >
    > But your problem specification is wrong somewhere, the pattern above
    > should *already* be leaving that Landlords one alone, it does not
    > have "JavaScript" in it...
    >
    >
    > ---------------------------------------------
    > #!/usr/bin/perl
    > use strict;
    > use warnings;
    >
    > $_ = q(
    > <a href="JavaScript:popup('104',380,460);" class="results">rent in advance</a>
    > <a href="JavaScript:popup('104',380,460);" class="results">Landlords</a>
    > <a href="JavaScript:popup('104',380,460);" class="results">rent in advance</a>
    > );
    >
    >
    > s/(<A HREF="Javascript:popup\('[0-9]+',[^>]*>([^<]*)<\/A>)/
    > $2 eq 'Landlords' ? $1 : $2
    > /ige;
    >
    > print;
    > ---------------------------------------------


    Apologies. I think I might have not explained it properly.

    The perl script is part of a process that nightly removes old html
    links and then adds new ones. The current links are in the format:

    <a href="JavaScript:popup('104',380,460);" class="results">rent in
    advance</a>

    this is used to run a javascript function that opens a small window
    showing a glossary definition of the term 'rent in advance'. But now
    it is required to work differently. Now it is just going to be a
    straightforward link to another page such as:

    <a href="/privrent/landlord-360-Een-f0.cfm"
    class="results">Landlords</a>
    <a href="/homeless/index-1292-Een-f0.cfm" class="results">tenure</a>

    There will be no javascript:popup text. Therefore what I require is
    some code to get rid of the html anchor, for example, change:

    <a href="/privrent/landlordresps-360-Een-f0.cfm"
    class="results">Landlords</a>

    to:

    Landlords

    ----
    thanks
    Tony W, Jul 31, 2003
    #5
  6. Tony W <> wrote:
    > (Tad McClellan) wrote in message news:<>...
    >> Tony W <> wrote:



    >> > I want a line that will remove any a href link, except for the term
    >> > (eg landlords - as below)



    >> s/(<A HREF="Javascript:popup\('[0-9]+',[^>]*>([^<]*)<\/A>)/
    >> $2 eq 'Landlords' ? $1 : $2
    >> /ige;



    > There will be no javascript:popup text. Therefore what I require is
    > some code to get rid of the html anchor,



    You already have code to get rid of the html anchor, modify
    the code I gave you so that it does not require the Javascript part.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Jul 31, 2003
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?dnNy?=

    removing some values from trace

    =?Utf-8?B?dnNy?=, Dec 5, 2005, in forum: ASP .Net
    Replies:
    5
    Views:
    356
    Karl Seguin
    Dec 5, 2005
  2. Chris  Chiasson
    Replies:
    6
    Views:
    608
    Richard Tobin
    Nov 14, 2006
  3. Jeremy
    Replies:
    2
    Views:
    335
    Jeremy
    Nov 27, 2007
  4. Ulrich Eckhardt
    Replies:
    6
    Views:
    351
    Bryan
    May 12, 2010
  5. Pavel Tsekov
    Replies:
    0
    Views:
    169
    Pavel Tsekov
    Sep 14, 2006
Loading...

Share This Page