removing some text

T

Tony W

Hello,

I know nothing about perl but have a perl script that I need to
modify.

The line below removes all web links but leaves the text in the file.

# Remove existing glossary links
$newbody =~ s/<A HREF=\"Javascript\:popup\('[0-9]+',[^>]*>([^<]*)<\/A>/$1/ig;

eg. Starts with this:
<a href="JavaScript:popup('104',380,460);" class="results">rent in
advance</a>

ends with this:
rent in advance

I don't really understand the code but I know that all current links
are in the format that I've shown in the example above.

I want a line that will remove any a href link, except for the term
(eg landlords - as below)

<a href="/privrent/landlordresps-360-Een-f0.cfm"
class="results">Landlords</a>
 
B

Brian McCauley

I know nothing about perl but have a perl script that I need to
modify.

The standard reply is to need to either learn Perl or hire a Perl
programmer.
The line below removes all web links but leaves the text in the file.

No very reliably. See FAQ: "How do I remove HTML from a string?"
# Remove existing glossary links
$newbody =~ s/<A HREF=\"Javascript\:popup\('[0-9]+',[^>]*>([^<]*)<\/A>/$1/ig;

eg. Starts with this:
<a href="JavaScript:popup('104',380,460);" class="results">rent in
advance</a>

ends with this:
rent in advance

I don't really understand the code but I know that all current links
are in the format that I've shown in the example above.

I want a line that will remove any a href link, except for the term
(eg landlords - as below)

<a href="/privrent/landlordresps-360-Een-f0.cfm"
class="results">Landlords</a>

A simple HTML::Filter should do that.

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 
T

Tad McClellan

Tony W said:
The line below removes all web links but leaves the text in the file.

# Remove existing glossary links
$newbody =~ s/<A HREF=\"Javascript\:popup\('[0-9]+',[^>]*>([^<]*)<\/A>/$1/ig;
^ ^
^ ^

Neither of those backslashes are needed, making the experience
level of whoever wrote this code questionable...

eg. Starts with this:
<a href="JavaScript:popup('104',380,460);" class="results">rent in
advance</a>

I know that all current links
are in the format that I've shown in the example above.


That is a profoundly important caveat.

It is the one that lets you ignore the usual response to your
"remove HTML" FAQ.

I want a line that will remove any a href link, except for the term
(eg landlords - as below)

<a href="/privrent/landlordresps-360-Een-f0.cfm"
^^^^^
^^^^^ where is the "Javascript:popup" part?
class="results">Landlords</a>


So, you want to remove all <a> tags for <a> tags formated as in that
first one, and you don't care if it is easily broken by legal HTML?

With all of that out of the way, then you might try doing
it with a regex.

But your problem specification is wrong somewhere, the pattern above
should *already* be leaving that Landlords one alone, it does not
have "JavaScript" in it...


---------------------------------------------
#!/usr/bin/perl
use strict;
use warnings;

$_ = q(
<a href="JavaScript:popup('104',380,460);" class="results">rent in advance</a>
<a href="JavaScript:popup('104',380,460);" class="results">Landlords</a>
<a href="JavaScript:popup('104',380,460);" class="results">rent in advance</a>
);


s/(<A HREF="Javascript:popup\('[0-9]+',[^>]*>([^<]*)<\/A>)/
$2 eq 'Landlords' ? $1 : $2
/ige;

print;
 
T

Tony W

Tony W said:
The line below removes all web links but leaves the text in the file.

# Remove existing glossary links
$newbody =~ s/<A HREF=\"Javascript\:popup\('[0-9]+',[^>]*>([^<]*)<\/A>/$1/ig;
^ ^
^ ^

Neither of those backslashes are needed, making the experience
level of whoever wrote this code questionable...

eg. Starts with this:
<a href="JavaScript:popup('104',380,460);" class="results">rent in
advance</a>

I know that all current links
are in the format that I've shown in the example above.


That is a profoundly important caveat.

It is the one that lets you ignore the usual response to your
"remove HTML" FAQ.

I want a line that will remove any a href link, except for the term
(eg landlords - as below)

<a href="/privrent/landlordresps-360-Een-f0.cfm"
^^^^^
^^^^^ where is the "Javascript:popup" part?
class="results">Landlords</a>


So, you want to remove all <a> tags for <a> tags formated as in that
first one, and you don't care if it is easily broken by legal HTML?

With all of that out of the way, then you might try doing
it with a regex.

But your problem specification is wrong somewhere, the pattern above
should *already* be leaving that Landlords one alone, it does not
have "JavaScript" in it...


---------------------------------------------
#!/usr/bin/perl
use strict;
use warnings;

$_ = q(
<a href="JavaScript:popup('104',380,460);" class="results">rent in advance</a>
<a href="JavaScript:popup('104',380,460);" class="results">Landlords</a>
<a href="JavaScript:popup('104',380,460);" class="results">rent in advance</a>
);


s/(<A HREF="Javascript:popup\('[0-9]+',[^>]*>([^<]*)<\/A>)/
$2 eq 'Landlords' ? $1 : $2
/ige;

print;
---------------------------------------------

Apologies. I think I might have not explained it properly.

The perl script is part of a process that nightly removes old html
links and then adds new ones. The current links are in the format:

<a href="JavaScript:popup('104',380,460);" class="results">rent in
advance</a>

this is used to run a javascript function that opens a small window
showing a glossary definition of the term 'rent in advance'. But now
it is required to work differently. Now it is just going to be a
straightforward link to another page such as:

<a href="/privrent/landlord-360-Een-f0.cfm"
class="results">Landlords</a>
<a href="/homeless/index-1292-Een-f0.cfm" class="results">tenure</a>

There will be no javascript:popup text. Therefore what I require is
some code to get rid of the html anchor, for example, change:

<a href="/privrent/landlordresps-360-Een-f0.cfm"
class="results">Landlords</a>

to:

Landlords
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top