reg exp

K

Ken Chesak

Perl scipt is formatting text for HTML page. It changes things like
an & to &amp. But should not change &nbsp. It uses \ as an escape
character. So \&nbsp will become &nbsp. The final results are
correct, but is there a better way to do this?

Input file test.txt
\HOME & \  BORN \& FREE BORN FREE ' \' HELP " \" w\\\\\\\w

1st change
1a= \HOME & \  BORN \& FREE BORN FREE '' \' HELP " \"
w\\\\\\\w
2nd changes
1b= HOME &   BORN & FREE BORN FREE '' ' HELP " "
w\\\w

#!/usr/local/bin/perl5
#
%encode = ( '&' => '&',
'"' => '"',
'\'' => '\'\'' );

$data = `cat test.txt`;
print "Oa= $data\n";
$data =~ s/(?<!\\)(.)/defined($encode{$1})?$encode{$1}:$1/eg;
print "1a= $data\n";
$data =~ s/(\\)(.)/$2/g;
print "1b= $data\n";


This is perl, v5.8.0 built for PA-RISC2.0 On HP-Unix.
 
G

Gunnar Hjalmarsson

Ken said:
Perl scipt is formatting text for HTML page. It changes things like
an & to &amp. But should not change &nbsp. It uses \ as an escape
character. So \&nbsp will become &nbsp. The final results are
correct, but is there a better way to do this?

Input file test.txt
\HOME & \&nbsp; BORN \& FREE BORN FREE ' \' HELP " \" w\\\\\\\w

1st change
1a= \HOME &amp; \&nbsp; BORN \& FREE BORN FREE '' \' HELP &quot; \"
w\\\\\\\w
2nd changes
1b= HOME &amp; &nbsp; BORN & FREE BORN FREE '' ' HELP &quot; "
w\\\w

#!/usr/local/bin/perl5
#
%encode = ( '&' => '&amp;',
'"' => '&quot;',
'\'' => '\'\'' );

$data = `cat test.txt`;
print "Oa= $data\n";
$data =~ s/(?<!\\)(.)/defined($encode{$1})?$encode{$1}:$1/eg;
print "1a= $data\n";
$data =~ s/(\\)(.)/$2/g;
print "1b= $data\n";

Don't know about better, but this does it with one substitution, and
does not require escaping of HTML entities in the original text:

$data =~ s{(&#?\w+;)|\\(.)|([&"'])}
{ $1 ? $1 : $2 ? $2 : $encode{$3} }eg;

Another thing is that I'm a bit confused about the wider purpose with
the exercise...
 
J

Joe Smith

Ken said:
Perl scipt is formatting text for HTML page. It changes things like
an & to &amp. But should not change &nbsp.

You've got bad or inconsistent input data.
Whatever process created the "&nbsp;" items is responsible for making
sure that all the other & occurances are set to "&amp;". You should
fix the upstream process instead of doing post-processing.
-Joe
 
K

Ken Chesak

Gunnar Hjalmarsson said:
Ken said:
Perl scipt is formatting text for HTML page. It changes things like
an & to &amp. But should not change &nbsp. It uses \ as an escape
character. So \&nbsp will become &nbsp. The final results are
correct, but is there a better way to do this?

Input file test.txt
\HOME & \&nbsp; BORN \& FREE BORN FREE ' \' HELP " \" w\\\\\\\w

1st change
1a= \HOME &amp; \&nbsp; BORN \& FREE BORN FREE '' \' HELP &quot; \"
w\\\\\\\w
2nd changes
1b= HOME &amp; &nbsp; BORN & FREE BORN FREE '' ' HELP &quot; "
w\\\w

#!/usr/local/bin/perl5
#
%encode = ( '&' => '&amp;',
'"' => '&quot;',
'\'' => '\'\'' );

$data = `cat test.txt`;
print "Oa= $data\n";
$data =~ s/(?<!\\)(.)/defined($encode{$1})?$encode{$1}:$1/eg;
print "1a= $data\n";
$data =~ s/(\\)(.)/$2/g;
print "1b= $data\n";

Don't know about better, but this does it with one substitution, and
does not require escaping of HTML entities in the original text:

$data =~ s{(&#?\w+;)|\\(.)|([&"'])}
{ $1 ? $1 : $2 ? $2 : $encode{$3} }eg;

Another thing is that I'm a bit confused about the wider purpose with
the exercise...

Gunnar,

Thanks, that works nicely. I had not thought of using the ";" to
anchor the html reserved words.

I had one question, what does the ? and : do on the following line,
{ $1 ? $1 : $2 ? $2 : $encode{$3} }eg;

The purpose of the script is to format the text for HTML. It was
originally changing all & to &amp. So when they started putting &nbsp
in, that was being changed to &ampnbsp. Which does not mean anything
to HTML.

Thanks again,
Ken
 
G

Gunnar Hjalmarsson

Ken said:
I had one question, what does the ? and : do on the following line,
{ $1 ? $1 : $2 ? $2 : $encode{$3} }eg;

It's called the conditional operator, and is a shorter way of writing

if ($1) {
$1
} elsif ($2) {
$2
} else {
$encode{$3}
}

See "perldoc perlop".
 
N

nobull

Gunnar Hjalmarsson said:
It's called the conditional operator, and is a shorter way of writing

if ($1) {
$1
} elsif ($2) {
$2
} else {
$encode{$3}
}

Or a longer way of writing...

$1 || $2 || $encode{$3}

....depending on your point of view.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top