reg exp

Discussion in 'Perl' started by Ken Chesak, Aug 30, 2004.

  1. Ken Chesak

    Ken Chesak Guest

    Perl scipt is formatting text for HTML page. It changes things like
    an & to &amp. But should not change &nbsp. It uses \ as an escape
    character. So \&nbsp will become &nbsp. The final results are
    correct, but is there a better way to do this?

    Input file test.txt
    \HOME & \  BORN \& FREE BORN FREE ' \' HELP " \" w\\\\\\\w

    1st change
    1a= \HOME & \  BORN \& FREE BORN FREE '' \' HELP " \"
    w\\\\\\\w
    2nd changes
    1b= HOME &   BORN & FREE BORN FREE '' ' HELP " "
    w\\\w

    #!/usr/local/bin/perl5
    #
    %encode = ( '&' => '&',
    '"' => '"',
    '\'' => '\'\'' );

    $data = `cat test.txt`;
    print "Oa= $data\n";
    $data =~ s/(?<!\\)(.)/defined($encode{$1})?$encode{$1}:$1/eg;
    print "1a= $data\n";
    $data =~ s/(\\)(.)/$2/g;
    print "1b= $data\n";


    This is perl, v5.8.0 built for PA-RISC2.0 On HP-Unix.
     
    Ken Chesak, Aug 30, 2004
    #1
    1. Advertising

  2. Ken Chesak wrote:
    > Perl scipt is formatting text for HTML page. It changes things like
    > an & to &amp. But should not change &nbsp. It uses \ as an escape
    > character. So \&nbsp will become &nbsp. The final results are
    > correct, but is there a better way to do this?
    >
    > Input file test.txt
    > \HOME & \&nbsp; BORN \& FREE BORN FREE ' \' HELP " \" w\\\\\\\w
    >
    > 1st change
    > 1a= \HOME &amp; \&nbsp; BORN \& FREE BORN FREE '' \' HELP &quot; \"
    > w\\\\\\\w
    > 2nd changes
    > 1b= HOME &amp; &nbsp; BORN & FREE BORN FREE '' ' HELP &quot; "
    > w\\\w
    >
    > #!/usr/local/bin/perl5
    > #
    > %encode = ( '&' => '&amp;',
    > '"' => '&quot;',
    > '\'' => '\'\'' );
    >
    > $data = `cat test.txt`;
    > print "Oa= $data\n";
    > $data =~ s/(?<!\\)(.)/defined($encode{$1})?$encode{$1}:$1/eg;
    > print "1a= $data\n";
    > $data =~ s/(\\)(.)/$2/g;
    > print "1b= $data\n";


    Don't know about better, but this does it with one substitution, and
    does not require escaping of HTML entities in the original text:

    $data =~ s{(&#?\w+;)|\\(.)|([&"'])}
    { $1 ? $1 : $2 ? $2 : $encode{$3} }eg;

    Another thing is that I'm a bit confused about the wider purpose with
    the exercise...

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Aug 30, 2004
    #2
    1. Advertising

  3. Ken Chesak

    Joe Smith Guest

    Ken Chesak wrote:

    > Perl scipt is formatting text for HTML page. It changes things like
    > an & to &amp. But should not change &nbsp.


    You've got bad or inconsistent input data.
    Whatever process created the "&nbsp;" items is responsible for making
    sure that all the other & occurances are set to "&amp;". You should
    fix the upstream process instead of doing post-processing.
    -Joe
     
    Joe Smith, Aug 30, 2004
    #3
  4. Ken Chesak

    Ken Chesak Guest

    Gunnar Hjalmarsson <> wrote in message news:<ehJYc.102165$>...
    > Ken Chesak wrote:
    > > Perl scipt is formatting text for HTML page. It changes things like
    > > an & to &amp. But should not change &nbsp. It uses \ as an escape
    > > character. So \&nbsp will become &nbsp. The final results are
    > > correct, but is there a better way to do this?
    > >
    > > Input file test.txt
    > > \HOME & \&nbsp; BORN \& FREE BORN FREE ' \' HELP " \" w\\\\\\\w
    > >
    > > 1st change
    > > 1a= \HOME &amp; \&nbsp; BORN \& FREE BORN FREE '' \' HELP &quot; \"
    > > w\\\\\\\w
    > > 2nd changes
    > > 1b= HOME &amp; &nbsp; BORN & FREE BORN FREE '' ' HELP &quot; "
    > > w\\\w
    > >
    > > #!/usr/local/bin/perl5
    > > #
    > > %encode = ( '&' => '&amp;',
    > > '"' => '&quot;',
    > > '\'' => '\'\'' );
    > >
    > > $data = `cat test.txt`;
    > > print "Oa= $data\n";
    > > $data =~ s/(?<!\\)(.)/defined($encode{$1})?$encode{$1}:$1/eg;
    > > print "1a= $data\n";
    > > $data =~ s/(\\)(.)/$2/g;
    > > print "1b= $data\n";

    >
    > Don't know about better, but this does it with one substitution, and
    > does not require escaping of HTML entities in the original text:
    >
    > $data =~ s{(&#?\w+;)|\\(.)|([&"'])}
    > { $1 ? $1 : $2 ? $2 : $encode{$3} }eg;
    >
    > Another thing is that I'm a bit confused about the wider purpose with
    > the exercise...


    Gunnar,

    Thanks, that works nicely. I had not thought of using the ";" to
    anchor the html reserved words.

    I had one question, what does the ? and : do on the following line,
    { $1 ? $1 : $2 ? $2 : $encode{$3} }eg;

    The purpose of the script is to format the text for HTML. It was
    originally changing all & to &amp. So when they started putting &nbsp
    in, that was being changed to &ampnbsp. Which does not mean anything
    to HTML.

    Thanks again,
    Ken
     
    Ken Chesak, Aug 31, 2004
    #4
  5. Ken Chesak wrote:
    > I had one question, what does the ? and : do on the following line,
    > { $1 ? $1 : $2 ? $2 : $encode{$3} }eg;


    It's called the conditional operator, and is a shorter way of writing

    if ($1) {
    $1
    } elsif ($2) {
    $2
    } else {
    $encode{$3}
    }

    See "perldoc perlop".

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Aug 31, 2004
    #5
  6. Ken Chesak

    Guest

    Gunnar Hjalmarsson <> wrote in message news:<917Zc.102309$>...
    > Ken Chesak wrote:
    > > I had one question, what does the ? and : do on the following line,
    > > { $1 ? $1 : $2 ? $2 : $encode{$3} }eg;

    >
    > It's called the conditional operator, and is a shorter way of writing
    >
    > if ($1) {
    > $1
    > } elsif ($2) {
    > $2
    > } else {
    > $encode{$3}
    > }


    Or a longer way of writing...

    $1 || $2 || $encode{$3}

    ....depending on your point of view.
     
    , Sep 1, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andrew Rowland

    Reg exp: matching relative path only.

    Andrew Rowland, Aug 2, 2003, in forum: Perl
    Replies:
    0
    Views:
    1,161
    Andrew Rowland
    Aug 2, 2003
  2. psk

    Newbie-Reg Exp

    psk, Jan 16, 2004, in forum: Perl
    Replies:
    2
    Views:
    1,381
    Gunnar Hjalmarsson
    Jan 19, 2004
  3. PerlE

    Reg Exp Help

    PerlE, Jan 30, 2004, in forum: Perl
    Replies:
    0
    Views:
    520
    PerlE
    Jan 30, 2004
  4. Ken Chesak

    reg exp

    Ken Chesak, Aug 26, 2004, in forum: Perl
    Replies:
    0
    Views:
    646
    Ken Chesak
    Aug 26, 2004
  5. Aristotle

    Help needed with reg exp please

    Aristotle, Sep 4, 2004, in forum: Perl
    Replies:
    4
    Views:
    508
    Gunnar Hjalmarsson
    Sep 4, 2004
Loading...

Share This Page