'+' messing up regular expression

Discussion in 'Perl Misc' started by Chris Johnson, Sep 16, 2005.

  1. I've written a CGI script that basically emulates the Apache default
    page, but with more customizations. One of these is the addition of
    content above the file list, and I've decided to use Wikipedia-esque
    shorthand.

    I've got it pretty much working. Except there are some problems with
    the link conversion. (In case you've never seen it,
    [[http://www.google.com|Google]] translates to <a
    href="http://www.google.com">Google</a>)

    I've found that if there's a '+' in the string to be replaced, it
    simply won't be replaced. Here's the code that works on most every
    situation:

    while(/\[\[(.*?)\]\]/g){
    $new = $1;
    if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
    s/\[\[$1\|$2\]\]/$new/g;
    }
    }

    The specific input that's having trouble is

    [[http://fy.chalmers.se/~appro/linux/DVD RW/|dvd rw-tools]]

    but the peculiar thing is that if I remove the +'s, it makes the
    replacement fine (except for the fact that the link is no longer
    valid). So does anyone see why this is happening?

    Thanks for your time,
    Chris
     
    Chris Johnson, Sep 16, 2005
    #1
    1. Advertising

  2. "Chris Johnson" <> wrote in
    news::

    > while(/\[\[(.*?)\]\]/g){
    > $new = $1;
    > if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
    > s/\[\[$1\|$2\]\]/$new/g;
    > }
    > }
    >
    > The specific input that's having trouble is
    >
    > [[http://fy.chalmers.se/~appro/linux/DVD RW/|dvd rw-tools]]


    #!/usr/bin/perl

    use strict;
    use warnings;

    my $s = '[[http://fy.chalmers.se/~appro/linux/DVD RW/|dvd rw-tools]]';

    if($s =~ /^\[\[(.+)\|(.+)\]\]$/) {
    print qq{<a href="$1">$2</a>\n};
    }

    __END__

    D:\Home\asu1\UseNet\clpmisc> c
    <a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">dvd+rw-tools</a>

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Sep 16, 2005
    #2
    1. Advertising

  3. A. Sinan Unur wrote:
    > "Chris Johnson" <> wrote in
    > news::
    >
    > > while(/\[\[(.*?)\]\]/g){
    > > $new = $1;
    > > if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
    > > s/\[\[$1\|$2\]\]/$new/g;
    > > }
    > > }
    > >
    > > The specific input that's having trouble is
    > >
    > > [[http://fy.chalmers.se/~appro/linux/DVD RW/|dvd rw-tools]]

    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > my $s = '[[http://fy.chalmers.se/~appro/linux/DVD RW/|dvd rw-tools]]';
    >
    > if($s =~ /^\[\[(.+)\|(.+)\]\]$/) {
    > print qq{<a href="$1">$2</a>\n};
    > }
    >
    > __END__
    >
    > D:\Home\asu1\UseNet\clpmisc> c
    > <a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">dvd+rw-tools</a>


    I should clarify, it seems. The input is a text file. I do not simply
    want to print the matched patterns; I want to replace the text, and
    then print the entire contents of the file. What I'm curious about is
    why it won't run the s/$old/$new/g if there's a '+' in $old.

    Incidentally, if I change the code to:

    while(/\[\[(.*?)\]\]/g){
    $new = $1;
    if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
    $old = "[[$1|$2]]";
    s/$old/$new/g;
    }
    }

    I get the following error:

    Invalid [] range "w-t" in regex; marked by <-- HERE in
    m/[[http://fy.chalmers.se/~appro/linux/DVD RW/|dvd rw-t <-- HERE
    ools]]/ at index.cgi line 89.
     
    Chris Johnson, Sep 16, 2005
    #3
  4. "Chris Johnson" <> wrote in
    news::

    > A. Sinan Unur wrote:
    >> "Chris Johnson" <> wrote in
    >> news::
    >>
    >> > while(/\[\[(.*?)\]\]/g){
    >> > $new = $1;
    >> > if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
    >> > s/\[\[$1\|$2\]\]/$new/g;
    >> > }
    >> > }
    >> >
    >> > The specific input that's having trouble is
    >> >
    >> > [[http://fy.chalmers.se/~appro/linux/DVD RW/|dvd rw-tools]]

    >>
    >> #!/usr/bin/perl
    >>
    >> use strict;
    >> use warnings;
    >>
    >> my $s =
    >> '[[http://fy.chalmers.se/~appro/linux/DVD RW/|dvd rw-tools]]';
    >>
    >> if($s =~ /^\[\[(.+)\|(.+)\]\]$/) {
    >> print qq{<a href="$1">$2</a>\n};
    >> }
    >>
    >> __END__
    >>
    >> D:\Home\asu1\UseNet\clpmisc> c
    >> <a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">dvd+rw-tools</a>

    >
    > I should clarify, it seems. The input is a text file. I do not simply
    > want to print the matched patterns; I want to replace the text, and
    > then print the entire contents of the file. What I'm curious about is
    > why it won't run the s/$old/$new/g if there's a '+' in $old.


    Because + and - are special in regexes.

    It seems like you need to read the docs.

    From perldoc perlop:

    \Q quote non-word characters till \E

    So, for example:

    use strict ;
    use warnings;

    my $test = 'Sinan+Unur';
    my $old = '+';
    my $new = ' ';

    $test =~ s/$old/$new/g;

    print "$test\n";


    __END__

    D:\Home\asu1\UseNet\clpmisc> c
    Quantifier follows nothing in regex; marked by <-- HERE in m/+ <-- HERE
    / at D:\Home\asu1\UseNet\clpmisc\c.pl line 8.

    Whereas:

    use strict ;
    use warnings;

    my $test = 'Sinan+Unur';
    my $old = '+';
    my $new = ' ';

    $test =~ s/\Q$old\E/$new/g;

    print "$test\n";

    __END__

    D:\Home\asu1\UseNet\clpmisc> c
    Sinan Unur

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Sep 16, 2005
    #4
  5. A. Sinan Unur wrote:
    > "Chris Johnson" <> wrote in
    > news::
    >
    > > while(/\[\[(.*?)\]\]/g){
    > > $new = $1;
    > > if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
    > > s/\[\[$1\|$2\]\]/$new/g;
    > > }
    > > }
    > >
    > > The specific input that's having trouble is
    > >
    > > [[http://fy.chalmers.se/~appro/linux/DVD RW/|dvd rw-tools]]

    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > my $s = '[[http://fy.chalmers.se/~appro/linux/DVD RW/|dvd rw-tools]]';
    >
    > if($s =~ /^\[\[(.+)\|(.+)\]\]$/) {
    > print qq{<a href="$1">$2</a>\n};
    > }
    >
    > __END__
    >
    > D:\Home\asu1\UseNet\clpmisc> c
    > <a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">dvd+rw-tools</a>


    I should clarify, it seems. The input is a text file. I do not simply
    want to print the matched patterns; I want to replace the text, and
    then print the entire contents of the file. What I'm curious about is
    why it won't run the s/$old/$new/g if there's a '+' in $old.

    Incidentally, if I change the code to:

    while(/\[\[(.*?)\]\]/g){
    $new = $1;
    if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
    $old = "[[$1|$2]]";
    s/$old/$new/g;
    }
    }

    I get the following error:

    Invalid [] range "w-t" in regex; marked by <-- HERE in
    m/[[http://fy.chalmers.se/~appro/linux/DVD RW/|dvd rw-t <-- HERE
    ools]]/ at index.cgi line 89.
     
    Chris Johnson, Sep 16, 2005
    #5
  6. Thank you. I was under the impression that those characters only made a
    difference if they were typed explicitly, but not if they were part of
    a variable.
     
    Chris Johnson, Sep 16, 2005
    #6
  7. Chris Johnson wrote:
    [...]
    > then print the entire contents of the file. What I'm curious about is
    > why it won't run the s/$old/$new/g if there's a '+' in $old.


    Well, it does, but probably you didn't mean to use the '+' sign to indicate
    one or more instances of the preceeding unit in the RE.
    Like in /a+/ matches any non-empty sequence of the letter 'a'.

    > Incidentally, if I change the code to:



    > I get the following error:
    >
    > Invalid [] range "w-t" in regex;


    Well, yeah, how many characters are there between 'w' and 't'? Note: I
    didn't ask for characters between 't' and 'w'.

    I strongly recommend you familiarize yourself with regular expressions.
    "perldoc perlretut" is a reasonably good introduction.

    jue
     
    Jürgen Exner, Sep 16, 2005
    #7
  8. A. Sinan Unur <> wrote:

    > Because + and - are special in regexes.



    Hyphen (-) is not meta in a regular expression, while plus (+) is meta.

    Hyphen (-) is meta in a character class, while plus (+) is not meta.


    We must peel our "language onion" to know what funny characters are funny.

    We have a language inside of a language inside of a language. The
    teeny-tiny character class language is inside of the larger regular
    expression language which is inside of big ol' Perl.

    So we must identify which language we are currently in before we
    know what metacharacters apply.

    eg:

    Hyphen (-):

    Perl: subtraction
    RE: not meta
    CC: range

    Caret (^):

    Perl: bitwise exclusive or
    RE: beginning of string
    CC: negates the class


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Sep 16, 2005
    #8
  9. Chris Johnson

    T Beck Guest

    Chris Johnson wrote:
    [snip early description
    >
    > while(/\[\[(.*?)\]\]/g){
    > $new = $1;
    > if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
    > s/\[\[$1\|$2\]\]/$new/g;
    > }
    > }
    >
    > The specific input that's having trouble is
    >
    > [[http://fy.chalmers.se/~appro/linux/DVD RW/|dvd rw-tools]]
    >
    > but the peculiar thing is that if I remove the +'s, it makes the
    > replacement fine (except for the fact that the link is no longer
    > valid). So does anyone see why this is happening?
    >


    Everyone's pointed out how it's happening... here's some code to get
    around it. The trick is to not try to use what you get to do an entire
    second substitution (Sinan alluded to this with his first post, but
    this might be a more useable version for you)

    #!/usr/bin/perl
    use strict;
    use warnings;

    my $input =
    q{[[http://fy.chalmers.se/~appro/linux/DVD RW/|dvd rw-tools]]
    other text
    [[http://www.google.com|google]] Final text};

    $input =~ s/\[\[(.*?)\|(.*?)\]\]/<a href="$1">$2<\/a>/sg;

    print "Output:\n$input\n";

    ../test.pl
    Output:
    <a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">dvd+rw-tools</a>
    other text
    <a href="http://www.google.com">google</a> Final text


    --T Beck
     
    T Beck, Sep 16, 2005
    #9
  10. Tad McClellan <> wrote in
    news::

    > A. Sinan Unur <> wrote:
    >
    >> Because + and - are special in regexes.

    >
    >
    > Hyphen (-) is not meta in a regular expression, while plus (+) is
    > meta.
    >
    > Hyphen (-) is meta in a character class, while plus (+) is not meta.
    >
    >
    > We must peel our "language onion" to know what funny characters are
    > funny.


    Absolutely. Thank you for the clarification.

    Sinan

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Sep 17, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,389
  2. Anders K. Jacobsen [DK]

    How to stop HTML View from messing up HTML code

    Anders K. Jacobsen [DK], Jan 21, 2005, in forum: ASP .Net
    Replies:
    3
    Views:
    704
    Anders K. Jacobsen [DK]
    Jan 22, 2005
  3. Jensen bredal

    IE messing up with font of web pages.

    Jensen bredal, Mar 21, 2005, in forum: ASP .Net
    Replies:
    3
    Views:
    341
    Patrice
    Mar 21, 2005
  4. =?Utf-8?B?cmdyYW5kaWRpZXI=?=

    Is ViewState messing with me?

    =?Utf-8?B?cmdyYW5kaWRpZXI=?=, Jan 9, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    352
    George Ter-Saakov
    Jan 10, 2006
  5. =?Utf-8?B?RGFiYmxlcg==?=

    validators messing up layout in IE before being triggered?

    =?Utf-8?B?RGFiYmxlcg==?=, Apr 11, 2006, in forum: ASP .Net
    Replies:
    4
    Views:
    647
    Swanand Mokashi
    Apr 12, 2006
Loading...

Share This Page