'+' messing up regular expression

Chris Johnson · Sep 16, 2005

I've written a CGI script that basically emulates the Apache default
page, but with more customizations. One of these is the addition of
content above the file list, and I've decided to use Wikipedia-esque
shorthand.

I've got it pretty much working. Except there are some problems with
the link conversion. (In case you've never seen it,
[[http://www.google.com|Google]] translates to <a
href="http://www.google.com">Google</a>)

I've found that if there's a '+' in the string to be replaced, it
simply won't be replaced. Here's the code that works on most every
situation:

while(/\[\[(.*?)\]\]/g){
$new = $1;
if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
s/\[\[$1\|$2\]\]/$new/g;
}
}

The specific input that's having trouble is

[[http://fy.chalmers.se/~appro/linux/DVD+RW/|dvd+rw-tools]]

but the peculiar thing is that if I remove the +'s, it makes the
replacement fine (except for the fact that the link is no longer
valid). So does anyone see why this is happening?

Thanks for your time,
Chris

A. Sinan Unur · Sep 16, 2005

while(/\[\[(.*?)\]\]/g){
$new = $1;
if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
s/\[\[$1\|$2\]\]/$new/g;
}
}

The specific input that's having trouble is

[[http://fy.chalmers.se/~appro/linux/DVD+RW/|dvd+rw-tools]]

#!/usr/bin/perl

use strict;
use warnings;

my $s = '[[http://fy.chalmers.se/~appro/linux/DVD+RW/|dvd+rw-tools]]';

if($s =~ /^\[\[(.+)\|(.+)\]\]$/) {
print qq{<a href="$1">$2</a>\n};
}

__END__

D:\Home\asu1\UseNet\clpmisc> c
<a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">dvd+rw-tools</a>

Chris Johnson · Sep 16, 2005

A. Sinan Unur said:
while(/\[\[(.*?)\]\]/g){
$new = $1;
if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
s/\[\[$1\|$2\]\]/$new/g;
}
}

The specific input that's having trouble is

[[http://fy.chalmers.se/~appro/linux/DVD+RW/|dvd+rw-tools]]

Click to expand...

#!/usr/bin/perl

use strict;
use warnings;

my $s = '[[http://fy.chalmers.se/~appro/linux/DVD+RW/|dvd+rw-tools]]';

if($s =~ /^\[\[(.+)\|(.+)\]\]$/) {
print qq{<a href="$1">$2</a>\n};
}

__END__

D:\Home\asu1\UseNet\clpmisc> c
<a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">dvd+rw-tools</a>

I should clarify, it seems. The input is a text file. I do not simply
want to print the matched patterns; I want to replace the text, and
then print the entire contents of the file. What I'm curious about is
why it won't run the s/$old/$new/g if there's a '+' in $old.

Incidentally, if I change the code to:

while(/\[\[(.*?)\]\]/g){
$new = $1;
if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
$old = "[[$1|$2]]";
s/$old/$new/g;
}
}

I get the following error:

Invalid [] range "w-t" in regex; marked by <-- HERE in
m/[[http://fy.chalmers.se/~appro/linux/DVD+RW/|dvd+rw-t <-- HERE
ools]]/ at index.cgi line 89.

A. Sinan Unur · Sep 16, 2005

A. Sinan Unur said:
A. Sinan Unur said:

while(/\[\[(.*?)\]\]/g){
$new = $1;
if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
s/\[\[$1\|$2\]\]/$new/g;
}
}

The specific input that's having trouble is

[[http://fy.chalmers.se/~appro/linux/DVD+RW/|dvd+rw-tools]]

Click to expand...

#!/usr/bin/perl

use strict;
use warnings;

my $s =
'[[http://fy.chalmers.se/~appro/linux/DVD+RW/|dvd+rw-tools]]';

if($s =~ /^\[\[(.+)\|(.+)\]\]$/) {
print qq{<a href="$1">$2</a>\n};
}

__END__

D:\Home\asu1\UseNet\clpmisc> c
<a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">dvd+rw-tools</a>

Click to expand...

I should clarify, it seems. The input is a text file. I do not simply
want to print the matched patterns; I want to replace the text, and
then print the entire contents of the file. What I'm curious about is
why it won't run the s/$old/$new/g if there's a '+' in $old.

Because + and - are special in regexes.

It seems like you need to read the docs.

From perldoc perlop:

\Q quote non-word characters till \E

So, for example:

use strict ;
use warnings;

my $test = 'Sinan+Unur';
my $old = '+';
my $new = ' ';

$test =~ s/$old/$new/g;

print "$test\n";

__END__

D:\Home\asu1\UseNet\clpmisc> c
Quantifier follows nothing in regex; marked by <-- HERE in m/+ <-- HERE
/ at D:\Home\asu1\UseNet\clpmisc\c.pl line 8.

Whereas:

use strict ;
use warnings;

my $test = 'Sinan+Unur';
my $old = '+';
my $new = ' ';

$test =~ s/\Q$old\E/$new/g;

print "$test\n";

__END__

D:\Home\asu1\UseNet\clpmisc> c
Sinan Unur

Chris Johnson · Sep 16, 2005

A. Sinan Unur said:
while(/\[\[(.*?)\]\]/g){
$new = $1;
if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
s/\[\[$1\|$2\]\]/$new/g;
}
}

The specific input that's having trouble is

[[http://fy.chalmers.se/~appro/linux/DVD+RW/|dvd+rw-tools]]

Click to expand...

#!/usr/bin/perl

use strict;
use warnings;

my $s = '[[http://fy.chalmers.se/~appro/linux/DVD+RW/|dvd+rw-tools]]';

if($s =~ /^\[\[(.+)\|(.+)\]\]$/) {
print qq{<a href="$1">$2</a>\n};
}

__END__

D:\Home\asu1\UseNet\clpmisc> c
<a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">dvd+rw-tools</a>

I should clarify, it seems. The input is a text file. I do not simply
want to print the matched patterns; I want to replace the text, and
then print the entire contents of the file. What I'm curious about is
why it won't run the s/$old/$new/g if there's a '+' in $old.

Incidentally, if I change the code to:

while(/\[\[(.*?)\]\]/g){
$new = $1;
if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
$old = "[[$1|$2]]";
s/$old/$new/g;
}
}

I get the following error:

Invalid [] range "w-t" in regex; marked by <-- HERE in
m/[[http://fy.chalmers.se/~appro/linux/DVD+RW/|dvd+rw-t <-- HERE
ools]]/ at index.cgi line 89.

Chris Johnson · Sep 16, 2005

Thank you. I was under the impression that those characters only made a
difference if they were typed explicitly, but not if they were part of
a variable.

Jürgen Exner · Sep 16, 2005

Chris Johnson wrote:
[...]

then print the entire contents of the file. What I'm curious about is
why it won't run the s/$old/$new/g if there's a '+' in $old.

Well, it does, but probably you didn't mean to use the '+' sign to indicate
one or more instances of the preceeding unit in the RE.
Like in /a+/ matches any non-empty sequence of the letter 'a'.

Incidentally, if I change the code to:

I get the following error:

Invalid [] range "w-t" in regex;

Well, yeah, how many characters are there between 'w' and 't'? Note: I
didn't ask for characters between 't' and 'w'.

I strongly recommend you familiarize yourself with regular expressions.
"perldoc perlretut" is a reasonably good introduction.

jue

Tad McClellan · Sep 16, 2005

A. Sinan Unur said:
Because + and - are special in regexes.

Hyphen (-) is not meta in a regular expression, while plus (+) is meta.

Hyphen (-) is meta in a character class, while plus (+) is not meta.

We must peel our "language onion" to know what funny characters are funny.

We have a language inside of a language inside of a language. The
teeny-tiny character class language is inside of the larger regular
expression language which is inside of big ol' Perl.

So we must identify which language we are currently in before we
know what metacharacters apply.

eg:

Hyphen (-):

Perl: subtraction
RE: not meta
CC: range

Caret (^):

Perl: bitwise exclusive or
RE: beginning of string
CC: negates the class

T Beck · Sep 16, 2005

Chris Johnson wrote:
[snip early description

while(/\[\[(.*?)\]\]/g){
$new = $1;
if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
s/\[\[$1\|$2\]\]/$new/g;
}
}

The specific input that's having trouble is

[[http://fy.chalmers.se/~appro/linux/DVD+RW/|dvd+rw-tools]]

but the peculiar thing is that if I remove the +'s, it makes the
replacement fine (except for the fact that the link is no longer
valid). So does anyone see why this is happening?

Everyone's pointed out how it's happening... here's some code to get
around it. The trick is to not try to use what you get to do an entire
second substitution (Sinan alluded to this with his first post, but
this might be a more useable version for you)

#!/usr/bin/perl
use strict;
use warnings;

my $input =
q{[[http://fy.chalmers.se/~appro/linux/DVD+RW/|dvd+rw-tools]]
other text
[[http://www.google.com|google]] Final text};

$input =~ s/\[\[(.*?)\|(.*?)\]\]/<a href="$1">$2<\/a>/sg;

print "Output:\n$input\n";

../test.pl
Output:
<a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">dvd+rw-tools</a>
other text
<a href="http://www.google.com">google</a> Final text

--T Beck

A. Sinan Unur · Sep 17, 2005

Hyphen (-) is not meta in a regular expression, while plus (+) is
meta.

Hyphen (-) is meta in a character class, while plus (+) is not meta.

We must peel our "language onion" to know what funny characters are
funny.

Absolutely. Thank you for the clarification.

Sinan

Only one table shows up with the information	2	Mar 29, 2023
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
fork messing up parent filehandle	6	Jun 20, 2006
Requesting regular expression help	12	Feb 26, 2010
FAQ 6.5 I put a regular expression into $/ but it didn't work. What's wrong?	0	Jan 28, 2011
Problem from complex string messing up	1	Aug 24, 2007
Regular expression help	4	May 29, 2006
Help with my responsive home page	2	Dec 14, 2022

'+' messing up regular expression

Chris Johnson

A. Sinan Unur

Chris Johnson

A. Sinan Unur

Chris Johnson

Chris Johnson

Jürgen Exner

Tad McClellan

T Beck

A. Sinan Unur

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads