weird matching variable behavior

J

juliani.moon

I am using perl v5.8.5.

I made a simple construct to make email addresses hyper-linkable:
if ($_ =~ / ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g) {
s!$1!<a href="mailto:$1">$1</a>!g;
}
However it only effectively takes out the email address, as if only
the first "$1" matched and the "$1" at the 2nd and 3rd places becomes
empty because it replaces an email address only with:
<a href="mailto:"></a>

Then I tried to use "$&" in places of "$1", as in:
if ($_ =~ / ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g) {
s!$&!<a href="mailto:$&">$&</a>!g;
}
This time the email addresses are found and replaced with html codes
like:
<a href="mailto: (e-mail address removed) "> (e-mail address removed) </
a>
-- but "$&" carries a space on both sides of the email address.

It seems trivial and unnecessary although I can work around by this:
if ($_ =~ / ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g) {
$emem = $1;
s!$emem!<a href="mailto:$emem">$emem</a>!g;
}
My wonders are (1) I saw examples where "$1" was used as in my first
example. Why it fails to work as in my first example? (2) Is it by
default that "$&" carries spaces on both sides of its content?
(To"clean" it, I would have to introduce a new variable as "$&" is a
"read-only value"). Things shouldn't be that complicated, are they?

Joe
 
T

Tim Greer

I am using perl v5.8.5.

I made a simple construct to make email addresses hyper-linkable:
if ($_ =~ / ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g) {
s!$1!<a href="mailto:$1">$1</a>!g;
}
However it only effectively takes out the email address, as if only
the first "$1" matched and the "$1" at the 2nd and 3rd places becomes
empty because it replaces an email address only with:
<a href="mailto:"></a>

The above logic won't work.

I assume you meant something like (albeit this isn't a good
solution/example):
s/\b([a-zA-Z\.]+\@[a-zA-Z\.]+)\b/<a href="mailto:$1">$1<\/a>/g;

Are you sure there are white spaces at the start and end of the email
addresses in that string?

Do you assume the email addresses won't have any numbers (digits)
anywhere?

Just so you're aware, the above will match invalid email addresses, and
not match valid one's.
 
D

Dr.Ruud

I am using perl v5.8.5.

I made a simple construct to make email addresses hyper-linkable:
if ($_ =~ / ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g) {
s!$1!<a href="mailto:$1">$1</a>!g;
}

perldoc -f split
perldoc -f join
 
S

sln

I am using perl v5.8.5.

I made a simple construct to make email addresses hyper-linkable:
if ($_ =~ / ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g) {
s!$1!<a href="mailto:$1">$1</a>!g;
}
However it only effectively takes out the email address, as if only
the first "$1" matched and the "$1" at the 2nd and 3rd places becomes
empty because it replaces an email address only with:
<a href="mailto:"></a>

Then I tried to use "$&" in places of "$1", as in:
if ($_ =~ / ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g) {
s!$&!<a href="mailto:$&">$&</a>!g;
}
This time the email addresses are found and replaced with html codes
like:
<a href="mailto: (e-mail address removed) "> (e-mail address removed) </
a>
-- but "$&" carries a space on both sides of the email address.

It seems trivial and unnecessary although I can work around by this:
if ($_ =~ / ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g) {
$emem = $1;
s!$emem!<a href="mailto:$emem">$emem</a>!g;
}
My wonders are (1) I saw examples where "$1" was used as in my first
example. Why it fails to work as in my first example? (2) Is it by
default that "$&" carries spaces on both sides of its content?
(To"clean" it, I would have to introduce a new variable as "$&" is a
"read-only value"). Things shouldn't be that complicated, are they?

Joe

Joe,

Understand some regulare expression basics first.
Don't depend on the documentation for complete knowledge.
Test and retest samples that are redundant with minor variations.
After a while, regulare expression behaviour will become ingrained
into your sub-concious.

-sln

------------------------------------------
## jjj.pl
use strict;
use warnings;

$_ = join '', <DATA>;

# Non - Global /g
# ---------------------
print "\nNon - global, position reset even with match (/g):\n";
print "-"x30,"\n";
/ ([a-zA-Z\.]+\@[a-zA-Z\.]+) /; print $1,"\n" if defined $1;
/ ([a-zA-Z\.]+\@[a-zA-Z\.]+) /; print $1,"\n" if defined $1;
/ ([a-zA-Z\.]+\@[a-zA-Z\.]+) /; print $1,"\n" if defined $1;
print "\n";

# Global /g
# ---------------------
print "\nGlobal, position reset on non-match only (/g):\n";
print "-"x30,"\n";
/ ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g; print $1,"\n" if defined $1;
/ ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g; print $1,"\n" if defined $1;
/ ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g; print $1,"\n" if defined $1;
/ ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g; print $1,"\n" if defined $1;
/ ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g; print $1,"\n" if defined $1;
print " ^^^ did not match, position was reset, but \$1 retained its value\n";
/ ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g; print $1,"\n" if defined $1;


# If, with Global /g, Capture $1, use $1 in substitution
# ----------------------------------------------------
print "\nIf, with Global /g, Capture \$1, use $1 in substitution:\n";
print "-"x30,"\n";
if (/ ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g)
{
print "$1\n";
s!($1)!<a href="mailto:$1">$1</a>!g;
print $1,"\n";
print $_,"\n";
}

# If, with Global /g, Non Capture $1, use $1 in substitution
# ----------------------------------------------------
print "\nIf, with Global /g, NON Capture \$1, use $1 in substitution:\n";
print "-"x30,"\n";
if (/ ([a-zA-Z\.]+\@[a-zA-Z\.]+) /g)
{
print "$1\n";
s!$1!<a href="mailto:$1">$1</a>!g;
print $1,"\n";
print $_,"\n";
}

__DATA__
(e-mail address removed) (e-mail address removed) (e-mail address removed) (e-mail address removed)

=========================================
Output:

c:\temp>perl jjj.pl

Non - global, position reset even with match (/g):
------------------------------
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)


Global, position reset on non-match only (/g):
------------------------------
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
^^^ did not match, position was reset, but $1 retained its value
(e-mail address removed)

If, with Global /g, Capture $1, use (e-mail address removed) in substitution:
------------------------------
(e-mail address removed)
(e-mail address removed)
(e-mail address removed) <a href="mailto:[email protected]">[email protected]</a> SmithC@
Biz.net (e-mail address removed)


If, with Global /g, NON Capture $1, use (e-mail address removed) in substitution:
------------------------------
(e-mail address removed)
Use of uninitialized value in concatenation (.) or string at jjj.pl line 48, <DA
TA> line 1.
Use of uninitialized value in concatenation (.) or string at jjj.pl line 48, <DA
TA> line 1.
Use of uninitialized value in print at jjj.pl line 49, <DATA> line 1.

<a href="mailto:"></a> <a href="mailto:[email protected]">[email protected]</a>
(e-mail address removed) (e-mail address removed)


c:\temp>
 
J

juliani.moon

Thank you for all replies. I've learnt more than I would expect.

Appreciate it!

Joe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top