Regex backref returns extra data

D

Dan Rawson

I'm attempting to extract a single word from a line which looks like:

TARGET = file1 file2 file3 .....

I need 'file1' from this line. Using:

s/.*?\=\s*([a-zA-Z_\.0-9]+).*/$1/ (Note the "+" following the char. class)

works fine as long as file1 is present. However, if I have a line which ends after the "=" sign, it returns the entire
line, instead of an empty string.

If I change the pattern to:

s/.*?\=\s*([a-zA-Z_\.0-9]*).*/$1/ (Note the "*" following the char. class)

it works; I get 'file1' back if it's there, and an empty string if not.

Questions:
1. My understanding was that the "+" would cause it to match one or more of the char. class (which would make it a
legal file name in this case). Why does it return the entire line if there's no match??
2. Why does the second pattern work differently than the first one if there's nothing after the "=" sign?

TIA . . . .

Dan
 
S

Stefan

Hi Dan

I think you are using a substitution (s///) where you actually want a
match (m//). If your substitution regex doesn't match, nothing will be
replaced and you will get the whole line back. If you want to extract
the first file name, you would be better off writing...

'TARGET = file1 file2 file3' => m/.*?=\s*(\w.+)/;

....after which $1 will contain 'file1'.

Hope this helps.


Stefan
 
D

Dan Rawson

Christian said:
But be careful with $1, as it will not
be reset after an unsuccessful match.
e.g.:

my $xx = "a b c";
$xx =~ /^(a)/; # matches
print "\$1 = $1\n";
$xx =~ /^(b)/; # does not match
print "\$1 = $1\n";

which prints out:
$1 = a
$1 = a
and may not be what one expects.

So one should either use capturing (see my reply
to the OP) or add a check like
if( /.*?=\s*(\w.+)/ ) { do something with $1... }

-Christian
Thanks to everyone for helping me shake out the cobwebs; I guess I should have had another cup of coffee before posting
this morning!!

I modified it so it checks for the match, then does the substitution if the match succeeds (duh!).

Thanks again . . . .

Dan
 
U

Uri Guttman

DR> I modified it so it checks for the match, then does the
DR> substitution if the match succeeds (duh!).

that sounds redundant. just do the s/// and check if it succeeds. it if
doesn't your original string is untouched. i have seen newbie code that
looks like:

if ( /match/ ) {

s/match/replace/ ;
}

which is the same as just the s/// by itself.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,019
Latest member
RoxannaSta

Latest Threads

Top