using the result of a variable regular expression

L

leifwessman

Hi!

I need to extract a certain value from a text. But the result isn't
always in the variable $1 - it might be in $2, $3, $4 or some other
predefined variable.

Some code to illustrate my problem:

$regexp = "(\d)(\w)(\d)";
$numb = 3; # Means the result I'm looking for is in $3
# I don't know this number, it's submitted
by user
# and may differ

if ($data =~ /$regexp/) {

print $numb; # does not work, prints "3"

# alternative solution that works
# but it's UGLY
if ($numb == 1) {
print $1;
} elsif ($numb == 2) {
print $2;
} elsif ($numb == 3) {
print $3;
}

# is there another way?
}

Thanks for any input!

Leif
 
B

Brian McCauley

I need to extract a certain value from a text. But the result isn't
always in the variable $1 - it might be in $2, $3, $4 or some other
predefined variable.

Some code to illustrate my problem:

$regexp = "(\d)(\w)(\d)";
$numb = 3; # Means the result I'm looking for is in $3
# I don't know this number, it's submitted
by user
# and may differ

if ($data =~ /$regexp/) {

print $numb; # does not work, prints "3"

What you are trying to do is use something called a symbolic ref:

print $$numb; # works - print value of $3

But you have to be careful using symrefs...

{
# Untaint and check $numb - don't clobber $1 etc
die 'Not a number' unless do { ($numb) = $numb =~ /(^\d+$)/ };
no strict 'refs';
print $$numb;
}

That said I wouldn't use one myself because I never use $1 etc (other
than in the RHS of s/// or in while(//g).

if (my @captures = $data =~ /$regexp/) {
print $captures[$numb-1];
}
 
G

Gunnar Hjalmarsson

I need to extract a certain value from a text. But the result isn't
always in the variable $1 - it might be in $2, $3, $4 or some other
predefined variable.

Some code to illustrate my problem:

Your problem starts before that code: You have not enabled strictures
and warnings!

use strict;
use warnings;
$regexp = "(\d)(\w)(\d)";

There is your second problem: $regex get the value '(d)(w)(d)', which
is not what you want.

my $regexp = '(\d)(\w)(\d)';
-----------------^------------^

1) Please copy and paste code that you post, do not retype it!

2) Warnings would have told you that something was wrong.
$numb = 3; # Means the result I'm looking for is in $3
# I don't know this number, it's submitted by user
# and may differ

if ($data =~ /$regexp/) {

print $numb; # does not work, prints "3"

# alternative solution that works
# but it's UGLY
if ($numb == 1) {
print $1;
} elsif ($numb == 2) {
print $2;
} elsif ($numb == 3) {
print $3;
}

# is there another way?
}

You can do:

if ( my @capt = $data =~ /$regexp/ ) {
print $capt[$numb-1];
}
 
A

Anno Siegel

Gunnar Hjalmarsson said:
I need to extract a certain value from a text. But the result isn't
always in the variable $1 - it might be in $2, $3, $4 or some other
predefined variable.

Some code to illustrate my problem:

Your problem starts before that code: You have not enabled strictures
and warnings!

use strict;
use warnings;
$regexp = "(\d)(\w)(\d)";

There is your second problem: $regex get the value '(d)(w)(d)', which
is not what you want.

my $regexp = '(\d)(\w)(\d)';
-----------------^------------^

1) Please copy and paste code that you post, do not retype it!

2) Warnings would have told you that something was wrong.
$numb = 3; # Means the result I'm looking for is in $3
# I don't know this number, it's submitted by user
# and may differ

if ($data =~ /$regexp/) {

print $numb; # does not work, prints "3"

# alternative solution that works
# but it's UGLY
if ($numb == 1) {
print $1;
} elsif ($numb == 2) {
print $2;
} elsif ($numb == 3) {
print $3;
}

# is there another way?
}

You can do:

if ( my @capt = $data =~ /$regexp/ ) {
print $capt[$numb-1];
}

Or, without an auxiliary variable:

defined and print for ( $data =~ /$regex/ )[ $numb - 1];

Anno
 
A

Anno Siegel

Gunnar Hjalmarsson said:
(e-mail address removed) wrote:
You can do:

if ( my @capt = $data =~ /$regexp/ ) {
print $capt[$numb-1];
}

Or, without an auxiliary variable:

defined and print for ( $data =~ /$regex/ )[ $numb - 1];

Anno
 
T

Tore Aursand

Some code to illustrate my problem:
[...]

Your code won't run. Please copy-and-paste working code, instead of
retyping it.

You should also add these:

use strict;
use warnings;
$regexp = "(\d)(\w)(\d)";
$numb = 3; # Means the result I'm looking for is in $3
# I don't know this number, it's submitted
by user
# and may differ

if ($data =~ /$regexp/) {

print $numb; # does not work, prints "3"

# alternative solution that works
# but it's UGLY
if ($numb == 1) {
print $1;
} elsif ($numb == 2) {
print $2;
} elsif ($numb == 3) {
print $3;
}

# is there another way?
}

You can match into an array;

if ( my @match = $data =~ /$regexp/ ) {
print @match[$numb-1];
}
 
J

John W. Krahn

I need to extract a certain value from a text. But the result isn't
always in the variable $1 - it might be in $2, $3, $4 or some other
predefined variable.

Some code to illustrate my problem:

$regexp = "(\d)(\w)(\d)";
$numb = 3; # Means the result I'm looking for is in $3
# I don't know this number, it's submitted
by user
# and may differ

if ($data =~ /$regexp/) {

print $numb; # does not work, prints "3"

# alternative solution that works
# but it's UGLY
if ($numb == 1) {
print $1;
} elsif ($numb == 2) {
print $2;
} elsif ($numb == 3) {
print $3;
}

# is there another way?
}

You are extracting single characters. How about substr()?

print substr( $data, $numb - 1, 1 )


Why not define your regexp based on the submitted value?

my @fields = ( '\d', '\w', '\d' );
$fields[ $numb - 1 ] = '(' . $fields[ $numb - 1 ] . ')';
my $regexp = join '', @fields;
if ( $data =~ /$regexp/ ) {
print $1;
}


Or you could use the @+ and @- arrays:

my $regexp = '(\d)(\w)(\d)';
if ( $data =~ /$regexp/ ) {
print substr( $data, $-[ $numb ], $+[ $numb ] - $-[ $numb ] );
}



John
 
B

Brian McCauley

Anno said:
Gunnar Hjalmarsson said:
You can do:

if ( my @capt = $data =~ /$regexp/ ) {
print $capt[$numb-1];
}


Or, without an auxiliary variable:

defined and print for ( $data =~ /$regex/ )[ $numb - 1];

In this particular case it is probably safe to assume that we want to
treat the case where the ${numb}th capture didn't capture to be
equivalent to the case where /$regex/ didn't match.

However it is important to be aware that you are making such an assumption.
 
A

Anno Siegel

Brian McCauley said:
Anno said:
Gunnar Hjalmarsson said:
You can do:

if ( my @capt = $data =~ /$regexp/ ) {
print $capt[$numb-1];
}


Or, without an auxiliary variable:

defined and print for ( $data =~ /$regex/ )[ $numb - 1];

In this particular case it is probably safe to assume that we want to
treat the case where the ${numb}th capture didn't capture to be
equivalent to the case where /$regex/ didn't match.

However it is important to be aware that you are making such an assumption.

You are right. I thought about explaining how it is okay (under this
assumption) to use the regex without explicitly checking if it matched,
but decided to let it slip. Thanks for pointing it out.

Anno
 
S

Sara

Hi!

I need to extract a certain value from a text. But the result isn't
always in the variable $1 - it might be in $2, $3, $4 or some other
predefined variable.

Some code to illustrate my problem:

$regexp = "(\d)(\w)(\d)";
$numb = 3; # Means the result I'm looking for is in $3
# I don't know this number, it's submitted
by user
# and may differ

if ($data =~ /$regexp/) {

print $numb; # does not work, prints "3"

# alternative solution that works
# but it's UGLY
if ($numb == 1) {
print $1;
} elsif ($numb == 2) {
print $2;
} elsif ($numb == 3) {
print $3;
}

# is there another way?
}

Thanks for any input!

Leif

Hi there Leif:

Interesting question. As pointed out, $$numb will work nicely for you.
The odd thing being that this LOOKS like a scalar dereference, which
it really isn't since 2 isn't the memory location of the value. Seems
like there is an ambiguity in there somewhere but I can't pinpoint it.

Thanks for posting.

G
 
B

Brian McCauley

Following said:
Interesting question. As pointed out, $$numb will work nicely for you.

For certain values of "nice".
The odd thing being that this LOOKS like a scalar dereference, which
it really isn't

Yes it is. It's a scalar dereference of a _symbolic_ reference.
since 2 isn't the memory location of the value.

If it were a _hard_ scalar reference then its numeric value would be the
address in memory.
Seems like there is an ambiguity in there somewhere but I can't pinpoint it.

No ambiguity. Perl's scalar values can contain either ordinary
strings/numbers or they can contain hard references. It is possible to
convert a hard reference[1] into an address in memory simply by using it
in a numeric context. It is not possible to go the other way[4]. If
you use a non-reference in a reference context then it will never be
treated as a memory address - it will be converted into a string and
looked up in the symbol table - i.e. it will be a symbolic reference.
Of course most of the time one has "strict qw(refs)" in effect which
causes symbolic references to be diallowed except in a few special
cases[2].

[1] (other than one to an overloaded type object)
[2] To do with symbolic CODErefs[3].
[3] And due to a bug any symrefs resolved at compile-time.
[4] In Perl - you can of course do anything you want by dropping down
into C.
 
B

Brian McCauley

bowsayge said:
Brian McCauley said to us:

For certain values of "nice".

[...]

Bowsayge hopes that this is a better value of "nice":

'8 t 4' =~ /(\d) (\w) (\d)/;
my $numb = 3;
print "matched: ", eval("\$$numb"), "\n";

There is no need to enable sym-refs.

All the reasons to avoid symrefs bad also apply to eval(STRING), only
more so.
 
A

Anno Siegel

bowsayge said:
Brian McCauley said to us:
For certain values of "nice".
[...]

Bowsayge hopes that this is a better value of "nice":

Not really.
'8 t 4' =~ /(\d) (\w) (\d)/;
my $numb = 3;
print "matched: ", eval("\$$numb"), "\n";

There is no need to enable sym-refs.

Sure. You can re-write any symref unsing eval like that, so string
eval is the more general mechanism. It also allows Perl to break its own
rules in more ways than mere symrefs do, so it's higher in the hierarchy
of nastiness, not lower.

It is also ugly because it's disproportionate, in the way it would be
ugly to start a sawmill to make a toothpick from a twig. You are running
another Perl interpreter to interpret a program that reads "$1" or "$5" or
something.

That said, your solution is, of course, perfectly valid. The symref
solution needs to unexpectedly talk about "strict", and may need a
bare block to limit the effect. So "eval" is shorter and more to the
point, and it's arguably as readable. Since the string you eval is
entirely defined in the program text (as opposed to an external source),
there is no additional risk in "eval".

But "nicer", no.

Anno
 
B

Ben Morrow

Quoth Brian McCauley said:
It is possible to
convert a hard reference[1] into an address in memory simply by using it
in a numeric context. It is not possible to go the other way[4]. If

[1] NMF
[4] In Perl - you can of course do anything you want by dropping down
into C.

You don't need C: unpack 'P' will work nicely... :)

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top