Regex gurus question

J

Joe Cosby

I have a string which will always contain a letter followed by
numbers, eg "x12345"

I want to take the numbers and put them in another variable.

So I do this:

$test = "x12345";

$_ = $test;
m/(\d+)/;
$justTheNumbers = $1;

print "justTheNumbers is $justTheNumbers\n";


and that works, $justTheNumbers is "12345", which is what I want.

The three-line way I do it seems retarded to me though. Is there a
simpler way to do it?

I mean, something that would occur more or less naturally to somebody
more familiar with Perl and Regex? I suppose there's always some
simpler way to do anything ... it seems like I must be missing
something obvious though.



--
Joe Cosby
http://joecosby.com/
YOU can be more like "Bob" than you are now!

http://www.subgenius.com
 
G

Gunnar Hjalmarsson

Joe said:
I have a string which will always contain a letter followed by
numbers, eg "x12345"

I want to take the numbers and put them in another variable.

So I do this:

$test = "x12345";

$_ = $test;
m/(\d+)/;
$justTheNumbers = $1;

print "justTheNumbers is $justTheNumbers\n";

and that works, $justTheNumbers is "12345", which is what I want.

The three-line way I do it seems retarded to me though. Is there a
simpler way to do it?

You can do it in one step:

($justTheNumbers) = $test =~ /(\d+)/;

Or, if the only thing you need to do is cutting off the first character:

$justTheNumbers = substr $test, 1;
 
T

Tad McClellan

Joe Cosby said:
Subject: Regex gurus question
^^^^^

Many people will take offense at your attempt to trick them into
reading your article.

Some people will never see your post because so many others
have tricked them by claiming the need a guru when all they
need is straight-forward code that they have killfiled all
articles that contain "guru".

Have you seen the Posting Guidelines that are posted here frequently?

I want to take the numbers and put them in another variable.


You can do that by putting the m// in list context, read up on
the m// operator in

perldoc perlop

m/(\d+)/;
$justTheNumbers = $1;


You should never use the dollar-digit variables unless you have
first ensured that the match *succeeded*.

if ( m/(\d+)/ ) {
$justTheNumbers = $1;
}
 
E

Eric Bohlman

I have a string which will always contain a letter followed by
numbers, eg "x12345"

s/will/is supposed to/;

Good programmers never say "always" or "never." hehehe. Unless you do
only trivial programming, Mr. Murphy is going to pay you several visits.
You need to get in the habit of making your habitat unattractive to him.
I want to take the numbers and put them in another variable.

So I do this:

$test = "x12345";

$_ = $test;
m/(\d+)/;
$justTheNumbers = $1;

The big problem here (other than the awkwardness of the code) is that if
Mr. Murphy does show up at your door, $justTheNumbers will contain a value
that might *look* OK, but really isn't. For example, if you got a string
with *no* numbers in it, $justTheNumbers would get the numbers from the
last good string instead. If you got a string like "x12a345" it would get
"12" which is misleading.

A general rule is that if you're using capturing parentheses, do *not* make
use of the $digit variables unless you've actually tested to make sure the
match succeeded. Note that a list assignment from a match, as in Gunnar's
first solution, takes care of this; if the match fails, the result will be
undef rather than junk.

The philosophy of "defensive programming" suggests that you should write

($justTheNumbers) = $test =~ /^[[:alpha:]](/d+)$/ or die "unexpected format
in \$test: [$test]";

It may *look* like a lot of extra effort, but scores of programmers have
found that the few extra minutes of coding that such techniques entail
saved them many *hours* of time wasted tracking down subtle bugs.

I'll admit, though, that Gunnar's second solution (sloppier, but at least
it will result in an empty string if there are no digits) was the first
thing that came to my mind.
 
J

Joe Cosby

You can do it in one step:

($justTheNumbers) = $test =~ /(\d+)/;

Or, if the only thing you need to do is cutting off the first character:

$justTheNumbers = substr $test, 1;

Thanks, much appreciated.
 
T

Tore Aursand

I have a string which will always contain a letter followed by
numbers, eg "x12345"

I want to take the numbers and put them in another variable.

So I do this:

$test = "x12345";

$_ = $test;
m/(\d+)/;
$justTheNumbers = $1;

print "justTheNumbers is $justTheNumbers\n";

and that works, $justTheNumbers is "12345", which is what I want.

The three-line way I do it seems retarded to me though. Is there a
simpler way to do it?

You could easily write what you have written in only one line;

( $justTheNumbers ) = $test =~ m/(\d+)/;

However, why use a regular expression at all? If you are _sure_ that the
string always will begin with only one character, you could use 'substr';

$justTheNumbers = substr( $test, 1 );
 
S

Shawn Corey

Eric said:
The philosophy of "defensive programming" suggests that you should write

($justTheNumbers) = $test =~ /^[[:alpha:]](/d+)$/ or die "unexpected format
in \$test: [$test]";

It may *look* like a lot of extra effort, but scores of programmers have
found that the few extra minutes of coding that such techniques entail
saved them many *hours* of time wasted tracking down subtle bugs.

I guess I'm a sloppy programmer. I would have written it as:

( my $just_digits = $test ) =~ s/\D//g;

I would put the test for correct format, or for any input validation
immediately after the input is received. Extra validation during
processing then becomes redundant. Having said that, validation for such
things as range will have to be done after this statement. So there is
an exception to every guideline.

--- Shawn
 
J

John W. Krahn

Shawn said:
Eric said:
The philosophy of "defensive programming" suggests that you should write

($justTheNumbers) = $test =~ /^[[:alpha:]](/d+)$/ or die "unexpected
format in \$test: [$test]";

It may *look* like a lot of extra effort, but scores of programmers
have found that the few extra minutes of coding that such techniques
entail saved them many *hours* of time wasted tracking down subtle bugs.

I guess I'm a sloppy programmer. I would have written it as:

( my $just_digits = $test ) =~ s/\D//g;

That will work fine if the OP's actual data is the same as the example he
presented ($test = "x12345";) however if the actual data looks something like
"x12345 y12345 z12345" then that will not produce a correct result.


John
 
S

Shawn Corey

John said:
That will work fine if the OP's actual data is the same as the example
he presented ($test = "x12345";) however if the actual data looks
something like "x12345 y12345 z12345" then that will not produce a
correct result.


John

But "x12345a" could also create problems. As I said, input validation
would be done before processing. By this point in the program, $test
will have valid data.

--- Shawn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,571
Members
45,045
Latest member
DRCM

Latest Threads

Top