noob question: Trying to extract part of a string in a variable to another variable

C

cayenne

Hello all,
I'm a perl noob...and just can't quite figure out how to do something
that should be pretty simple.

Here's an example.

I have $mail_address = 'fred jones <[email protected]>'

I want to use regular expressions to just parse out the userid here of
fred_jones

I'm trying things like this:

$mail_address =~ /\w+@/;

But, doesn't seem to work. I'm a little hazy on exactly how the =~
works...through examples I've successfully used it for substitutions
like x =~ s/tom/joe/g; but, I'm just wanting to match a regular
expression and extract it to the variable...or even to another
variable leaving $mail_address unchanged.

I've looked in books at the substr() function, but, I don't know how
to use regular expressions to find the offset point, etc.

Can someone give me an example...or pointers to a good reference on
this type of thing?

Thanks in advance,

chilecayenne
 
G

gnari

cayenne said:
I'm trying things like this:

$mail_address =~ /\w+@/;

But, doesn't seem to work.

'doesn't seem to work' does not tell us anything
except that you expected it to do something other
than what it does. many of us have negligent PSI
powers, so it helps us not a lot.

on the other hand, maybe what you want is:

my ($id)= $mail_address =~ /(\w+)@/;
I've looked in books at the substr() function, but, I don't know how
to use regular expressions to find the offset point, etc.
Can someone give me an example...or pointers to a good reference on
this type of thing?


take a look at the perl documentation:
perldoc perlop
perldoc perlre

gnari
 
J

Jürgen Exner

cayenne said:
Here's an example.

I have $mail_address = 'fred jones <[email protected]>'

I want to use regular expressions to just parse out the userid here of
fred_jones

I'm trying things like this:

$mail_address =~ /\w+@/;

But, doesn't seem to work.

Please define "doesn't seem to work". What exactly do you expect that
statement to do and what do you observe instead? Like, what do you mean by
"parse out"? Do you want to remove the userid from the string? Or do you
want to capture the userid in a different variable?
I'm a little hazy on exactly how the =~
works...

It is the binding operator. If used the substitute or match will be applied
to the variable on it's left side instead of to the default $_.
through examples I've successfully used it for substitutions
like x =~ s/tom/joe/g; but, I'm just wanting to match a regular
expression and extract it to the variable...or even to another
variable leaving $mail_address unchanged.

Well, Perl regular expressions do that automatically. Just use grouping:

my $mail_address = 'fred jones <[email protected]>';
$mail_address =~ /(\w+)@/;
print $1;

Further details "perldoc perlretut" or for the advanced part "perldoc
perlre"

However, I hope you are aware that '\w' does not even begin to cover the
full set of possible email aliases.
Please see "perldoc -q valid", third paragraph for further information.
I've looked in books at the substr() function, but, I don't know how
to use regular expressions to find the offset point, etc.

You don't. You would use index() to find the position of a character or
string in a text.

jue
 
B

Bob Walton

cayenne wrote:

....

I have $mail_address = 'fred jones <[email protected]>'

I want to use regular expressions to just parse out the userid here of
fred_jones ....


Can someone give me an example...or pointers to a good reference on
this type of thing? ....
chilecayenne

Try:

my($userid)=$mail_address=~/(\w+)@/;

References:

perldoc perlre
perldoc perlretut
perldoc perlop

The books: "Learning Perl (3rd edition)", "Programming Perl (3rd
edition)" and "Mastering Regular Expressions (2nd edition)".

Online: learn.perl.org, www.perl.com, www.perldoc.com
 
M

Milo Minderbinder

cayenne said:
Hello all,
I'm a perl noob...and just can't quite figure out how to do something
that should be pretty simple.

Here's an example.

I have $mail_address = 'fred jones <[email protected]>'

I want to use regular expressions to just parse out the userid here of
fred_jones

I'm trying things like this:

$mail_address =~ /\w+@/;

But, doesn't seem to work. I'm a little hazy on exactly how the =~
works...through examples I've successfully used it for substitutions
like x =~ s/tom/joe/g; but, I'm just wanting to match a regular
expression and extract it to the variable...or even to another
variable leaving $mail_address unchanged.

I've looked in books at the substr() function, but, I don't know how
to use regular expressions to find the offset point, etc.

Can someone give me an example...or pointers to a good reference on
this type of thing?

Thanks in advance,

chilecayenne

Hi,

you have to mark the part you want to get.

$mail_address =~ m/(\w+?)@/;
$name = $1;

Take brackets to mark what you want. You will find the result in $1. If
you specify more then one part, you will find the second hit in $2. The
questionsign within the brackets avoids, that you get as much as
possible into your result (if there two or more @).
Other way to get results is:

my @result = $mail_address =~ m/(\w+?)@/;

In $result[0] you will find then name.

Milo
 
W

Web Surfer

[This followup was posted to comp.lang.perl.misc]

Hello all,
I'm a perl noob...and just can't quite figure out how to do something
that should be pretty simple.

Here's an example.

I have $mail_address = 'fred jones <[email protected]>'

I want to use regular expressions to just parse out the userid here of
fred_jones

I'm trying things like this:

$mail_address =~ /\w+@/;

But, doesn't seem to work. I'm a little hazy on exactly how the =~
works...through examples I've successfully used it for substitutions
like x =~ s/tom/joe/g; but, I'm just wanting to match a regular
expression and extract it to the variable...or even to another
variable leaving $mail_address unchanged.

I've looked in books at the substr() function, but, I don't know how
to use regular expressions to find the offset point, etc.

Can someone give me an example...or pointers to a good reference on
this type of thing?

Thanks in advance,

chilecayenne


#!/usr/bin/perl -w

use strict;

my ( $mail_address , $userid );

$mail_address = 'fred jones <[email protected]>';
$mail_address =~ /(\w+)@/;

$userid = $1;

print "Userid = [$userid]\n";

exit 0;
 
T

Tad McClellan

Jürgen Exner said:
Just use grouping:

my $mail_address = 'fred jones <[email protected]>';
$mail_address =~ /(\w+)@/;
print $1;


But don't use it like that!

You should never use the dollar-digit variables without first ensuring
that the match *succeeded*.

if ( $mail_address =~ /(\w+)@/ ) {
print $1;
}
 
T

Tad McClellan

[ snip full-quote, please don't do that]
you have to mark the part you want to get.

$mail_address =~ m/(\w+?)@/;
$name = $1;

Take brackets to mark what you want. You will find the result in $1.
^^^^
^^^^

No, you *might* find the result in $1.

If you've tested that the match *succeeded*,
_then_ you will find the result in $1.
 
S

Sherm Pendley

Robin said:
Regular expressions are not the right way to find the offset unless you
want to use $1 an $2 and $3...etc, and then use index, it still isn't an
optimal way to find the offset point.

Darn right it's not. If your pattern has subexpressions, then on a match the
offset of each subexpression appears in the @- array. That is, the offset
of $1 is in $-[0], $2 is in $-[1], and so forth.

Note that offsets, no matter how they're found, are irrelevant to the
original question anyway. All he wanted was the value of the matched
substring, not its position. He was thinking he might need to offset to get
the substring, but he was barking in the wrong forest with that idea.

So tell me Robin, when are you going to stop posting nonsense answers to
questions you don't understand?

sherm--
 
R

Robin

cayenne said:
Hello all,
I'm a perl noob...and just can't quite figure out how to do something
that should be pretty simple.

Here's an example.

I have $mail_address = 'fred jones <[email protected]>'

I want to use regular expressions to just parse out the userid here of
fred_jones

I'm trying things like this:

$mail_address =~ /\w+@/;

But, doesn't seem to work. I'm a little hazy on exactly how the =~
works...through examples I've successfully used it for substitutions
like x =~ s/tom/joe/g; but, I'm just wanting to match a regular
expression and extract it to the variable...or even to another
variable leaving $mail_address unchanged.

I've looked in books at the substr() function, but, I don't know how
to use regular expressions to find the offset point, etc.

Can someone give me an example...or pointers to a good reference on
this type of thing?

Thanks in advance,

chilecayenne

Regular expressions are not the right way to find the offset unless you want
to use $1 an $2 and $3...etc, and then use index, it still isn't an optimal
way to find the offset point. Just change up your regular expression looks
like the other code, man I'm so tired.
-Robin
 
J

Joe Smith

Sherm said:
If your pattern has subexpressions, then on a match the
offset of each subexpression appears in the @- array. That is, the offset
of $1 is in $-[0], $2 is in $-[1], and so forth.

Incorrect. The offset of $& is in $-[0], the offset of $1 is in $-[1], etc.
-Joe
 
A

Anno Siegel

Jürgen Exner said:
cayenne wrote:
[...]
I've looked in books at the substr() function, but, I don't know how
to use regular expressions to find the offset point, etc.

You don't.

Ah, but you do, though not in this case. The @- and @+ arrays are
there to support it.

Anno
 
R

Richard Morse

I have $mail_address = 'fred jones <[email protected]>'

I want to use regular expressions to just parse out the userid here of
fred_jones

I'm trying things like this:

$mail_address =~ /\w+@/;

What you seem to be asking for is this:

my ($user_id) = ($mail_address =~ m/(\w+)@/);

However, please note that \w doesn't really have the complete set of
valid characters to prefix the '@' sign in an email address.

Just off the top of my head, I know that '.', '-', '?', '=', and more
are valid. Possibly any unicode character other than whitespace and '@'
are valid. It might even be valid to have '<' in an email address.

At the very least, you probably want

my ($user_id) = ($mail_address =~ m/([\w.-+=]+)@/);

HTH,
Ricky
 
G

Glenn Jackman

Richard Morse said:
At the very least, you probably want

my ($user_id) = ($mail_address =~ m/([\w.-+=]+)@/);


Be careful where you use '-' inside a range:
Invalid [] range ".-+" before HERE mark in regex m/([\w.-+ << HERE =]+)@/

Put the hyphen last: [\w.+=-]
 
C

cayenne

Richard Morse said:
I have $mail_address = 'fred jones <[email protected]>'

I want to use regular expressions to just parse out the userid here of
fred_jones

I'm trying things like this:

$mail_address =~ /\w+@/;

What you seem to be asking for is this:

my ($user_id) = ($mail_address =~ m/(\w+)@/);

However, please note that \w doesn't really have the complete set of
valid characters to prefix the '@' sign in an email address.

Just off the top of my head, I know that '.', '-', '?', '=', and more
are valid. Possibly any unicode character other than whitespace and '@'
are valid. It might even be valid to have '<' in an email address.

At the very least, you probably want

my ($user_id) = ($mail_address =~ m/([\w.-+=]+)@/);

HTH,
Ricky

Just quickly, can you explain the extensive use of parens here? I
understand the () in the regular expression, to keep those parts the
match...but, what is the function of the () around $user_id and the
entire part after the = sign?

Thanks in advance,

CC
 
R

Richard Morse

my ($user_id) = ($mail_address =~ m/([\w.-+=]+)@/);

Just quickly, can you explain the extensive use of parens here? I
understand the () in the regular expression, to keep those parts the
match...but, what is the function of the () around $user_id and the
entire part after the = sign?

Parens around $user_id force the match to happen in a list context. A
match in a scalar context would return the number of matches, while in a
list context, it returns the various matches.

my $user_id = ($mail_address =~ m/.../)

would have $user_id be the value 1 (because there is one match, as it
isn't a /g match).

The parens around the match are there because it makes it easier for me
to read it. I've never not put them there, although a quick test I just
did seems to indicate that they aren't necessary.

HTH,
Ricky
 
P

Paul Lalli

Richard Morse said:
my ($user_id) = ($mail_address =~ m/([\w.-+=]+)@/);
Just quickly, can you explain the extensive use of parens here? I
understand the () in the regular expression, to keep those parts the
match...but, what is the function of the () around $user_id and the
entire part after the = sign?

The parens around $user_id force the binding operation of =~ to be
evaluated in list context. This is done because a pattern match in list
context returns a list of all of the captured matches (ie, the things that
go into $1, $2, etc). This is a shorthand way of writing the two
statements:

$mail_address =~ m/([\w.-+=]+)@/
my $user_id = $1;

The parens around the whole pattern match here are actually unnecessary.
This is because the =~ operator has a higher precedence than the =
operator. They are likely used here just for clarity, to make sure the
readers of the code are aware that ($user_id) is being assigned to the
return value of the pattern match, rather than the alternate
interpretation of the assignment of $user_id to $mail_address being
pattern matched against the pattern (which would be written like so:
(my $user_id = $mail_address) =~ m/([\w.-+=]+)@/;

Please let me know if this is not clear enough.

Paul Lalli
 
J

John W. Krahn

Paul said:
Richard Morse said:
my ($user_id) = ($mail_address =~ m/([\w.-+=]+)@/);

Just quickly, can you explain the extensive use of parens here? I
understand the () in the regular expression, to keep those parts the
match...but, what is the function of the () around $user_id and the
entire part after the = sign?

The parens around $user_id force the binding operation of =~ to be
evaluated in list context. This is done because a pattern match in list
context returns a list of all of the captured matches (ie, the things that
go into $1, $2, etc). This is a shorthand way of writing the two
statements:

$mail_address =~ m/([\w.-+=]+)@/
my $user_id = $1;

They are not the same at all. If the match fails the first will set
$user_id to undef but your version will set $user_id to the contents of
a previously successful match's capturing parentheses or ''.




John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,733
Messages
2,569,440
Members
44,830
Latest member
ZADIva7383

Latest Threads

Top