/^From:.*?([\w.-]+@[\w.-]+)/

April · Nov 2, 2008

We know that you are new to Perl. What people want to see is some
indication that you are putting out some effort yourself, and that you
don't expect us to spoon feed it to you. If the answer to your question
is in the docs, hopefully you have made some effort to find the answer
there.

thanks .. I have this piece program was not working and what I was
expecting is kind of reassurance so I can rule out certain things and
focus on the others.

In my original post I was trying to ask specific questions, and then
in following posts with something for reassurance, which is important
as it gives me confidance.

I'm not expecting lengthy responses (though appreciate them), but just
some quick confirmation and short explanation, direction would be
plus.

Tad J McClellan · Nov 2, 2008

Scott Bryce said:
The regulars here expect a certain posting style because it communicates
effectively in this medium.

What the regulars expect is that you will quote enough to provide
necessary background, and that YOU will skip the unnecessary parts by
trimming them from your post.

Which illustrates the effectiveness mentioned above.

Q: How many people spend time on the unnecessary parts if the
poster trims them?
A: One

Q: How many people spend time on the unnecessary parts if the
poster leaves them in?
A: Dozens or even hundreds.

Shifting the work from one person to hundreds of other people will
likely annoy hundreds of other people.

Jürgen Exner · Nov 2, 2008

April said:
April said:

if test: elsif ($header && /^From:.*?([\w.-]+@[\w.-]+)/)

Click to expand...

as && requires the matching needs to be done in the header ($header)

Click to expand...

Click to expand...

what I meant was as $header also needs to be evaluated true (&&), so

And this is exactly the main problem that I personally have with your
postings. Computers are the most notorius nitpicking bastards you can
possibly imagine. One character in the wrong place and your program may
do something completely different from what you intented it to do.
Very often you articles and questions are vague and ambigious or at the
very least confusing. Please, if you want help then at least try to be
as exact as possible.

Above you wrote verbatim "the matching needs to be done in the header".
So Scott correctly pointed out that code doesn't do that, instead it
matches in $_. And now you are changing your story and saying that's not
what you meant. Well, thank you for leading Scott and everyone else on a
wild goose chase.

Now, having said that, things like this happen to everyone every now and
then. However in your case it happens very, very often. And that makes
it very frustrating for your audience.
..

this test must be done in the header section which is what $header
stands for.

And that is the second main problem I personally have with your
postings: they information you provide is incomplete.
How are we supposed to know what $header stands for in your program? By
all typical programming patterns and naming conventions it's most likely
a string contain the (or a) header from whatever your program is
processing. There is no way to guess that you are using it as some sort
of flag.
We do not, cannot, and don't want to know all of your program. But those
parts that you post need to be self-contained, such that other people
can understand what is going on. There is a very good reason why the
posting guidelines strongly suggest a minimal self-contained program
instead of an incomplete excerpt from a larger program.

jue

April · Nov 2, 2008

April said:
April said:

April wrote:
if test: elsif ($header && /^From:.*?([\w.-]+@[\w.-]+)/)
as && requires the matching needs to be done in the header ($header)

Click to expand...

Click to expand...

what I meant was as $header also needs to be evaluated true (&&), so

Click to expand...

And this is exactly the main problem that I personally have with your
postings. Computers are the most notorius nitpicking bastards you can
possibly imagine. One character in the wrong place and your program may
do something completely different from what you intented it to do.
Very often you articles and questions are vague and ambigious or at the
very least confusing. Please, if you want help then at least try to be
as exact as possible.

Above you wrote verbatim "the matching needs to be done in the header".
So Scott correctly pointed out that code doesn't do that, instead it
matches in $_. And now you are changing your story and saying that's not
what you meant. Well, thank you for leading Scott and everyone else on a
wild goose chase.

Now, having said that, things like this happen to everyone every now and
then. However in your case it happens very, very often. And that makes
it very frustrating for your audience.
.

this test must be done in the header section which is what $header
stands for.

Click to expand...

And that is the second main problem I personally have with your
postings: they information you provide is incomplete.
How are we supposed to know what $header stands for in your program? By
all typical programming patterns and naming conventions it's most likely
a string contain the (or a) header from whatever your program is
processing. There is no way to guess that you are using it as some sort
of flag.
We do not, cannot, and don't want to know all of your program. But those
parts that you post need to be self-contained, such that other people
can understand what is going on. There is a very good reason why the
posting guidelines strongly suggest a minimal self-contained program
instead of an incomplete excerpt from a larger program.

jue

I'm not changing my story at all .. the reason I mentioned the header
part is actually to make the line I mentioned self-contained:

"The reason I wasn't sure is that the following cannot be picked up
by
an if test:

From (e-mail address removed) Tue Apr 24 11:02:41 2002

if test: elsif ($header && /^From:.*?([\w.-]+@[\w.-]+)/)

I modified From: to From in the above and also checked still in the
header secton."

I still don't understand where your accusations come from, why you are
so upset, and what exactly you are expecting ..

Jürgen Exner · Nov 2, 2008

April said:
I still don't understand where your accusations come from, why you are
so upset, and what exactly you are expecting ..

I guess our communication is running on very very different frequencies.
I recognize and appreciate that you are _NOT_ acting like a troll or
some of the morons we had here over the years. I believe you are a
decend person and are trying to fit in.

However, I also realize that our communication styles are vastly
different and incompatible. I just can't make much sense of your
postings, sorry, my weakness. But because of this there isn't much
benefit to either of us if I continue to read your postings, therefore I
will stop doing so.

Good luck

jue

Tad J McClellan · Nov 2, 2008

April said:
what exactly you are expecting ..

A short and complete program *that we can run*.

No full-quoting of articles.

No quoting of .sigs.

Have you seen the Posting Guidelines that are posted here frequently?

They spell out in quite some detail what we are expecting.

April · Nov 2, 2008

A short and complete program *that we can run*.

No full-quoting of articles.

No quoting of .sigs.

Have you seen the Posting Guidelines that are posted here frequently?

this is very helpful ..

sln · Nov 3, 2008

[snip]

Thanks Tim, Tad and Jue .. now it's much clear to me! One thing left
is whether .*? simply means anything appears before (...)? April

Since you are asking this question, it is not clear to you at all April.

Look at that expression. '[\w.-]+@' is a hard anchor. That must be satisfied
first, especially the '@'.

The fact is '[\w.-]+' can be satisfied with a single character.
The other fact is '.*?' can get by with one character but it is in the bottom
of precedence.
However '.*' wants to take as much of the string as possible, but it is
subserviant to '[\w.-]+@'.

Here is the heirchy from top down:

1 - '@' is GOD
2 - '[\w.-]+' is CHRIST
3 - '.*' is the greedy HOLY GHOST
4 - '.*?' is the single ANGEL

use strict;
use warnings;

my $email = "From: -2ame\@yahoo.com";

if ($email =~ /^From:.*?([\w.-]+@[\w.-]+)/)
{
print "$1\n";
}
if ($email =~ /^From:.*([\w.-]+@[\w.-]+)/)
{
print "$1\n";
}
__END__

church services produce:

-2ame\@yahoo.com
e\@yahoo.com

Please notice the regexp distinctions that produced these results.
As well, notice that '.*?' was put in as a trap to filter out exterraneous
characters that are not alpha numeric.

Let us pray.
sln

April · Nov 3, 2008

Since you are asking this question, it is not clear to you at all April.

you really know me, however with your inspiration, I'm pretty sure
I'll be getting better sooner.

Look at that expression. '[\w.-]+@' is a hard anchor. That must be satisfied
first, especially the '@'.

not sure I agree with this and the following Church ranking thing ...

The fact is '[\w.-]+' can be satisfied with a single character.
agree.

The other fact is '.*?' can get by with one character but it is in the bottom
of precedence.

believe '.*?' can get by with 0 character too.

However '.*' wants to take as much of the string as possible,

agree, but will still check to see whether that will allow the
following [\w.-] to be satisfied.

Here is the heirchy from top down:

1 - '@' is GOD
2 - '[\w.-]+' is CHRIST
3 - '.*' is the greedy HOLY GHOST
4 - '.*?' is the single ANGEL

use strict;
use warnings;

my $email = "From: -2ame\@yahoo.com";

if ($email =~ /^From:.*?([\w.-]+@[\w.-]+)/)
{
print "$1\n";}

if ($email =~ /^From:.*([\w.-]+@[\w.-]+)/)
{
print "$1\n";}

__END__

church services produce:

-2ame\@yahoo.com
e\@yahoo.com

Please notice the regexp distinctions that produced these results.
As well, notice that '.*?' was put in as a trap to filter out exterraneous
characters that are not alpha numeric.

Let us pray.

Just foud and read "Regular Expression Tutorial Part 5: Greedy and Non-
Greedy Quantification" by Andrew Johnson (which can be found on the
Internet by searching). Andrew provides a pretty convencing
explanation on how '.*?' works. I believe the use of '.*?' will take
care of no space, one or more other characters, including space, tab,
etc., that appear before the real email address but are not matched
by [\w.-].

I've started to love this place and you guys ..

Scott Bryce · Nov 3, 2008

April said:
I still don't understand where your accusations come from, why you
are so upset, and what exactly you are expecting ..

He is not making accusations, and he is not upset. He is trying to help
you communicate your questions in a way that makes it easier for the
regulars here to answer them.

For example, in this line of code:

elsif ($header && /^From:.*?([\w.-]+@[\w.-]+)/)

We have no way of knowing what $header is used for, or what data it
might contain.

We have no way of knowing what $_ might contain.

We don't know what you mean by "header section." Of a email? Or is your
program divided up into "sections" that we are unaware of?

Without knowing that we cannot know if $header contains a value that is
different from what you think it holds, if $_ contains a different value
than you think it does, or if the regex is wrong for what you are trying
to accomplish.

if you haven't already, take a look at the posting guidelines for this
group. See if they help you ask your questions in a better way in the
future. There is a lot of Perl talent here. (I normally don't respond to
posts here, since there are others who give much better answers than I
do.) If you are willing to work within their guidelines, there is a lot
you can learn here.

Tim Greer · Nov 3, 2008

April said:
Just foud and read "Regular Expression Tutorial Part 5: Greedy and
Non- Greedy Quantification" by Andrew Johnson (which can be found on
the Internet by searching).Â Â AndrewÂ providesÂ aÂ prettyÂ convencing
explanation on how '.*?' works.Â Â IÂ believeÂ theÂ useÂ ofÂ '.*?'Â willÂ take
care of no space, one or more other characters, including space, tab,
etc.,Â Â thatÂ appearÂ beforeÂ theÂ realÂ emailÂ addressÂ butÂ areÂ notÂ matched
by [\w.-].

I've started to love this place and you guys ..

BTW, if you know it'll only be white space (space, tabs, etc.) between
the ^From:? and email@address, then \s+ would probably be a better
idea... unless you suspect other non \w, ., and - characters will exist
between it and don't want to try and predict them.

April · Nov 3, 2008

BTW, if you know it'll only be white space (space, tabs, etc.) between
the ^From:? and email@address, then \s+ would probably be a better
idea... unless you suspect other non \w, ., and - characters will exist
between it and don't want to try and predict them.

you mean '^From:\s+?', how about '^From:\s*?', to also cover the case
no white space or anything at all?

April · Nov 3, 2008

He is not making accusations, and he is not upset. He is trying to help
you communicate your questions in a way that makes it easier for the
regulars here to answer them.

I assume that's the case now ..

For example, in this line of code:

elsif ($header && /^From:.*?([\w.-]+@[\w.-]+)/)

We have no way of knowing what $header is used for, or what data it
might contain.

We have no way of knowing what $_ might contain.

We don't know what you mean by "header section." Of a email? Or is your
program divided up into "sections" that we are unaware of?

Without knowing that we cannot know if $header contains a value that is
different from what you think it holds, if $_ contains a different value
than you think it does, or if the regex is wrong for what you are trying
to accomplish.

I guess I could have done better .. I indicated this is an email
matching issue in the original post, and I remembered to use
meaningful variable name, and the && is a logical and, so then I
assumed people have the same thing in my mind and would understand I
was trying to take out any doubt related to $header, and can focus on
anything wrong with the part after &&.

if you haven't already, take a look at the posting guidelines for this
group. See if they help you ask your questions in a better way in the
future. There is a lot of Perl talent here. (I normally don't respond to
posts here, since there are others who give much better answers than I
do.) If you are willing to work within their guidelines, there is a lot
you can learn here.

I have taken note, thanks Scott.

Tim Greer · Nov 3, 2008

April said:
you mean '^From:\s+?', how about '^From:\s*?', to also cover the case
no white space or anything at all?

Yes, if it might have white space or might have none at all, then \s*
for zero or more is what you want. \s*? isn't necessary here, since
\s* is already zero or more, so making it an optional match doesn't
matter, since if it doesn't exist, it's already "zero". Be sure to
make : optional on From though, since your examples don't have it each
time.

John W. Krahn · Nov 3, 2008

Tim said:
Yes, if it might have white space or might have none at all, then \s*
for zero or more is what you want. \s*? isn't necessary here, since
\s* is already zero or more, so making it an optional match doesn't
matter,

The ? on the end of \s*? changes \s* to non-greedy, the * makes it optional.

since if it doesn't exist, it's already "zero". Be sure to
make : optional on From though, since your examples don't have it each
time.

John

Tim Greer · Nov 3, 2008

John said:
The ? on the end of \s*? changes \s* to non-greedy, the * makes it
optional.

Right, I know. However, they'll never want to capture any of the white
space between From:? and ([\w.-]+@[\w.-]+), so \s* should suffice and
doesn't require non greedy to function as expected.

Tim Greer · Nov 3, 2008

Tim said:
John said:

The ? on the end of \s*? changes \s* to non-greedy, the * makes it
optional.

Click to expand...

Right, I know. However, they'll never want to capture any of the
white space between From:? and ([\w.-]+@[\w.-]+), so \s* should
suffice and doesn't require non greedy to function as expected.

Pardon, to be more specific, I misused the word capture (that's
obvious). I simply mean that they appear to want to match all white
space (zero or as many as there is), and don't need to use a non greedy
match there. Not that it would matter, but it's not necessary to use
from what I can see.

sln · Nov 3, 2008

Since you are asking this question, it is not clear to you at all April.

Click to expand...

you really know me, however with your inspiration, I'm pretty sure
I'll be getting better sooner.

Look at that expression. '[\w.-]+@' is a hard anchor. That must be satisfied
first, especially the '@'.

Click to expand...

not sure I agree with this and the following Church ranking thing ...

The fact is '[\w.-]+' can be satisfied with a single character.
agree.

The other fact is '.*?' can get by with one character but it is in the bottom
of precedence.

Click to expand...

believe '.*?' can get by with 0 character too.

Yes, it will take no character, its a filter, but its a one character
at a time filter.

However '.*' wants to take as much of the string as possible,

Click to expand...

agree, but will still check to see whether that will allow the
following [\w.-] to be satisfied.

Yes, but '[\w.-]+' can be satisfied with a single character.
Thus '.*' will grab all before that single character in a greedy fashion.
This will take as long as the non-greedy but will not get the right results.

Here is the heirchy from top down:

1 - '@' is GOD
2 - '[\w.-]+' is CHRIST
3 - '.*' is the greedy HOLY GHOST
4 - '.*?' is the single ANGEL

Click to expand...

[snip]

Just foud and read "Regular Expression Tutorial Part 5: Greedy and Non-
Greedy Quantification" by Andrew Johnson (which can be found on the
Internet by searching). Andrew provides a pretty convencing
explanation on how '.*?' works. I believe the use of '.*?' will take
care of no space, one or more other characters, including space, tab,
etc., that appear before the real email address but are not matched
by [\w.-].

I never read that book. Its probably good.

In my experience, negative greed is one the most usefull concepts.
I always look to add negative greed to expressions.

In terms of greed, once the engine knows what not to look for, it will
grab all it can up to that point. Then it will look at the next term
in the regex expression.

This is the same as non-greedy, but the greedy one grabs a chunk of
matched data at a time, where as the non-greedy will grab one occurance
at a time. They both then check the next term for a match.

Knowing this, you can shorten the time the data takes to process.

Example:
$data = "From: ]]]][[[[*****\\ -2ame\@yahoo.com";
$data =~ /^From:[^\w.-]*([\w.-]+@[\w.-]+)/
is about %130 faster (2-3x faster) than this
$data =~ /^From:.*?([\w.-]+@[\w.-]+)/

The reason is that the engine grabs the greedy chunk first.
It just so happens we stopped the greed at a boundry where
the next character \w satisfies the next term '[\w-]+'.

Non-greedy will only get one character at a time between checks if the
next character will satisfy the next term '[\w-]+'. The repeated itteration
consumes a very large chunk of processing time.

The more a non-greedy term has to process the longer it takes. It could be
non-linear as well, not sure.

If the above were '$data = "From: -2ame\@yahoo.com";', the processing time's
would be equal. The more '.*?' characters, the longer time it takes.

There are times when you don't know where exactly to stop the greed,
but by all means possible, try to let the greed be there. Just have
to think about it and test all possible scenario's.

In a looping scenario, say like a parser, where everything is processed in a
repeating fashion, there is usually a sink/filter that picks up waste/comments
or formatting data, typically takes on the '.*?' form. This typically gives
the patterns a chance to match on the next character.

If there is 1,2 or 3 characters that start out the pattern matches, a greedy
term (negative) can take you up to them quickly, giving the pattern's a chance
to match without checking at character intervals. In that case you can use negative
greed and just simply have to know how to get past those characters in case the
patterns don't match.
Typically:

$lcbpos = 0
while (/<($pat1|$pat2|$pat3)>|([^<]*)(<?)/g) {
if (defined $2) {
if (length($3) && $lcbpos != pos($_)) {
$lcbpos = pos($_);
pos($_) = $lcbpos - 1;
}
next;
}
# found pattern

So negative greed is a good thing indeed. Its advisable to always try to be greedy.
But, this is un-avoidable sometimes: /ANCHOR's.*?AWAY/

Here are some benchmarks concerning greed and your email regexp.
--------------------------
use strict;
use warnings;
use Benchmark ':hireswallclock';

my $email = "From: ]]]][[[[*****\\ -2ame\@yahoo.com";
my ($result,$t0,$t1,$tdif) = '';

### Non-Greedy '.*?'
$t0 = new Benchmark;
for (1 .. 10000)
{
$email =~ /^From:.*?([\w.-]+@[\w.-]+)/;
}
$t1 = new Benchmark;
$tdif = timediff($t1, $t0);
print "\nNon-greedy '.*?' --\n the code took:",timestr($tdif),"\n";

### Greedy '[^\\w.-]*'
$t0 = new Benchmark;
for (1 .. 10000)
{
$email =~ /^From:[^\w.-]*([\w.-]+@[\w.-]+)/;
}
$t1 = new Benchmark;
$tdif = timediff($t1, $t0);
print "\nGreedy '[^\\w.-]*' --\n the code took:",timestr($tdif),"\n";

__END__

Non-greedy '.*?' --
the code took:0.03332 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU)

Greedy '[^\w.-]*' --
the code took:0.016902 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU)

John W. Krahn · Nov 3, 2008

Tim said:
Tim said:

John said:

Tim Greer wrote:

Yes, if it might have white space or might have none at all, then
\s*
for zero or more is what you want. \s*? isn't necessary here, since
\s* is already zero or more, so making it an optional match doesn't
matter,

The ? on the end of \s*? changes \s* to non-greedy, the * makes it
optional.

Click to expand...

Right, I know. However, they'll never want to capture any of the
white space between From:? and ([\w.-]+@[\w.-]+), so \s* should
suffice and doesn't require non greedy to function as expected.

Click to expand...

Pardon, to be more specific, I misused the word capture (that's
obvious). I simply mean that they appear to want to match all white
space (zero or as many as there is), and don't need to use a non greedy
match there. Not that it would matter, but it's not necessary to use
from what I can see.

Right, because the (optional) whitespace is anchored by 'm:?' on the
left and '[\w.-]+' on the right the greediness is irrelevant.

John

April · Nov 4, 2008

Yes, if it might have white space or might have none at all, then \s*
for zero or more is what you want. \s*? isn't necessary here, since
\s* is already zero or more, so making it an optional match doesn't
matter, since if it doesn't exist, it's already "zero". Be sure to
make : optional on From though, since your examples don't have it each
time.

that's right it cannot be greedy to anywhere as here the matching is
with white space.

by the way using ? to make : optional is also a good thinking.

? seems a pretty interesting quatifier in re. it relates to both
optional and non-greedy.

CORS/Express: Getting data from server from domain html	2	Sep 3, 2022
FAQ 6.7 How can I make "\w" match national character sets?	0	Jan 19, 2011
Pyautogui, cv2 and cannot find image	0	Feb 7, 2023
Generate one HTML from API based on the object key language and their value	2	Aug 19, 2022
networking question: 2-way messaging w/o wireless modem config?	3	Apr 17, 2014
If a validation script fails, how do I place focus back into the field until entered correctly	2	May 14, 2017
Newbie question - Problem understanding W(p)GTR	2	Nov 16, 2008
RegEx	0	Sep 1, 2022

/^From:.*?([\w.-]+@[\w.-]+)/

April

Tad J McClellan

Jürgen Exner

April

Jürgen Exner

Tad J McClellan

April

sln

April

Scott Bryce

Tim Greer

April

April

Tim Greer

John W. Krahn

Tim Greer

Tim Greer

sln

John W. Krahn

April

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads