positive/negative lookahead issue. greedy = problems?

V

vbgunz

/*
* BEGIN EXAMPLES
*/

var text = 'A Cats Catalog of Cat Catastrophes and Calamities';

/***
* EXAMPLE 1: negative lookahead assertion logic
***/

var newString = text.split(/\s/);
for (var i in newString) {
var word = newString;
if (
// we'll replace the word Cat under these conditions:
word.search(/Cat/) == 0 && // *if* word begins with Cat
word != 'Catalog' && // *but* is not Catalog
word != 'Catastrophes' // *and* is not Catastrophes
) {
// we'll replace the word Cat with Human
newString = (word.replace('Cat', 'Human'));
}
}
alert(newString.join(' '));
// -> A Humans Catalog of Human Catastrophes and Calamities


/***
* EXAMPLE 2: the simpler version
***/
var pattern = /(Cat(?!alog|astrophes))/g;
alert(text.replace(pattern, 'Human'));
// -> A Humans Catalog of Human Catastrophes and Calamities

/*
* END EXAMPLES
*/


example 1 === example 2. it may appear from this point forward it'll
be safe to assume my understanding of a negative look ahead may be
correct. problem is, I feel my understanding may be flawed. here is
why...

I always knew about positive/negative look aheads but didn't truly
understand them. Wrox Professional JavaScript sort of cleared it up
for me *but* upon deciding to sharpen my skills on my own, I came
across a peculiar problem.

Does being greedy work with them?

e.g., I created this problem entirely on my own. I didn't mean to, it
just ended up that way. my idea was to match \d+\.\d+ but not if it
was followed by \.\d+ .

here's an example:

var nla = /(\d+\.\d+(?!\.\d))/; // my negative look ahead
var txt = 'euphoria 72.21.330';
alert(nla.exec(txt)[1]); // -> 72.2 (completely unexpected)

what did I expect? nothing. null, nada. I am not at all interested in
the dozens of other possible solutions. I am most interested in
understanding this problem. I need enlightenment and very much
appreciate any insight on it!
 
T

Thomas 'PointedEars' Lahn

vbgunz said:
var nla = /(\d+\.\d+(?!\.\d))/; // my negative look ahead
var txt = 'euphoria 72.21.330';
alert(nla.exec(txt)[1]); // -> 72.2 (completely unexpected)

what did I expect? nothing. null, nada. I am not at all interested in
the dozens of other possible solutions. I am most interested in
understanding this problem. I need enlightenment and very much
appreciate any insight on it!

Subexpression | Match | Not matched
----------------+-------+------------
\d+ | 72 | .21.330
\d+. | 72. | 21.330
\d+.\d+ | 72.21 | .330
\d+.\d+(?!\.\d) | 72.2 | 1.330


HTH

PointedEars
 
T

Thomas 'PointedEars' Lahn

vbgunz said:
var nla = /(\d+\.\d+(?!\.\d))/; // my negative look ahead
var txt = 'euphoria 72.21.330';
alert(nla.exec(txt)[1]); // -> 72.2 (completely unexpected)

what did I expect? nothing. null, nada. I am not at all interested in
the dozens of other possible solutions. I am most interested in
understanding this problem. I need enlightenment and very much
appreciate any insight on it!

Subexpression | Match | Not matched
-----------------+-------+------------
\d+ | 72 | .21.330
\d+\. | 72. | 21.330
\d+\.\d+ | 72.21 | .330
\d+\.\d+(?!\.\d) | 72.2 | 1.330


HTH

PointedEars
 
V

vbgunz

Thomas said:
Subexpression | Match | Not matched
-----------------+-------+------------
\d+ | 72 | .21.330
\d+\. | 72. | 21.330
\d+\.\d+ | 72.21 | .330
\d+\.\d+(?!\.\d) | 72.2 | 1.330

it makes no sense to me. i read up a bit more on positive/negative
lookaheads and my issue still makes no sense to me. please, read this.

for text replacement operations I find these lookaheads invaluable. it
is awesome to know I can check if a pattern is either followed or not
followed by another pattern (without consuming it). seriously awesome.
what I do not understand is what is happening in my issue. I know you
posted a table to help but the table makes no sense to me at all when
it comes to *why* anything returns :(

lines 1 through 3 make perfect sense. 4 throws my baby out of the
window with the bathwater. I am thinking no match should be made.
seriously. no match should be made. I cannot understand why any match
is made.

(123\.456(?!\.789)) -> no match on 123.456.789 -> PERFECT!
(\d+\.\d+(?!\.\d+)) -> 2 matches on 123.456.789 -> 123.45/6.789 -> A
PREFECT WTF!?

I think this is why I never really used lookaheads. I thought I
understood them as of yesterday and using primitive examples they're
very useful *but* when I came up with the problem above, I got thrown
into a world of hurt. I have no idea why a match is made. no match
should be made. so. why is there a match being made?

In the end i would see even 456.789 making sense if it was the only
thing to return. I am just completely thrown off here. any detailed
enlightenment would be very much appreciated!
 
T

Thomas 'PointedEars' Lahn

vbgunz said:
it makes no sense to me. i read up a bit more on positive/negative
lookaheads and my issue still makes no sense to me.

You have asked for an explanation (and explicitly not for a solution), and
I provided it.

To be more verbose, the reason for the observed result is that with greedy
matching (which is the default) the longest possible match for any given
subexpression wins. Since you have imposed a further restriction on the
match (that it must not "be followed by a dot followed by a decimal digit"),
the second longest possible match won. (As you can observe in the last row,
`72.2' does match `\d+\.\d+' and `1.' does not match `\.\d'.)
(123\.456(?!\.789)) -> no match on 123.456.789 -> PERFECT!

Because you have explicitly requested that a match for "123.456" be not
followed by ".789" and so you have excluded the one and only possible match.
(\d+\.\d+(?!\.\d+)) -> 2 matches on 123.456.789 -> 123.45/6.789 -> A
PREFECT WTF!?

See above. There was more than one possible match because of the modifier,
and the longest possible one won, given the restrictions imposed. Since you
have used capturing parentheses, there are two elements in the
RegExp.prototype.exec() array: one for the matched substring and one for
the match for the first (and here only) captured substring -- which are
of course equal here.


PointedEars
 
P

pr

vbgunz said:
(123\.456(?!\.789)) -> no match on 123.456.789 -> PERFECT!
(\d+\.\d+(?!\.\d+)) -> 2 matches on 123.456.789 -> 123.45/6.789 -> A
PREFECT WTF!?

It looks like you're thinking of a lookahead as "x followed by y", and
might find it makes more sense to understand it as "x *immediately*
followed by y".

'123.45' is indeed the longest match for digit(s) dot digit(s) not
immediately followed by dot digit(s). '123.45' is immediately followed
instead by '6'.

If you want digit(s) dot digit(s) not followed *anywhere* by dot
digit(s), you're looking at something like:

/(\d+\.\d+(?!.*\.\d+))/

Note the '.*' in the lookahead. That will match '456.789'.

This scenario, from the perspective of Perl regular expressions, is well
described at

http://perldoc.perl.org/perlre.html#Backtracking
 
V

vbgunz

It looks like you're thinking of a lookahead as "x followed by y", and
might find it makes more sense to understand it as "x *immediately*
followed by y".

'123.45' is indeed the longest match for digit(s) dot digit(s) not
immediately followed by dot digit(s). '123.45' is immediately followed
instead by '6'.

If you want digit(s) dot digit(s) not followed *anywhere* by dot
digit(s), you're looking at something like:

/(\d+\.\d+(?!.*\.\d+))/

Note the '.*' in the lookahead. That will match '456.789'.

This scenario, from the perspective of Perl regular expressions, is well
described at

http://perldoc.perl.org/perlre.html#Backtracking

I'd like to thank you both (PointedEars) for your trying to help.
although it makes a little more sense than it did when I first
encountered the problem, the perldoc resource looks like a great
opportunity to clear matters up even more. every link to one of my
questions is like a gift. clicking one is equivalent to the act of
unwrapping one. I thank you both again for your time :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top