regex question: extended [^...] concept?

Werner Lemberg · Mar 23, 2009

Folks,

consider this input:

foo ... foo ... bar

where `...' doesn't contain the word `foo'. How can I write a regular
expression which matches `foo ... bar' but not `foo ... foo ... bar'? Were
it a single character as in

f ... f ... bar

I could write

/f[^f]*bar/

but how can I do something similar for a word? In other words, I search an
extension of the [^.] concept which covers a sequence of characters.

I've looked into both the `perlre' and `perlretut' manual pages (of perl
5.10.0), but it contains relevant to this problem.

Werner

smallpond · Mar 23, 2009

Folks,

consider this input:

foo ... foo ... bar

where `...' doesn't contain the word `foo'. How can I write a regular
expression which matches `foo ... bar' but not `foo ... foo ... bar'? Were
it a single character as in

f ... f ... bar

I could write

/f[^f]*bar/

but how can I do something similar for a word? In other words, I search an
extension of the [^.] concept which covers a sequence of characters.

I've looked into both the `perlre' and `perlretut' manual pages (of perl
5.10.0), but it contains relevant to this problem.

Werner

print "OK" if ($source =~ /foo.*bar/ and $source !~ /foo.*foo.*bar/);

Willem · Mar 23, 2009

Werner Lemberg wrote:
)
) Folks,
)
)
) consider this input:
)
) foo ... foo ... bar
)
) where `...' doesn't contain the word `foo'. How can I write a regular
) expression which matches `foo ... bar' but not `foo ... foo ... bar'? Were
) it a single character as in
)
) f ... f ... bar
)
) I could write
)
) /f[^f]*bar/
)
) but how can I do something similar for a word? In other words, I search an
) extension of the [^.] concept which covers a sequence of characters.

That's quite difficult and complicated to do in a single regexp.
You basically have to cover all cases.

This might work, but I'm not sure I got all cases right:

/foo[^f]*((f[^o]|fo[^o])[^f]*)*f?bar/

You can see that using two regexes one after the other (as mentioned
crossthread) is a lot easier.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Werner Lemberg · Mar 23, 2009

Willem said:
) consider this input:
)
) foo ... foo ... bar
)
) where `...' doesn't contain the word `foo'. How can I write a regular
) expression which matches `foo ... bar' but not `foo ... foo ... bar'?

That's quite difficult and complicated to do in a single regexp.
You basically have to cover all cases.

This might work, but I'm not sure I got all cases right:

/foo[^f]*((f[^o]|fo[^o])[^f]*)*f?bar/

Click to expand...

You can see that using two regexes one after the other (as mentioned
crossthread) is a lot easier.

Thanks for the answers. I'm really surprised that there are so many regex
extensions in Perl but not a single one which covers this. Is this
difficult to handle in a regex machine, or is there no need normally for
it?

Especially in combination with the (?PARNO) stuff (as described in the
perlre man page) this could be quite handy for recursively parsing nested
expressions.

I also wonder why there is no callback mechanism with in regular
expressions. The (?{ code }) construct allows execution of arbitrary Perl
code but within the regex it always evaluates to true. I would like to have
a similar construct, say, (!{ code }), which evaluates to true or false
depending on `code'. Then I could implement my above request by myself,
simply checking the passed subgroup whether it contains the given string.

Werner

Jürgen Exner · Mar 23, 2009

Werner Lemberg said:
consider this input:

foo ... foo ... bar

where `...' doesn't contain the word `foo'. How can I write a regular
expression which matches `foo ... bar' but not `foo ... foo ... bar'?

reverse() the text and match non-greedy /^rab...oof/, then reverse() the
found match again.

jue

smallpond · Mar 23, 2009

Werner Lemberg wrote:

)
) Folks,
)
)
) consider this input:
)
) foo ... foo ... bar
)
) where `...' doesn't contain the word `foo'. How can I write a regular
) expression which matches `foo ... bar' but not `foo ... foo ... bar'? Were
) it a single character as in
)
) f ... f ... bar
)
) I could write
)
) /f[^f]*bar/
)
) but how can I do something similar for a word? In other words, I search an
) extension of the [^.] concept which covers a sequence of characters.

That's quite difficult and complicated to do in a single regexp.
You basically have to cover all cases.

This might work, but I'm not sure I got all cases right:

/foo[^f]*((f[^o]|fo[^o])[^f]*)*f?bar/

You can see that using two regexes one after the other (as mentioned
crossthread) is a lot easier.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

$s="fooafoosdbar";
print "OK" if ($s =~ /foo[^f]*((f[^o]|fo[^o])[^f]*)*f?bar/);

OK

It won't work unless you prevent backtracking, I think. The initial
foo in your pattern can match the second one in the string.

Werner Lemberg · Mar 24, 2009

consider this input:

reverse() the text and match non-greedy /^rab...oof/, then reverse() the
found match again.

Thank you. While this is a solution for my concrete example, it
unfortunately leads to nothing if you want to generalize the [^.] concept.

Werner

Werner Lemberg · Mar 24, 2009

/foo (?: (?!foo) . )* bar/x

Aah. I've already thought of a negative lookahead, but I haven't had the
idea of using `(.)*' to provide a `moving anchor' for it. Thanks for the
idea.

If a suffix of "foo" matches a prefix of "bar" you may end up with false
negatives, depending on what you wanted. That is,

"foo...fobar" =~ /foo (?: (?!fob) . )* bar/x

is false, even though the "fob" you don't want to match is part of the
"bar" you do. It is possible to correct this with yet another negative
lookahead:

/foo (?: (?! fob (?!ar) ) . )* bar/x

Uuh, a negative lookahead *within* another negative lookahead. How is the
exactly defined? Is it equivalent to

(?! fob ) (?! ar )

?

Werner

Ilya Zakharevich · Mar 24, 2009

foo ... foo ... bar

where `...' doesn't contain the word `foo'. How can I write a regular
expression which matches `foo ... bar' but not `foo ... foo ... bar'?

This does not make sense, since `foo ... bar' is a substring of `foo
.... foo ... bar'.

I assume you want to allow the REX to match this substring, but not
the larger string. Then the simplest solution would be to fasttrack to
the LATEST occurence of foo which is followed by bar:

/^ (?> .* (?=foo .* bar) ) (foo .* bar) /x; # add \b where needed

or just (depending on the needs)

/ ^ .* (foo .* bar) /x;

If you want to disallow ANY match which contains foo foo bar, then it
may as simple as

/ ^ (?! .*? foo .* foo .* bar) .*? ( foo .* bar )/x
or
/ ^ (?! (?> (?> .*? foo) .*? foo) .* bar) .*? ( foo .* bar )/x

However, the problem becomes much trickier if you prohibit using ^...

Hope this helps,
Ilya

P.S. Of course, with "onion rings" implemented (google for it) there
would be no problem whatsoever...

sln · Mar 24, 2009

Folks,

consider this input:

foo ... foo ... bar

where `...' doesn't contain the word `foo'. How can I write a regular
expression which matches `foo ... bar' but not `foo ... foo ... bar'? Were
it a single character as in

f ... f ... bar

I could write

/f[^f]*bar/

but how can I do something similar for a word? In other words, I search an
extension of the [^.] concept which covers a sequence of characters.

I've looked into both the `perlre' and `perlretut' manual pages (of perl
5.10.0), but it contains relevant to this problem.

Werner

I've always thought these are good ways.

-sln

---------------------------------------
use strict;
use warnings;

if ( "foo ... foo ... bar ... bar" =~ /(foo (?: . (?! foo) ) * bar)/x )
{
print "$1\n";
}
## or

if ( "foo ... foo ... bar ... bar" =~ /(foo (?: . (?! foo) ) *? bar)/x )
{
print "$1\n";
}

__END__

foo ... bar ... bar
foo ... bar

Eric Pozharski · Mar 24, 2009

On 2009-03-23 said:
Thanks for the answers. I'm really surprised that there are so many regex
extensions in Perl but not a single one which covers this. Is this
difficult to handle in a regex machine, or is there no need normally for
it?

Watch what you say. Those aren't 'regex extensions in Perl'. Those are
Perl regex (or 'perlre', for short)

*CUT*

Eric Pozharski · Mar 24, 2009

On 2009-03-24 said:
as being something like

(?{ if (/pattern/) { fail } })

where 'fail' is a hypothetical builtin that causes the surrounding match
to fail. (Of course $_ would have to have the appropriate value, as
well, which it doesn't.) OTOH, it may not...

Once I've considered aproach of intentionally failing match within
perlre itself (prepropcessing wasn't an option).

I would come with something like this:

perl -Mstrict -wle '
my $x = qr[(??{ substr($`, -1, 1) eq q|f| ? qr/(?<=f)/ : qr/(?!.|$)/
})];
foreach ( q||, qw| x f fx xf | ) {
print m{$x};
print qq|<$&>|;
print qq|$_\n|; };
print q|FIN|'

Use of uninitialized value $& in concatenation (.) or string at -e line 5.
<>

Use of uninitialized value $& in concatenation (.) or string at -e line 5.
<>
x

1
<>
f

1
<>
fx

1
<>
xf

FIN

But isn't that a way havy (I'm not even about C<$`>)?

*CUT*

Werner Lemberg · Mar 24, 2009

Ilya Zakharevich said:
Hope this helps,

Thanks a lot to all the posters who presented further ideas and comments.

P.S. Of course, with "onion rings" implemented (google for it) there
would be no problem whatsoever...

Yeah. I've looked at

http://dev.perl.org/perl6/rfc/198.html

and this is exactly what I would like to have. In the same document there
is also

(?*{code})

which I would like to see too.

Werner

sln · Mar 25, 2009

Thanks a lot to all the posters who presented further ideas and comments.

Yeah. I've looked at

http://dev.perl.org/perl6/rfc/198.html

and this is exactly what I would like to have. In the same document there
is also

(?*{code})

which I would like to see too.

Werner

Note that (?{ code }) always passes, it has no real effect on the regex except you
can tweek special variables, pos(), etc..

Hopefully, it won't come down to all that.

-sln

Aliasing, and another regex question	4	Feb 27, 2008
get the number of subgroups in a regex	1	Sep 8, 2005
string.seach() RegEx question	11	Dec 16, 2003
Help needed with tough regular expression matching	11	Oct 12, 2009
should we Go now?	12	Feb 24, 2013
Regex question	3	Jun 28, 2004
FAQ 6.18 Why don't word-boundary searches with "\b" work for me?	0	Apr 24, 2011
Module to match file names against a wildcard spec?	12	Jun 15, 2005

regex question: extended [^...] concept?

Werner Lemberg

smallpond

Willem

Werner Lemberg

Jürgen Exner

smallpond

Werner Lemberg

Werner Lemberg

Ilya Zakharevich

sln

Eric Pozharski

Eric Pozharski

Werner Lemberg

sln

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads