regex question: extended [^...] concept?

W

Werner Lemberg

Folks,


consider this input:

foo ... foo ... bar

where `...' doesn't contain the word `foo'. How can I write a regular
expression which matches `foo ... bar' but not `foo ... foo ... bar'? Were
it a single character as in

f ... f ... bar

I could write

/f[^f]*bar/

but how can I do something similar for a word? In other words, I search an
extension of the [^.] concept which covers a sequence of characters.

I've looked into both the `perlre' and `perlretut' manual pages (of perl
5.10.0), but it contains relevant to this problem.


Werner
 
S

smallpond

Folks,

consider this input:

foo ... foo ... bar

where `...' doesn't contain the word `foo'. How can I write a regular
expression which matches `foo ... bar' but not `foo ... foo ... bar'? Were
it a single character as in

f ... f ... bar

I could write

/f[^f]*bar/

but how can I do something similar for a word? In other words, I search an
extension of the [^.] concept which covers a sequence of characters.

I've looked into both the `perlre' and `perlretut' manual pages (of perl
5.10.0), but it contains relevant to this problem.

Werner

print "OK" if ($source =~ /foo.*bar/ and $source !~ /foo.*foo.*bar/);
 
W

Willem

Werner Lemberg wrote:
)
) Folks,
)
)
) consider this input:
)
) foo ... foo ... bar
)
) where `...' doesn't contain the word `foo'. How can I write a regular
) expression which matches `foo ... bar' but not `foo ... foo ... bar'? Were
) it a single character as in
)
) f ... f ... bar
)
) I could write
)
) /f[^f]*bar/
)
) but how can I do something similar for a word? In other words, I search an
) extension of the [^.] concept which covers a sequence of characters.

That's quite difficult and complicated to do in a single regexp.
You basically have to cover all cases.

This might work, but I'm not sure I got all cases right:

/foo[^f]*((f[^o]|fo[^o])[^f]*)*f?bar/

You can see that using two regexes one after the other (as mentioned
crossthread) is a lot easier.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
W

Werner Lemberg

Willem said:
) consider this input:
)
) foo ... foo ... bar
)
) where `...' doesn't contain the word `foo'. How can I write a regular
) expression which matches `foo ... bar' but not `foo ... foo ... bar'?

That's quite difficult and complicated to do in a single regexp.
You basically have to cover all cases.
This might work, but I'm not sure I got all cases right:
/foo[^f]*((f[^o]|fo[^o])[^f]*)*f?bar/

You can see that using two regexes one after the other (as mentioned
crossthread) is a lot easier.

Thanks for the answers. I'm really surprised that there are so many regex
extensions in Perl but not a single one which covers this. Is this
difficult to handle in a regex machine, or is there no need normally for
it?

Especially in combination with the (?PARNO) stuff (as described in the
perlre man page) this could be quite handy for recursively parsing nested
expressions.

I also wonder why there is no callback mechanism with in regular
expressions. The (?{ code }) construct allows execution of arbitrary Perl
code but within the regex it always evaluates to true. I would like to have
a similar construct, say, (!{ code }), which evaluates to true or false
depending on `code'. Then I could implement my above request by myself,
simply checking the passed subgroup whether it contains the given string.


Werner
 
J

Jürgen Exner

Werner Lemberg said:
consider this input:

foo ... foo ... bar

where `...' doesn't contain the word `foo'. How can I write a regular
expression which matches `foo ... bar' but not `foo ... foo ... bar'?

reverse() the text and match non-greedy /^rab...oof/, then reverse() the
found match again.

jue
 
S

smallpond

Werner Lemberg wrote:

)
) Folks,
)
)
) consider this input:
)
) foo ... foo ... bar
)
) where `...' doesn't contain the word `foo'. How can I write a regular
) expression which matches `foo ... bar' but not `foo ... foo ... bar'? Were
) it a single character as in
)
) f ... f ... bar
)
) I could write
)
) /f[^f]*bar/
)
) but how can I do something similar for a word? In other words, I search an
) extension of the [^.] concept which covers a sequence of characters.

That's quite difficult and complicated to do in a single regexp.
You basically have to cover all cases.

This might work, but I'm not sure I got all cases right:

/foo[^f]*((f[^o]|fo[^o])[^f]*)*f?bar/

You can see that using two regexes one after the other (as mentioned
crossthread) is a lot easier.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

$s="fooafoosdbar";
print "OK" if ($s =~ /foo[^f]*((f[^o]|fo[^o])[^f]*)*f?bar/);

OK

It won't work unless you prevent backtracking, I think. The initial
foo in your pattern can match the second one in the string.
 
W

Werner Lemberg

consider this input:
reverse() the text and match non-greedy /^rab...oof/, then reverse() the
found match again.

Thank you. While this is a solution for my concrete example, it
unfortunately leads to nothing if you want to generalize the [^.] concept.


Werner
 
W

Werner Lemberg

/foo (?: (?!foo) . )* bar/x

Aah. I've already thought of a negative lookahead, but I haven't had the
idea of using `(.)*' to provide a `moving anchor' for it. Thanks for the
idea.
If a suffix of "foo" matches a prefix of "bar" you may end up with false
negatives, depending on what you wanted. That is,
"foo...fobar" =~ /foo (?: (?!fob) . )* bar/x
is false, even though the "fob" you don't want to match is part of the
"bar" you do. It is possible to correct this with yet another negative
lookahead:
/foo (?: (?! fob (?!ar) ) . )* bar/x

Uuh, a negative lookahead *within* another negative lookahead. How is the
exactly defined? Is it equivalent to

(?! fob ) (?! ar )

?


Werner
 
I

Ilya Zakharevich

foo ... foo ... bar

where `...' doesn't contain the word `foo'. How can I write a regular
expression which matches `foo ... bar' but not `foo ... foo ... bar'?

This does not make sense, since `foo ... bar' is a substring of `foo
.... foo ... bar'.

I assume you want to allow the REX to match this substring, but not
the larger string. Then the simplest solution would be to fasttrack to
the LATEST occurence of foo which is followed by bar:

/^ (?> .* (?=foo .* bar) ) (foo .* bar) /x; # add \b where needed

or just (depending on the needs)

/ ^ .* (foo .* bar) /x;

If you want to disallow ANY match which contains foo foo bar, then it
may as simple as

/ ^ (?! .*? foo .* foo .* bar) .*? ( foo .* bar )/x
or
/ ^ (?! (?> (?> .*? foo) .*? foo) .* bar) .*? ( foo .* bar )/x

However, the problem becomes much trickier if you prohibit using ^...

Hope this helps,
Ilya

P.S. Of course, with "onion rings" implemented (google for it) there
would be no problem whatsoever...
 
S

sln

Folks,


consider this input:

foo ... foo ... bar

where `...' doesn't contain the word `foo'. How can I write a regular
expression which matches `foo ... bar' but not `foo ... foo ... bar'? Were
it a single character as in

f ... f ... bar

I could write

/f[^f]*bar/

but how can I do something similar for a word? In other words, I search an
extension of the [^.] concept which covers a sequence of characters.

I've looked into both the `perlre' and `perlretut' manual pages (of perl
5.10.0), but it contains relevant to this problem.


Werner

I've always thought these are good ways.

-sln

---------------------------------------
use strict;
use warnings;

if ( "foo ... foo ... bar ... bar" =~ /(foo (?: . (?! foo) ) * bar)/x )
{
print "$1\n";
}
## or

if ( "foo ... foo ... bar ... bar" =~ /(foo (?: . (?! foo) ) *? bar)/x )
{
print "$1\n";
}

__END__

foo ... bar ... bar
foo ... bar
 
E

Eric Pozharski

On 2009-03-23 said:
Thanks for the answers. I'm really surprised that there are so many regex
extensions in Perl but not a single one which covers this. Is this
difficult to handle in a regex machine, or is there no need normally for
it?

Watch what you say. Those aren't 'regex extensions in Perl'. Those are
Perl regex (or 'perlre', for short)

*CUT*
 
E

Eric Pozharski

On 2009-03-24 said:
as being something like

(?{ if (/pattern/) { fail } })

where 'fail' is a hypothetical builtin that causes the surrounding match
to fail. (Of course $_ would have to have the appropriate value, as
well, which it doesn't.) OTOH, it may not... :)

Once I've considered aproach of intentionally failing match within
perlre itself (prepropcessing wasn't an option).

I would come with something like this:

perl -Mstrict -wle '
my $x = qr[(??{ substr($`, -1, 1) eq q|f| ? qr/(?<=f)/ : qr/(?!.|$)/
})];
foreach ( q||, qw| x f fx xf | ) {
print m{$x};
print qq|<$&>|;
print qq|$_\n|; };
print q|FIN|'

Use of uninitialized value $& in concatenation (.) or string at -e line 5.
<>



Use of uninitialized value $& in concatenation (.) or string at -e line 5.
<>
x

1
<>
f

1
<>
fx

1
<>
xf

FIN

But isn't that a way havy (I'm not even about C<$`>)?

*CUT*
 
S

sln

Thanks a lot to all the posters who presented further ideas and comments.


Yeah. I've looked at

http://dev.perl.org/perl6/rfc/198.html

and this is exactly what I would like to have. In the same document there
is also

(?*{code})

which I would like to see too.


Werner

Note that (?{ code }) always passes, it has no real effect on the regex except you
can tweek special variables, pos(), etc..

Hopefully, it won't come down to all that.

-sln
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top