Regular expression guaranteed to fail

Des Small · Aug 20, 2004

I want to use sets and regular expressions to implement some
linguistic ideas. Representing sounds by symbols, and properties
(coronal or velar articulation; voicedness) by sets of symbols with
those properties, it is natural to then map these sets, and
intersections of them, to regular expressions to apply to strings.

The question is, what regular expression should correspond to the
empty set? I've provisionally gone with "(?!.*)", i.e., the negation
of a look-ahead which matches anything. Is there an established idiom
for this, and is that it? And if there isn't, does this seem
reasonable?

Example code:

"""
import sets

def str2set(s): return sets.Set(s.split())

cor = str2set("N D T") # Coronal articulation
vel = str2set("K G") # Velar articulation
voi = str2set("N D G") # Voiced

def set2re(s):
if s: return "|".join([e for e in s])
else: return "(?!.*)"
"""

So we can get a regexp (string) that matches symbols corresponding to
velar and voiced sounds:
"""=> 'D|N'
"""
But nothing can be (in this model at least) velar and coronal:
"""=> Set([])
"""
and this maps to the Regexp Which Matches Nothing:
"""=> '(?!.*)'
"""

This seems quite elegant to me, but there is such a fine line between
elegance and utter weirdness and I'd be glad to know which side other
persons think I'm on here.

Des

Eric Brunel · Aug 20, 2004

Des said:
I want to use sets and regular expressions to implement some
linguistic ideas. Representing sounds by symbols, and properties
(coronal or velar articulation; voicedness) by sets of symbols with
those properties, it is natural to then map these sets, and
intersections of them, to regular expressions to apply to strings.

The question is, what regular expression should correspond to the
empty set? I've provisionally gone with "(?!.*)", i.e., the negation
of a look-ahead which matches anything. Is there an established idiom
for this, and is that it? And if there isn't, does this seem
reasonable?

I also looked for a never-matching re just a few days ago and ended up with
"^(?!$)$". It's certainly not more "standard" than yours, but I find it a wee
tad more readable (for a regular expression, I mean...): it's quite clear that
it requests a string start not followed by a string end and followed by a string
end, which is guaranteed to never happen. Yours is a bit harder to explain. Mine
may also be more efficient for very long strings, but I can be wrong here.

See what other people think...

Jeremy Bowers · Aug 20, 2004

The question is, what regular expression should correspond to the
empty set?

I would return compiled RE objects instead of strings, and in the empty
case, return a class you write that matches the interface of a compiled RE
but returns what you like. Something like:

def NeverMatch(object):
def match(*args, **kwargs):
return None

def set2re(s):
if s: return re.compile("|".join([e for e in s]))
else: return NeverMatch()

Hallvard B Furuseth · Aug 22, 2004

Eric said:
I also looked for a never-matching re just a few days ago and ended up
with "^(?!$)$". It's certainly not more "standard" than yours, but I
find it a wee tad more readable (for a regular expression, I mean...):

I think e.g. r'\Zx' and r'x\A' are more readable. In particular the
latter, but perhaps that causes Python to locate every 'x' in the string
and then check if the string starts at the next character...

Greg Chapman · Aug 24, 2004

I think e.g. r'\Zx' and r'x\A' are more readable. In particular the
latter, but perhaps that causes Python to locate every 'x' in the string
and then check if the string starts at the next character...

Why not just "(?!)": this always fails immediately (since an empty pattern
matches any string, the negation of an empty pattern match always fails).

Des Small · Aug 24, 2004

Greg Chapman said:
Why not just "(?!)": this always fails immediately (since an empty
pattern matches any string, the negation of an empty pattern match
always fails).

I think we have a winner!

Des
thanks all the persons who contributed, of course.

Hallvard B Furuseth · Aug 24, 2004

Greg said:
Why not just "(?!)": this always fails immediately (since an empty pattern
matches any string, the negation of an empty pattern match always fails).

It's fine for re.match.

'Why not?': Because I'd expect re.search to walk through the entire
string and check if each position in the string matches that regexp.
Unfortunately, a little timing shows that that happens with _every_
regexp suggested so far. Long strings take longer for each of them.
(Except Jeremy's solution, of course, which avoids the whole problem.)
r'\A(?!)' or r'\Ax\A' didn't work either.

Anyway, I note that r'x\A' beats all the other regexps suggested so far
with a factor of 20 when searching 's'*10000.

Eric Brunel · Aug 24, 2004

Hallvard said:
It's fine for re.match.

'Why not?': Because I'd expect re.search to walk through the entire
string and check if each position in the string matches that regexp.
Unfortunately, a little timing shows that that happens with _every_
regexp suggested so far. Long strings take longer for each of them.
(Except Jeremy's solution, of course, which avoids the whole problem.)
r'\A(?!)' or r'\Ax\A' didn't work either.

Anyway, I note that r'x\A' beats all the other regexps suggested so far
with a factor of 20 when searching 's'*10000.

And when searching 'x'*10000? Since there is an 'x' in the re, it may change
things a lot...

Hallvard B Furuseth · Aug 24, 2004

Eric said:
And when searching 'x'*10000? Since there is an 'x' in the re, it may change
things a lot...

Heh. You are right: That's about almost as slow as the others. A bit
slower than \Zx and \Ax\A, but still faster than the other alternatives.

Regular expression to structure HTML	11	Oct 2, 2009
Regular expression	0	Jul 21, 2009
Regular Expression Help	3	Apr 12, 2009
Repeating assertions in regular expression	3	Jan 3, 2012
Unwanted collector in regular expression	2	Apr 1, 2011
How to print all expressions that match a regular expression	23	Feb 6, 2010
Please help with regular expression finding multiple floats	6	Oct 22, 2009
Help with regular expression in python	1	Aug 18, 2011

Regular expression guaranteed to fail

Des Small

Eric Brunel

Jeremy Bowers

Hallvard B Furuseth

Greg Chapman

Des Small

Hallvard B Furuseth

Eric Brunel

Hallvard B Furuseth

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads