Regular expression guaranteed to fail

D

Des Small

I want to use sets and regular expressions to implement some
linguistic ideas. Representing sounds by symbols, and properties
(coronal or velar articulation; voicedness) by sets of symbols with
those properties, it is natural to then map these sets, and
intersections of them, to regular expressions to apply to strings.

The question is, what regular expression should correspond to the
empty set? I've provisionally gone with "(?!.*)", i.e., the negation
of a look-ahead which matches anything. Is there an established idiom
for this, and is that it? And if there isn't, does this seem
reasonable?

Example code:

"""
import sets

def str2set(s): return sets.Set(s.split())

cor = str2set("N D T") # Coronal articulation
vel = str2set("K G") # Velar articulation
voi = str2set("N D G") # Voiced

def set2re(s):
if s: return "|".join([e for e in s])
else: return "(?!.*)"
"""

So we can get a regexp (string) that matches symbols corresponding to
velar and voiced sounds:
"""=> 'D|N'
"""
But nothing can be (in this model at least) velar and coronal:
"""=> Set([])
"""
and this maps to the Regexp Which Matches Nothing:
"""=> '(?!.*)'
"""

This seems quite elegant to me, but there is such a fine line between
elegance and utter weirdness and I'd be glad to know which side other
persons think I'm on here.

Des
 
E

Eric Brunel

Des said:
I want to use sets and regular expressions to implement some
linguistic ideas. Representing sounds by symbols, and properties
(coronal or velar articulation; voicedness) by sets of symbols with
those properties, it is natural to then map these sets, and
intersections of them, to regular expressions to apply to strings.

The question is, what regular expression should correspond to the
empty set? I've provisionally gone with "(?!.*)", i.e., the negation
of a look-ahead which matches anything. Is there an established idiom
for this, and is that it? And if there isn't, does this seem
reasonable?

I also looked for a never-matching re just a few days ago and ended up with
"^(?!$)$". It's certainly not more "standard" than yours, but I find it a wee
tad more readable (for a regular expression, I mean...): it's quite clear that
it requests a string start not followed by a string end and followed by a string
end, which is guaranteed to never happen. Yours is a bit harder to explain. Mine
may also be more efficient for very long strings, but I can be wrong here.

See what other people think...
 
J

Jeremy Bowers

The question is, what regular expression should correspond to the
empty set?

I would return compiled RE objects instead of strings, and in the empty
case, return a class you write that matches the interface of a compiled RE
but returns what you like. Something like:

def NeverMatch(object):
def match(*args, **kwargs):
return None

def set2re(s):
if s: return re.compile("|".join([e for e in s]))
else: return NeverMatch()
 
H

Hallvard B Furuseth

Eric said:
I also looked for a never-matching re just a few days ago and ended up
with "^(?!$)$". It's certainly not more "standard" than yours, but I
find it a wee tad more readable (for a regular expression, I mean...):

I think e.g. r'\Zx' and r'x\A' are more readable. In particular the
latter, but perhaps that causes Python to locate every 'x' in the string
and then check if the string starts at the next character...
 
G

Greg Chapman

I think e.g. r'\Zx' and r'x\A' are more readable. In particular the
latter, but perhaps that causes Python to locate every 'x' in the string
and then check if the string starts at the next character...

Why not just "(?!)": this always fails immediately (since an empty pattern
matches any string, the negation of an empty pattern match always fails).
 
D

Des Small

Greg Chapman said:
Why not just "(?!)": this always fails immediately (since an empty
pattern matches any string, the negation of an empty pattern match
always fails).

I think we have a winner!

Des
thanks all the persons who contributed, of course.
 
H

Hallvard B Furuseth

Greg said:
Why not just "(?!)": this always fails immediately (since an empty pattern
matches any string, the negation of an empty pattern match always fails).

It's fine for re.match.

'Why not?': Because I'd expect re.search to walk through the entire
string and check if each position in the string matches that regexp.
Unfortunately, a little timing shows that that happens with _every_
regexp suggested so far. Long strings take longer for each of them.
(Except Jeremy's solution, of course, which avoids the whole problem.)
r'\A(?!)' or r'\Ax\A' didn't work either.

Anyway, I note that r'x\A' beats all the other regexps suggested so far
with a factor of 20 when searching 's'*10000.
 
E

Eric Brunel

Hallvard said:
It's fine for re.match.

'Why not?': Because I'd expect re.search to walk through the entire
string and check if each position in the string matches that regexp.
Unfortunately, a little timing shows that that happens with _every_
regexp suggested so far. Long strings take longer for each of them.
(Except Jeremy's solution, of course, which avoids the whole problem.)
r'\A(?!)' or r'\Ax\A' didn't work either.

Anyway, I note that r'x\A' beats all the other regexps suggested so far
with a factor of 20 when searching 's'*10000.

And when searching 'x'*10000? Since there is an 'x' in the re, it may change
things a lot...
 
H

Hallvard B Furuseth

Eric said:
And when searching 'x'*10000? Since there is an 'x' in the re, it may change
things a lot...

Heh. You are right: That's about almost as slow as the others. A bit
slower than \Zx and \Ax\A, but still faster than the other alternatives.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top