D
Des Small
I want to use sets and regular expressions to implement some
linguistic ideas. Representing sounds by symbols, and properties
(coronal or velar articulation; voicedness) by sets of symbols with
those properties, it is natural to then map these sets, and
intersections of them, to regular expressions to apply to strings.
The question is, what regular expression should correspond to the
empty set? I've provisionally gone with "(?!.*)", i.e., the negation
of a look-ahead which matches anything. Is there an established idiom
for this, and is that it? And if there isn't, does this seem
reasonable?
Example code:
"""
import sets
def str2set(s): return sets.Set(s.split())
cor = str2set("N D T") # Coronal articulation
vel = str2set("K G") # Velar articulation
voi = str2set("N D G") # Voiced
def set2re(s):
if s: return "|".join([e for e in s])
else: return "(?!.*)"
"""
So we can get a regexp (string) that matches symbols corresponding to
velar and voiced sounds:
"""=> 'D|N'
"""
But nothing can be (in this model at least) velar and coronal:
"""=> Set([])
"""
and this maps to the Regexp Which Matches Nothing:
"""=> '(?!.*)'
"""
This seems quite elegant to me, but there is such a fine line between
elegance and utter weirdness and I'd be glad to know which side other
persons think I'm on here.
Des
linguistic ideas. Representing sounds by symbols, and properties
(coronal or velar articulation; voicedness) by sets of symbols with
those properties, it is natural to then map these sets, and
intersections of them, to regular expressions to apply to strings.
The question is, what regular expression should correspond to the
empty set? I've provisionally gone with "(?!.*)", i.e., the negation
of a look-ahead which matches anything. Is there an established idiom
for this, and is that it? And if there isn't, does this seem
reasonable?
Example code:
"""
import sets
def str2set(s): return sets.Set(s.split())
cor = str2set("N D T") # Coronal articulation
vel = str2set("K G") # Velar articulation
voi = str2set("N D G") # Voiced
def set2re(s):
if s: return "|".join([e for e in s])
else: return "(?!.*)"
"""
So we can get a regexp (string) that matches symbols corresponding to
velar and voiced sounds:
"""=> 'D|N'
"""
But nothing can be (in this model at least) velar and coronal:
"""=> Set([])
"""
and this maps to the Regexp Which Matches Nothing:
"""=> '(?!.*)'
"""
This seems quite elegant to me, but there is such a fine line between
elegance and utter weirdness and I'd be glad to know which side other
persons think I'm on here.
Des