Regular expression guaranteed to fail

Discussion in 'Python' started by Des Small, Aug 20, 2004.

  1. Des Small

    Des Small Guest

    I want to use sets and regular expressions to implement some
    linguistic ideas. Representing sounds by symbols, and properties
    (coronal or velar articulation; voicedness) by sets of symbols with
    those properties, it is natural to then map these sets, and
    intersections of them, to regular expressions to apply to strings.

    The question is, what regular expression should correspond to the
    empty set? I've provisionally gone with "(?!.*)", i.e., the negation
    of a look-ahead which matches anything. Is there an established idiom
    for this, and is that it? And if there isn't, does this seem
    reasonable?

    Example code:

    """
    import sets

    def str2set(s): return sets.Set(s.split())

    cor = str2set("N D T") # Coronal articulation
    vel = str2set("K G") # Velar articulation
    voi = str2set("N D G") # Voiced

    def set2re(s):
    if s: return "|".join([e for e in s])
    else: return "(?!.*)"
    """

    So we can get a regexp (string) that matches symbols corresponding to
    velar and voiced sounds:
    """
    >>> set2re(cor & voi)

    => 'D|N'
    """
    But nothing can be (in this model at least) velar and coronal:
    """
    >>> cor & vel

    => Set([])
    """
    and this maps to the Regexp Which Matches Nothing:
    """
    >>> set2re(cor & vel)

    => '(?!.*)'
    """

    This seems quite elegant to me, but there is such a fine line between
    elegance and utter weirdness and I'd be glad to know which side other
    persons think I'm on here.

    Des
    --
    "[T]he structural trend in linguistics which took root with the
    International Congresses of the twenties and early thirties [...] had
    close and effective connections with phenomenology in its Husserlian
    and Hegelian versions." -- Roman Jakobson
     
    Des Small, Aug 20, 2004
    #1
    1. Advertising

  2. Des Small

    Eric Brunel Guest

    Des Small wrote:
    > I want to use sets and regular expressions to implement some
    > linguistic ideas. Representing sounds by symbols, and properties
    > (coronal or velar articulation; voicedness) by sets of symbols with
    > those properties, it is natural to then map these sets, and
    > intersections of them, to regular expressions to apply to strings.
    >
    > The question is, what regular expression should correspond to the
    > empty set? I've provisionally gone with "(?!.*)", i.e., the negation
    > of a look-ahead which matches anything. Is there an established idiom
    > for this, and is that it? And if there isn't, does this seem
    > reasonable?


    I also looked for a never-matching re just a few days ago and ended up with
    "^(?!$)$". It's certainly not more "standard" than yours, but I find it a wee
    tad more readable (for a regular expression, I mean...): it's quite clear that
    it requests a string start not followed by a string end and followed by a string
    end, which is guaranteed to never happen. Yours is a bit harder to explain. Mine
    may also be more efficient for very long strings, but I can be wrong here.

    See what other people think...
    --
    - Eric Brunel <eric (underscore) brunel (at) despammed (dot) com> -
    PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com
     
    Eric Brunel, Aug 20, 2004
    #2
    1. Advertising

  3. On Fri, 20 Aug 2004 10:35:18 +0000, Des Small wrote:
    > The question is, what regular expression should correspond to the
    > empty set?


    I would return compiled RE objects instead of strings, and in the empty
    case, return a class you write that matches the interface of a compiled RE
    but returns what you like. Something like:

    def NeverMatch(object):
    def match(*args, **kwargs):
    return None

    def set2re(s):
    if s: return re.compile("|".join([e for e in s]))
    else: return NeverMatch()
     
    Jeremy Bowers, Aug 20, 2004
    #3
  4. Eric Brunel wrote:

    > I also looked for a never-matching re just a few days ago and ended up
    > with "^(?!$)$". It's certainly not more "standard" than yours, but I
    > find it a wee tad more readable (for a regular expression, I mean...):


    I think e.g. r'\Zx' and r'x\A' are more readable. In particular the
    latter, but perhaps that causes Python to locate every 'x' in the string
    and then check if the string starts at the next character...

    --
    Hallvard
     
    Hallvard B Furuseth, Aug 22, 2004
    #4
  5. Des Small

    Greg Chapman Guest

    On 22 Aug 2004 20:07:51 +0200, Hallvard B Furuseth <>
    wrote:

    >Eric Brunel wrote:
    >
    >> I also looked for a never-matching re just a few days ago and ended up
    >> with "^(?!$)$". It's certainly not more "standard" than yours, but I
    >> find it a wee tad more readable (for a regular expression, I mean...):

    >
    >I think e.g. r'\Zx' and r'x\A' are more readable. In particular the
    >latter, but perhaps that causes Python to locate every 'x' in the string
    >and then check if the string starts at the next character...


    Why not just "(?!)": this always fails immediately (since an empty pattern
    matches any string, the negation of an empty pattern match always fails).

    ---
    Greg Chapman
     
    Greg Chapman, Aug 24, 2004
    #5
  6. Des Small

    Des Small Guest

    Greg Chapman <> writes:

    > On 22 Aug 2004 20:07:51 +0200, Hallvard B Furuseth <>
    > wrote:
    >
    > >Eric Brunel wrote:
    > >
    > >> I also looked for a never-matching re just a few days ago and
    > >> ended up with "^(?!$)$". It's certainly not more "standard" than
    > >> yours, but I find it a wee tad more readable (for a regular
    > >> expression, I mean...):

    > >
    > >I think e.g. r'\Zx' and r'x\A' are more readable. In particular
    > >the latter, but perhaps that causes Python to locate every 'x' in
    > >the string and then check if the string starts at the next
    > >character...

    >
    > Why not just "(?!)": this always fails immediately (since an empty
    > pattern matches any string, the negation of an empty pattern match
    > always fails).


    I think we have a winner!

    Des
    thanks all the persons who contributed, of course.
    --
    "[T]he structural trend in linguistics which took root with the
    International Congresses of the twenties and early thirties [...] had
    close and effective connections with phenomenology in its Husserlian
    and Hegelian versions." -- Roman Jakobson
     
    Des Small, Aug 24, 2004
    #6
  7. Greg Chapman wrote:
    > Why not just "(?!)": this always fails immediately (since an empty pattern
    > matches any string, the negation of an empty pattern match always fails).


    It's fine for re.match.

    'Why not?': Because I'd expect re.search to walk through the entire
    string and check if each position in the string matches that regexp.
    Unfortunately, a little timing shows that that happens with _every_
    regexp suggested so far. Long strings take longer for each of them.
    (Except Jeremy's solution, of course, which avoids the whole problem.)
    r'\A(?!)' or r'\Ax\A' didn't work either.

    Anyway, I note that r'x\A' beats all the other regexps suggested so far
    with a factor of 20 when searching 's'*10000.

    --
    Hallvard
     
    Hallvard B Furuseth, Aug 24, 2004
    #7
  8. Des Small

    Eric Brunel Guest

    Hallvard B Furuseth wrote:
    > Greg Chapman wrote:
    >
    >>Why not just "(?!)": this always fails immediately (since an empty pattern
    >>matches any string, the negation of an empty pattern match always fails).

    >
    >
    > It's fine for re.match.
    >
    > 'Why not?': Because I'd expect re.search to walk through the entire
    > string and check if each position in the string matches that regexp.
    > Unfortunately, a little timing shows that that happens with _every_
    > regexp suggested so far. Long strings take longer for each of them.
    > (Except Jeremy's solution, of course, which avoids the whole problem.)
    > r'\A(?!)' or r'\Ax\A' didn't work either.
    >
    > Anyway, I note that r'x\A' beats all the other regexps suggested so far
    > with a factor of 20 when searching 's'*10000.


    And when searching 'x'*10000? Since there is an 'x' in the re, it may change
    things a lot...
    --
    - Eric Brunel <eric (underscore) brunel (at) despammed (dot) com> -
    PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com
     
    Eric Brunel, Aug 24, 2004
    #8
  9. Eric Brunel wrote:
    >Hallvard B Furuseth wrote:
    >> Anyway, I note that r'x\A' beats all the other regexps suggested so far
    >> with a factor of 20 when searching 's'*10000.

    >
    > And when searching 'x'*10000? Since there is an 'x' in the re, it may change
    > things a lot...


    Heh. You are right: That's about almost as slow as the others. A bit
    slower than \Zx and \Ax\A, but still faster than the other alternatives.

    --
    Hallvard
     
    Hallvard B Furuseth, Aug 24, 2004
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,334
  2. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    862
    Alan Moore
    Dec 2, 2005
  3. Wenjie

    if (f() != FAIL) or if (FAIL != f())?

    Wenjie, Jul 28, 2003, in forum: C Programming
    Replies:
    3
    Views:
    465
    E. Robert Tisdale
    Jul 31, 2003
  4. joes
    Replies:
    2
    Views:
    1,033
    Daniel Pitts
    May 25, 2007
  5. Replies:
    12
    Views:
    634
Loading...

Share This Page