Construction of a non-regexable subset of the set of all strings

Discussion in 'Perl Misc' started by Dominic van der Zypen, Jun 23, 2006.

  1. Hello,

    I have a question which I am afraid might seem "overly scientific" (not
    to say nutty), but which I find is quite natural after all.

    If we look at the set S of all finite strings, then each regex
    represents some subset of S. For instance, h(a|e)ndel is the set
    consisting of handel and hendel; and the regex .+ represents the set
    of all non-empty strings not containing \n.

    I wondered whether there is a "non-regexable" subset N of S (i.e. a
    subset N for which there is no regex representing N) and came up with
    a (non-constructive) positive answer from set theory: there are
    uncountably many subsets of S, but there are only countably many
    regular expressions, so there are non-regexable subsets.

    What I would like to know is whether one can construct a subset N of S
    which is non-regexable (and whether it is hard to prove that N has the
    desired property).

    Many thanks in advance and best wishes, Dominic
     
    Dominic van der Zypen, Jun 23, 2006
    #1
    1. Advertising

  2. Dominic van der Zypen wrote:
    >
    > I have a question which I am afraid might seem "overly scientific" (not
    > to say nutty), but which I find is quite natural after all.
    >
    > If we look at the set S of all finite strings, then each regex
    > represents some subset of S. For instance, h(a|e)ndel is the set
    > consisting of handel and hendel; and the regex .+ represents the set
    > of all non-empty strings not containing \n.
    >
    > I wondered whether there is a "non-regexable" subset N of S (i.e. a
    > subset N for which there is no regex representing N) and came up with
    > a (non-constructive) positive answer from set theory: there are
    > uncountably many subsets of S, but there are only countably many
    > regular expressions, so there are non-regexable subsets.
    >
    > What I would like to know is whether one can construct a subset N of S
    > which is non-regexable (and whether it is hard to prove that N has the
    > desired property).


    Are you talking about regular expressions in general or only Perl's regular
    expressions? If this is a generic regular expression question you may want to
    ask in a group like comp.programming instead.


    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Jun 23, 2006
    #2
    1. Advertising

  3. Dominic van der Zypen

    David Squire Guest

    Glenn Jackman wrote:
    > At 2006-06-23 08:45AM, Dominic van der Zypen <> wrote:
    >> Hello,
    >>
    >> I have a question which I am afraid might seem "overly scientific" (not
    >> to say nutty), but which I find is quite natural after all.
    >>
    >> If we look at the set S of all finite strings, then each regex
    >> represents some subset of S. For instance, h(a|e)ndel is the set
    >> consisting of handel and hendel; and the regex .+ represents the set
    >> of all non-empty strings not containing \n.
    >>
    >> I wondered whether there is a "non-regexable" subset N of S (i.e. a
    >> subset N for which there is no regex representing N) and came up with
    >> a (non-constructive) positive answer from set theory: there are
    >> uncountably many subsets of S, but there are only countably many
    >> regular expressions, so there are non-regexable subsets.
    >>
    >> What I would like to know is whether one can construct a subset N of S
    >> which is non-regexable (and whether it is hard to prove that N has the
    >> desired property).

    >
    > It seems to me that, for every string x in S, one can construct at least
    > one regular expression to match it, i.e. ^x$
    >


    Yes, but the set of all subsets of S is not the same as the set of
    single strings in S (i.e. S).

    DS
     
    David Squire, Jun 23, 2006
    #3
  4. Dominic van der Zypen

    Ted Zlatanov Guest

    On 23 Jun 2006, wrote:

    > I wondered whether there is a "non-regexable" subset N of S (i.e. a
    > subset N for which there is no regex representing N) and came up with
    > a (non-constructive) positive answer from set theory: there are
    > uncountably many subsets of S, but there are only countably many
    > regular expressions, so there are non-regexable subsets.


    > What I would like to know is whether one can construct a subset N of S
    > which is non-regexable (and whether it is hard to prove that N has the
    > desired property).


    (S: set of all finite strings)

    I assume by "regexable" you mean "uniquely regexable", meaning that
    the regular expression would not match S-N.

    Any finite subset N of S would be regexable, since you can construct a
    regular expression that does an OR of all the finite strings in N.

    Some infinite subsets (e.g. all the strings that begin with "a") could
    be regexable, but generally you can't uniquely match a random infinite
    collection of strings with a regular expression, since there's nothing
    for the regular expression to "latch" onto. I'm sure there's a more
    rigorous analysis, perhaps using the Huffman distance, but I would say
    that if a set is infinite and constructed without a specific set of
    rules, it's not possible in finite time to *build* a regular
    expression to match only that set.

    Ted
     
    Ted Zlatanov, Jun 23, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    15
    Views:
    583
    Dirk Thierbach
    Mar 19, 2005
  2. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    767
    Malcolm
    Jun 24, 2006
  3. Ook
    Replies:
    10
    Views:
    559
  4. 2Barter08

    A Non-profit in Construction is here...

    2Barter08, Apr 19, 2008, in forum: Javascript
    Replies:
    0
    Views:
    78
    2Barter08
    Apr 19, 2008
  5. Replies:
    11
    Views:
    338
    J. Clarke
    Feb 17, 2014
Loading...

Share This Page