Construction of a non-regexable subset of the set of all strings

  • Thread starter Dominic van der Zypen
  • Start date
D

Dominic van der Zypen

Hello,

I have a question which I am afraid might seem "overly scientific" (not
to say nutty), but which I find is quite natural after all.

If we look at the set S of all finite strings, then each regex
represents some subset of S. For instance, h(a|e)ndel is the set
consisting of handel and hendel; and the regex .+ represents the set
of all non-empty strings not containing \n.

I wondered whether there is a "non-regexable" subset N of S (i.e. a
subset N for which there is no regex representing N) and came up with
a (non-constructive) positive answer from set theory: there are
uncountably many subsets of S, but there are only countably many
regular expressions, so there are non-regexable subsets.

What I would like to know is whether one can construct a subset N of S
which is non-regexable (and whether it is hard to prove that N has the
desired property).

Many thanks in advance and best wishes, Dominic
 
J

John W. Krahn

Dominic said:
I have a question which I am afraid might seem "overly scientific" (not
to say nutty), but which I find is quite natural after all.

If we look at the set S of all finite strings, then each regex
represents some subset of S. For instance, h(a|e)ndel is the set
consisting of handel and hendel; and the regex .+ represents the set
of all non-empty strings not containing \n.

I wondered whether there is a "non-regexable" subset N of S (i.e. a
subset N for which there is no regex representing N) and came up with
a (non-constructive) positive answer from set theory: there are
uncountably many subsets of S, but there are only countably many
regular expressions, so there are non-regexable subsets.

What I would like to know is whether one can construct a subset N of S
which is non-regexable (and whether it is hard to prove that N has the
desired property).

Are you talking about regular expressions in general or only Perl's regular
expressions? If this is a generic regular expression question you may want to
ask in a group like comp.programming instead.


John
 
D

David Squire

Glenn said:
It seems to me that, for every string x in S, one can construct at least
one regular expression to match it, i.e. ^x$

Yes, but the set of all subsets of S is not the same as the set of
single strings in S (i.e. S).

DS
 
T

Ted Zlatanov

I wondered whether there is a "non-regexable" subset N of S (i.e. a
subset N for which there is no regex representing N) and came up with
a (non-constructive) positive answer from set theory: there are
uncountably many subsets of S, but there are only countably many
regular expressions, so there are non-regexable subsets.
What I would like to know is whether one can construct a subset N of S
which is non-regexable (and whether it is hard to prove that N has the
desired property).

(S: set of all finite strings)

I assume by "regexable" you mean "uniquely regexable", meaning that
the regular expression would not match S-N.

Any finite subset N of S would be regexable, since you can construct a
regular expression that does an OR of all the finite strings in N.

Some infinite subsets (e.g. all the strings that begin with "a") could
be regexable, but generally you can't uniquely match a random infinite
collection of strings with a regular expression, since there's nothing
for the regular expression to "latch" onto. I'm sure there's a more
rigorous analysis, perhaps using the Huffman distance, but I would say
that if a set is infinite and constructed without a specific set of
rules, it's not possible in finite time to *build* a regular
expression to match only that set.

Ted
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top