Get all strings matching given RegExp

Discussion in 'Python' started by Alex9968, Apr 3, 2008.

  1. Alex9968

    Alex9968 Guest

    Can I get sequence of all strings that can match a given regular
    expression? For example, for expression '(a|b)|(x|y)' it would be ['ax',
    'ay', 'bx', 'by']

    It would be useful for example to pass these strings to a search engine
    not supporting RegExp (therefore adding such support to it). A program
    can also let user specify sequence of strings using RegExp (filenames to
    process, etc.). If there are other types of expressions for these
    purposes, please let me know.

    I know that for some expressions there would be infinite amount of
    matching strings, but these aren't the cases I'm considering. It'd still
    be possible if string length is limited (there might be large but finite
    number of matching strings).

    Thanks
     
    Alex9968, Apr 3, 2008
    #1
    1. Advertising

  2. Alex9968 <> wrote:
    > Can I get sequence of all strings that can match a given regular
    > expression? For example, for expression '(a|b)|(x|y)' it would be ['ax',
    > 'ay', 'bx', 'by']
    >
    > It would be useful for example to pass these strings to a search engine
    > not supporting RegExp (therefore adding such support to it). A program
    > can also let user specify sequence of strings using RegExp (filenames to
    > process, etc.). If there are other types of expressions for these
    > purposes, please let me know.
    >
    > I know that for some expressions there would be infinite amount of
    > matching strings, but these aren't the cases I'm considering. It'd still
    > be possible if string length is limited (there might be large but finite
    > number of matching strings).


    This will give you all (byte-)strings upto a given length which match a
    given regular expression. But beware, it can be slow ;)

    import re

    all_chars = [chr(i) for i in xrange(256)]

    def gen_strings(length, alphabet=all_chars):
    if length == 1:
    for c in alphabet:
    yield c
    else:
    for i in alphabet:
    yield c
    for s in gen_strings(length - 1, alphabet):
    yield c + s

    def regex_matches(regex, max_length, alphabet=all_chars):
    r = re.compile('(' + regex + r')\Z')
    return (s for s in gen_strings(max_length, alphabet) if r.match(s))

    Marc
     
    Marc Christiansen, Apr 3, 2008
    #2
    1. Advertising

  3. Alex9968

    Jeff Guest

    I don't think there is any built in way. Regular expressions are
    compiled into an expanded pattern internally, but I don't think that
    it is anything that would be useful for you to directly access.

    If you are interested in a lot of work, you could do something with
    PLY and write an re parser that would expand it into a series of
    possible textual matches :)
     
    Jeff, Apr 3, 2008
    #3
  4. Alex9968

    Paul McGuire Guest

    On Apr 3, 4:50 am, Alex9968 <> wrote:
    > Can I get sequence of all strings that can match a given regular
    > expression? For example, for expression '(a|b)|(x|y)' it would be ['ax',
    > 'ay', 'bx', 'by']
    >


    Actually, this regex will only match 'a', 'b', 'x', or 'y' (assuming
    you meant to bracket it with leading '^' and trailing '$'). To get
    the set you listed, you would need to invert '(a|b)(x|y)'.

    The thread that Gabriel Genellina referenced includes a link to a
    pyparsing regex inverter (http://pyparsing-public.wikispaces.com/space/
    showimage/invRegex.py), which I used to test your regex - I've added
    both regexes in the test cases of that script. Some of the other test
    cases include regexes for all US time zones, and all chemical symbols.

    -- Paul
     
    Paul McGuire, Apr 3, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. chiara
    Replies:
    6
    Views:
    478
    Barry Schwarz
    Oct 6, 2005
  2. anonym
    Replies:
    1
    Views:
    1,023
    Knute Johnson
    Jan 15, 2009
  3. Karin Lagesen

    matching strings in a large set of strings

    Karin Lagesen, Apr 29, 2010, in forum: Python
    Replies:
    13
    Views:
    460
    Bryan
    May 3, 2010
  4. Helmut Jarausch
    Replies:
    3
    Views:
    327
    Dave Angel
    Apr 30, 2010
  5. Joao Silva
    Replies:
    16
    Views:
    366
    7stud --
    Aug 21, 2009
Loading...

Share This Page