Regular expressions

Discussion in 'Python' started by Kill Bill, Oct 25, 2003.

  1. Kill Bill

    Kill Bill Guest

    I'm trying to find all combinations of the a string.
    I found that [blah]* gives it to me but it uses the same letter multiple
    times which I dont' want.
    Kill Bill, Oct 25, 2003
    #1
    1. Advertising

  2. Kill Bill

    Kill Bill Guest

    Also, if yipee is a variable, what happens when I do?

    re.search("[yipee]*", line):

    This will treat yipee as a string correct? I want it to treat yipee as a
    variable and perform the different combinations on whatever is inside the
    variable.


    "Kill Bill" <> wrote in message
    news:bneu7i$10u2ht$-berlin.de...
    > I'm trying to find all combinations of the a string.
    > I found that [blah]* gives it to me but it uses the same letter multiple
    > times which I dont' want.
    >
    >
    Kill Bill, Oct 26, 2003
    #2
    1. Advertising

  3. Kill Bill

    William Park Guest

    Kill Bill <> wrote:
    > Also, if yipee is a variable, what happens when I do?
    >
    > re.search("[yipee]*", line):
    >
    > This will treat yipee as a string correct? I want it to treat yipee as a
    > variable and perform the different combinations on whatever is inside the
    > variable.


    Use round bracket (ie. parenthesis) instead of square one, ie.
    (yipee)*


    > "Kill Bill" <> wrote in message
    > news:bneu7i$10u2ht$-berlin.de...
    >> I'm trying to find all combinations of the a string.
    >> I found that [blah]* gives it to me but it uses the same letter multiple
    >> times which I dont' want.


    --
    William Park, Open Geometry Consulting, <>
    Linux solution for data management and processing.
    William Park, Oct 26, 2003
    #3
  4. > re.search("[yipee]*", line):
    >
    > This will treat yipee as a string correct? I want it to treat yipee as a
    > variable and perform the different combinations on whatever is inside the
    > variable.

    You don't want square bracket. You want parenthesis. But this is not the
    only problem. Simply replacing the square brackets with parenthesis does
    not treat yipee as a variable. The

    "%s" % variable

    idiom will do this

    Also, it is not clear why the sample line ends with a colon. I took it off

    The "*" will match on no occurrences and I suspect you want to match on one
    or more. I replaced "*" with "+".

    I think you want:

    re.search("(%s)+" % yipee, line)

    --

    Dennis Reinhardt

    http://www.spamai.com?ng_python
    Dennis Reinhardt, Oct 26, 2003
    #4
  5. Kill Bill wrote:

    > Also, if yipee is a variable, what happens when I do?
    >
    > re.search("[yipee]*", line):
    >
    > This will treat yipee as a string correct?


    Yes, a string literal is a string literal is a string literal.

    > I want it to treat yipee
    > as a
    > variable and perform the different combinations on whatever is inside
    > the
    > variable.


    Then you want "[" + yipee + "]*" or "[%s]*" % yipee.

    --
    Erik Max Francis && && http://www.alcyone.com/max/
    __ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
    / \ Defeat is a school in which truth always grows strong.
    \__/ Henry Ward Beecher
    Erik Max Francis, Oct 26, 2003
    #5
  6. Kill Bill

    Kill Bill Guest

    That's not what I want. If I have yipee as "kdasfjh", then I want to
    compare to all combinations, so that there will be a match when it compares
    it with say when line is "fas" because line has all those letters in yipee.
    But "fass" will not match because yipee does not have two "s"'s.

    I'm new to python but I'm finding these regular expressions quite confusing.


    "Dennis Reinhardt" <> wrote in message
    news:5fEmb.3338$...
    > > re.search("[yipee]*", line):
    > >
    > > This will treat yipee as a string correct? I want it to treat yipee as

    a
    > > variable and perform the different combinations on whatever is inside

    the
    > > variable.

    > You don't want square bracket. You want parenthesis. But this is not the
    > only problem. Simply replacing the square brackets with parenthesis does
    > not treat yipee as a variable. The
    >
    > "%s" % variable
    >
    > idiom will do this
    >
    > Also, it is not clear why the sample line ends with a colon. I took it

    off
    >
    > The "*" will match on no occurrences and I suspect you want to match on

    one
    > or more. I replaced "*" with "+".
    >
    > I think you want:
    >
    > re.search("(%s)+" % yipee, line)
    >
    > --
    >
    > Dennis Reinhardt
    >
    > http://www.spamai.com?ng_python
    >
    >
    Kill Bill, Oct 26, 2003
    #6
  7. Kill Bill

    Gary Herron Guest

    On Saturday 25 October 2003 03:41 pm, Kill Bill wrote:
    > I'm trying to find all combinations of the a string.
    > I found that [blah]* gives it to me but it uses the same letter multiple
    > times which I dont' want.


    Several people have answered (correctly) one of your questions, that
    being how to get the contents of a variable into a string.

    However, I think your other question remains unanswered, perhaps it is
    not very well worded. By saying you don't want it to not "uses the
    same letter multiple times", I guess your asking about permutations of
    a given string. For instance the permutations of "abc" are

    abc
    acb
    bca
    bac
    cab
    cba

    and not things like

    aac

    Is this correct?

    If so, you are a bit out of luck. I don't think regular expressions
    can do this in any straightforward way. (However as you say regular
    expressions are complex, so I won't claim that this is not possible.)

    Perhaps you would be satisfied with something like this:

    "abc|acb|bca|bac|cab|cba"

    and if you were clever enough to build a list of all permutations of a
    given string

    listOfPermutations = Permutations("abc") # e.g., ['abc', 'acb', ...]

    then the regular expression could be gotten by

    '|'.join(listOfPermutations) # e.g., "abc|acb|..."


    Hope that helps,
    Gary Herron
    Gary Herron, Oct 26, 2003
    #7
  8. Quoting Kill Bill ():
    > That's not what I want. If I have yipee as "kdasfjh", then I want
    > to compare to all combinations, so that there will be a match when
    > it compares it with say when line is "fas" because line has all
    > those letters in yipee. But "fass" will not match because yipee does
    > not have two "s"'s.


    The easiest way to do this with regular expressions is to do it with
    several regular expressions rather than a single regular expression.
    This still isn't easy.

    To employ this solution, you would need to generate a regular
    expression for each letter in your target. For each letter, the
    pattern would look something like:

    [<all other letters>]*<this letter>[<all other letters>]*

    Then you would match against the full suite of patterns. This gets
    MORE complicated when your target has two of the same letter. Ouch.

    My advice would be to not use regular expressions. The pattern you're
    looking for can be pretty easily expressed in the following bit of
    code:

    def is_a_permutation(check, yipee=pattern):
    list_of_letters = list(yipee)
    for letter in check:
    if letter in list_of_letters:
    list_of_letters.remove(letter)
    else:
    return False
    if list_of_letters == []:
    return True
    else:
    return False

    It has the added advantage of being pretty clear.

    Luck,
    --G.

    --
    Geoff Gerrietts <geoff at gerrietts dot net> http://www.gerrietts.net/
    "Politics, as a practice, whatever its professions, has always been the
    systematic organization of hatreds." --Henry Adams
    Geoff Gerrietts, Oct 26, 2003
    #8
  9. Kill Bill

    Peter Otten Guest

    Kill Bill wrote:

    > That's not what I want. If I have yipee as "kdasfjh", then I want to
    > compare to all combinations, so that there will be a match when it
    > compares it with say when line is "fas" because line has all those letters
    > in yipee. But "fass" will not match because yipee does not have two "s"'s.
    >
    > I'm new to python but I'm finding these regular expressions quite
    > confusing.


    Regexes predate Python, so don't blame it on the snake :)

    Here's an approach that does not use regular expressions. Perhaps you can
    extend it to something useful:

    class Matcher:
    def __init__(self, s):
    self.search = list(s)
    self.search.sort()
    def __call__(self, s):
    if len(s) != len(self.search):
    return False
    t = list(s)
    t.sort()
    return t == self.search

    m = Matcher("peter")

    for s in "peter retep treep preet".split():
    assert m(s)
    print s, "->", m(s)

    for s in "netto pete trppe Preet".split():
    assert not m(s)
    print s, "->", m(s)

    However, it has the overhead of string to list conversion.
    Another drawback is that you have to split your input into chunks before
    feeding it to the matcher. Depending on your problem these might overlap
    and make the above simplistic matcher very inefficient.

    But if I'm reading you correctly you always want to match the complete line:

    m = Matcher("whatever")
    for line in file("source"):
    if m(line.strip()):
    print line.strip()


    Peter
    Peter Otten, Oct 26, 2003
    #9
  10. > That's not what I want. If I have yipee as "kdasfjh", then I want to
    > compare to all combinations, so that there will be a match when it

    compares
    > it with say when line is "fas" because line has all those letters in

    yipee.
    > But "fass" will not match because yipee does not have two "s"'s.


    Were you to alphabetize both your regex and line, then the question is can
    "afs" or "afss" match the pattern "adfhjks". By manipulating the lines
    strings "a.*f.*s" or "a.*f.*s.*s", you will see that the first line
    instance matches the regex and the second does not.

    Notice that the regex must be a string (not compiled) *and* this trick runs
    the regex backwards, matching the line (with embedded ".*") against the
    pattern and not the normal direction. There may be more straightforward
    ways to do this but this is the regex solution which occurs to me.

    On second thought, the regex "\A(a|)(d|)(f|)(h|)(j|)(k|)(s|)\Z" will match
    "fas" but not "fass".

    --
    Dennis Reinhardt
    http://www.spamai.com?ng_python
    Dennis Reinhardt, Oct 28, 2003
    #10
  11. Kill Bill

    David C. Fox Guest

    Dennis Reinhardt wrote:

    >>That's not what I want. If I have yipee as "kdasfjh", then I want to
    >>compare to all combinations, so that there will be a match when it

    >
    > compares
    >
    >>it with say when line is "fas" because line has all those letters in

    >
    > yipee.
    >
    >>But "fass" will not match because yipee does not have two "s"'s.

    >
    >
    > Were you to alphabetize both your regex and line, then the question is can
    > "afs" or "afss" match the pattern "adfhjks". By manipulating the lines
    > strings "a.*f.*s" or "a.*f.*s.*s", you will see that the first line
    > instance matches the regex and the second does not.
    >
    > Notice that the regex must be a string (not compiled) *and* this trick runs
    > the regex backwards, matching the line (with embedded ".*") against the
    > pattern and not the normal direction. There may be more straightforward
    > ways to do this but this is the regex solution which occurs to me.
    >
    > On second thought, the regex "\A(a|)(d|)(f|)(h|)(j|)(k|)(s|)\Z" will match
    > "fas" but not "fass".


    Dennis is correct: alphabetizing both the pattern and the target strings
    is way to do this. I would use a slightly different regex, though,
    constructed as follows:

    import re

    def char_counts(s):
    """returns a dictionary indicating how many times a given character
    appears in the string s
    """
    d = {}
    for char in s:
    d[char] = d.get(char, 0) + 1 # current count, or zero, plus one
    return d

    def char_subset(s):
    """given a string s, returns a regular expression which matches a
    sorted character string containing a subset of the same characters
    (and with no more occurences of each character than in the original
    string)
    """
    counts = char_counts(s)
    l = []
    chars = counts.keys()
    chars.sort()
    for char in chars:
    r = '%s{0,%d}' % (char, counts[char])
    l.append(r)
    # regex fragment matching count or fewer occurances os that many
    l.append('$') # make sure that we match against the full string
    return ''.join(l)

    def sorted_string(s):
    """given a string s, return a new one which all the characters
    sorted
    """
    l = list(s)
    l.sort()
    return ''.join(l)

    Then, you can compare a given target string, t, against the string in
    yipee with:

    r = char_subset(yipee)
    t_sorted = sorted_string(t)
    if re.match(r, t_sorted):
    print 'found a match: %s' %t


    David
    David C. Fox, Oct 28, 2003
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jay Douglas

    Custom Regular Expressions in ASP.net

    Jay Douglas, Nov 2, 2003, in forum: ASP .Net
    Replies:
    3
    Views:
    605
    mikeb
    Nov 3, 2003
  2. mark

    Regular expressions

    mark, Jun 30, 2003, in forum: Perl
    Replies:
    4
    Views:
    1,718
  3. Dustin D.
    Replies:
    1
    Views:
    11,180
  4. Jay Douglas
    Replies:
    0
    Views:
    598
    Jay Douglas
    Aug 15, 2003
  5. Noman Shapiro
    Replies:
    0
    Views:
    232
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page