Dennis said:
Were you to alphabetize both your regex and line, then the question is can
"afs" or "afss" match the pattern "adfhjks". By manipulating the lines
strings "a.*f.*s" or "a.*f.*s.*s", you will see that the first line
instance matches the regex and the second does not.
Notice that the regex must be a string (not compiled) *and* this trick runs
the regex backwards, matching the line (with embedded ".*") against the
pattern and not the normal direction. There may be more straightforward
ways to do this but this is the regex solution which occurs to me.
On second thought, the regex "\A(a|)(d|)(f|)(h|)(j|)(k|)(s|)\Z" will match
"fas" but not "fass".
Dennis is correct: alphabetizing both the pattern and the target strings
is way to do this. I would use a slightly different regex, though,
constructed as follows:
import re
def char_counts(s):
"""returns a dictionary indicating how many times a given character
appears in the string s
"""
d = {}
for char in s:
d[char] = d.get(char, 0) + 1 # current count, or zero, plus one
return d
def char_subset(s):
"""given a string s, returns a regular expression which matches a
sorted character string containing a subset of the same characters
(and with no more occurences of each character than in the original
string)
"""
counts = char_counts(s)
l = []
chars = counts.keys()
chars.sort()
for char in chars:
r = '%s{0,%d}' % (char, counts[char])
l.append(r)
# regex fragment matching count or fewer occurances os that many
l.append('$') # make sure that we match against the full string
return ''.join(l)
def sorted_string(s):
"""given a string s, return a new one which all the characters
sorted
"""
l = list(s)
l.sort()
return ''.join(l)
Then, you can compare a given target string, t, against the string in
yipee with:
r = char_subset(yipee)
t_sorted = sorted_string(t)
if re.match(r, t_sorted):
print 'found a match: %s' %t
David