Regular expressions

K

Kill Bill

I'm trying to find all combinations of the a string.
I found that [blah]* gives it to me but it uses the same letter multiple
times which I dont' want.
 
K

Kill Bill

Also, if yipee is a variable, what happens when I do?

re.search("[yipee]*", line):

This will treat yipee as a string correct? I want it to treat yipee as a
variable and perform the different combinations on whatever is inside the
variable.
 
W

William Park

Kill Bill said:
Also, if yipee is a variable, what happens when I do?

re.search("[yipee]*", line):

This will treat yipee as a string correct? I want it to treat yipee as a
variable and perform the different combinations on whatever is inside the
variable.

Use round bracket (ie. parenthesis) instead of square one, ie.
(yipee)*

Kill Bill said:
I'm trying to find all combinations of the a string.
I found that [blah]* gives it to me but it uses the same letter multiple
times which I dont' want.
 
D

Dennis Reinhardt

re.search("[yipee]*", line):
This will treat yipee as a string correct? I want it to treat yipee as a
variable and perform the different combinations on whatever is inside the
variable.
You don't want square bracket. You want parenthesis. But this is not the
only problem. Simply replacing the square brackets with parenthesis does
not treat yipee as a variable. The

"%s" % variable

idiom will do this

Also, it is not clear why the sample line ends with a colon. I took it off

The "*" will match on no occurrences and I suspect you want to match on one
or more. I replaced "*" with "+".

I think you want:

re.search("(%s)+" % yipee, line)
 
E

Erik Max Francis

Kill said:
Also, if yipee is a variable, what happens when I do?

re.search("[yipee]*", line):

This will treat yipee as a string correct?

Yes, a string literal is a string literal is a string literal.
I want it to treat yipee
as a
variable and perform the different combinations on whatever is inside
the
variable.

Then you want "[" + yipee + "]*" or "[%s]*" % yipee.
 
K

Kill Bill

That's not what I want. If I have yipee as "kdasfjh", then I want to
compare to all combinations, so that there will be a match when it compares
it with say when line is "fas" because line has all those letters in yipee.
But "fass" will not match because yipee does not have two "s"'s.

I'm new to python but I'm finding these regular expressions quite confusing.


Dennis Reinhardt said:
re.search("[yipee]*", line):

This will treat yipee as a string correct? I want it to treat yipee as a
variable and perform the different combinations on whatever is inside the
variable.
You don't want square bracket. You want parenthesis. But this is not the
only problem. Simply replacing the square brackets with parenthesis does
not treat yipee as a variable. The

"%s" % variable

idiom will do this

Also, it is not clear why the sample line ends with a colon. I took it off

The "*" will match on no occurrences and I suspect you want to match on one
or more. I replaced "*" with "+".

I think you want:

re.search("(%s)+" % yipee, line)

--

Dennis Reinhardt

(e-mail address removed) http://www.spamai.com?ng_python
 
G

Gary Herron

I'm trying to find all combinations of the a string.
I found that [blah]* gives it to me but it uses the same letter multiple
times which I dont' want.

Several people have answered (correctly) one of your questions, that
being how to get the contents of a variable into a string.

However, I think your other question remains unanswered, perhaps it is
not very well worded. By saying you don't want it to not "uses the
same letter multiple times", I guess your asking about permutations of
a given string. For instance the permutations of "abc" are

abc
acb
bca
bac
cab
cba

and not things like

aac

Is this correct?

If so, you are a bit out of luck. I don't think regular expressions
can do this in any straightforward way. (However as you say regular
expressions are complex, so I won't claim that this is not possible.)

Perhaps you would be satisfied with something like this:

"abc|acb|bca|bac|cab|cba"

and if you were clever enough to build a list of all permutations of a
given string

listOfPermutations = Permutations("abc") # e.g., ['abc', 'acb', ...]

then the regular expression could be gotten by

'|'.join(listOfPermutations) # e.g., "abc|acb|..."


Hope that helps,
Gary Herron
 
G

Geoff Gerrietts

Quoting Kill Bill ([email protected]):
That's not what I want. If I have yipee as "kdasfjh", then I want
to compare to all combinations, so that there will be a match when
it compares it with say when line is "fas" because line has all
those letters in yipee. But "fass" will not match because yipee does
not have two "s"'s.

The easiest way to do this with regular expressions is to do it with
several regular expressions rather than a single regular expression.
This still isn't easy.

To employ this solution, you would need to generate a regular
expression for each letter in your target. For each letter, the
pattern would look something like:

[<all other letters>]*<this letter>[<all other letters>]*

Then you would match against the full suite of patterns. This gets
MORE complicated when your target has two of the same letter. Ouch.

My advice would be to not use regular expressions. The pattern you're
looking for can be pretty easily expressed in the following bit of
code:

def is_a_permutation(check, yipee=pattern):
list_of_letters = list(yipee)
for letter in check:
if letter in list_of_letters:
list_of_letters.remove(letter)
else:
return False
if list_of_letters == []:
return True
else:
return False

It has the added advantage of being pretty clear.

Luck,
--G.
 
P

Peter Otten

Kill said:
That's not what I want. If I have yipee as "kdasfjh", then I want to
compare to all combinations, so that there will be a match when it
compares it with say when line is "fas" because line has all those letters
in yipee. But "fass" will not match because yipee does not have two "s"'s.

I'm new to python but I'm finding these regular expressions quite
confusing.

Regexes predate Python, so don't blame it on the snake :)

Here's an approach that does not use regular expressions. Perhaps you can
extend it to something useful:

class Matcher:
def __init__(self, s):
self.search = list(s)
self.search.sort()
def __call__(self, s):
if len(s) != len(self.search):
return False
t = list(s)
t.sort()
return t == self.search

m = Matcher("peter")

for s in "peter retep treep preet".split():
assert m(s)
print s, "->", m(s)

for s in "netto pete trppe Preet".split():
assert not m(s)
print s, "->", m(s)

However, it has the overhead of string to list conversion.
Another drawback is that you have to split your input into chunks before
feeding it to the matcher. Depending on your problem these might overlap
and make the above simplistic matcher very inefficient.

But if I'm reading you correctly you always want to match the complete line:

m = Matcher("whatever")
for line in file("source"):
if m(line.strip()):
print line.strip()


Peter
 
D

Dennis Reinhardt

That's not what I want. If I have yipee as "kdasfjh", then I want to
compare to all combinations, so that there will be a match when it compares
it with say when line is "fas" because line has all those letters in yipee.
But "fass" will not match because yipee does not have two "s"'s.

Were you to alphabetize both your regex and line, then the question is can
"afs" or "afss" match the pattern "adfhjks". By manipulating the lines
strings "a.*f.*s" or "a.*f.*s.*s", you will see that the first line
instance matches the regex and the second does not.

Notice that the regex must be a string (not compiled) *and* this trick runs
the regex backwards, matching the line (with embedded ".*") against the
pattern and not the normal direction. There may be more straightforward
ways to do this but this is the regex solution which occurs to me.

On second thought, the regex "\A(a|)(d|)(f|)(h|)(j|)(k|)(s|)\Z" will match
"fas" but not "fass".
 
D

David C. Fox

Dennis said:
Were you to alphabetize both your regex and line, then the question is can
"afs" or "afss" match the pattern "adfhjks". By manipulating the lines
strings "a.*f.*s" or "a.*f.*s.*s", you will see that the first line
instance matches the regex and the second does not.

Notice that the regex must be a string (not compiled) *and* this trick runs
the regex backwards, matching the line (with embedded ".*") against the
pattern and not the normal direction. There may be more straightforward
ways to do this but this is the regex solution which occurs to me.

On second thought, the regex "\A(a|)(d|)(f|)(h|)(j|)(k|)(s|)\Z" will match
"fas" but not "fass".

Dennis is correct: alphabetizing both the pattern and the target strings
is way to do this. I would use a slightly different regex, though,
constructed as follows:

import re

def char_counts(s):
"""returns a dictionary indicating how many times a given character
appears in the string s
"""
d = {}
for char in s:
d[char] = d.get(char, 0) + 1 # current count, or zero, plus one
return d

def char_subset(s):
"""given a string s, returns a regular expression which matches a
sorted character string containing a subset of the same characters
(and with no more occurences of each character than in the original
string)
"""
counts = char_counts(s)
l = []
chars = counts.keys()
chars.sort()
for char in chars:
r = '%s{0,%d}' % (char, counts[char])
l.append(r)
# regex fragment matching count or fewer occurances os that many
l.append('$') # make sure that we match against the full string
return ''.join(l)

def sorted_string(s):
"""given a string s, return a new one which all the characters
sorted
"""
l = list(s)
l.sort()
return ''.join(l)

Then, you can compare a given target string, t, against the string in
yipee with:

r = char_subset(yipee)
t_sorted = sorted_string(t)
if re.match(r, t_sorted):
print 'found a match: %s' %t


David
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top