Regular expressions

Kill Bill · Oct 25, 2003

I'm trying to find all combinations of the a string.
I found that [blah]* gives it to me but it uses the same letter multiple
times which I dont' want.

Kill Bill · Oct 26, 2003

Also, if yipee is a variable, what happens when I do?

re.search("[yipee]*", line):

This will treat yipee as a string correct? I want it to treat yipee as a
variable and perform the different combinations on whatever is inside the
variable.

William Park · Oct 26, 2003

Kill Bill said:
Also, if yipee is a variable, what happens when I do?

re.search("[yipee]*", line):

This will treat yipee as a string correct? I want it to treat yipee as a
variable and perform the different combinations on whatever is inside the
variable.

Use round bracket (ie. parenthesis) instead of square one, ie.
(yipee)*

Kill Bill said:
Kill Bill said:

I'm trying to find all combinations of the a string.
I found that [blah]* gives it to me but it uses the same letter multiple
times which I dont' want.

Click to expand...

Dennis Reinhardt · Oct 26, 2003

re.search("[yipee]*", line):

This will treat yipee as a string correct? I want it to treat yipee as a
variable and perform the different combinations on whatever is inside the
variable.

You don't want square bracket. You want parenthesis. But this is not the
only problem. Simply replacing the square brackets with parenthesis does
not treat yipee as a variable. The

"%s" % variable

idiom will do this

Also, it is not clear why the sample line ends with a colon. I took it off

The "*" will match on no occurrences and I suspect you want to match on one
or more. I replaced "*" with "+".

I think you want:

re.search("(%s)+" % yipee, line)

Erik Max Francis · Oct 26, 2003

Kill said:
Also, if yipee is a variable, what happens when I do?

re.search("[yipee]*", line):

This will treat yipee as a string correct?

Yes, a string literal is a string literal is a string literal.

I want it to treat yipee
as a
variable and perform the different combinations on whatever is inside
the
variable.

Then you want "[" + yipee + "]*" or "[%s]*" % yipee.

Kill Bill · Oct 26, 2003

That's not what I want. If I have yipee as "kdasfjh", then I want to
compare to all combinations, so that there will be a match when it compares
it with say when line is "fas" because line has all those letters in yipee.
But "fass" will not match because yipee does not have two "s"'s.

I'm new to python but I'm finding these regular expressions quite confusing.

Dennis Reinhardt said:
re.search("[yipee]*", line):

This will treat yipee as a string correct? I want it to treat yipee as a
variable and perform the different combinations on whatever is inside the
variable.

Click to expand...

You don't want square bracket. You want parenthesis. But this is not the
only problem. Simply replacing the square brackets with parenthesis does
not treat yipee as a variable. The

"%s" % variable

idiom will do this

Also, it is not clear why the sample line ends with a colon. I took it off

The "*" will match on no occurrences and I suspect you want to match on one
or more. I replaced "*" with "+".

I think you want:

re.search("(%s)+" % yipee, line)

--

Dennis Reinhardt

(e-mail address removed) http://www.spamai.com?ng_python

Gary Herron · Oct 26, 2003

I'm trying to find all combinations of the a string.
I found that [blah]* gives it to me but it uses the same letter multiple
times which I dont' want.

Several people have answered (correctly) one of your questions, that
being how to get the contents of a variable into a string.

However, I think your other question remains unanswered, perhaps it is
not very well worded. By saying you don't want it to not "uses the
same letter multiple times", I guess your asking about permutations of
a given string. For instance the permutations of "abc" are

abc
acb
bca
bac
cab
cba

and not things like

aac

Is this correct?

If so, you are a bit out of luck. I don't think regular expressions
can do this in any straightforward way. (However as you say regular
expressions are complex, so I won't claim that this is not possible.)

Perhaps you would be satisfied with something like this:

"abc|acb|bca|bac|cab|cba"

and if you were clever enough to build a list of all permutations of a
given string

listOfPermutations = Permutations("abc") # e.g., ['abc', 'acb', ...]

then the regular expression could be gotten by

'|'.join(listOfPermutations) # e.g., "abc|acb|..."

Hope that helps,
Gary Herron

Geoff Gerrietts · Oct 26, 2003

Quoting Kill Bill ([email protected]):

That's not what I want. If I have yipee as "kdasfjh", then I want
to compare to all combinations, so that there will be a match when
it compares it with say when line is "fas" because line has all
those letters in yipee. But "fass" will not match because yipee does
not have two "s"'s.

The easiest way to do this with regular expressions is to do it with
several regular expressions rather than a single regular expression.
This still isn't easy.

To employ this solution, you would need to generate a regular
expression for each letter in your target. For each letter, the
pattern would look something like:

[<all other letters>]*<this letter>[<all other letters>]*

Then you would match against the full suite of patterns. This gets
MORE complicated when your target has two of the same letter. Ouch.

My advice would be to not use regular expressions. The pattern you're
looking for can be pretty easily expressed in the following bit of
code:

def is_a_permutation(check, yipee=pattern):
list_of_letters = list(yipee)
for letter in check:
if letter in list_of_letters:
list_of_letters.remove(letter)
else:
return False
if list_of_letters == []:
return True
else:
return False

It has the added advantage of being pretty clear.

Luck,
--G.

Peter Otten · Oct 26, 2003

Kill said:
That's not what I want. If I have yipee as "kdasfjh", then I want to
compare to all combinations, so that there will be a match when it
compares it with say when line is "fas" because line has all those letters
in yipee. But "fass" will not match because yipee does not have two "s"'s.

I'm new to python but I'm finding these regular expressions quite
confusing.

Regexes predate Python, so don't blame it on the snake

Here's an approach that does not use regular expressions. Perhaps you can
extend it to something useful:

class Matcher:
def __init__(self, s):
self.search = list(s)
self.search.sort()
def __call__(self, s):
if len(s) != len(self.search):
return False
t = list(s)
t.sort()
return t == self.search

m = Matcher("peter")

for s in "peter retep treep preet".split():
assert m(s)
print s, "->", m(s)

for s in "netto pete trppe Preet".split():
assert not m(s)
print s, "->", m(s)

However, it has the overhead of string to list conversion.
Another drawback is that you have to split your input into chunks before
feeding it to the matcher. Depending on your problem these might overlap
and make the above simplistic matcher very inefficient.

But if I'm reading you correctly you always want to match the complete line:

m = Matcher("whatever")
for line in file("source"):
if m(line.strip()):
print line.strip()

Peter

Dennis Reinhardt · Oct 28, 2003

That's not what I want. If I have yipee as "kdasfjh", then I want to

compare to all combinations, so that there will be a match when it compares
it with say when line is "fas" because line has all those letters in yipee.
But "fass" will not match because yipee does not have two "s"'s.

Were you to alphabetize both your regex and line, then the question is can
"afs" or "afss" match the pattern "adfhjks". By manipulating the lines
strings "a.*f.*s" or "a.*f.*s.*s", you will see that the first line
instance matches the regex and the second does not.

Notice that the regex must be a string (not compiled) *and* this trick runs
the regex backwards, matching the line (with embedded ".*") against the
pattern and not the normal direction. There may be more straightforward
ways to do this but this is the regex solution which occurs to me.

On second thought, the regex "\A(a|)(d|)(f|)(h|)(j|)(k|)(s|)\Z" will match
"fas" but not "fass".

David C. Fox · Oct 28, 2003

Dennis said:
Were you to alphabetize both your regex and line, then the question is can
"afs" or "afss" match the pattern "adfhjks". By manipulating the lines
strings "a.*f.*s" or "a.*f.*s.*s", you will see that the first line
instance matches the regex and the second does not.

Notice that the regex must be a string (not compiled) *and* this trick runs
the regex backwards, matching the line (with embedded ".*") against the
pattern and not the normal direction. There may be more straightforward
ways to do this but this is the regex solution which occurs to me.

On second thought, the regex "\A(a|)(d|)(f|)(h|)(j|)(k|)(s|)\Z" will match
"fas" but not "fass".

Dennis is correct: alphabetizing both the pattern and the target strings
is way to do this. I would use a slightly different regex, though,
constructed as follows:

import re

def char_counts(s):
"""returns a dictionary indicating how many times a given character
appears in the string s
"""
d = {}
for char in s:
d[char] = d.get(char, 0) + 1 # current count, or zero, plus one
return d

def char_subset(s):
"""given a string s, returns a regular expression which matches a
sorted character string containing a subset of the same characters
(and with no more occurences of each character than in the original
string)
"""
counts = char_counts(s)
l = []
chars = counts.keys()
chars.sort()
for char in chars:
r = '%s{0,%d}' % (char, counts[char])
l.append(r)
# regex fragment matching count or fewer occurances os that many
l.append('$') # make sure that we match against the full string
return ''.join(l)

def sorted_string(s):
"""given a string s, return a new one which all the characters
sorted
"""
l = list(s)
l.sort()
return ''.join(l)

Then, you can compare a given target string, t, against the string in
yipee with:

r = char_subset(yipee)
t_sorted = sorted_string(t)
if re.match(r, t_sorted):
print 'found a match: %s' %t

David

Basic syntax question	1	Jan 6, 2023
Python Regular Expressions	4	Jun 22, 2011
Regular expressions	8	Dec 26, 2011
regular expressions and matching delimeters	17	May 21, 2014
Large regular expressions	1	Mar 15, 2010
I need some help on a format issue that should be simple for someone here (but not me!)	0	Jul 6, 2023
Utility to locate errors in regular expressions	3	May 24, 2013
The power of regular expressions without regular expressions.	0	Jul 17, 2013

Regular expressions

Kill Bill

Kill Bill

William Park

Dennis Reinhardt

Erik Max Francis

Kill Bill

Gary Herron

Geoff Gerrietts

Peter Otten

Dennis Reinhardt

David C. Fox

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads