packing things back to regular expression

A

Amit Gupta

Hi

I wonder if python has a function to pack things back into regexp,
that has group names.

e.g:
exp = (<?P<name1>[a-z]+)
compiledexp = re.compile(exp)

Now, I have a dictionary "mytable = {"a" : "myname"}

Is there a way in re module, or elsewhere, where I can have it match
the contents from dictionary to the re-expression (and check that it
matches the rules) and than return the substituted string?

e.gERROR



Thanks
A
 
G

Gary Herron

Amit said:
Hi

I wonder if python has a function to pack things back into regexp,
that has group names.

e.g:
exp = (<?P<name1>[a-z]+)
compiledexp = re.compile(exp)

Now, I have a dictionary "mytable = {"a" : "myname"}

Is there a way in re module, or elsewhere, where I can have it match
the contents from dictionary to the re-expression (and check that it
matches the rules) and than return the substituted string?
I'm not following what you're asking for until I get to the last two
words. The re module does have functions to do string substitution.
One or more occurrences of a pattern matched by an re can be replaces
with a given string. See sub and subn. Perhaps you can make one of
those do whatever it is you are trying to do.

Gary Herron
 
T

Tim Chase

mytable = {"a" : "myname"}

how does SomeNewFunc know to pull "a" as opposed to any other key?

You could do something like one of the following 3 functions:

import re
ERROR = 'ERROR'
def some_new_func(table, regex):
"Return processed results for values matching regex"
result = {}
for k,v in table.iteritems():
m = regex.match(v)
if m:
result[k] = m.group(1)
else:
result[k] = ERROR
return result

def some_new_func2(table, regex, key):
"Get value (if matches regex) or ERROR based on key"
m = regex.match(table[key])
if m: return m.group(0)
return ERROR

def some_new_func3(table, regex):
"Sniff the desired key from the regexp (inefficient)"
for k,v in table.iteritems():
m = regex.match(v)
if m:
groupname, match = m.groupdict().iteritems().next()
if groupname == k:
return match
return ERROR

if __name__ == "__main__":
NAME = 'name1'
mytable = {
'a': 'myname',
'b': '1',
NAME: 'foo',
}
regexp = '(?P<%s>[a-z]+)' % NAME
print 'Using regex:'
print regexp
print '='*10

r = re.compile(regexp)
results = some_new_func(mytable, r)
print 'a: ', results['a']
print 'b: ', results['b']
print '='*10
print 'a: ', some_new_func2(mytable, r, 'a')
print 'b: ', some_new_func2(mytable, r, 'b')
print '='*10
print '%s: %s' % (NAME, some_new_func3(mytable, r))

Function#2 is the optimal solution, for single hits, whereas
Function#1 is best if you plan to repeatedly extract keys from
one set of processed results (the function only gets called
once). Function#3 is just ugly, and generally indicates that you
need to change your tactic ;)

-tkc
 
A

Amit Gupta

Before I read the message: I screwed up.

Let me write again
x = re.compile("CL(?P<name1>[a-z]+)")
# group name "name1" is attached to the match of lowercase string of
alphabet
# Now I have a dictionary saying {"name1", "iamgood"}
# I would like a function, that takes x and my dictionary and return
"CLiamgood"
# If my dictionary instead have {"name1", "123"}, it gives error on
processingit
#
# In general, I have reg-expression where every non-trivial match has
a group-name. I want to do the reverse of reg-exp match. The function
can take reg-exp and replace the group-matches from dictionary
# I hope, this make it clear.
 
S

Steven D'Aprano

Before I read the message: I screwed up.

Let me write again
x = re.compile("CL(?P<name1>[a-z]+)")
# group name "name1" is attached to the match of lowercase string of
alphabet
# Now I have a dictionary saying {"name1", "iamgood"}
# I would like a function, that takes x and my dictionary and
return "CLiamgood"
# If my dictionary instead have {"name1", "123"}, it gives error on
processingit
#
# In general, I have reg-expression where every non-trivial match has a
group-name. I want to do the reverse of reg-exp match. The function can
take reg-exp and replace the group-matches from dictionary
# I hope, this make it clear.


Clear as mud. But I'm going to take a guess.

Are you trying to validate the data against the regular expression as
well as substitute values? That means your function needs to do something
like this:

(1) Take the regular expression object, and extract the string it was
made from. That way at least you know the regular expression was valid.

x = re.compile("CL(?P<name1>[a-z]+)") # validate the regex
x.pattern()

=> "CL(?P<name1>[a-z]+)"


(2) Split the string into sets of three pieces:

split("CL(?P<name1>[a-z]+)") # you need to write this function

=> ("CL", "(?P<name1>", "[a-z]+)")


(3) Mangle the first two pieces:

mangle("CL", "(?P<name1>") # you need to write this function

=> "CL%(name1)s"

(4) Validate the value in the dictionary:

d = {"name1", "123"}
validate("[a-z]+)", d)

=> raise exception

d = {"name1", "iamgood"}
validate("[a-z]+)", d)

=> return True


(5) If the validation step succeeded, then do the replacement:

"CL%(name1)s" % d

=> "CLiamgood"


Step (2), the splitter, will be the hardest because you essentially need
to parse the regular expression. You will need to decide how to handle
regexes with multiple "bits", including *nested* expressions, e.g.:

"CL(?P<name1>[a-z]+)XY(?:AB)[aeiou]+(?P<name2>CD(?P<name3>..)\?EF)"


Good luck.
 
M

MRAB

Before I read the message: I screwed up.

Let me write again
x = re.compile("CL(?P<name1>[a-z]+)")

# group name "name1" is attached to the match of lowercase string of
alphabet
# Now I have a dictionary saying {"name1", "iamgood"}
# I would like a function, that takes x and my dictionary and return
"CLiamgood"
# If my dictionary instead have {"name1", "123"}, it gives error on
processingit
#
# In general, I have reg-expression where every non-trivial match has
a group-name. I want to do the reverse of reg-exp match. The function
can take reg-exp and replace the group-matches from dictionary
# I hope, this make it clear.

If you want the string that matched the regex then you can use
group(0) (or just group()):
x = re.compile("CL(?P<name1>[a-z]+)")
m = x.search("something CLiamgood!something else")
m.group()
'CLiamgood'
 
P

Paul McGuire

Before I read the message: I screwed up.
Let me write again
x = re.compile("CL(?P<name1>[a-z]+)")
# group name "name1" is attached to the match of lowercase string of
alphabet
# Now I have a dictionary saying {"name1", "iamgood"}
# I would like a function, that takes x and my dictionary and
return "CLiamgood"
# If my dictionary instead have {"name1", "123"}, it gives error on
processingit
#
# In general, I have reg-expression where every non-trivial match has a
group-name. I want to do the reverse of reg-exp match. The function can
take reg-exp and replace the group-matches from dictionary
# I hope, this make it clear.
Good luck.

Oh, pshaw! Try this pyparsing ditty.

-- Paul
http://pyparsing.wikispaces.com



from pyparsing import *
import re

# replace patterns of (?P<name>xxx) with dict
# values iff value matches 'xxx' as re

LPAR,RPAR,LT,GT = map(Suppress,"()<>")
nameFlag = Suppress("?P")
rechars = printables.replace(")","").replace("(","")+" "
regex = Forward()("fld_re")
namedField = (nameFlag + \
LT + Word(alphas,alphanums+"_")("fld_name") + GT + \
regex )
regex << Combine(OneOrMore(Word(rechars) |
r"\(" | r"\)" |
nestedExpr(LPAR, RPAR, namedField |
regex,
ignoreExpr=None ) ))

def fillRE(reString, nameDict):
def fieldPA(tokens):
fieldRE = tokens.fld_re
fieldName = tokens.fld_name
if fieldName not in nameDict:
raise ParseFatalException(
"name '%s' not defined in name dict" %
(fieldName,) )
fieldTranslation = nameDict[fieldName]
if (re.match(fieldRE, fieldTranslation)):
return fieldTranslation
else:
raise ParseFatalException(
"value '%s' does not match re '%s'" %
(fieldTranslation, fieldRE) )
namedField.setParseAction(fieldPA)
try:
return (LPAR + namedField + RPAR).transformString(reString)
except ParseBaseException, pe:
return pe.msg

# tests start here
testRE = r"CL(?P<name1>[a-z]+)"

# a simple test
test1 = { "name1" : "iamgood" }
print fillRE(testRE, test1)

# harder test, including nested names (have to be careful in
# constructing the names dict)
testRE = \
r"CL(?P<name1>[a-z]+)XY(?P<name4>:)?AB)[aeiou]+)" \
r"(?P<name2>CD(?P<name3>..)\?EF)"
test3 = { "name1" : "iamgoodZ",
"name2" : "CD@@?EF",
"name3" : "@@",
"name4" : "ABeieio",
}
print fillRE(testRE, test3)

# test a non-conforming field
test2 = { "name1" : "123" }
print fillRE(testRE, test2)


Prints:

CLiamgood
CLiamgoodZXYABeieioCD@@?EF
value '123' does not match re '[a-z]+'
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top