packing things back to regular expression

Discussion in 'Python' started by Amit Gupta, Feb 20, 2008.

  1. Amit Gupta

    Amit Gupta Guest

    Hi

    I wonder if python has a function to pack things back into regexp,
    that has group names.

    e.g:
    exp = (<?P<name1>[a-z]+)
    compiledexp = re.compile(exp)

    Now, I have a dictionary "mytable = {"a" : "myname"}

    Is there a way in re module, or elsewhere, where I can have it match
    the contents from dictionary to the re-expression (and check that it
    matches the rules) and than return the substituted string?

    e.g
    >> re.SomeNewFunc(compilexp, mytable)

    "myname"
    >> mytable = {"a" : "1"}
    >> re.SomeNewFunc(compileexp, mytable)

    ERROR



    Thanks
    A
    Amit Gupta, Feb 20, 2008
    #1
    1. Advertising

  2. Amit Gupta

    Gary Herron Guest

    Amit Gupta wrote:
    > Hi
    >
    > I wonder if python has a function to pack things back into regexp,
    > that has group names.
    >
    > e.g:
    > exp = (<?P<name1>[a-z]+)
    > compiledexp = re.compile(exp)
    >
    > Now, I have a dictionary "mytable = {"a" : "myname"}
    >
    > Is there a way in re module, or elsewhere, where I can have it match
    > the contents from dictionary to the re-expression (and check that it
    > matches the rules) and than return the substituted string?
    >

    I'm not following what you're asking for until I get to the last two
    words. The re module does have functions to do string substitution.
    One or more occurrences of a pattern matched by an re can be replaces
    with a given string. See sub and subn. Perhaps you can make one of
    those do whatever it is you are trying to do.

    Gary Herron

    > e.g
    >
    >>> re.SomeNewFunc(compilexp, mytable)
    >>>

    > "myname"
    >
    >>> mytable = {"a" : "1"}
    >>> re.SomeNewFunc(compileexp, mytable)
    >>>

    > ERROR
    >
    >
    >
    > Thanks
    > A
    >
    Gary Herron, Feb 20, 2008
    #2
    1. Advertising

  3. Amit Gupta

    Tim Chase Guest

    > mytable = {"a" : "myname"}
    >>> re.SomeNewFunc(compilexp, mytable)

    > "myname"


    how does SomeNewFunc know to pull "a" as opposed to any other key?

    >>> mytable = {"a" : "1"}
    >>> re.SomeNewFunc(compileexp, mytable)

    > ERROR


    You could do something like one of the following 3 functions:

    import re
    ERROR = 'ERROR'
    def some_new_func(table, regex):
    "Return processed results for values matching regex"
    result = {}
    for k,v in table.iteritems():
    m = regex.match(v)
    if m:
    result[k] = m.group(1)
    else:
    result[k] = ERROR
    return result

    def some_new_func2(table, regex, key):
    "Get value (if matches regex) or ERROR based on key"
    m = regex.match(table[key])
    if m: return m.group(0)
    return ERROR

    def some_new_func3(table, regex):
    "Sniff the desired key from the regexp (inefficient)"
    for k,v in table.iteritems():
    m = regex.match(v)
    if m:
    groupname, match = m.groupdict().iteritems().next()
    if groupname == k:
    return match
    return ERROR

    if __name__ == "__main__":
    NAME = 'name1'
    mytable = {
    'a': 'myname',
    'b': '1',
    NAME: 'foo',
    }
    regexp = '(?P<%s>[a-z]+)' % NAME
    print 'Using regex:'
    print regexp
    print '='*10

    r = re.compile(regexp)
    results = some_new_func(mytable, r)
    print 'a: ', results['a']
    print 'b: ', results['b']
    print '='*10
    print 'a: ', some_new_func2(mytable, r, 'a')
    print 'b: ', some_new_func2(mytable, r, 'b')
    print '='*10
    print '%s: %s' % (NAME, some_new_func3(mytable, r))

    Function#2 is the optimal solution, for single hits, whereas
    Function#1 is best if you plan to repeatedly extract keys from
    one set of processed results (the function only gets called
    once). Function#3 is just ugly, and generally indicates that you
    need to change your tactic ;)

    -tkc
    Tim Chase, Feb 20, 2008
    #3
  4. Amit Gupta

    Amit Gupta Guest

    Before I read the message: I screwed up.

    Let me write again

    >> x = re.compile("CL(?P<name1>[a-z]+)")

    # group name "name1" is attached to the match of lowercase string of
    alphabet
    # Now I have a dictionary saying {"name1", "iamgood"}
    # I would like a function, that takes x and my dictionary and return
    "CLiamgood"
    # If my dictionary instead have {"name1", "123"}, it gives error on
    processingit
    #
    # In general, I have reg-expression where every non-trivial match has
    a group-name. I want to do the reverse of reg-exp match. The function
    can take reg-exp and replace the group-matches from dictionary
    # I hope, this make it clear.
    Amit Gupta, Feb 20, 2008
    #4
  5. On Wed, 20 Feb 2008 11:36:20 -0800, Amit Gupta wrote:

    > Before I read the message: I screwed up.
    >
    > Let me write again
    >
    >>> x = re.compile("CL(?P<name1>[a-z]+)")

    > # group name "name1" is attached to the match of lowercase string of
    > alphabet
    > # Now I have a dictionary saying {"name1", "iamgood"}
    > # I would like a function, that takes x and my dictionary and
    > return "CLiamgood"
    > # If my dictionary instead have {"name1", "123"}, it gives error on
    > processingit
    > #
    > # In general, I have reg-expression where every non-trivial match has a
    > group-name. I want to do the reverse of reg-exp match. The function can
    > take reg-exp and replace the group-matches from dictionary
    > # I hope, this make it clear.



    Clear as mud. But I'm going to take a guess.

    Are you trying to validate the data against the regular expression as
    well as substitute values? That means your function needs to do something
    like this:

    (1) Take the regular expression object, and extract the string it was
    made from. That way at least you know the regular expression was valid.

    x = re.compile("CL(?P<name1>[a-z]+)") # validate the regex
    x.pattern()

    => "CL(?P<name1>[a-z]+)"


    (2) Split the string into sets of three pieces:

    split("CL(?P<name1>[a-z]+)") # you need to write this function

    => ("CL", "(?P<name1>", "[a-z]+)")


    (3) Mangle the first two pieces:

    mangle("CL", "(?P<name1>") # you need to write this function

    => "CL%(name1)s"

    (4) Validate the value in the dictionary:

    d = {"name1", "123"}
    validate("[a-z]+)", d)

    => raise exception

    d = {"name1", "iamgood"}
    validate("[a-z]+)", d)

    => return True


    (5) If the validation step succeeded, then do the replacement:

    "CL%(name1)s" % d

    => "CLiamgood"


    Step (2), the splitter, will be the hardest because you essentially need
    to parse the regular expression. You will need to decide how to handle
    regexes with multiple "bits", including *nested* expressions, e.g.:

    "CL(?P<name1>[a-z]+)XY(?:AB)[aeiou]+(?P<name2>CD(?P<name3>..)\?EF)"


    Good luck.


    --
    Steven
    Steven D'Aprano, Feb 21, 2008
    #5
  6. Amit Gupta

    MRAB Guest

    On Feb 20, 7:36 pm, Amit Gupta <> wrote:
    > Before I read the message: I screwed up.
    >
    > Let me write again
    >
    > >> x = re.compile("CL(?P<name1>[a-z]+)")

    >
    > # group name "name1" is attached to the match of lowercase string of
    > alphabet
    > # Now I have a dictionary saying {"name1", "iamgood"}
    > # I would like a function, that takes x and my dictionary and return
    > "CLiamgood"
    > # If my dictionary instead have {"name1", "123"}, it gives error on
    > processingit
    > #
    > # In general, I have reg-expression where every non-trivial match has
    > a group-name. I want to do the reverse of reg-exp match. The function
    > can take reg-exp and replace the group-matches from dictionary
    > # I hope, this make it clear.


    If you want the string that matched the regex then you can use
    group(0) (or just group()):

    >>> x = re.compile("CL(?P<name1>[a-z]+)")
    >>> m = x.search("something CLiamgood!something else")
    >>> m.group()

    'CLiamgood'
    MRAB, Feb 21, 2008
    #6
  7. Amit Gupta

    Paul McGuire Guest

    On Feb 20, 6:29 pm, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:
    > On Wed, 20 Feb 2008 11:36:20 -0800, Amit Gupta wrote:
    > > Before I read the message: I screwed up.

    >
    > > Let me write again

    >
    > >>> x = re.compile("CL(?P<name1>[a-z]+)")

    > > # group name "name1" is attached to the match of lowercase string of
    > > alphabet
    > > # Now I have a dictionary saying {"name1", "iamgood"}
    > > # I would like a function, that takes x and my dictionary and
    > > return "CLiamgood"
    > > # If my dictionary instead have {"name1", "123"}, it gives error on
    > > processingit
    > > #
    > > # In general, I have reg-expression where every non-trivial match has a
    > > group-name. I want to do the reverse of reg-exp match. The function can
    > > take reg-exp and replace the group-matches from dictionary
    > > # I hope, this make it clear.

    >

    <snip>
    >
    > Good luck.
    >
    > --
    > Steven


    Oh, pshaw! Try this pyparsing ditty.

    -- Paul
    http://pyparsing.wikispaces.com



    from pyparsing import *
    import re

    # replace patterns of (?P<name>xxx) with dict
    # values iff value matches 'xxx' as re

    LPAR,RPAR,LT,GT = map(Suppress,"()<>")
    nameFlag = Suppress("?P")
    rechars = printables.replace(")","").replace("(","")+" "
    regex = Forward()("fld_re")
    namedField = (nameFlag + \
    LT + Word(alphas,alphanums+"_")("fld_name") + GT + \
    regex )
    regex << Combine(OneOrMore(Word(rechars) |
    r"\(" | r"\)" |
    nestedExpr(LPAR, RPAR, namedField |
    regex,
    ignoreExpr=None ) ))

    def fillRE(reString, nameDict):
    def fieldPA(tokens):
    fieldRE = tokens.fld_re
    fieldName = tokens.fld_name
    if fieldName not in nameDict:
    raise ParseFatalException(
    "name '%s' not defined in name dict" %
    (fieldName,) )
    fieldTranslation = nameDict[fieldName]
    if (re.match(fieldRE, fieldTranslation)):
    return fieldTranslation
    else:
    raise ParseFatalException(
    "value '%s' does not match re '%s'" %
    (fieldTranslation, fieldRE) )
    namedField.setParseAction(fieldPA)
    try:
    return (LPAR + namedField + RPAR).transformString(reString)
    except ParseBaseException, pe:
    return pe.msg

    # tests start here
    testRE = r"CL(?P<name1>[a-z]+)"

    # a simple test
    test1 = { "name1" : "iamgood" }
    print fillRE(testRE, test1)

    # harder test, including nested names (have to be careful in
    # constructing the names dict)
    testRE = \
    r"CL(?P<name1>[a-z]+)XY(?P<name4>:)?AB)[aeiou]+)" \
    r"(?P<name2>CD(?P<name3>..)\?EF)"
    test3 = { "name1" : "iamgoodZ",
    "name2" : "CD@@?EF",
    "name3" : "@@",
    "name4" : "ABeieio",
    }
    print fillRE(testRE, test3)

    # test a non-conforming field
    test2 = { "name1" : "123" }
    print fillRE(testRE, test2)


    Prints:

    CLiamgood
    CLiamgoodZXYABeieioCD@@?EF
    value '123' does not match re '[a-z]+'
    Paul McGuire, Feb 21, 2008
    #7
  8. Amit Gupta

    Amit Gupta Guest


    > "CL(?P<name1>[a-z]+)XY(?:AB)[aeiou]+(?P<name2>CD(?P<name3>..)\?EF)"
    >
    > Good luck.
    >
    > --
    > Steven


    This is what I did in the end (in principle). Thanks.

    A
    Amit Gupta, Feb 24, 2008
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    496
  2. VSK
    Replies:
    2
    Views:
    2,268
  3. bleed-22
    Replies:
    4
    Views:
    330
    Andrew Thompson
    Jan 24, 2004
  4. Matthew

    regular expression back references

    Matthew, Aug 8, 2003, in forum: Python
    Replies:
    8
    Views:
    384
    Matthew
    Aug 11, 2003
  5. =?Utf-8?B?V2lsbGlhbSBTdWxsaXZhbg==?=

    vs2005 publish website doing bad things, bad things

    =?Utf-8?B?V2lsbGlhbSBTdWxsaXZhbg==?=, Oct 25, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    585
    =?Utf-8?B?UGV0ZXIgQnJvbWJlcmcgW0MjIE1WUF0=?=
    Oct 25, 2006
Loading...

Share This Page