Regular Expressions in Python

Discussion in 'Python' started by fossil_blue, Mar 1, 2004.

  1. fossil_blue

    fossil_blue Guest

    Dear Gurus,

    I am trying to find out how to write an effective regular expression
    in python for the following scenario:

    "any number of leading spaces at the beginning of a line" "follow
    by a string" "there maybe a string that starts with *"

    for example:

    END *This is a comment

    but I don't want to match this:

    END e * This is a line with an error (e)

    thanks,
    Noel
    fossil_blue, Mar 1, 2004
    #1
    1. Advertising

  2. fossil_blue

    Jeff Epler Guest

    opt_spaces = " *"
    identifier = "[A-Za-z_][A-Za-z0-9_]+"
    comment = "\*.*"
    opt_comment = "(%s)?" % comment

    pat = re.compile(opt_spaces + identifier + opt_spaces + opt_comment + "$")

    for test in (
    " END *This is a comment",
    " END e * This is a line with an error (e)"):
    print test, pat.match(test)

    Jeff
    Jeff Epler, Mar 1, 2004
    #2
    1. Advertising

  3. fossil_blue

    Paul McGuire Guest

    "Jeff Epler" <> wrote in message
    news:...
    > opt_spaces = " *"
    > identifier = "[A-Za-z_][A-Za-z0-9_]+"
    > comment = "\*.*"
    > opt_comment = "(%s)?" % comment
    >
    > pat = re.compile(opt_spaces + identifier + opt_spaces + opt_comment + "$")
    >
    > for test in (
    > " END *This is a comment",
    > " END e * This is a line with an error (e)"):
    > print test, pat.match(test)
    >
    > Jeff
    >

    Assuming you're more interested in the identifier than in the comment,
    change identifier to "([A-Za-z_][A-Za-z0-9_]+)" so that the keyword gets
    saved in the pat.match.groups() list.

    -- Paul
    Paul McGuire, Mar 1, 2004
    #3
  4. fossil_blue

    Paul McGuire Guest

    "fossil_blue" <> wrote in message
    news:...
    > Dear Gurus,
    >
    > I am trying to find out how to write an effective regular expression
    > in python for the following scenario:
    >
    > "any number of leading spaces at the beginning of a line" "follow
    > by a string" "there maybe a string that starts with *"
    >
    > for example:
    >
    > END *This is a comment
    >
    > but I don't want to match this:
    >
    > END e * This is a line with an error (e)
    >
    > thanks,
    > Noel


    Here's an example with sample code using both re's and pyparsing. Note that
    the single .ignore() call takes care of ignoring comments on all contained
    grammar constructs, and non-significant whitespace is implicitly ignored (so
    no need to litter your matching expressions with lots of opt_spaces-type
    content).

    -- Paul
    ========================
    from pyparsing import Word, alphas, alphanums, restOfLine, LineEnd,
    ParseException

    testdata = """
    END *This is a comment
    END*This is a comment (but the next line has no comment)
    END
    END e * This is a line with an error (e)"""
    enquote = lambda st : ( '"%s"' % st )

    print "test with pyparsing"
    grammar = Word( alphas, alphanums ).setName("keyword") + LineEnd()
    comment = "*" + restOfLine
    grammar.ignore( comment )

    for test in testdata.split("\n"):
    try:
    print enquote(test),"\n->",
    print grammar.parseString( test )
    except ParseException, pe:
    print pe

    print

    import re
    print "test with re"
    opt_spaces = " *"
    #identifier = "[A-Za-z_][A-Za-z0-9_]+" - I'm guessing this regexp should
    have ()'s for accessing content as a group
    identifier = "([A-Za-z_][A-Za-z0-9_]+)"
    comment = "\*.*"
    opt_comment = "(%s)?" % comment

    pat = re.compile(opt_spaces + identifier + opt_spaces + opt_comment + "$")

    for test in testdata.split("\n"):
    print enquote(test),"\n->",
    if pat.match(test):
    print pat.match(test).groups()
    else:
    print "Bad text"

    ========================
    Gives this output:

    test with pyparsing
    ""
    -> Expected keyword (0), (1,1)
    "END *This is a comment"
    -> ['END']
    " END*This is a comment (but the next line has no comment)"
    -> ['END']
    " END"
    -> ['END']
    " END e * This is a line with an error (e)"
    -> Expected end of line (8), (1,9)

    test with re
    ""
    -> Bad text
    "END *This is a comment"
    -> ('END', '*This is a comment')
    " END*This is a comment (but the next line has no comment)"
    -> ('END', '*This is a comment (but the next line has no comment)')
    " END"
    -> ('END', None)
    " END e * This is a line with an error (e)"
    -> Bad text
    Paul McGuire, Mar 1, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jay Douglas
    Replies:
    0
    Views:
    582
    Jay Douglas
    Aug 15, 2003
  2. Tony C
    Replies:
    6
    Views:
    316
  3. codecraig

    Regular Expressions - Python vs Perl

    codecraig, Apr 21, 2005, in forum: Python
    Replies:
    30
    Views:
    5,059
    Ilpo =?iso-8859-1?Q?Nyyss=F6nen?=
    Apr 26, 2005
  4. Vibha Tripathi
    Replies:
    3
    Views:
    2,179
    George Sakkis
    Jul 5, 2005
  5. Noman Shapiro
    Replies:
    0
    Views:
    215
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page