Regular Expressions in Python

fossil_blue · Mar 1, 2004

Dear Gurus,

I am trying to find out how to write an effective regular expression
in python for the following scenario:

"any number of leading spaces at the beginning of a line" "follow
by a string" "there maybe a string that starts with *"

for example:

END *This is a comment

but I don't want to match this:

END e * This is a line with an error (e)

thanks,
Noel

Jeff Epler · Mar 1, 2004

opt_spaces = " *"
identifier = "[A-Za-z_][A-Za-z0-9_]+"
comment = "\*.*"
opt_comment = "(%s)?" % comment

pat = re.compile(opt_spaces + identifier + opt_spaces + opt_comment + "$")

for test in (
" END *This is a comment",
" END e * This is a line with an error (e)"):
print test, pat.match(test)

Jeff

Paul McGuire · Mar 1, 2004

Jeff Epler said:
opt_spaces = " *"
identifier = "[A-Za-z_][A-Za-z0-9_]+"
comment = "\*.*"
opt_comment = "(%s)?" % comment

pat = re.compile(opt_spaces + identifier + opt_spaces + opt_comment + "$")

for test in (
" END *This is a comment",
" END e * This is a line with an error (e)"):
print test, pat.match(test)

Jeff

Assuming you're more interested in the identifier than in the comment,
change identifier to "([A-Za-z_][A-Za-z0-9_]+)" so that the keyword gets
saved in the pat.match.groups() list.

-- Paul

Paul McGuire · Mar 1, 2004

fossil_blue said:
Dear Gurus,

I am trying to find out how to write an effective regular expression
in python for the following scenario:

"any number of leading spaces at the beginning of a line" "follow
by a string" "there maybe a string that starts with *"

for example:

END *This is a comment

but I don't want to match this:

END e * This is a line with an error (e)

thanks,
Noel

Here's an example with sample code using both re's and pyparsing. Note that
the single .ignore() call takes care of ignoring comments on all contained
grammar constructs, and non-significant whitespace is implicitly ignored (so
no need to litter your matching expressions with lots of opt_spaces-type
content).

-- Paul
========================
from pyparsing import Word, alphas, alphanums, restOfLine, LineEnd,
ParseException

testdata = """
END *This is a comment
END*This is a comment (but the next line has no comment)
END
END e * This is a line with an error (e)"""
enquote = lambda st : ( '"%s"' % st )

print "test with pyparsing"
grammar = Word( alphas, alphanums ).setName("keyword") + LineEnd()
comment = "*" + restOfLine
grammar.ignore( comment )

for test in testdata.split("\n"):
try:
print enquote(test),"\n->",
print grammar.parseString( test )
except ParseException, pe:
print pe

print

import re
print "test with re"
opt_spaces = " *"
#identifier = "[A-Za-z_][A-Za-z0-9_]+" - I'm guessing this regexp should
have ()'s for accessing content as a group
identifier = "([A-Za-z_][A-Za-z0-9_]+)"
comment = "\*.*"
opt_comment = "(%s)?" % comment

pat = re.compile(opt_spaces + identifier + opt_spaces + opt_comment + "$")

for test in testdata.split("\n"):
print enquote(test),"\n->",
if pat.match(test):
print pat.match(test).groups()
else:
print "Bad text"

========================
Gives this output:

test with pyparsing
""
-> Expected keyword (0), (1,1)
"END *This is a comment"
-> ['END']
" END*This is a comment (but the next line has no comment)"
-> ['END']
" END"
-> ['END']
" END e * This is a line with an error (e)"
-> Expected end of line (8), (1,9)

test with re
""
-> Bad text
"END *This is a comment"
-> ('END', '*This is a comment')
" END*This is a comment (but the next line has no comment)"
-> ('END', '*This is a comment (but the next line has no comment)')
" END"
-> ('END', None)
" END e * This is a line with an error (e)"
-> Bad text

Utility to locate errors in regular expressions	3	May 24, 2013
Python Regular Expressions	4	Jun 22, 2011
regular expressions, stack and nesting	2	Mar 22, 2009
Large regular expressions	1	Mar 15, 2010
The power of regular expressions without regular expressions.	0	Jul 17, 2013
regular expressions and matching delimeters	17	May 21, 2014
Groups in regular expressions don't repeat as expected	7	Apr 20, 2011
Use Regular Expressions to extract URL's	3	Apr 30, 2010

Regular Expressions in Python

fossil_blue

Jeff Epler

Paul McGuire

Paul McGuire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads