Regular Expressions in Python

F

fossil_blue

Dear Gurus,

I am trying to find out how to write an effective regular expression
in python for the following scenario:

"any number of leading spaces at the beginning of a line" "follow
by a string" "there maybe a string that starts with *"

for example:

END *This is a comment

but I don't want to match this:

END e * This is a line with an error (e)

thanks,
Noel
 
J

Jeff Epler

opt_spaces = " *"
identifier = "[A-Za-z_][A-Za-z0-9_]+"
comment = "\*.*"
opt_comment = "(%s)?" % comment

pat = re.compile(opt_spaces + identifier + opt_spaces + opt_comment + "$")

for test in (
" END *This is a comment",
" END e * This is a line with an error (e)"):
print test, pat.match(test)

Jeff
 
P

Paul McGuire

Jeff Epler said:
opt_spaces = " *"
identifier = "[A-Za-z_][A-Za-z0-9_]+"
comment = "\*.*"
opt_comment = "(%s)?" % comment

pat = re.compile(opt_spaces + identifier + opt_spaces + opt_comment + "$")

for test in (
" END *This is a comment",
" END e * This is a line with an error (e)"):
print test, pat.match(test)

Jeff
Assuming you're more interested in the identifier than in the comment,
change identifier to "([A-Za-z_][A-Za-z0-9_]+)" so that the keyword gets
saved in the pat.match.groups() list.

-- Paul
 
P

Paul McGuire

fossil_blue said:
Dear Gurus,

I am trying to find out how to write an effective regular expression
in python for the following scenario:

"any number of leading spaces at the beginning of a line" "follow
by a string" "there maybe a string that starts with *"

for example:

END *This is a comment

but I don't want to match this:

END e * This is a line with an error (e)

thanks,
Noel

Here's an example with sample code using both re's and pyparsing. Note that
the single .ignore() call takes care of ignoring comments on all contained
grammar constructs, and non-significant whitespace is implicitly ignored (so
no need to litter your matching expressions with lots of opt_spaces-type
content).

-- Paul
========================
from pyparsing import Word, alphas, alphanums, restOfLine, LineEnd,
ParseException

testdata = """
END *This is a comment
END*This is a comment (but the next line has no comment)
END
END e * This is a line with an error (e)"""
enquote = lambda st : ( '"%s"' % st )

print "test with pyparsing"
grammar = Word( alphas, alphanums ).setName("keyword") + LineEnd()
comment = "*" + restOfLine
grammar.ignore( comment )

for test in testdata.split("\n"):
try:
print enquote(test),"\n->",
print grammar.parseString( test )
except ParseException, pe:
print pe

print

import re
print "test with re"
opt_spaces = " *"
#identifier = "[A-Za-z_][A-Za-z0-9_]+" - I'm guessing this regexp should
have ()'s for accessing content as a group
identifier = "([A-Za-z_][A-Za-z0-9_]+)"
comment = "\*.*"
opt_comment = "(%s)?" % comment

pat = re.compile(opt_spaces + identifier + opt_spaces + opt_comment + "$")

for test in testdata.split("\n"):
print enquote(test),"\n->",
if pat.match(test):
print pat.match(test).groups()
else:
print "Bad text"

========================
Gives this output:

test with pyparsing
""
-> Expected keyword (0), (1,1)
"END *This is a comment"
-> ['END']
" END*This is a comment (but the next line has no comment)"
-> ['END']
" END"
-> ['END']
" END e * This is a line with an error (e)"
-> Expected end of line (8), (1,9)

test with re
""
-> Bad text
"END *This is a comment"
-> ('END', '*This is a comment')
" END*This is a comment (but the next line has no comment)"
-> ('END', '*This is a comment (but the next line has no comment)')
" END"
-> ('END', None)
" END e * This is a line with an error (e)"
-> Bad text
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top