Negative look-behind

B

Bhargava

Hello,

I am a newbie to python and need some help.

I am looking at doing some batch search/replace for some of my source
code. Criteria is to find all literal strings and wrap them up with
some macro, say MC. For ex., var = "somestring" would become var =
MC("somestring"). Literal strings can contain escaped " & \.

But there are 2 cases when this replace should not happen:
1.literal strings which have already been wrapped, like
MC("somestring")
2.directives like #include "header.h" and #extern "C".

I tried to use negative look-behind assertion for this purpose. The
expression I use for matching a literal string is
"((\\")|[^"(\\")])+". This works fine. But as I start prepending
look-behind patterns, things go wrong. The question I have is whether
the pattern in negative look-behind part can contain alternation ? In
other words can I make up a regexp which says "match this pattern x
only if it not preceded by anyone of pattern a, pattern b and pattern
c" ?

I tried the following expression to take into account the two
constraints mentioned above, (?<![(#include )(#extern
)(MC\()])"((\\")|[^"(\\")])+". Can someone point out the mistakes in
this ?

Thanks,
Bhargava
 
J

Josh Gilbert

Bhargava said:
Hello,

I am a newbie to python and need some help.

I am looking at doing some batch search/replace for some of my source
code. Criteria is to find all literal strings and wrap them up with
some macro, say MC. For ex., var = "somestring" would become var =
MC("somestring"). Literal strings can contain escaped " & \.

But there are 2 cases when this replace should not happen:
1.literal strings which have already been wrapped, like
MC("somestring")
2.directives like #include "header.h" and #extern "C".

I tried to use negative look-behind assertion for this purpose. The
expression I use for matching a literal string is
"((\\")|[^"(\\")])+". This works fine. But as I start prepending
look-behind patterns, things go wrong. The question I have is whether
the pattern in negative look-behind part can contain alternation ? In
other words can I make up a regexp which says "match this pattern x
only if it not preceded by anyone of pattern a, pattern b and pattern
c" ?

I tried the following expression to take into account the two
constraints mentioned above, (?<![(#include )(#extern
)(MC\()])"((\\")|[^"(\\")])+". Can someone point out the mistakes in
this ?

Thanks,
Bhargava

Hi.

It would have been nice if you simplified your example. Since you said
that your base pattern matched properly (for example) you could have let
that be a literal. But no matter.

I think that your problem is that you're trying to use grouping in a
character class (set). [(1 )(2 )] matches '1', ' ', '(', ')'. My proof:
>>> re.sub('[(1 )(2 )]','a','1 2 ( ) ')
'aaaaaaaa'

So you should just need to ditch the '[' and ']'.

I think what you meant by the set was question marks, ie:
(#include )?(#extern )?(MC\()?
So at least one occurs, though all may.

This is not a Python specific question, this is just plain Reg ex's. You
may wish to consult a good reference site such as
http://www.regular-expressions.info/
or the O'Reilly book http://www.oreilly.com/catalog/regex/ in the future.

Josh Gilbert.
 
P

Paul McGuire

Bhargava said:
Hello,

I am a newbie to python and need some help.

I am looking at doing some batch search/replace for some of my source
code. Criteria is to find all literal strings and wrap them up with
some macro, say MC. For ex., var = "somestring" would become var =
MC("somestring"). Literal strings can contain escaped " & \.

But there are 2 cases when this replace should not happen:
1.literal strings which have already been wrapped, like
MC("somestring")
2.directives like #include "header.h" and #extern "C".

I tried to use negative look-behind assertion for this purpose. The
expression I use for matching a literal string is
"((\\")|[^"(\\")])+". This works fine. But as I start prepending
look-behind patterns, things go wrong. The question I have is whether
the pattern in negative look-behind part can contain alternation ? In
other words can I make up a regexp which says "match this pattern x
only if it not preceded by anyone of pattern a, pattern b and pattern
c" ?

I tried the following expression to take into account the two
constraints mentioned above, (?<![(#include )(#extern
)(MC\()])"((\\")|[^"(\\")])+". Can someone point out the mistakes in
this ?

Thanks,
Bhargava

Please check out the latest beta release of pyparsing, at
http://pyparsing.sourceforge.net . Your post inspired me to add the
transformString() method to pyparsing; look at the included scanExamples.py
program for some search-and-replace examples similar to the ones you give in
your post.

Sincerely,
-- Paul McGuire
 
B

Bhargava

Paul McGuire said:
Bhargava said:
Hello,

I am a newbie to python and need some help.

I am looking at doing some batch search/replace for some of my source
code. Criteria is to find all literal strings and wrap them up with
some macro, say MC. For ex., var = "somestring" would become var =
MC("somestring"). Literal strings can contain escaped " & \.

But there are 2 cases when this replace should not happen:
1.literal strings which have already been wrapped, like
MC("somestring")
2.directives like #include "header.h" and #extern "C".

I tried to use negative look-behind assertion for this purpose. The
expression I use for matching a literal string is
"((\\")|[^"(\\")])+". This works fine. But as I start prepending
look-behind patterns, things go wrong. The question I have is whether
the pattern in negative look-behind part can contain alternation ? In
other words can I make up a regexp which says "match this pattern x
only if it not preceded by anyone of pattern a, pattern b and pattern
c" ?

I tried the following expression to take into account the two
constraints mentioned above, (?<![(#include )(#extern
)(MC\()])"((\\")|[^"(\\")])+". Can someone point out the mistakes in
this ?

Thanks,
Bhargava

Please check out the latest beta release of pyparsing, at
http://pyparsing.sourceforge.net . Your post inspired me to add the
transformString() method to pyparsing; look at the included scanExamples.py
program for some search-and-replace examples similar to the ones you give in
your post.

Sincerely,
-- Paul McGuire
Hi,

I downloaded version 1.2beta3 from sourceforge, but could not find the
scanExamples.py program. I will go thro' the documentation/examples
provided and try.

Thanks,
Bhargava
 
P

Paul McGuire

Bhargava said:
Hi,

I downloaded version 1.2beta3 from sourceforge, but could not find the
scanExamples.py program. I will go thro' the documentation/examples
provided and try.

Thanks,
Bhargava

Well, I think I messed up the 'setup.py sdist' step. Here is
scanExamples.py - it works through some simple scan/transform passes on some
hokey sample C code.

-- Paul

-------------------------------------------
#
# scanExamples.py
#
# Illustration of using pyparsing's scanString and transformString methods
#
# Copyright (c) 2004, Paul McGuire
#
from pyparsing import Word, alphas, alphanums, Literal, restOfLine,
OneOrMore, Empty

# simulate some C++ code
testData = """
#define MAX_LOCS=100
#define USERNAME = "floyd"
#define PASSWORD = "swordfish"

a = MAX_LOCS;

A::assignA( a );
A2::A1::printA( a );

CORBA::initORB("xyzzy", USERNAME, PASSWORD );

"""

#################
print "Example of an extractor"
print "----------------------"

# simple grammar to match #define's
ident = Word(alphas, alphanums+"_")
macroDef = Literal("#define") + ident.setResultsName("name") + "=" +
restOfLine.setResultsName("value")
for t,s,e in macroDef.scanString( testData ):
print t.name,":", t.value

# or a quick way to make a dictionary of the names and values (need to
suppress output of all tokens, other than the name and the value)
macroDef = Literal("#define").suppress() + ident + Literal("=").suppress() +
Empty() + restOfLine
macros = dict([t for t,s,e in macroDef.scanString(testData)])
print "macros =", macros
print


#################
print "Examples of a transformer"
print "----------------------"

# convert C++ namespaces to mangled C-compatible names
scopedIdent = ident + OneOrMore( Literal("::").suppress() + ident )
scopedIdent.setParseAction(lambda s,l,t: "_".join(t))

print "(replace namespace-scoped names with C-compatible names)"
print scopedIdent.transformString( testData )


# or a crude pre-processor (use parse actions to replace matching text)
def substituteMacro(s,l,t):
if t[0] in macros:
return macros[t[0]]
ident.setParseAction( substituteMacro )
ident.ignore(macroDef)

print "(simulate #define pre-processor)"
print ident.transformString( testData )



#################
print "Example of a stripper"
print "----------------------"

from pyparsing import dblQuotedString, LineStart

# remove all string macro definitions (after extracting to a string resource
table?)
ident.setParseAction( None )
stringMacroDef = Literal("#define") + ident + "=" + dblQuotedString +
LineStart()
stringMacroDef.setParseAction( lambda s,l,t: [] )

print stringMacroDef.transformString( testData )
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top