Negative look-behind

Bhargava · Jun 1, 2004

Hello,

I am a newbie to python and need some help.

I am looking at doing some batch search/replace for some of my source
code. Criteria is to find all literal strings and wrap them up with
some macro, say MC. For ex., var = "somestring" would become var =
MC("somestring"). Literal strings can contain escaped " & \.

But there are 2 cases when this replace should not happen:
1.literal strings which have already been wrapped, like
MC("somestring")
2.directives like #include "header.h" and #extern "C".

I tried to use negative look-behind assertion for this purpose. The
expression I use for matching a literal string is
"((\\")|[^"(\\")])+". This works fine. But as I start prepending
look-behind patterns, things go wrong. The question I have is whether
the pattern in negative look-behind part can contain alternation ? In
other words can I make up a regexp which says "match this pattern x
only if it not preceded by anyone of pattern a, pattern b and pattern
c" ?

I tried the following expression to take into account the two
constraints mentioned above, (?<![(#include )(#extern
)(MC\()])"((\\")|[^"(\\")])+". Can someone point out the mistakes in
this ?

Thanks,
Bhargava

Josh Gilbert · Jun 1, 2004

Bhargava said:
Hello,

I am a newbie to python and need some help.

I am looking at doing some batch search/replace for some of my source
code. Criteria is to find all literal strings and wrap them up with
some macro, say MC. For ex., var = "somestring" would become var =
MC("somestring"). Literal strings can contain escaped " & \.

But there are 2 cases when this replace should not happen:
1.literal strings which have already been wrapped, like
MC("somestring")
2.directives like #include "header.h" and #extern "C".

I tried to use negative look-behind assertion for this purpose. The
expression I use for matching a literal string is
"((\\")|[^"(\\")])+". This works fine. But as I start prepending
look-behind patterns, things go wrong. The question I have is whether
the pattern in negative look-behind part can contain alternation ? In
other words can I make up a regexp which says "match this pattern x
only if it not preceded by anyone of pattern a, pattern b and pattern
c" ?

I tried the following expression to take into account the two
constraints mentioned above, (?<![(#include )(#extern
)(MC\()])"((\\")|[^"(\\")])+". Can someone point out the mistakes in
this ?

Thanks,
Bhargava

Hi.

It would have been nice if you simplified your example. Since you said
that your base pattern matched properly (for example) you could have let
that be a literal. But no matter.

I think that your problem is that you're trying to use grouping in a
character class (set). [(1 )(2 )] matches '1', ' ', '(', ')'. My proof:

>>> re.sub('[(1 )(2 )]','a','1 2 ( ) ')

Click to expand...

Click to expand...

'aaaaaaaa'

So you should just need to ditch the '[' and ']'.

I think what you meant by the set was question marks, ie:
(#include )?(#extern )?(MC\()?
So at least one occurs, though all may.

This is not a Python specific question, this is just plain Reg ex's. You
may wish to consult a good reference site such as
http://www.regular-expressions.info/
or the O'Reilly book http://www.oreilly.com/catalog/regex/ in the future.

Josh Gilbert.

Paul McGuire · Jun 5, 2004

Bhargava said:
Hello,

I am a newbie to python and need some help.

I am looking at doing some batch search/replace for some of my source
code. Criteria is to find all literal strings and wrap them up with
some macro, say MC. For ex., var = "somestring" would become var =
MC("somestring"). Literal strings can contain escaped " & \.

But there are 2 cases when this replace should not happen:
1.literal strings which have already been wrapped, like
MC("somestring")
2.directives like #include "header.h" and #extern "C".

I tried to use negative look-behind assertion for this purpose. The
expression I use for matching a literal string is
"((\\")|[^"(\\")])+". This works fine. But as I start prepending
look-behind patterns, things go wrong. The question I have is whether
the pattern in negative look-behind part can contain alternation ? In
other words can I make up a regexp which says "match this pattern x
only if it not preceded by anyone of pattern a, pattern b and pattern
c" ?

I tried the following expression to take into account the two
constraints mentioned above, (?<![(#include )(#extern
)(MC\()])"((\\")|[^"(\\")])+". Can someone point out the mistakes in
this ?

Thanks,
Bhargava

Please check out the latest beta release of pyparsing, at
http://pyparsing.sourceforge.net . Your post inspired me to add the
transformString() method to pyparsing; look at the included scanExamples.py
program for some search-and-replace examples similar to the ones you give in
your post.

Sincerely,
-- Paul McGuire

Bhargava · Jun 7, 2004

Paul McGuire said:
Bhargava said:

Hello,

I am a newbie to python and need some help.

I am looking at doing some batch search/replace for some of my source
code. Criteria is to find all literal strings and wrap them up with
some macro, say MC. For ex., var = "somestring" would become var =
MC("somestring"). Literal strings can contain escaped " & \.

But there are 2 cases when this replace should not happen:
1.literal strings which have already been wrapped, like
MC("somestring")
2.directives like #include "header.h" and #extern "C".

I tried to use negative look-behind assertion for this purpose. The
expression I use for matching a literal string is
"((\\")|[^"(\\")])+". This works fine. But as I start prepending
look-behind patterns, things go wrong. The question I have is whether
the pattern in negative look-behind part can contain alternation ? In
other words can I make up a regexp which says "match this pattern x
only if it not preceded by anyone of pattern a, pattern b and pattern
c" ?

I tried the following expression to take into account the two
constraints mentioned above, (?<![(#include )(#extern
)(MC\()])"((\\")|[^"(\\")])+". Can someone point out the mistakes in
this ?

Thanks,
Bhargava

Click to expand...

Please check out the latest beta release of pyparsing, at
http://pyparsing.sourceforge.net . Your post inspired me to add the
transformString() method to pyparsing; look at the included scanExamples.py
program for some search-and-replace examples similar to the ones you give in
your post.

Sincerely,
-- Paul McGuire

Hi,

I downloaded version 1.2beta3 from sourceforge, but could not find the
scanExamples.py program. I will go thro' the documentation/examples
provided and try.

Thanks,
Bhargava

Paul McGuire · Jun 7, 2004

Bhargava said:
Hi,

I downloaded version 1.2beta3 from sourceforge, but could not find the
scanExamples.py program. I will go thro' the documentation/examples
provided and try.

Thanks,
Bhargava

Well, I think I messed up the 'setup.py sdist' step. Here is
scanExamples.py - it works through some simple scan/transform passes on some
hokey sample C code.

-- Paul

-------------------------------------------
#
# scanExamples.py
#
# Illustration of using pyparsing's scanString and transformString methods
#
# Copyright (c) 2004, Paul McGuire
#
from pyparsing import Word, alphas, alphanums, Literal, restOfLine,
OneOrMore, Empty

# simulate some C++ code
testData = """
#define MAX_LOCS=100
#define USERNAME = "floyd"
#define PASSWORD = "swordfish"

a = MAX_LOCS;

A::assignA( a );
A2::A1:

rintA( a );

CORBA::initORB("xyzzy", USERNAME, PASSWORD );

"""

#################
print "Example of an extractor"
print "----------------------"

# simple grammar to match #define's
ident = Word(alphas, alphanums+"_")
macroDef = Literal("#define") + ident.setResultsName("name") + "=" +
restOfLine.setResultsName("value")
for t,s,e in macroDef.scanString( testData ):
print t.name,":", t.value

# or a quick way to make a dictionary of the names and values (need to
suppress output of all tokens, other than the name and the value)
macroDef = Literal("#define").suppress() + ident + Literal("=").suppress() +
Empty() + restOfLine
macros = dict([t for t,s,e in macroDef.scanString(testData)])
print "macros =", macros
print

#################
print "Examples of a transformer"
print "----------------------"

# convert C++ namespaces to mangled C-compatible names
scopedIdent = ident + OneOrMore( Literal("::").suppress() + ident )
scopedIdent.setParseAction(lambda s,l,t: "_".join(t))

print "(replace namespace-scoped names with C-compatible names)"
print scopedIdent.transformString( testData )

# or a crude pre-processor (use parse actions to replace matching text)
def substituteMacro(s,l,t):
if t[0] in macros:
return macros[t[0]]
ident.setParseAction( substituteMacro )
ident.ignore(macroDef)

print "(simulate #define pre-processor)"
print ident.transformString( testData )

#################
print "Example of a stripper"
print "----------------------"

from pyparsing import dblQuotedString, LineStart

# remove all string macro definitions (after extracting to a string resource
table?)
ident.setParseAction( None )
stringMacroDef = Literal("#define") + ident + "=" + dblQuotedString +
LineStart()
stringMacroDef.setParseAction( lambda s,l,t: [] )

print stringMacroDef.transformString( testData )

Regular expression negative look-ahead	1	Jul 2, 2013
Problem: perl negative look-ahead assertion in multi-line mode	2	May 22, 2013
RegEx	0	Sep 1, 2022
Big problem I need to solve with some unix utils	1	Jun 19, 2022
negative regexes.	10	Jan 16, 2011
variable-width negative look-behind emulation	4	Sep 13, 2003
How to do variable-width look-behind?	3	Feb 9, 2010
regex negative lookbehind assertion not working correctly?	0	Mar 31, 2009

Negative look-behind

Bhargava

Josh Gilbert

Paul McGuire

Bhargava

Paul McGuire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads