Regexp Neg. set of chars HowTo?

durumdara · Dec 20, 2006

Hi!

I want to replace some seqs. in a html.
Let:
a-
b
= ab

but:
xxx -
b
must be unchanged, because it is not word split.

I want to search and replace with re, but I don't know how to neg. this
set ['\ \n\t'].

This time I use full set without these chars, but neg. is better and
shorter.

Ok, I can use [^\s], but I want to know, how to neg. set of chars.
sNorm1= '([^[\ \t\n]]{1})\-\<br\ \/\>\n' - this is not working.

Thanks for the help:
dd

sNorm1= '([%s]{1})\-\<br\ \/\>\n'
c = range(0, 256)
c.remove(32)
c.remove(13)
c.remove(10)
c.remove(9)
s = ["\\%s" % (hex(v).replace('00x', '')) for v in c]
sNorm1 = sNorm1 % ("".join(s))
print sNorm1

def Normalize(Text):

rx = re.compile(sNorm1)
def replacer(match):
return match.group(1)
return rx.sub(replacer, Text)

print Normalize('a - \nb')
print Normalize('a- \nb')
sys.exit()

Paul McGuire · Dec 21, 2006

Hi!

I want to replace some seqs. in a html.
Let:
a-
b
= ab

but:
xxx -
b
must be unchanged, because it is not word split.

I want to search and replace with re, but I don't know how to neg. this
set ['\ \n\t'].

This time I use full set without these chars, but neg. is better and
shorter.

Ok, I can use [^\s], but I want to know, how to neg. set of chars.
sNorm1= '([^[\ \t\n]]{1})\-\<br\ \/\>\n' - this is not working.

Thanks for the help:
dd

sNorm1= '([%s]{1})\-\<br\ \/\>\n'
c = range(0, 256)
c.remove(32)
c.remove(13)
c.remove(10)
c.remove(9)
s = ["\\%s" % (hex(v).replace('00x', '')) for v in c]
sNorm1 = sNorm1 % ("".join(s))
print sNorm1

def Normalize(Text):

rx = re.compile(sNorm1)
def replacer(match):
return match.group(1)
return rx.sub(replacer, Text)

print Normalize('a - \nb')
print Normalize('a- \nb')
sys.exit()

It looks like you are trying to de-hyphenate words that have been
broken across line breaks.

Well, this isn't a regexp solution, it uses pyparsing instead. But
I've added a number of other test cases which may be problematic for an
re.

-- Paul

from pyparsing import makeHTMLTags,Literal,Word,alphas,Suppress

brTag,brEndTag = makeHTMLTags("br")
hyphen = Literal("-")
hyphen.leaveWhitespace() # don't skip whitespace before matching this

collapse = Word(alphas) + Suppress(hyphen) + Suppress(brTag) \
+ Word(alphas)
# define action to replace expression with the word before hyphen
# concatenated with the word after the tag
collapse.setParseAction(lambda toks: toks[0]+toks[1])

print collapse.transformString('a - \nb')
print collapse.transformString('a- \nb')
print collapse.transformString('a- \nb')
print collapse.transformString('a- \nb')
print collapse.transformString('a- \nb')

durumdara · Dec 22, 2006

Hi!

Thanks for this! I'll use that!

I found a solution my question in regexp way too:
import re
testtext = " minion battalion nation dion sion wion alion"
m = re.compile("[^t^l]ion")
print m.findall(testtext)

I search for all text that not lion and tion.

dd

Marc 'BlackJack' Rintsch · Dec 22, 2006

durumdara said:
I found a solution my question in regexp way too:
import re
testtext = " minion battalion nation dion sion wion alion"
m = re.compile("[^t^l]ion")
print m.findall(testtext)

I search for all text that not lion and tion.

And ^ion. The first ^ in that character group "negates" that group, the
second is a literal ^, so I guess you meant "[^tl]ion".

Ciao,
Marc 'BlackJack' Rintsch

HOWTO: Parsing email using Python part2	1	Jul 15, 2011
Issue with textbox script?	0	Sep 5, 2022
stripping unwanted chars from string	7	May 4, 2006
Python point location of intersect between two lines	0	Feb 28, 2018
HOWTO: Parsing email using Python part1	2	Jul 3, 2011
Translater + module + tkinter	1	Feb 16, 2023
Inexplicable behavior in simple example of a set in a class	8	Jul 2, 2011
Data saving in condition of changing reality	0	Apr 29, 2022

Regexp Neg. set of chars HowTo?

durumdara

Paul McGuire

durumdara

Marc 'BlackJack' Rintsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads