regexp substitution - a lot of work!

L

Lukas Holcik

Hi Python crazies!:))
There is a problem to be solved. I have a text and I have to parse
it using a lot of regular expressions. In (lin)u(ni)x I could write in
bash:

cat file | sed 's/../../' | sed 's/../../' .. .. .. > parsed_file

I write a parser in python and what I must do is:

regexp = re.compile(..)
result = regexp.search(data)
while result:
data = data[:result.start()] + .. result.group(..) + \
data[result.end():]
result = regexp.search(data)

... for one regexp substitution

instead of just: s/../../

That is quite a lot of work! Don't you know some better and easier way?
Thanks in advance,

---------------------------------------_.)--
| Lukas Holcik ([email protected]) (\=)*
----------------------------------------''--
 
E

Eddie Corns

Lukas Holcik said:
Hi Python crazies!:))
There is a problem to be solved. I have a text and I have to parse
it using a lot of regular expressions. In (lin)u(ni)x I could write in
bash:
cat file | sed 's/../../' | sed 's/../../' .. .. .. > parsed_file

In Unix you would actually do:

$ sed 's/pat1/rep1/ s/pat2/rep2/ ...' <infile >outfile

to do the replacements in one pass. (you will now anyway :)
I write a parser in python and what I must do is:
regexp = re.compile(..)
result = regexp.search(data)
while result:
data = data[:result.start()] + .. result.group(..) + \
data[result.end():]
result = regexp.search(data)
... for one regexp substitution
instead of just: s/../../

That is quite a lot of work! Don't you know some better and easier way?
Thanks in advance,

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/81330

Eddie
 
T

Tuure Laurinolli

Lukas Holcik wrote:


> That is quite a lot of work! Don't you know some better and easier way?
> Thanks in advance,

Why not use re.sub?
regexp = re.compile(..)
result = regexp.search(data)
while result:
data = data[:result.start()] + .. result.group(..) + \
data[result.end():]
result = regexp.search(data)

... for one regexp substitution


regexp = re.compile(a_pattern)
result = regexp.sub(a_replacement, data)

Or use a callback if you need to modify the match:

def add_one(match):
return match.group(1) + '1'

regexp = re.compile(a_pattern)
result = regexp.sub(add_one, data)
 
L

Lukas Holcik

Yes, sorry, I was in such a hurry I didn't found it in the documentation,
but when I want to use a lot of very different expressions using a lot of
different grouping, which would be easy to implement using s/(..)/x\1x/
then it is quite problematic having to use re.sub(), isn't it?

for example these perlish expressions:
's/^<i>.*?</i>\s*//'
's/<b>\s*</b><br>.*//s'
's/^(?:<[^>]>)?\t(.*?)<br>/<p>\1</p>/'
's/\t+| {2,}/ /'
's/(<[^/][^>]*>)([^\n])/\1\n\1/'
's/(?<!\n)(</[^>]*>)/\n\1/'
's/ /\n/'
couldn't it be easier to call external perl (or sed) ? Thanks,

---------------------------------------_.)--
| Lukas Holcik ([email protected]) (\=)*
----------------------------------------''--

Lukas Holcik wrote:


That is quite a lot of work! Don't you know some better and easier way?
Thanks in advance,

Why not use re.sub?
regexp = re.compile(..)
result = regexp.search(data)
while result:
data = data[:result.start()] + .. result.group(..) + \
data[result.end():]
result = regexp.search(data)

... for one regexp substitution


regexp = re.compile(a_pattern)
result = regexp.sub(a_replacement, data)

Or use a callback if you need to modify the match:

def add_one(match):
return match.group(1) + '1'

regexp = re.compile(a_pattern)
result = regexp.sub(add_one, data)
 
D

Duncan Booth

Yes, sorry, I was in such a hurry I didn't found it in the
documentation, but when I want to use a lot of very different
expressions using a lot of different grouping, which would be easy to
implement using s/(..)/x\1x/ then it is quite problematic having to
use re.sub(), isn't it?

I don't understand your point. The Python equivalent is:

re.sub('(..)', r'x\1x', s)

or using a precompiled pattern:

pat.sub(r'x\1x', s)
 
L

Lukas Holcik

Is it really that easy? Now I see Python is simply the best:)))!

I just didn't know, how to use groups in a different way than
MatchObject.group(..). You already answered that, thanks!:)

---------------------------------------_.)--
| Lukas Holcik ([email protected]) (\=)*
----------------------------------------''--
 
F

Fredrik Lundh

Lukas Holcik quoted someone writing:

footnote: you can use a callback instead of the replacement pattern.
callbacks are often faster, and can lead more readable code:

http://effbot.org/zone/re-sub.htm#callbacks

(as the other examples on that page show, you can do a lot of weird
stuff with re.sub callbacks...)

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top