Help with regex search-and-replace (Perl to Python)

Schif Schaf · Feb 7, 2010

Hi,

I've got some text that looks like this:

Lorem [ipsum] dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut [labore] et [dolore] magna aliqua.

and I want to make it look like this:

Lorem {ipsum} dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut {labore} et {dolore} magna aliqua.

(brackets replaced by braces). I can do that with Perl pretty easily:

~~~~
for (<>) {
s/\[(.+?)\]/\{$1\}/g;
print;
}
~~~~

but am not able to figure out how to do it with Python. I start out
trying something like:

~~~~
import re, sys
withbracks = re.compile(r'\[(.+?)\]')
for line in sys.stdin:
mat = withbracks.search(line)
if mat:
# Well, this line has at least one.
# Should be able to use withbracks.sub()
# and mat.group() maybe ... ?
line = withbracks.sub('{' + mat.group(0) + '}', line)
# No, that's not working right.

sys.stdout.write(line)
~~~~

but then am not sure where to go with that.

How would you do it?

Thanks.

Dennis Lee Bieber · Feb 7, 2010

but am not able to figure out how to do it with Python. I start out
trying something like:

~~~~
import re, sys
withbracks = re.compile(r'\[(.+?)\]')
for line in sys.stdin:
mat = withbracks.search(line)
if mat:
# Well, this line has at least one.
# Should be able to use withbracks.sub()
# and mat.group() maybe ... ?
line = withbracks.sub('{' + mat.group(0) + '}', line)
# No, that's not working right.

sys.stdout.write(line)
~~~~

but then am not sure where to go with that.

How would you do it?

Step one -- delete all your knowledge of regular expressions...

Step two -- study the reference documents on methods which apply to
strings...

intext = """Lorem [ipsum] dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut [labore] et [dolore] magna aliqua."""
outtext = "{".join("}".join(intext.split("]")).split("["))
outtext

Click to expand...

Click to expand...

'Lorem {ipsum} dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut {labore} et {dolore} magna aliqua.'

No doubt there is some even better method using .translate(), but
setting up the definition takes some time...

From the help file:
"""
translate( s, table[, deletechars])

Delete all characters from s that are in deletechars (if present), and
then translate the characters using table, which must be a 256-character
string giving the translation for each character value, indexed by its
ordinal.
"""

Alf P. Steinbach · Feb 7, 2010

* Schif Schaf:

Hi,

I've got some text that looks like this:

Lorem [ipsum] dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut [labore] et [dolore] magna aliqua.

and I want to make it look like this:

Lorem {ipsum} dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut {labore} et {dolore} magna aliqua.

(brackets replaced by braces). I can do that with Perl pretty easily:

~~~~
for (<>) {
s/\[(.+?)\]/\{$1\}/g;
print;
}
~~~~

but am not able to figure out how to do it with Python. I start out
trying something like:

~~~~
import re, sys
withbracks = re.compile(r'\[(.+?)\]')
for line in sys.stdin:
mat = withbracks.search(line)
if mat:
# Well, this line has at least one.
# Should be able to use withbracks.sub()
# and mat.group() maybe ... ?
line = withbracks.sub('{' + mat.group(0) + '}', line)
# No, that's not working right.

sys.stdout.write(line)
~~~~

but then am not sure where to go with that.

How would you do it?

I haven't used regexps in Python before, but what I did was (1) look in the
documentation, (2) check that it worked.

<code>
import re

text = (
"Lorem [ipsum] dolor sit amet, consectetur",
"adipisicing elit, sed do eiusmod tempor",
"incididunt ut [labore] et [dolore] magna aliqua."
)

withbracks = re.compile( r'\[(.+?)\]' )
for line in text:
print( re.sub( withbracks, r'{\1}', line) )
</code>

Python's equivalent of the Perl snippet seems to be the same number of lines,
and more clear.

Cheers & hth.,

- Alf

Schif Schaf · Feb 7, 2010

I haven't used regexps in Python before, but what I did was (1) look in the
documentation,

Hm. I checked in the repl, running `import re; help(re)` and the docs
on the `sub()` method didn't say anything about using back-refs in the
replacement string. Neat feature though.

(2) check that it worked.

<code>
import re

text = (
"Lorem [ipsum] dolor sit amet, consectetur",
"adipisicing elit, sed do eiusmod tempor",
"incididunt ut [labore] et [dolore] magna aliqua."
)

withbracks = re.compile( r'\[(.+?)\]' )
for line in text:
print( re.sub( withbracks, r'{\1}', line) )
</code>

Seems like there's magic happening here. There's the `withbracks`
regex that applies itself to `line`. But then when `re.sub()` does the
replacement operation, it appears to consult the `withbracks` regex on
the most recent match it just had.

Thanks.

Dennis Lee Bieber · Feb 7, 2010

intext.replace('[', '{').replace(']', '}')

Click to expand...

Click to expand...

<heh> Looks like I need to take my own advice re: references to
string methods...

Tim Chase · Feb 7, 2010

Schif said:
I haven't used regexps in Python before, but what I did was (1) look in the
documentation, [snip]
<code>
import re

text = (
"Lorem [ipsum] dolor sit amet, consectetur",
"adipisicing elit, sed do eiusmod tempor",
"incididunt ut [labore] et [dolore] magna aliqua."
)

withbracks = re.compile( r'\[(.+?)\]' )
for line in text:
print( re.sub( withbracks, r'{\1}', line) )
</code>

Click to expand...

Seems like there's magic happening here. There's the `withbracks`
regex that applies itself to `line`. But then when `re.sub()` does the
replacement operation, it appears to consult the `withbracks` regex on
the most recent match it just had.

I suspect Alf's rustiness with regexps caused him to miss the
simpler rendition of

print withbacks.sub(r'{\1}', line)

And to answer those who are reaching for other non-regex (whether
string translations or .replace(), or pyparsing) solutions, it
depends on what you want to happen in pathological cases like

s = """Dangling closing]
with properly [[nested]] and
complex [properly [nested] text]
and [improperly [nested] text
and with some text [straddling
lines] and with
dangling opening [brackets
"""
where you'll begin to see the differences.

-tkc

Steve Holden · Feb 7, 2010

@ Rocteur CC said:
Here is one simple solution :

intext = """Lorem [ipsum] dolor sit amet, consectetur adipisicing

Click to expand...

elit, sed do eiusmod tempor incididunt ut [labore] et [dolore] magna
aliqua."""

intext.replace('[', '{').replace(']',

Click to expand...

'}')
'Lorem {ipsum} dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut {labore} et {dolore} magna aliqua.'

/Some people, when confronted with a problem, think "I know, I’ll use
regular expressions." Now they have two problems./ — Jamie Zawinski
<ttp://jwz.livejournal.com> in comp.lang.emacs.

Click to expand...

That is because regular expressions are what we learned in programming
the shell from sed to awk and ksh and zsh and of course Perl and we've
read the two books by Jeffrey and much much more!!!

How do we rethink and relearn how we do things and should we ?

What is the solution ?

A rigorous focus on programming simplicity.

regards
Steve

Anssi Saari · Feb 7, 2010

Schif Schaf said:
(brackets replaced by braces). I can do that with Perl pretty easily:

~~~~
for (<>) {
s/\[(.+?)\]/\{$1\}/g;
print;
}
~~~~

Just curious, but since this is just transpose, then why not simply
tr/[]/{}/? I.e. why use a regular expression at all for this?

In python you would do this with

for line in text:
print line.replace('[', '{').replace(']', '}')

Steve Holden · Feb 7, 2010

Tim said:
Schif said:

I haven't used regexps in Python before, but what I did was (1) look
in the
documentation, [snip]
<code>
import re

text = (
"Lorem [ipsum] dolor sit amet, consectetur",
"adipisicing elit, sed do eiusmod tempor",
"incididunt ut [labore] et [dolore] magna aliqua."
)

withbracks = re.compile( r'\[(.+?)\]' )
for line in text:
print( re.sub( withbracks, r'{\1}', line) )
</code>

Click to expand...

Seems like there's magic happening here. There's the `withbracks`
regex that applies itself to `line`. But then when `re.sub()` does the
replacement operation, it appears to consult the `withbracks` regex on
the most recent match it just had.

Click to expand...

I suspect Alf's rustiness with regexps caused him to miss the simpler
rendition of

print withbacks.sub(r'{\1}', line)

And to answer those who are reaching for other non-regex (whether string
translations or .replace(), or pyparsing) solutions, it depends on what
you want to happen in pathological cases like

s = """Dangling closing]
with properly [[nested]] and
complex [properly [nested] text]
and [improperly [nested] text
and with some text [straddling
lines] and with
dangling opening [brackets
"""
where you'll begin to see the differences.

Really? Under what circumstances does a simple one-for-one character
replacement operation fail?

regards
Steve

Steve Holden · Feb 7, 2010

Tim said:
Schif said:

I haven't used regexps in Python before, but what I did was (1) look
in the
documentation, [snip]
<code>
import re

text = (
"Lorem [ipsum] dolor sit amet, consectetur",
"adipisicing elit, sed do eiusmod tempor",
"incididunt ut [labore] et [dolore] magna aliqua."
)

withbracks = re.compile( r'\[(.+?)\]' )
for line in text:
print( re.sub( withbracks, r'{\1}', line) )
</code>

Click to expand...

Seems like there's magic happening here. There's the `withbracks`
regex that applies itself to `line`. But then when `re.sub()` does the
replacement operation, it appears to consult the `withbracks` regex on
the most recent match it just had.

Click to expand...

I suspect Alf's rustiness with regexps caused him to miss the simpler
rendition of

print withbacks.sub(r'{\1}', line)

And to answer those who are reaching for other non-regex (whether string
translations or .replace(), or pyparsing) solutions, it depends on what
you want to happen in pathological cases like

s = """Dangling closing]
with properly [[nested]] and
complex [properly [nested] text]
and [improperly [nested] text
and with some text [straddling
lines] and with
dangling opening [brackets
"""
where you'll begin to see the differences.

Really? Under what circumstances does a simple one-for-one character
replacement operation fail?

regards
Steve

Tim Chase · Feb 7, 2010

Steve said:
Tim said:

And to answer those who are reaching for other non-regex (whether string
translations or .replace(), or pyparsing) solutions, it depends on what
you want to happen in pathological cases like

s = """Dangling closing]
with properly [[nested]] and
complex [properly [nested] text]
and [improperly [nested] text
and with some text [straddling
lines] and with
dangling opening [brackets
"""
where you'll begin to see the differences.

Click to expand...

Really? Under what circumstances does a simple one-for-one character
replacement operation fail?

Failure is only defined in the clarified context of what the OP
wants

Replacement operations only fail if the OP's desired
output from the above mess doesn't change *all* of the ]/[
characters, but only those with some form of parity (nested or
otherwise). But if the OP *does* want all of the ]/[ characters
replaced regardless of contextual nature, then yes, replace is a
much better solution than regexps.

-tkc

Schif Schaf · Feb 8, 2010

Steve said:
Steve said:

Really? Under what circumstances does a simple one-for-one character
replacement operation fail?

Click to expand...

Failure is only defined in the clarified context of what the OP
wants Replacement operations only fail if the OP's desired
output from the above mess doesn't change *all* of the ]/[
characters, but only those with some form of parity (nested or
otherwise). But if the OP *does* want all of the ]/[ characters
replaced regardless of contextual nature, then yes, replace is a
much better solution than regexps.

I need to do the usual "pipe text through and do various search/
replace" thing fairly often. The above case of having to replace
brackets with braces is only one example. Simple string methods run
out of steam pretty quickly and much of my work relies on using
regular expressions. Yes, I try to keep focused on simplicity, and
often regexes are the simplest solution for my day-to-day needs.

Anthra Norell · Feb 8, 2010

Schif said:
Steve Holden wrote:

Really? Under what circumstances does a simple one-for-one character
replacement operation fail?

Click to expand...

Failure is only defined in the clarified context of what the OP
wants Replacement operations only fail if the OP's desired
output from the above mess doesn't change *all* of the ]/[
characters, but only those with some form of parity (nested or
otherwise). But if the OP *does* want all of the ]/[ characters
replaced regardless of contextual nature, then yes, replace is a
much better solution than regexps.

Click to expand...

I need to do the usual "pipe text through and do various search/
replace" thing fairly often. The above case of having to replace
brackets with braces is only one example. Simple string methods run
out of steam pretty quickly and much of my work relies on using
regular expressions. Yes, I try to keep focused on simplicity, and
often regexes are the simplest solution for my day-to-day needs.

Could you post a complex case? It's a kindness to your helpers to
simplify your case, but if the simplification doesn't cover the full
scope of your problem you can't expect the suggestions to cover it.

Frederic

Grouping messages that have similarities	2	Jun 28, 2023
Help needed with code	5	Mar 7, 2021
jQuery Scrapping & Formatting Inputted Paste	2	Sep 30, 2020
Spliting input at blank line	3	Sep 25, 2007
How to make a flex box inherit its parent (container) background.	0	Sep 28, 2021
Benchmarking stripping of Unicode characters which are invalid XML	0	Mar 18, 2012
Justify-content not working	1	Mar 29, 2021
a test posting	1	Oct 13, 2010

Help with regex search-and-replace (Perl to Python)

Schif Schaf

Dennis Lee Bieber

Alf P. Steinbach

Schif Schaf

Dennis Lee Bieber

Tim Chase

Steve Holden

Anssi Saari

Steve Holden

Steve Holden

Tim Chase

Schif Schaf

Anthra Norell

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads