Help with regex search-and-replace (Perl to Python)

S

Schif Schaf

Hi,

I've got some text that looks like this:


Lorem [ipsum] dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut [labore] et [dolore] magna aliqua.

and I want to make it look like this:


Lorem {ipsum} dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut {labore} et {dolore} magna aliqua.

(brackets replaced by braces). I can do that with Perl pretty easily:

~~~~
for (<>) {
s/\[(.+?)\]/\{$1\}/g;
print;
}
~~~~

but am not able to figure out how to do it with Python. I start out
trying something like:

~~~~
import re, sys
withbracks = re.compile(r'\[(.+?)\]')
for line in sys.stdin:
mat = withbracks.search(line)
if mat:
# Well, this line has at least one.
# Should be able to use withbracks.sub()
# and mat.group() maybe ... ?
line = withbracks.sub('{' + mat.group(0) + '}', line)
# No, that's not working right.

sys.stdout.write(line)
~~~~

but then am not sure where to go with that.

How would you do it?

Thanks.
 
D

Dennis Lee Bieber

but am not able to figure out how to do it with Python. I start out
trying something like:

~~~~
import re, sys
withbracks = re.compile(r'\[(.+?)\]')
for line in sys.stdin:
mat = withbracks.search(line)
if mat:
# Well, this line has at least one.
# Should be able to use withbracks.sub()
# and mat.group() maybe ... ?
line = withbracks.sub('{' + mat.group(0) + '}', line)
# No, that's not working right.

sys.stdout.write(line)
~~~~

but then am not sure where to go with that.

How would you do it?
Step one -- delete all your knowledge of regular expressions...

Step two -- study the reference documents on methods which apply to
strings...
intext = """Lorem [ipsum] dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut [labore] et [dolore] magna aliqua."""
outtext = "{".join("}".join(intext.split("]")).split("["))
outtext
'Lorem {ipsum} dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut {labore} et {dolore} magna aliqua.'

No doubt there is some even better method using .translate(), but
setting up the definition takes some time...

From the help file:
"""
translate( s, table[, deletechars])

Delete all characters from s that are in deletechars (if present), and
then translate the characters using table, which must be a 256-character
string giving the translation for each character value, indexed by its
ordinal.
"""
 
A

Alf P. Steinbach

* Schif Schaf:
Hi,

I've got some text that looks like this:


Lorem [ipsum] dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut [labore] et [dolore] magna aliqua.

and I want to make it look like this:


Lorem {ipsum} dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut {labore} et {dolore} magna aliqua.

(brackets replaced by braces). I can do that with Perl pretty easily:

~~~~
for (<>) {
s/\[(.+?)\]/\{$1\}/g;
print;
}
~~~~

but am not able to figure out how to do it with Python. I start out
trying something like:

~~~~
import re, sys
withbracks = re.compile(r'\[(.+?)\]')
for line in sys.stdin:
mat = withbracks.search(line)
if mat:
# Well, this line has at least one.
# Should be able to use withbracks.sub()
# and mat.group() maybe ... ?
line = withbracks.sub('{' + mat.group(0) + '}', line)
# No, that's not working right.

sys.stdout.write(line)
~~~~

but then am not sure where to go with that.

How would you do it?

I haven't used regexps in Python before, but what I did was (1) look in the
documentation, (2) check that it worked.


<code>
import re

text = (
"Lorem [ipsum] dolor sit amet, consectetur",
"adipisicing elit, sed do eiusmod tempor",
"incididunt ut [labore] et [dolore] magna aliqua."
)

withbracks = re.compile( r'\[(.+?)\]' )
for line in text:
print( re.sub( withbracks, r'{\1}', line) )
</code>


Python's equivalent of the Perl snippet seems to be the same number of lines,
and more clear. :)


Cheers & hth.,

- Alf
 
S

Schif Schaf

I haven't used regexps in Python before, but what I did was (1) look in the
documentation,

Hm. I checked in the repl, running `import re; help(re)` and the docs
on the `sub()` method didn't say anything about using back-refs in the
replacement string. Neat feature though.
(2) check that it worked.

<code>
import re

text = (
     "Lorem [ipsum] dolor sit amet, consectetur",
     "adipisicing elit, sed do eiusmod tempor",
     "incididunt ut [labore] et [dolore] magna aliqua."
     )

withbracks = re.compile( r'\[(.+?)\]' )
for line in text:
     print( re.sub( withbracks, r'{\1}', line) )
</code>

Seems like there's magic happening here. There's the `withbracks`
regex that applies itself to `line`. But then when `re.sub()` does the
replacement operation, it appears to consult the `withbracks` regex on
the most recent match it just had.

Thanks.
 
T

Tim Chase

Schif said:
I haven't used regexps in Python before, but what I did was (1) look in the
documentation, [snip]
<code>
import re

text = (
"Lorem [ipsum] dolor sit amet, consectetur",
"adipisicing elit, sed do eiusmod tempor",
"incididunt ut [labore] et [dolore] magna aliqua."
)

withbracks = re.compile( r'\[(.+?)\]' )
for line in text:
print( re.sub( withbracks, r'{\1}', line) )
</code>

Seems like there's magic happening here. There's the `withbracks`
regex that applies itself to `line`. But then when `re.sub()` does the
replacement operation, it appears to consult the `withbracks` regex on
the most recent match it just had.

I suspect Alf's rustiness with regexps caused him to miss the
simpler rendition of

print withbacks.sub(r'{\1}', line)

And to answer those who are reaching for other non-regex (whether
string translations or .replace(), or pyparsing) solutions, it
depends on what you want to happen in pathological cases like

s = """Dangling closing]
with properly [[nested]] and
complex [properly [nested] text]
and [improperly [nested] text
and with some text [straddling
lines] and with
dangling opening [brackets
"""
where you'll begin to see the differences.

-tkc
 
S

Steve Holden

@ Rocteur CC said:
Here is one simple solution :
intext = """Lorem [ipsum] dolor sit amet, consectetur adipisicing
elit, sed do eiusmod tempor incididunt ut [labore] et [dolore] magna
aliqua."""
intext.replace('[', '{').replace(']',
'}')
'Lorem {ipsum} dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut {labore} et {dolore} magna aliqua.'

/Some people, when confronted with a problem, think "I know, I’ll use
regular expressions." Now they have two problems./ — Jamie Zawinski
<ttp://jwz.livejournal.com> in comp.lang.emacs.

That is because regular expressions are what we learned in programming
the shell from sed to awk and ksh and zsh and of course Perl and we've
read the two books by Jeffrey and much much more!!!

How do we rethink and relearn how we do things and should we ?

What is the solution ?
A rigorous focus on programming simplicity.

regards
Steve
 
A

Anssi Saari

Schif Schaf said:
(brackets replaced by braces). I can do that with Perl pretty easily:

~~~~
for (<>) {
s/\[(.+?)\]/\{$1\}/g;
print;
}
~~~~

Just curious, but since this is just transpose, then why not simply
tr/[]/{}/? I.e. why use a regular expression at all for this?

In python you would do this with

for line in text:
print line.replace('[', '{').replace(']', '}')
 
S

Steve Holden

Tim said:
Schif said:
I haven't used regexps in Python before, but what I did was (1) look
in the
documentation, [snip]
<code>
import re

text = (
"Lorem [ipsum] dolor sit amet, consectetur",
"adipisicing elit, sed do eiusmod tempor",
"incididunt ut [labore] et [dolore] magna aliqua."
)

withbracks = re.compile( r'\[(.+?)\]' )
for line in text:
print( re.sub( withbracks, r'{\1}', line) )
</code>

Seems like there's magic happening here. There's the `withbracks`
regex that applies itself to `line`. But then when `re.sub()` does the
replacement operation, it appears to consult the `withbracks` regex on
the most recent match it just had.

I suspect Alf's rustiness with regexps caused him to miss the simpler
rendition of

print withbacks.sub(r'{\1}', line)

And to answer those who are reaching for other non-regex (whether string
translations or .replace(), or pyparsing) solutions, it depends on what
you want to happen in pathological cases like

s = """Dangling closing]
with properly [[nested]] and
complex [properly [nested] text]
and [improperly [nested] text
and with some text [straddling
lines] and with
dangling opening [brackets
"""
where you'll begin to see the differences.
Really? Under what circumstances does a simple one-for-one character
replacement operation fail?

regards
Steve
 
S

Steve Holden

Tim said:
Schif said:
I haven't used regexps in Python before, but what I did was (1) look
in the
documentation, [snip]
<code>
import re

text = (
"Lorem [ipsum] dolor sit amet, consectetur",
"adipisicing elit, sed do eiusmod tempor",
"incididunt ut [labore] et [dolore] magna aliqua."
)

withbracks = re.compile( r'\[(.+?)\]' )
for line in text:
print( re.sub( withbracks, r'{\1}', line) )
</code>

Seems like there's magic happening here. There's the `withbracks`
regex that applies itself to `line`. But then when `re.sub()` does the
replacement operation, it appears to consult the `withbracks` regex on
the most recent match it just had.

I suspect Alf's rustiness with regexps caused him to miss the simpler
rendition of

print withbacks.sub(r'{\1}', line)

And to answer those who are reaching for other non-regex (whether string
translations or .replace(), or pyparsing) solutions, it depends on what
you want to happen in pathological cases like

s = """Dangling closing]
with properly [[nested]] and
complex [properly [nested] text]
and [improperly [nested] text
and with some text [straddling
lines] and with
dangling opening [brackets
"""
where you'll begin to see the differences.
Really? Under what circumstances does a simple one-for-one character
replacement operation fail?

regards
Steve
 
T

Tim Chase

Steve said:
Tim said:
And to answer those who are reaching for other non-regex (whether string
translations or .replace(), or pyparsing) solutions, it depends on what
you want to happen in pathological cases like

s = """Dangling closing]
with properly [[nested]] and
complex [properly [nested] text]
and [improperly [nested] text
and with some text [straddling
lines] and with
dangling opening [brackets
"""
where you'll begin to see the differences.
Really? Under what circumstances does a simple one-for-one character
replacement operation fail?

Failure is only defined in the clarified context of what the OP
wants :) Replacement operations only fail if the OP's desired
output from the above mess doesn't change *all* of the ]/[
characters, but only those with some form of parity (nested or
otherwise). But if the OP *does* want all of the ]/[ characters
replaced regardless of contextual nature, then yes, replace is a
much better solution than regexps.

-tkc
 
S

Schif Schaf

Steve said:
Really? Under what circumstances does a simple one-for-one character
replacement operation fail?

Failure is only defined in the clarified context of what the OP
wants :)  Replacement operations only fail if the OP's desired
output from the above mess doesn't change *all* of the ]/[
characters, but only those with some form of parity (nested or
otherwise).  But if the OP *does* want all of the ]/[ characters
replaced regardless of contextual nature, then yes, replace is a
much better solution than regexps.

I need to do the usual "pipe text through and do various search/
replace" thing fairly often. The above case of having to replace
brackets with braces is only one example. Simple string methods run
out of steam pretty quickly and much of my work relies on using
regular expressions. Yes, I try to keep focused on simplicity, and
often regexes are the simplest solution for my day-to-day needs.
 
A

Anthra Norell

Schif said:
Steve Holden wrote:

Really? Under what circumstances does a simple one-for-one character
replacement operation fail?
Failure is only defined in the clarified context of what the OP
wants :) Replacement operations only fail if the OP's desired
output from the above mess doesn't change *all* of the ]/[
characters, but only those with some form of parity (nested or
otherwise). But if the OP *does* want all of the ]/[ characters
replaced regardless of contextual nature, then yes, replace is a
much better solution than regexps.

I need to do the usual "pipe text through and do various search/
replace" thing fairly often. The above case of having to replace
brackets with braces is only one example. Simple string methods run
out of steam pretty quickly and much of my work relies on using
regular expressions. Yes, I try to keep focused on simplicity, and
often regexes are the simplest solution for my day-to-day needs.
Could you post a complex case? It's a kindness to your helpers to
simplify your case, but if the simplification doesn't cover the full
scope of your problem you can't expect the suggestions to cover it.

Frederic
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top