F: How can I make re.sub() replace patterns across newlines

  • Thread starter Viktor Rosenfeld
  • Start date
V

Viktor Rosenfeld

Hi,

I want to strip a JAVA file of /* */ like comments. Unfortunately, the
simple regexp "\/\*.*\*\/" only works on comments, that are on one line.
Is there a simple way to remove comments that go across several lines with
python regexp's? I tried re.M to no avail.

Thanks,
Viktor
 
K

Karl =?iso-8859-1?q?Pfl=E4sterer?=

Viktor said:
I want to strip a JAVA file of /* */ like comments. Unfortunately, the
simple regexp "\/\*.*\*\/" only works on comments, that are on one line.
Is there a simple way to remove comments that go across several lines with
python regexp's? I tried re.M to no avail.

You must use re.S

,----[ Python lib reference ]
| `S'
|
| `DOTALL'
| Make the `.' special character match any character at all,
| including a newline; without this flag, `.' will match anything
| _except_ a newline.
`----


KP
 
J

Josiah Carlson

Viktor said:
Hi,

I want to strip a JAVA file of /* */ like comments. Unfortunately, the
simple regexp "\/\*.*\*\/" only works on comments, that are on one line.
Is there a simple way to remove comments that go across several lines with
python regexp's? I tried re.M to no avail.

Thanks,
Viktor

Viktor,

Supply the DOTALL flag during the regular expression compile as
described here: http://www.python.org/doc/current/lib/re-syntax.html

You will also want to make the regular expression non-greedy...the
reasons are quite evident.
.... /* this is a
.... multi-line comment */
....
.... /* this is a single-line comment */
....
.... /* this /* has multiple
.... starts */
.... """
#non-greedy matching['/* this is a\nmulti-line comment */',
'/* this is a single-line comment */',
'/* this /* has multiple\nstarts */']

#greedy matching['/* this is a\nmulti-line comment */\n\n/* this is a single-line
comment */\n\n/* this /* has multiple\nstarts */']

- Josiah
 
H

Hans Nowak

Viktor said:
Hi,

I want to strip a JAVA file of /* */ like comments. Unfortunately, the
simple regexp "\/\*.*\*\/" only works on comments, that are on one line.
Is there a simple way to remove comments that go across several lines with
python regexp's? I tried re.M to no avail.

Something like:

import re
pattern = re.compile("/\*.*?\*/", re.MULTILINE|re.DOTALL)
stripped_data = pattern.sub("", data)

Note that I added a ? to the regex, so it won't be "greedy".

HTH,
 
V

Viktor Rosenfeld

Thanks to all that were quick to answer, using re.DOTALL indeed solves the
problem. I was too tired to read the documentation correctly.

Ciao,
Viktor
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top