how to remove c++ comments from a cpp file?

Frank Potter · Jan 26, 2007

I only want to remove the comments which begin with "//".
I did like this, but it doesn't work.

r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE)
f=file.open("mycpp.cpp","r")
f=unicode(f,"utf8")
r.sub(ur"",f)

Will somebody show me the right way?
Thanks~~

Gary Herron · Jan 26, 2007

Frank said:
I only want to remove the comments which begin with "//".
I did like this, but it doesn't work.

r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE)
f=file.open("mycpp.cpp","r")
f=unicode(f,"utf8")
r.sub(ur"",f)

Will somebody show me the right way?
Thanks~~

If you expect help with a problem, it would be nice if you told us what
the problem is. What error did you get?

But even without that I see lots of errors:

You must import re before you use it:
import re

Open a file with open((..) not file.open(...).

Once you open the file you must *read* the contents and operate on that:
data = f.read()

Then you ought to close the file:
f.close()

Now you can do your sub on the string in data -- but note, THIS WON'T
CHANGE data, but rather returns a new string which you must assign to
something:

new_data = r.sub(ur"", data)

Then do something with the new string.

Also I fear your regular expression is incorrect.

Cheers,
Gary Herron

Frank Potter · Jan 26, 2007

Frank said:
Frank said:

I only want to remove the comments which begin with "//".
I did like this, but it doesn't work.

Click to expand...

r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE)
f=file.open("mycpp.cpp","r")
f=unicode(f,"utf8")
r.sub(ur"",f)

Click to expand...

Will somebody show me the right way?
Thanks~~If you expect help with a problem, it would be nice if you told us what

Click to expand...

the problem is. What error did you get?

But even without that I see lots of errors:

You must import re before you use it:
import re

Open a file with open((..) not file.open(...).

Once you open the file you must *read* the contents and operate on that:
data = f.read()

Then you ought to close the file:
f.close()

Now you can do your sub on the string in data -- but note, THIS WON'T
CHANGE data, but rather returns a new string which you must assign to
something:

new_data = r.sub(ur"", data)

Then do something with the new string.

Also I fear your regular expression is incorrect.

Cheers,
Gary Herron

Thank you.
I'm very sorry because I was in a hurry when I post this thread.
I'll post again my code here:

Code:

import re

f=open("show_btchina.user.js","r").read()
f=unicode(f,"utf8")

r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE)
f_new=r.sub(ur"",f)

open("modified.js","w").write(f_new.encode("utf8"))

And, the problem is, it seems that only the last comment is removed.
How can I remove all of the comments, please?

Gabriel Genellina · Jan 26, 2007

Code:
Code:

import re f=open("show_btchina.user.js","r").read() f=unicode(f,"utf8") r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE) f_new=r.sub(ur"",f) open("modified.js","w").write(f_new.encode("utf8"))

And, the problem is, it seems that only the last comment is removed.
How can I remove all of the comments, please?

Note that it's not as easy as simply deleting from // to end of line,
because those characters might be inside a string literal. But if you
can afford the risk, this is a simple way without re:

f = open("show_btchina.user.js","r")
modf = open("modified.js","w")
for line in f:
uline=unicode(line,"utf8")
idx = uline.find("//")
if idx==0:
continue
elif idx>0:
uline = uline[:idx]+'\n'
modf.write(uline.encode("utf8"))
modf.close()
f.close()

--
Gabriel Genellina
Softlab SRL

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Frank Potter · Jan 26, 2007

Thank you!

Code:
Code:

import re [QUOTE] f=open("show_btchina.user.js","r").read() f=unicode(f,"utf8")[/QUOTE] r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE) f_new=r.sub(ur"",f) [QUOTE] open("modified.js","w").write(f_new.encode("utf8"))

Click to expand...

And, the problem is, it seems that only the last comment is removed.
How can I remove all of the comments, please?Note that it's not as easy as simply deleting from // to end of line,

because those characters might be inside a string literal. But if you
can afford the risk, this is a simple way without re:

f = open("show_btchina.user.js","r")
modf = open("modified.js","w")
for line in f:
uline=unicode(line,"utf8")
idx = uline.find("//")
if idx==0:
continue
elif idx>0:
uline = uline[:idx]+'\n'
modf.write(uline.encode("utf8"))
modf.close()
f.close()

--
Gabriel Genellina
Softlab SRL

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!http://www.yahoo.com.ar/respuestas[/QUOTE]

Laurent Rahuel · Jan 26, 2007

And using the codecs module

Code:

import codecs

f = codecs.open("show_btchina.user.js","r","utf-8")
modf = codecs.open("modified.js","w","utf-8")
for line in f:
     idx = line.find(u"//")
     if idx==0:
         continue
     elif idx>0:
         line = line[:idx]+u'\n'
     modf.write(line)
modf.close()
f.close()

Gabriel said:
Code:

import re f=open("show_btchina.user.js","r").read() f=unicode(f,"utf8") r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE) f_new=r.sub(ur"",f) open("modified.js","w").write(f_new.encode("utf8"))

And, the problem is, it seems that only the last comment is removed.
How can I remove all of the comments, please?

Click to expand...

Note that it's not as easy as simply deleting from // to end of line,
because those characters might be inside a string literal. But if you
can afford the risk, this is a simple way without re:

f = open("show_btchina.user.js","r")
modf = open("modified.js","w")
for line in f:
uline=unicode(line,"utf8")
idx = uline.find("//")
if idx==0:
continue
elif idx>0:
uline = uline[:idx]+'\n'
modf.write(uline.encode("utf8"))
modf.close()
f.close()

Peter Otten · Jan 26, 2007

Laurent said:
And using the codecs module

Why would you de/encode at all?

Peter

Paul McGuire · Jan 26, 2007

Code:
I'm very sorry because I was in a hurry when I post this thread.
I'll post again my code here:

Code:

import re f=open("show_btchina.user.js","r").read() f=unicode(f,"utf8") r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE) f_new=r.sub(ur"",f) open("modified.js","w").write(f_new.encode("utf8"))

Here's a pyparsing version that will stay clear of '//' inside quoted
strings. (untested)

-- Paul

from pyparsing import javaStyleComment, dblQuotedString

f=open("show_btchina.user.js","r").read()
f=unicode(f,"utf8")

commentFilter = Suppress( javaStyleComment ).ignore( dblQuotedString )
f_new= commentFilter.transformString(f)

open("modified.js","w").write(f_new.encode("utf8"))

Gabriel Genellina · Jan 26, 2007

Peter Otten said:
Why would you de/encode at all?

I'd say the otherwise: why not? This is the recommended practice: decode
inputs as soon as possible, work on Unicode, encode only when you write the
output.
In this particular case, it's not necesary and you get the same results,
only because these two conditions are met:

- the encoding used is utf-8
- we're looking for '//', and no unicode character contains '/' in its
representation using that encoding apart from '/' itself

Looking for the byte sequence '//' into data encoded with a different
encoding (like utf-16 or ucs-2) could give false positives. And looking for
other things (like '¡¡') on utf-8 could give false positives too.
The same applies if one wants to skip string literals looking for '"' and
'\\"'.
Anyway for a toy script like this, perhaps it does not make any sense at
all - but one should be aware of the potential problems.

Toby · Jan 27, 2007

Frank said:
r=re.compile(ur"//[^\r\n]+$", re.UNICODE|re.VERBOSE)
f_new=r.sub(ur"",f)

From the documentation:

re.MULTILINE
When specified [...] the pattern character "$" matches at the
end of the string and at the end of each line (immediately
preceding each newline). By default [...] "$" matches only at
the end of the string.

re.DOTALL
[...] without this flag, "." will match anything except a newline.

So a simple solution to your problem would be:

r = re.compile("//.*")
f_new = r.sub("", f)

Toby

Regular Expression for Finding and Deleting comments	1	Jan 4, 2011
How to remove // comments	100	Oct 20, 2006
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
How can I hide a div using an event listener on multiple checkboxes?	6	Dec 23, 2022
FAQ 6.11 How do I use a regular expression to strip C style comments from a file?	0	Feb 10, 2011
Windows file paths, again	11	Oct 21, 2009
How to remove an empty line which is created when i deleted a element from my xml file?	0	Oct 1, 2016
Export data from python to a txt file	5	Mar 29, 2013

how to remove c++ comments from a cpp file?

Frank Potter

Gary Herron

Frank Potter

Gabriel Genellina

Frank Potter

Laurent Rahuel

Peter Otten

Paul McGuire

Gabriel Genellina

Toby

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads