comments accross platforms

  • Thread starter Hendrik Wendler
  • Start date
H

Hendrik Wendler

Hi everybody,

this may be a stupid question:
i want to strip comments from a .cpp
file.

cpp comments look like:

// (two slashes) .... comment until newline -->|

but how do i catch newlines in different os
( UNIX / Mac / Win -> CR, LF, CR+LF)? are they always defined as '\n',
and the compilers will take care of it? or do i have to check
the file type in advance? how can i assure i strip the comments
of a MacOS file correctly under Win32?

best regards + many thanks,
hendrik
 
V

Victor Bazarov

Hendrik said:
Hi everybody,

this may be a stupid question:
i want to strip comments from a .cpp
file.

cpp comments look like:

// (two slashes) .... comment until newline -->|

but how do i catch newlines in different os
( UNIX / Mac / Win -> CR, LF, CR+LF)? are they always defined as '\n',
and the compilers will take care of it?

That depends on (a) how you open the file and (b) what platform the file
was written on: one of the challenges is to read, say, a UNIX file on
Windows, or vice versa.
> or do i have to check
the file type in advance?

Usually not. Besides, there is no sure way.
> how can i assure i strip the comments
of a MacOS file correctly under Win32?

The three platforms that have the line breaks differently all have the \n
symbol in there somewhere. Our approach was always to look for that (BTW,
that's probably why 'std::getline' has '\n' as the default value for the
terminator symbol), and always weed out \r from the string obtained. Of
course, that requires opening the file as _binary_, not "text".

V
 
M

Mike Wahler

Hendrik Wendler said:
Hi everybody,

this may be a stupid question:
i want to strip comments from a .cpp
file.

cpp comments look like:

// (two slashes) .... comment until newline -->|

but how do i catch newlines in different os
( UNIX / Mac / Win -> CR, LF, CR+LF)? are they always defined as '\n',

Yes, within your program, newline characters are expressed
as '\n', regardless of host platform.
and the compilers will take care of it?

Yes, in 'text' mode (the default for iostreams). In 'binary'
mode, no translation occurs.
or do i have to check
the file type in advance? how can i assure i strip the comments
of a MacOS file correctly under Win32?

It's not a simple as you might imagine (regardless of the
newline issue). Comments in C++ can also be expressed within
the 'C-style' delimiters /* and */. You'll need to keep track
of those and make sure each 'start' delimiter is matched by
exactly one 'end' delimiter.


-Mike
 
V

Vijai Kalyan

Is that all you want to do or is it part of something bigger and more
complex? For the former, although it is an interesting exercise to do
it in C++ (you _do_ want to do it in C++ no? :) I would suggest perl.
For the latter, if you are writing a more complex program parser or
something of that sort, then a lexical analyzer generator is better (a
lexical analyzer will usually have states. So for example, you can look
for a \r followed by a \n or a plain \n and so on).

-vijai.
 
B

Ben Pope

Victor said:
The three platforms that have the line breaks differently all have the \n
symbol in there somewhere. Our approach was always to look for that (BTW,
that's probably why 'std::getline' has '\n' as the default value for the
terminator symbol), and always weed out \r from the string obtained. Of
course, that requires opening the file as _binary_, not "text".

I thought:
Windows: "\r\n"
*nix: "\n"
mac: "\r"

A positive PITA.

Ben
 
A

Alf P. Steinbach

* Hendrik Wendler:
this may be a stupid question:
i want to strip comments from a .cpp
file.

cpp comments look like:

// (two slashes) .... comment until newline -->|

but how do i catch newlines in different os
( UNIX / Mac / Win -> CR, LF, CR+LF)? are they always defined as '\n',
and the compilers will take care of it?

The compiler's associated standard library takes care on it for that
particular platform's convention.

or do i have to check
the file type in advance? how can i assure i strip the comments
of a MacOS file correctly under Win32?

In that scenario you'll need to open the file in binary mode, and check for
either '\r' (Mac), '\n' (Unix), or '\r' followed '\n' (Windows). One simple
algorithm for your cross-plattform application is to simply regard any of
'\r' or '\n' as end-of-line, and copy the characters faithfully. Of course
that may not work for some obscure platform where, say, files are
record-oriented with fixed length lines, no end-of-line character (e.g., the
HP3000 under MPE I think it was called, early eighties...).



OT extra note: I now checked that MPE thing via Google, and found to my
astonishment (Wikipedia) that the HP3000 series, introduced in 1973, was
still sold up till 2003 (!), with service available until 2007! Ouch. It
must hurt to use those old beasties, not to mention _buying_ them -- I
wonder if anyone's running old PDP-11s, and perhaps even buying them? Must
be some pointy-haired bosses doing this. It's really scary.
 
P

Pete Becker

Hendrik said:
Hi everybody,

this may be a stupid question:
i want to strip comments from a .cpp
file.

cpp comments look like:

// (two slashes) .... comment until newline -->|

but how do i catch newlines in different os
( UNIX / Mac / Win -> CR, LF, CR+LF)? are they always defined as '\n',
and the compilers will take care of it? or do i have to check
the file type in advance? how can i assure i strip the comments
of a MacOS file correctly under Win32?

If you're transferring files between systems use ftp in ascii mode. It
will translate line endings, except on certain brain-damaged versions of
Linux. Once you've transferred a file, use

sed -fs!//.*$!! < source-file > target-file

I think that's the right script command, but you should probably check.
And you might need quotation marks around the script.
 
V

Victor Bazarov

Ben said:
I thought:
Windows: "\r\n"
*nix: "\n"
mac: "\r"

A positive PITA.

You're right. Up to MacOS 9 it had \r only. Now, OsX and after
they switched to "normal" UNIX \n.

V
 
J

Jack Klein

Hi everybody,

this may be a stupid question:
i want to strip comments from a .cpp
file.

cpp comments look like:

// (two slashes) .... comment until newline -->|

but how do i catch newlines in different os
( UNIX / Mac / Win -> CR, LF, CR+LF)? are they always defined as '\n',
and the compilers will take care of it? or do i have to check
the file type in advance? how can i assure i strip the comments
of a MacOS file correctly under Win32?

best regards + many thanks,
hendrik

There is nothing guaranteed by the C++ standard, but there's a
technique I have used for more than 20 years in C that will handle the
three platforms you mention. It is a little more work than just using
std::getline(), however.

Open your file in binary mode. Read the file character by character,
or read it in chunks into a buffer and step through it character by
character.

A '\r' character is always considered an end of line token. A '\n'
character is also considered an end of line token, EXCEPT when it is
immediately preceded by a '\r'.

This will work for *x, Window/MS-DOS, and Mac.

Where it won't work is for platforms where end of line is not
indicated by a character, i.e., fixed block text files, and there are
probably still a few dinosaurs like that still out there. It also
won't work if some pathological file systems stores text files with
"\n\r" in that order.
 
B

Ben Pope

Victor said:
You're right. Up to MacOS 9 it had \r only. Now, OsX and after
they switched to "normal" UNIX \n.

Yeah, I was wondering that with a colleague today. We presumed that since it was basically *nix (BSD) it would be \n in OSX.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top