getline() and newlines

B

barcaroller

I have a text file with mixed carriage returns ('\n' and '\r\n').

On Linux, both the std::string getline() global function and the
std::iostream getline() member function are keeping some of the newlines in
the result (I suspect they look only for the '\n').

* Is there a quick way I can tell either function to gobble up both
Windows-style and Unix-style newlines?

* If not, what would be an efficient way of getting rid of them? Currently
I use string::find_last_of("\n\r") + string::erase() but this is not very
efficient.
 
O

Obnoxious User

I have a text file with mixed carriage returns ('\n' and '\r\n').

On Linux, both the std::string getline() global function and the
std::iostream getline() member function are keeping some of the newlines
in the result (I suspect they look only for the '\n').

* Is there a quick way I can tell either function to gobble up both
Windows-style and Unix-style newlines?

* If not, what would be an efficient way of getting rid of them?
Currently
I use string::find_last_of("\n\r") + string::erase() but this is not
very efficient.

A simple and quick solution, adjust it to your own needs:

#include <iostream>
#include <sstream>

std::istream & getline(std::istream & in, std::string & out) {
char c;
while(in.get(c).good()) {
if(c == '\n') {
c = in.peek();
if(in.good()) {
if(c == '\r') {
in.ignore();
}
}
break;
}
out.append(1,c);
}
return in;
}

int main() {
std::istringstream strm("alpha\nbeta\n\r...\n\romega\n\n");
for(int i = 0; strm.good(); ++i) {
std::string line;
getline(strm,line);
std::cout<<i<<"\t"<<line<<std::endl;
}
return 0;
}
 
O

Obnoxious User

A simple and quick solution, adjust it to your own needs:

#include <iostream>
#include <sstream>

std::istream & getline(std::istream & in, std::string & out) {
char c;
while(in.get(c).good()) {
if(c == '\n') {
c = in.peek();
if(in.good()) {
if(c == '\r') {
in.ignore();
}
}
break;
}
out.append(1,c);
}
return in;
}

int main() {
std::istringstream strm("alpha\nbeta\n\r...\n\romega\n\n");
for(int i = 0; strm.good(); ++i) {
std::string line;
getline(strm,line);
std::cout<<i<<"\t"<<line<<std::endl;
}
return 0;
}

Realized after I posted it that I reversed the sequence, so the code is
flawed for your needs. Although easily fixed. Ignore it.
 
E

Erik Wikström

I have a text file with mixed carriage returns ('\n' and '\r\n').

On Linux, both the std::string getline() global function and the
std::iostream getline() member function are keeping some of the newlines in
the result (I suspect they look only for the '\n').

* Is there a quick way I can tell either function to gobble up both
Windows-style and Unix-style newlines?

While you can specify the delimiting character you can only specify one
character.
* If not, what would be an efficient way of getting rid of them? Currently
I use string::find_last_of("\n\r") + string::erase() but this is not very
efficient.

Since the Windows sequence is \r\n and getline() uses \n as delimiter
any line with a Windows linebreak will end with \r. Use this knowledge
to reduce the work required:

std::string str;
std::getline(file, str);

if (str[str.size() - 1] == '\r')
str.resize(str.size() - 1);
 
J

James Kanze

I have a text file with mixed carriage returns ('\n' and '\r\n').
On Linux, both the std::string getline() global function and
the std::iostream getline() member function are keeping some
of the newlines in the result (I suspect they look only for
the '\n').

Technically, it's implementation defined. Typically, however,
yes: Unix implementations treat a single 0x0A in the stream as a
newline; Windows implementations treat either a single 0x0A or
the sequence 0x0D, 0x0A as a newline.

Most of the time, this should not be a problem. In all of the
usual encodings (at least outside of the mainframe world), the
0x0D will result in an '\r' under Unix (and probably also under
Windows, if it isn't immediately followed by a 0x0A). In the
"C" locale, and probably in all other locales, '\r' is
whitespace. So it ends up ignored with the rest of the trailing
whitespace. (The one exception is C and C++ source code; for
some reason, the standard doesn't consider '\r' as whitespace in
source code.)
* Is there a quick way I can tell either function to gobble
up both Windows-style and Unix-style newlines?

Is there ever a need to?
* If not, what would be an efficient way of getting rid of
them? Currently I use string::find_last_of("\n\r") +
string::erase() but this is not very efficient.

I'd use an external program (e.g. tr). In practice, if a file
is on a shared file system, and thus being read by both Windows
and Unix, it's generally best (pragmatically, at least) to stick
with the Unix conventions.
 
A

Antoine Mathys

std::string str;
std::getline(file, str);

if (str[str.size() - 1] == '\r')
str.resize(str.size() - 1);

And with a empty line with unix end of line -> SEGFAULT

The code fragment should be:
if ((str.size() > 0) && (str[str.size() - 1] == '\r')
str.resize(str.size() - 1);
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

getline problems 8
portable std::getline and line terminators 2
adapting getline 77
ifstream::getline() synatx 18
Crossword 14
getline problem 4
Stringstreams Getline 3
getline buffering 8

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top