getline() and newlines

Discussion in 'C++' started by barcaroller, Apr 5, 2008.

  1. barcaroller

    barcaroller Guest

    I have a text file with mixed carriage returns ('\n' and '\r\n').

    On Linux, both the std::string getline() global function and the
    std::iostream getline() member function are keeping some of the newlines in
    the result (I suspect they look only for the '\n').

    * Is there a quick way I can tell either function to gobble up both
    Windows-style and Unix-style newlines?

    * If not, what would be an efficient way of getting rid of them? Currently
    I use string::find_last_of("\n\r") + string::erase() but this is not very
    efficient.
     
    barcaroller, Apr 5, 2008
    #1
    1. Advertising

  2. On Sat, 05 Apr 2008 09:25:11 -0400, barcaroller wrote:

    > I have a text file with mixed carriage returns ('\n' and '\r\n').
    >
    > On Linux, both the std::string getline() global function and the
    > std::iostream getline() member function are keeping some of the newlines
    > in the result (I suspect they look only for the '\n').
    >
    > * Is there a quick way I can tell either function to gobble up both
    > Windows-style and Unix-style newlines?
    >
    > * If not, what would be an efficient way of getting rid of them?
    > Currently
    > I use string::find_last_of("\n\r") + string::erase() but this is not
    > very efficient.


    A simple and quick solution, adjust it to your own needs:

    #include <iostream>
    #include <sstream>

    std::istream & getline(std::istream & in, std::string & out) {
    char c;
    while(in.get(c).good()) {
    if(c == '\n') {
    c = in.peek();
    if(in.good()) {
    if(c == '\r') {
    in.ignore();
    }
    }
    break;
    }
    out.append(1,c);
    }
    return in;
    }

    int main() {
    std::istringstream strm("alpha\nbeta\n\r...\n\romega\n\n");
    for(int i = 0; strm.good(); ++i) {
    std::string line;
    getline(strm,line);
    std::cout<<i<<"\t"<<line<<std::endl;
    }
    return 0;
    }

    --
    OU
     
    Obnoxious User, Apr 5, 2008
    #2
    1. Advertising

  3. On Sat, 05 Apr 2008 13:45:39 +0000, Obnoxious User wrote:

    > On Sat, 05 Apr 2008 09:25:11 -0400, barcaroller wrote:
    >
    >> I have a text file with mixed carriage returns ('\n' and '\r\n').
    >>
    >> On Linux, both the std::string getline() global function and the
    >> std::iostream getline() member function are keeping some of the
    >> newlines in the result (I suspect they look only for the '\n').
    >>
    >> * Is there a quick way I can tell either function to gobble up both
    >> Windows-style and Unix-style newlines?
    >>
    >> * If not, what would be an efficient way of getting rid of them?
    >> Currently
    >> I use string::find_last_of("\n\r") + string::erase() but this is not
    >> very efficient.

    >
    > A simple and quick solution, adjust it to your own needs:
    >
    > #include <iostream>
    > #include <sstream>
    >
    > std::istream & getline(std::istream & in, std::string & out) {
    > char c;
    > while(in.get(c).good()) {
    > if(c == '\n') {
    > c = in.peek();
    > if(in.good()) {
    > if(c == '\r') {
    > in.ignore();
    > }
    > }
    > break;
    > }
    > out.append(1,c);
    > }
    > return in;
    > }
    >
    > int main() {
    > std::istringstream strm("alpha\nbeta\n\r...\n\romega\n\n");
    > for(int i = 0; strm.good(); ++i) {
    > std::string line;
    > getline(strm,line);
    > std::cout<<i<<"\t"<<line<<std::endl;
    > }
    > return 0;
    > }


    Realized after I posted it that I reversed the sequence, so the code is
    flawed for your needs. Although easily fixed. Ignore it.

    --
    OU
     
    Obnoxious User, Apr 5, 2008
    #3
  4. On 2008-04-05 15:25, barcaroller wrote:
    > I have a text file with mixed carriage returns ('\n' and '\r\n').
    >
    > On Linux, both the std::string getline() global function and the
    > std::iostream getline() member function are keeping some of the newlines in
    > the result (I suspect they look only for the '\n').
    >
    > * Is there a quick way I can tell either function to gobble up both
    > Windows-style and Unix-style newlines?


    While you can specify the delimiting character you can only specify one
    character.

    > * If not, what would be an efficient way of getting rid of them? Currently
    > I use string::find_last_of("\n\r") + string::erase() but this is not very
    > efficient.


    Since the Windows sequence is \r\n and getline() uses \n as delimiter
    any line with a Windows linebreak will end with \r. Use this knowledge
    to reduce the work required:

    std::string str;
    std::getline(file, str);

    if (str[str.size() - 1] == '\r')
    str.resize(str.size() - 1);

    --
    Erik Wikström
     
    Erik Wikström, Apr 5, 2008
    #4
  5. barcaroller

    James Kanze Guest

    On 5 avr, 15:25, "barcaroller" <> wrote:
    > I have a text file with mixed carriage returns ('\n' and '\r\n').


    > On Linux, both the std::string getline() global function and
    > the std::iostream getline() member function are keeping some
    > of the newlines in the result (I suspect they look only for
    > the '\n').


    Technically, it's implementation defined. Typically, however,
    yes: Unix implementations treat a single 0x0A in the stream as a
    newline; Windows implementations treat either a single 0x0A or
    the sequence 0x0D, 0x0A as a newline.

    Most of the time, this should not be a problem. In all of the
    usual encodings (at least outside of the mainframe world), the
    0x0D will result in an '\r' under Unix (and probably also under
    Windows, if it isn't immediately followed by a 0x0A). In the
    "C" locale, and probably in all other locales, '\r' is
    whitespace. So it ends up ignored with the rest of the trailing
    whitespace. (The one exception is C and C++ source code; for
    some reason, the standard doesn't consider '\r' as whitespace in
    source code.)

    > * Is there a quick way I can tell either function to gobble
    > up both Windows-style and Unix-style newlines?


    Is there ever a need to?

    > * If not, what would be an efficient way of getting rid of
    > them? Currently I use string::find_last_of("\n\r") +
    > string::erase() but this is not very efficient.


    I'd use an external program (e.g. tr). In practice, if a file
    is on a shared file system, and thus being read by both Windows
    and Unix, it's generally best (pragmatically, at least) to stick
    with the Unix conventions.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Apr 5, 2008
    #5
  6. > std::string str;
    > std::getline(file, str);
    >
    > if (str[str.size() - 1] == '\r')
    > str.resize(str.size() - 1);


    And with a empty line with unix end of line -> SEGFAULT

    The code fragment should be:
    if ((str.size() > 0) && (str[str.size() - 1] == '\r')
    str.resize(str.size() - 1);
     
    Antoine Mathys, Apr 7, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Porthos
    Replies:
    1
    Views:
    571
    C. M. Sperberg-McQueen
    Jul 27, 2005
  2. Replies:
    0
    Views:
    476
  3. Replies:
    5
    Views:
    462
    Kent Johnson
    May 2, 2006
  4. Edward K. Ream

    newlines and sax.saxutils.quoteattr

    Edward K. Ream, Sep 19, 2006, in forum: Python
    Replies:
    2
    Views:
    333
    Edward K. Ream
    Sep 19, 2006
  5. HopfZ
    Replies:
    1
    Views:
    101
    John G Harris
    Jan 11, 2007
Loading...

Share This Page