Why does IO.readlines() keep newlines?

Discussion in 'Ruby' started by Just Another Victim of the Ambient Morality, Nov 19, 2007.

  1. At the very least, the win32 implementation of Ruby's IO.readlines()
    method keeps the newline character on each string in the array. Considering
    that it is the newline that defines a "line," it would not be wholly
    unreasonable to omit it from the array, returned. I would have imagined
    that it was implemented using String.split(), which omits the splitting
    character. On a simply practical note, I'm sure the former is more popular
    than the latter in the following:


    out = File.open('file.txt', 'r'){|file| file.readlines.collect{|line|
    line.chomp}}
    out = File.open('file.txt', 'r'){|file|
    }


    ...in that rarely do people actually want newlines in their strings.
    Interestingly enough, I discovered this behaviour from a bug in a
    program which was hidden by another peculiar function, puts(). Can you
    imagine my surprise that puts() not only appends a newline to a string
    printed to stdout but, if a newline already exists, it doesn't bother
    appending one! So, printing strings with puts() can hide whether strings
    have a newline or not. Weird...
    So, who thinks my suggested change is a good idea? How do I go about
    popularizing my opinion?
    Thank you...
     
    Just Another Victim of the Ambient Morality, Nov 19, 2007
    #1
    1. Advertisements


  2. I'm going to speculate that readlines does this because of operating
    system differences in line endings.
    For compatibility between most systems, it would have to remove line
    feeds (\x0A) or line-feed/carriage return combinations (\x0D\x0A).

    I personally rather prefer the current behavior of readline. I don't
    think puts matters, and is certainly not worth changing. I'm aware of
    their behavior and if it matters, I code accordingly.

    humbly,
    Daniel Brumbaugh Keeney
     
    Daniel Brumbaugh Keeney, Nov 19, 2007
    #2
    1. Advertisements

  3. Just Another Victim of the Ambient Morality

    Phrogz Guest

    On Nov 19, 12:14 pm, "Just Another Victim of the Ambient Morality"
    FWIW, I never use readlines for this exact reason. I find its
    preservation of line endings entirely annoying. I always
    IO.read().split when I can.

    As much as I'd personally like it changed, and know that such a change
    would not affect any of my scripts, I'm concerned that such a change
    must fall into the category of "not backwards compatible", and thus
    unlikely to be effected without very strong support.

    Discuss the issue here as you are doing. If you don't get a large
    vocal outcry against the proposal, or are not swayed by any arguments
    that come against it, file an RCR[1] (preferably with a source code
    patch attached) and hope that Matz accepts your change into the core.

    [1] http://rcrchive.net/
     
    Phrogz, Nov 20, 2007
    #3
  4. Just Another Victim of the Ambient Morality

    Xavier Noria Guest

    Indeed that's not the case.

    In CRLF platforms the I/O layer handles newlines in text mode so that
    the programmer *always* works with "\n", no CRLF ever goes up on
    Windows. Nor you need to print CRLFs by hand at the Ruby level. At the
    Ruby level a newline is always == "\n" and has always length 1.

    The string "\n" is the logical newline in Ruby meaning it is portable
    and the I/O layer takes care of its actual representation on disk
    according to the runtime platform. In Java for example this works in a
    different way, "\n" is not portable, to write a portable newline in
    Java you invoke some println().

    This article explains how newlines work in C-based languages. It is
    Perl-based but in general it applies to Ruby except that in Ruby
    there's no platform where "\n" == "\015". In Ruby "\n" == "\012"
    everywhere and that simplifies things a bit. The I/O layer in MRI is
    C's stdio instead of PerlIO, but the explained newline mangling in and
    out is analogous:

    http://www.onlamp.com/pub/a/onlamp/2006/08/17/understanding-newlines.html

    I am the author but that doesn't matter.

    -- fxn
     
    Xavier Noria, Nov 20, 2007
    #4
  5. Just Another Victim of the Ambient Morality

    Robert Dober Guest

    On Nov 20, 2007 12:43 AM, Daniel Brumbaugh Keeney
    But than you could do
    readlines/(\n\r?)/,
    as default behavior I find it most annoying too.

    Robert
     
    Robert Dober, Nov 20, 2007
    #5
  6. Unfortunately, files created on one platform inevitably make their way
    to another. When an IO with \r\n is read on a UNIX, it preserves the
    carriage return.

    Daniel Brumbaugh Keeney
     
    Daniel Brumbaugh Keeney, Nov 20, 2007
    #6
  7. Just Another Victim of the Ambient Morality

    Xavier Noria Guest

    Xavier Noria, Nov 20, 2007
    #7
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.