Reasons to use a buffer in IO::read?

Discussion in 'Ruby' started by Steve Midgley, Dec 6, 2007.

  1. Hi Ruby people,

    I'm wondering what the functional and performance differences might be
    between the two statements below? Assume 'io' is an IO instance with
    gobs of data in it. Assume 'file' is an open file instance with write
    access:

    until io.eof? do
    file.write(io.read(10485760))
    end

    buffer = ''
    until io.eof? do
    buffer = io.read(10485760)
    file.write(buffer)
    end

    I see that Ruby provides for a buffer and I'm wondering what the
    reason is? I read this article but am still not clear on the benefit
    of a buffer at all:

    http://rcoder.net/content/fast-ruby-io

    I'm wondering if providing a buffer might reduce malloc issues and
    speed things up? I can't see any other reason to use one..

    Thanks in advance for any information!

    Steve
    Steve Midgley, Dec 6, 2007
    #1
    1. Advertising

  2. Steve Midgley

    MonkeeSage Guest

    On Dec 5, 6:59 pm, Steve Midgley <> wrote:
    > Hi Ruby people,
    >
    > I'm wondering what the functional and performance differences might be
    > between the two statements below? Assume 'io' is an IO instance with
    > gobs of data in it. Assume 'file' is an open file instance with write
    > access:
    >
    > until io.eof? do
    > file.write(io.read(10485760))
    > end
    >
    > buffer = ''
    > until io.eof? do
    > buffer = io.read(10485760)
    > file.write(buffer)
    > end
    >
    > I see that Ruby provides for a buffer and I'm wondering what the
    > reason is? I read this article but am still not clear on the benefit
    > of a buffer at all:
    >
    > http://rcoder.net/content/fast-ruby-io
    >
    > I'm wondering if providing a buffer might reduce malloc issues and
    > speed things up? I can't see any other reason to use one..
    >
    > Thanks in advance for any information!
    >
    > Steve


    $ ri IO#buffer
    ----------------------------------------------------------------
    IO#read
    ios.read([length [, buffer]]) => string, buffer, or nil
    ------------------------------------------------------------------------
    Reads at most _length_ bytes from the I/O stream, or to the end
    of
    file if _length_ is omitted or is +nil+. _length_ must be a
    non-negative integer or nil. If the optional _buffer_ argument is
    present, it must reference a String, which will receive the data.

    At end of file, it returns +nil+ or +""+ depend on _length_.
    +_ios_.read()+ and +_ios_.read(nil)+ returns +""+.
    +_ios_.read(_positive-integer_)+ returns nil.

    f = File.new("testfile")
    f.read(16) #=> "This is line one"

    So...

    buffer = ""
    file.write(io.read(nil, buffer))
    print "I read this stuff ", buffer, "\n"

    Regards,
    Jordan
    MonkeeSage, Dec 6, 2007
    #2
    1. Advertising

  3. On Dec 5, 5:56 pm, MonkeeSage <> wrote:
    > On Dec 5, 6:59 pm, SteveMidgley<> wrote:
    >
    >
    >
    > > Hi Ruby people,

    >
    > > I'm wondering what the functional and performance differences might be
    > > between the two statements below? Assume 'io' is an IO instance with
    > > gobs of data in it. Assume 'file' is an open file instance with write
    > > access:

    >
    > > until io.eof? do
    > > file.write(io.read(10485760))
    > > end

    >
    > > buffer = ''
    > > until io.eof? do
    > > buffer = io.read(10485760)
    > > file.write(buffer)
    > > end

    >
    > > I see that Ruby provides for a buffer and I'm wondering what the
    > > reason is? I read this article but am still not clear on the benefit
    > > of a buffer at all:

    >
    > >http://rcoder.net/content/fast-ruby-io

    >
    > > I'm wondering if providing a buffer might reduce malloc issues and
    > > speed things up? I can't see any other reason to use one..

    >
    > > Thanks in advance for any information!

    >
    > > Steve

    >
    > $ ri IO#buffer
    > ----------------------------------------------------------------
    > IO#read
    > ios.read([length [, buffer]]) => string, buffer, or nil
    > ------------------------------------------------------------------------
    > Reads at most _length_ bytes from the I/O stream, or to the end
    > of
    > file if _length_ is omitted or is +nil+. _length_ must be a
    > non-negative integer or nil. If the optional _buffer_ argument is
    > present, it must reference a String, which will receive the data.
    >
    > At end of file, it returns +nil+ or +""+ depend on _length_.
    > +_ios_.read()+ and +_ios_.read(nil)+ returns +""+.
    > +_ios_.read(_positive-integer_)+ returns nil.
    >
    > f = File.new("testfile")
    > f.read(16) #=> "This is line one"
    >
    > So...
    >
    > buffer = ""
    > file.write(io.read(nil, buffer))
    > print "I read this stuff ", buffer, "\n"
    >
    > Regards,
    > Jordan


    Thanks Jordan. How is your code different (if at all) from:

    buffer = io.read
    file.write(buffer)
    print "I read this stuff ", buffer, "\n"

    Am I missing something? I just don't see why buffer is useful - is it
    a performance benefit or some kind of syntax improvement that I'm
    missing? The only thing I can see is that it has some kind of low
    level malloc optimization if the same string size is passed in
    repeatedly during partial writes.

    Steve
    Steve Midgley, Dec 7, 2007
    #3
  4. Steve Midgley

    MonkeeSage Guest

    On Dec 6, 10:31 pm, Steve Midgley <> wrote:
    > On Dec 5, 5:56 pm, MonkeeSage <> wrote:
    >
    >
    >
    > > On Dec 5, 6:59 pm, SteveMidgley<> wrote:

    >
    > > > Hi Ruby people,

    >
    > > > I'm wondering what the functional and performance differences might be
    > > > between the two statements below? Assume 'io' is an IO instance with
    > > > gobs of data in it. Assume 'file' is an open file instance with write
    > > > access:

    >
    > > > until io.eof? do
    > > > file.write(io.read(10485760))
    > > > end

    >
    > > > buffer = ''
    > > > until io.eof? do
    > > > buffer = io.read(10485760)
    > > > file.write(buffer)
    > > > end

    >
    > > > I see that Ruby provides for a buffer and I'm wondering what the
    > > > reason is? I read this article but am still not clear on the benefit
    > > > of a buffer at all:

    >
    > > >http://rcoder.net/content/fast-ruby-io

    >
    > > > I'm wondering if providing a buffer might reduce malloc issues and
    > > > speed things up? I can't see any other reason to use one..

    >
    > > > Thanks in advance for any information!

    >
    > > > Steve

    >
    > > $ ri IO#buffer
    > > ----------------------------------------------------------------
    > > IO#read
    > > ios.read([length [, buffer]]) => string, buffer, or nil
    > > ------------------------------------------------------------------------
    > > Reads at most _length_ bytes from the I/O stream, or to the end
    > > of
    > > file if _length_ is omitted or is +nil+. _length_ must be a
    > > non-negative integer or nil. If the optional _buffer_ argument is
    > > present, it must reference a String, which will receive the data.

    >
    > > At end of file, it returns +nil+ or +""+ depend on _length_.
    > > +_ios_.read()+ and +_ios_.read(nil)+ returns +""+.
    > > +_ios_.read(_positive-integer_)+ returns nil.

    >
    > > f = File.new("testfile")
    > > f.read(16) #=> "This is line one"

    >
    > > So...

    >
    > > buffer = ""
    > > file.write(io.read(nil, buffer))
    > > print "I read this stuff ", buffer, "\n"

    >
    > > Regards,
    > > Jordan

    >
    > Thanks Jordan. How is your code different (if at all) from:
    >
    > buffer = io.read
    > file.write(buffer)
    > print "I read this stuff ", buffer, "\n"
    >
    > Am I missing something? I just don't see why buffer is useful - is it
    > a performance benefit or some kind of syntax improvement that I'm
    > missing? The only thing I can see is that it has some kind of low
    > level malloc optimization if the same string size is passed in
    > repeatedly during partial writes.
    >
    > Steve


    I don't know if there is any optimization is the back end, but it lets
    you pass the results of io.read to another method and also put them in
    buffer at the same time. But since you can do that with assignment, I
    don't really see any point to it (I was just trying to give an example
    as the docs describe). To me, unless as you say, there is some
    optimization going on in the backend, this code...

    buffer = ""
    file.write(io.read(nil, buffer))
    print "I read this stuff ", buffer, "\n"

    ....looks the same as this code...

    file.write(buffer = io.read)
    print "I read this stuff ", buffer, "\n"

    Regards,
    Jordan
    MonkeeSage, Dec 7, 2007
    #4
  5. 2007/12/7, Steve Midgley <>:
    > On Dec 5, 5:56 pm, MonkeeSage <> wrote:
    > > On Dec 5, 6:59 pm, SteveMidgley<> wrote:
    > >
    > >
    > >
    > > > Hi Ruby people,

    > >
    > > > I'm wondering what the functional and performance differences might be
    > > > between the two statements below? Assume 'io' is an IO instance with
    > > > gobs of data in it. Assume 'file' is an open file instance with write
    > > > access:

    > >
    > > > until io.eof? do
    > > > file.write(io.read(10485760))
    > > > end

    > >
    > > > buffer = ''


    This line above is completely superfluous.

    > > > until io.eof? do
    > > > buffer = io.read(10485760)
    > > > file.write(buffer)
    > > > end

    > >
    > > > I see that Ruby provides for a buffer and I'm wondering what the
    > > > reason is? I read this article but am still not clear on the benefit
    > > > of a buffer at all:

    > >
    > > >http://rcoder.net/content/fast-ruby-io

    > >
    > > > I'm wondering if providing a buffer might reduce malloc issues and
    > > > speed things up? I can't see any other reason to use one..

    > >
    > > > Thanks in advance for any information!

    > >
    > > > Steve

    > >
    > > $ ri IO#buffer
    > > ----------------------------------------------------------------
    > > IO#read
    > > ios.read([length [, buffer]]) => string, buffer, or nil
    > > ------------------------------------------------------------------------
    > > Reads at most _length_ bytes from the I/O stream, or to the end
    > > of
    > > file if _length_ is omitted or is +nil+. _length_ must be a
    > > non-negative integer or nil. If the optional _buffer_ argument is
    > > present, it must reference a String, which will receive the data.
    > >
    > > At end of file, it returns +nil+ or +""+ depend on _length_.
    > > +_ios_.read()+ and +_ios_.read(nil)+ returns +""+.
    > > +_ios_.read(_positive-integer_)+ returns nil.
    > >
    > > f = File.new("testfile")
    > > f.read(16) #=> "This is line one"
    > >
    > > So...
    > >
    > > buffer = ""
    > > file.write(io.read(nil, buffer))
    > > print "I read this stuff ", buffer, "\n"
    > >
    > > Regards,
    > > Jordan

    >
    > Thanks Jordan. How is your code different (if at all) from:
    >
    > buffer = io.read
    > file.write(buffer)
    > print "I read this stuff ", buffer, "\n"
    >
    > Am I missing something? I just don't see why buffer is useful - is it
    > a performance benefit or some kind of syntax improvement that I'm
    > missing?


    Yes, the string referenced by buffer is reused. This leads to
    improved performance for the typical application which is like this:

    buffer = ""
    while ( io.read(1024, buffer) )
    file.write buffer
    end

    > The only thing I can see is that it has some kind of low
    > level malloc optimization if the same string size is passed in
    > repeatedly during partial writes.


    Exactly (see above). Note that it is very inefficient to read with
    such a large chunk size as you use in your original posting. If you
    want to read the whole file you can simply do io.read.

    Kind regards

    robert

    --
    use.inject do |as, often| as.you_can - without end
    Robert Klemme, Dec 7, 2007
    #5
  6. Steve Midgley

    Jano Svitok Guest

    On Dec 7, 2007 6:25 AM, MonkeeSage <> wrote:
    >
    > On Dec 6, 10:31 pm, Steve Midgley <> wrote:
    > > On Dec 5, 5:56 pm, MonkeeSage <> wrote:
    > >
    > >
    > >
    > > > On Dec 5, 6:59 pm, SteveMidgley<> wrote:

    > >
    > > > > Hi Ruby people,

    > >
    > > > > I'm wondering what the functional and performance differences might be
    > > > > between the two statements below? Assume 'io' is an IO instance with
    > > > > gobs of data in it. Assume 'file' is an open file instance with write
    > > > > access:

    > >
    > > > > until io.eof? do
    > > > > file.write(io.read(10485760))
    > > > > end

    > >
    > > > > buffer = ''
    > > > > until io.eof? do
    > > > > buffer = io.read(10485760)
    > > > > file.write(buffer)
    > > > > end

    > >
    > > > > I see that Ruby provides for a buffer and I'm wondering what the
    > > > > reason is? I read this article but am still not clear on the benefit
    > > > > of a buffer at all:

    > >
    > > > >http://rcoder.net/content/fast-ruby-io

    > >
    > > > > I'm wondering if providing a buffer might reduce malloc issues and
    > > > > speed things up? I can't see any other reason to use one..

    > >
    > > > > Thanks in advance for any information!

    > >
    > > > > Steve

    > >
    > > > $ ri IO#buffer
    > > > ----------------------------------------------------------------
    > > > IO#read
    > > > ios.read([length [, buffer]]) => string, buffer, or nil
    > > > ------------------------------------------------------------------------
    > > > Reads at most _length_ bytes from the I/O stream, or to the end
    > > > of
    > > > file if _length_ is omitted or is +nil+. _length_ must be a
    > > > non-negative integer or nil. If the optional _buffer_ argument is
    > > > present, it must reference a String, which will receive the data.

    > >
    > > > At end of file, it returns +nil+ or +""+ depend on _length_.
    > > > +_ios_.read()+ and +_ios_.read(nil)+ returns +""+.
    > > > +_ios_.read(_positive-integer_)+ returns nil.

    > >
    > > > f = File.new("testfile")
    > > > f.read(16) #=> "This is line one"

    > >
    > > > So...

    > >
    > > > buffer = ""
    > > > file.write(io.read(nil, buffer))
    > > > print "I read this stuff ", buffer, "\n"

    > >
    > > > Regards,
    > > > Jordan

    > >
    > > Thanks Jordan. How is your code different (if at all) from:
    > >
    > > buffer = io.read
    > > file.write(buffer)
    > > print "I read this stuff ", buffer, "\n"
    > >
    > > Am I missing something? I just don't see why buffer is useful - is it
    > > a performance benefit or some kind of syntax improvement that I'm
    > > missing? The only thing I can see is that it has some kind of low
    > > level malloc optimization if the same string size is passed in
    > > repeatedly during partial writes.
    > >
    > > Steve

    >
    > I don't know if there is any optimization is the back end, but it lets
    > you pass the results of io.read to another method and also put them in
    > buffer at the same time. But since you can do that with assignment, I
    > don't really see any point to it (I was just trying to give an example
    > as the docs describe). To me, unless as you say, there is some
    > optimization going on in the backend, this code...
    >
    > buffer = ""
    > file.write(io.read(nil, buffer))
    > print "I read this stuff ", buffer, "\n"
    >
    > ...looks the same as this code...
    >
    > file.write(buffer = io.read)
    >
    > print "I read this stuff ", buffer, "\n"
    >
    > Regards,
    > Jordan


    I'd *assume* the former saves you a bunch of allocations when looping
    through a file
    (I assume the buffer is reused instead of allocating a new one for
    each iteration).

    i.e.
    buffer = ""
    File.open('xxx','r') do |f|
    while f.read(1024, buffer) do
    process(buffer)
    end
    end

    vs.

    File.open('xxx','r') do |f|
    while true do
    buffer = f.read(1024)
    break if buffer.empty?
    process(buffer)
    end
    end
    Jano Svitok, Dec 7, 2007
    #6
  7. Steve Midgley

    MonkeeSage Guest

    On Dec 7, 3:29 am, Jano Svitok <> wrote:
    > I'd *assume* the former saves you a bunch of allocations when looping
    > through a file
    > (I assume the buffer is reused instead of allocating a new one for
    > each iteration).


    I'm not the smartest C programmer (or the smartest anything
    programmer), but I'm not seeing any optimization in the actual C code.
    Please correct me if I'm wrong.

    First, io_read() is the function called in the backend from IO#read.
    Te relevant lines are:

    ====
    rb_scan_args(argc, argv, "02", &length, &str);

    if (NIL_P(length)) {
    if (!NIL_P(str)) StringValue(str);
    GetOpenFile(io, fptr);
    rb_io_check_readable(fptr);
    return read_all(fptr, remain_size(fptr), str);
    }
    len = NUM2LONG(length);
    if (len < 0) {
    rb_raise(rb_eArgError, "negative length %ld given", len);
    }

    if (NIL_P(str)) {
    str = rb_tainted_str_new(0, len);
    }
    else {
    StringValue(str);
    rb_str_modify(str);
    rb_str_resize(str,len);
    }
    ====

    So we see that we get a new string from rb_tainted_str_new if buffer
    is is not passed in to IO#read; otherwise str is used and we call
    StringValue on it.

    So what is StringValue? A macro defined in ruby.h:

    ====
    #define StringValue(v) rb_string_value(&(v))
    ====

    And what is rb_string_value()? A function from string.c:

    ====
    static char *null_str = "";

    VALUE
    rb_string_value(ptr)
    volatile VALUE *ptr;
    {
    VALUE s = *ptr;
    if (TYPE(s) != T_STRING) {
    s = rb_str_to_str(s);
    *ptr = s;
    }
    if (!RSTRING(s)->ptr) {
    FL_SET(s, ELTS_SHARED);
    RSTRING(s)->ptr = null_str;
    }
    return s;
    }
    ====

    So if it's not a string, we convert it to one, otherwise we zero it
    out.

    But the interesting lines are back up in io_read():

    ====
    rb_str_modify(str);
    rb_str_resize(str,len);
    ====

    Now rb_str_modify() (string.c) is called with our zeroed string. And
    it in turn calls str_make_independent():

    ====
    static void
    str_make_independent(str)
    VALUE str;
    {
    char *ptr;

    ptr = ALLOC_N(char, RSTRING(str)->len+1);
    if (RSTRING(str)->ptr) {
    memcpy(ptr, RSTRING(str)->ptr, RSTRING(str)->len);
    }
    ptr[RSTRING(str)->len] = 0;
    RSTRING(str)->ptr = ptr;
    RSTRING(str)->aux.capa = RSTRING(str)->len;
    FL_UNSET(str, STR_NOCAPA);
    }
    ====

    And finally, rb_str_resize is called:

    ====
    VALUE
    rb_str_resize(str, len)
    VALUE str;
    long len;
    {
    if (len < 0) {
    rb_raise(rb_eArgError, "negative string size (or size too big)");
    }

    rb_str_modify(str);
    if (len != RSTRING(str)->len) {
    if (RSTRING(str)->len < len || RSTRING(str)->len - len > 1024) {
    REALLOC_N(RSTRING(str)->ptr, char, len+1);
    if (!FL_TEST(str, STR_NOCAPA)) {
    RSTRING(str)->aux.capa = len;
    }
    }
    RSTRING(str)->len = len;
    RSTRING(str)->ptr[len] = '\0'; /* sentinel */
    }
    return str;
    }
    ====

    Now, like I said, I'm not the greatest C programmer...but I fail to
    see how, if I'm reading the code above correctly, passing in a buffer
    string to IO#read is any more optimal than creating a new string (even
    when looping many times), since it appears to me to be doing the same
    thing (compare str_new from string.c, which is what rb_tainted_str_new
    calls).

    Regards,
    Jordan

    ----
    References:

    http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8/io.c
    http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8/ruby.h
    http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8/string.c
    MonkeeSage, Dec 7, 2007
    #7
  8. Steve Midgley

    MonkeeSage Guest

    On Dec 7, 4:56 am, MonkeeSage <> wrote:
    > On Dec 7, 3:29 am, Jano Svitok <> wrote:
    >
    > > I'd *assume* the former saves you a bunch of allocations when looping
    > > through a file
    > > (I assume the buffer is reused instead of allocating a new one for
    > > each iteration).

    >
    > I'm not the smartest C programmer (or the smartest anything
    > programmer), but I'm not seeing any optimization in the actual C code.
    > Please correct me if I'm wrong.
    >
    > First, io_read() is the function called in the backend from IO#read.
    > Te relevant lines are:
    >
    > ====
    > rb_scan_args(argc, argv, "02", &length, &str);
    >
    > if (NIL_P(length)) {
    > if (!NIL_P(str)) StringValue(str);
    > GetOpenFile(io, fptr);
    > rb_io_check_readable(fptr);
    > return read_all(fptr, remain_size(fptr), str);
    > }
    > len = NUM2LONG(length);
    > if (len < 0) {
    > rb_raise(rb_eArgError, "negative length %ld given", len);
    > }
    >
    > if (NIL_P(str)) {
    > str = rb_tainted_str_new(0, len);
    > }
    > else {
    > StringValue(str);
    > rb_str_modify(str);
    > rb_str_resize(str,len);
    > }
    > ====
    >
    > So we see that we get a new string from rb_tainted_str_new if buffer
    > is is not passed in to IO#read; otherwise str is used and we call
    > StringValue on it.
    >
    > So what is StringValue? A macro defined in ruby.h:
    >
    > ====
    > #define StringValue(v) rb_string_value(&(v))
    > ====
    >
    > And what is rb_string_value()? A function from string.c:
    >
    > ====
    > static char *null_str = "";
    >
    > VALUE
    > rb_string_value(ptr)
    > volatile VALUE *ptr;
    > {
    > VALUE s = *ptr;
    > if (TYPE(s) != T_STRING) {
    > s = rb_str_to_str(s);
    > *ptr = s;
    > }
    > if (!RSTRING(s)->ptr) {
    > FL_SET(s, ELTS_SHARED);
    > RSTRING(s)->ptr = null_str;
    > }
    > return s;}
    >
    > ====
    >
    > So if it's not a string, we convert it to one, otherwise we zero it
    > out.
    >
    > But the interesting lines are back up in io_read():
    >
    > ====
    > rb_str_modify(str);
    > rb_str_resize(str,len);
    > ====
    >
    > Now rb_str_modify() (string.c) is called with our zeroed string. And
    > it in turn calls str_make_independent():
    >
    > ====
    > static void
    > str_make_independent(str)
    > VALUE str;
    > {
    > char *ptr;
    >
    > ptr = ALLOC_N(char, RSTRING(str)->len+1);
    > if (RSTRING(str)->ptr) {
    > memcpy(ptr, RSTRING(str)->ptr, RSTRING(str)->len);
    > }
    > ptr[RSTRING(str)->len] = 0;
    > RSTRING(str)->ptr = ptr;
    > RSTRING(str)->aux.capa = RSTRING(str)->len;
    > FL_UNSET(str, STR_NOCAPA);}
    >
    > ====
    >
    > And finally, rb_str_resize is called:
    >
    > ====
    > VALUE
    > rb_str_resize(str, len)
    > VALUE str;
    > long len;
    > {
    > if (len < 0) {
    > rb_raise(rb_eArgError, "negative string size (or size too big)");
    > }
    >
    > rb_str_modify(str);
    > if (len != RSTRING(str)->len) {
    > if (RSTRING(str)->len < len || RSTRING(str)->len - len > 1024) {
    > REALLOC_N(RSTRING(str)->ptr, char, len+1);
    > if (!FL_TEST(str, STR_NOCAPA)) {
    > RSTRING(str)->aux.capa = len;
    > }
    > }
    > RSTRING(str)->len = len;
    > RSTRING(str)->ptr[len] = '\0'; /* sentinel */
    > }
    > return str;}
    >
    > ====
    >
    > Now, like I said, I'm not the greatest C programmer...but I fail to
    > see how, if I'm reading the code above correctly, passing in a buffer
    > string to IO#read is any more optimal than creating a new string (even
    > when looping many times), since it appears to me to be doing the same
    > thing (compare str_new from string.c, which is what rb_tainted_str_new
    > calls).
    >
    > Regards,
    > Jordan
    >
    > ----
    > References:
    >
    > http://svn.ruby-lang.org/repos/ruby...ang.org/repos/ruby/branches/ruby_1_8/string.c


    Oh...wait...I'm completely dense. Duh! io_read() is going to create /
    re-initialize new string anyway to put its results in. So If I create
    a new string independently to store the return value of IO#read, then
    I'm causing an extra allocation and copy. Sorry for wasting space.
    Have pity on mentally handicapped people like me. :p

    Regards,
    Jordan
    MonkeeSage, Dec 7, 2007
    #8
  9. 2007/12/7, MonkeeSage <>:
    > On Dec 7, 4:56 am, MonkeeSage <> wrote:
    > > On Dec 7, 3:29 am, Jano Svitok <> wrote:
    > >
    > > > I'd *assume* the former saves you a bunch of allocations when looping
    > > > through a file
    > > > (I assume the buffer is reused instead of allocating a new one for
    > > > each iteration).

    > >
    > > I'm not the smartest C programmer (or the smartest anything
    > > programmer), but I'm not seeing any optimization in the actual C code.
    > > Please correct me if I'm wrong.
    > >
    > > First, io_read() is the function called in the backend from IO#read.
    > > Te relevant lines are:
    > >
    > > ====
    > > rb_scan_args(argc, argv, "02", &length, &str);
    > >
    > > if (NIL_P(length)) {
    > > if (!NIL_P(str)) StringValue(str);
    > > GetOpenFile(io, fptr);
    > > rb_io_check_readable(fptr);
    > > return read_all(fptr, remain_size(fptr), str);
    > > }
    > > len = NUM2LONG(length);
    > > if (len < 0) {
    > > rb_raise(rb_eArgError, "negative length %ld given", len);
    > > }
    > >
    > > if (NIL_P(str)) {
    > > str = rb_tainted_str_new(0, len);
    > > }
    > > else {
    > > StringValue(str);
    > > rb_str_modify(str);
    > > rb_str_resize(str,len);
    > > }
    > > ====
    > >
    > > So we see that we get a new string from rb_tainted_str_new if buffer
    > > is is not passed in to IO#read; otherwise str is used and we call
    > > StringValue on it.
    > >
    > > So what is StringValue? A macro defined in ruby.h:
    > >
    > > ====
    > > #define StringValue(v) rb_string_value(&(v))
    > > ====
    > >
    > > And what is rb_string_value()? A function from string.c:
    > >
    > > ====
    > > static char *null_str = "";
    > >
    > > VALUE
    > > rb_string_value(ptr)
    > > volatile VALUE *ptr;
    > > {
    > > VALUE s = *ptr;
    > > if (TYPE(s) != T_STRING) {
    > > s = rb_str_to_str(s);
    > > *ptr = s;
    > > }
    > > if (!RSTRING(s)->ptr) {
    > > FL_SET(s, ELTS_SHARED);
    > > RSTRING(s)->ptr = null_str;
    > > }
    > > return s;}
    > >
    > > ====
    > >
    > > So if it's not a string, we convert it to one, otherwise we zero it
    > > out.
    > >
    > > But the interesting lines are back up in io_read():
    > >
    > > ====
    > > rb_str_modify(str);
    > > rb_str_resize(str,len);
    > > ====
    > >
    > > Now rb_str_modify() (string.c) is called with our zeroed string. And
    > > it in turn calls str_make_independent():
    > >
    > > ====
    > > static void
    > > str_make_independent(str)
    > > VALUE str;
    > > {
    > > char *ptr;
    > >
    > > ptr = ALLOC_N(char, RSTRING(str)->len+1);
    > > if (RSTRING(str)->ptr) {
    > > memcpy(ptr, RSTRING(str)->ptr, RSTRING(str)->len);
    > > }
    > > ptr[RSTRING(str)->len] = 0;
    > > RSTRING(str)->ptr = ptr;
    > > RSTRING(str)->aux.capa = RSTRING(str)->len;
    > > FL_UNSET(str, STR_NOCAPA);}
    > >
    > > ====
    > >
    > > And finally, rb_str_resize is called:
    > >
    > > ====
    > > VALUE
    > > rb_str_resize(str, len)
    > > VALUE str;
    > > long len;
    > > {
    > > if (len < 0) {
    > > rb_raise(rb_eArgError, "negative string size (or size too big)");
    > > }
    > >
    > > rb_str_modify(str);
    > > if (len != RSTRING(str)->len) {
    > > if (RSTRING(str)->len < len || RSTRING(str)->len - len > 1024) {
    > > REALLOC_N(RSTRING(str)->ptr, char, len+1);
    > > if (!FL_TEST(str, STR_NOCAPA)) {
    > > RSTRING(str)->aux.capa = len;
    > > }
    > > }
    > > RSTRING(str)->len = len;
    > > RSTRING(str)->ptr[len] = '\0'; /* sentinel */
    > > }
    > > return str;}
    > >
    > > ====
    > >
    > > Now, like I said, I'm not the greatest C programmer...but I fail to
    > > see how, if I'm reading the code above correctly, passing in a buffer
    > > string to IO#read is any more optimal than creating a new string (even
    > > when looping many times), since it appears to me to be doing the same
    > > thing (compare str_new from string.c, which is what rb_tainted_str_new
    > > calls).
    > >
    > > Regards,
    > > Jordan
    > >
    > > ----
    > > References:
    > >
    > > http://svn.ruby-lang.org/repos/ruby...ang.org/repos/ruby/branches/ruby_1_8/string.c

    >
    > Oh...wait...I'm completely dense. Duh! io_read() is going to create /
    > re-initialize new string anyway to put its results in. So If I create
    > a new string independently to store the return value of IO#read, then
    > I'm causing an extra allocation and copy. Sorry for wasting space.
    > Have pity on mentally handicapped people like me. :p


    LOL

    Also, allocating of a String instance is not only the raw malloc of
    the memory but as well the bookkeeping needed for GC. So it is more
    expensive than a simple resize. Note also, that if you loop with code
    like the one I showed the length of the string instance is adjusted
    only *once* because all chunks have the same length or are shorter
    (the last one potentially).

    Kind regards

    robert


    --
    use.inject do |as, often| as.you_can - without end
    Robert Klemme, Dec 7, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris Berg
    Replies:
    8
    Views:
    2,139
    Roland
    Aug 21, 2005
  2. Replies:
    1
    Views:
    357
    Joe Kesselman
    Jun 13, 2006
  3. Desser
    Replies:
    7
    Views:
    608
    richard
    Aug 12, 2008
  4. Neal Becker

    buffer creates only read-only buffer?

    Neal Becker, Jan 8, 2009, in forum: Python
    Replies:
    0
    Views:
    402
    Neal Becker
    Jan 8, 2009
  5. Mark Volkmann

    reasons to use else inside rescue

    Mark Volkmann, Feb 7, 2006, in forum: Ruby
    Replies:
    5
    Views:
    100
    Mark Volkmann
    Feb 7, 2006
Loading...

Share This Page