A few questions of function and style from a newbie

Discussion in 'Ruby' started by Sven Johansson, Dec 30, 2005.

  1. Hi, good people of clr,

    I'm just dipped into the goodness that is ruby for the first time
    yesterday, and while this group and the online docs proved useful, I'm
    left somewhat bewildered by a few things. Environment: Win XP SP2,
    one-click-install 1.8.2 ruby.

    1) Current working directories:
    I currently use

    f = __FILE__
    len = -f.length
    my_dir = File::expand_path(f)[0...len]

    To find the script's current working directory. Snappier alternatives
    such as

    my_dir = File.dirname(__FILE__)

    just report back with ".", which, while true, isn't exactly helpful.

    Problem: this only works if the script is invoked from the command line
    as "ruby this.rb". Trying to invoke it by double-clicking on the script
    in the windows explorer makes the above function return an empty
    string. Is there any way, short of embedding the call to ruby in a bat
    file, to make ruby read its currrent working directory even if invokend
    by double-clicking?

    2) MD5 hashes and file handles:
    I currently use something like

    Dir['*'].each {|f| print Digest::MD5.hexdigest(open(f, 'rb').read), '
    ', f, "\n"}

    I tried stuff like

    Dir['*'].each {|f|print f, " "; puts
    Digest::MD5.hexdigest(File.read(f))}
    or
    dig=Digest::MD5.new
    dig.update(file)

    and they both seem to suffer from some sort of buffer on the directory
    reading; that is, they'll produce the same hash for several files when
    scanning a large directory. The first line above bypasses this, I
    suppose by the 'rb' reading mode on the file handle. Is there any way
    to unbuffer the directory file handle stream (akin to Perl's $|=1)?

    3) Finally, I submit for very first ruby script for merciless
    criticism. What here could have been done otherwise? What screams for a
    better ruby solution? I'm aware of that I should probably look into
    split instead of relying so much on regexps for splitting and I was
    trying to set up a structure like hash[key]=[a,b], but I found I could
    not access hash.each_pair { |key,value] puts key, value(0), value (1)
    }.

    ------------------------------------------------------------------
    require 'Digest/md5'
    require 'fileutils'

    # Variables to set manually
    global_digest_index='C:/srfctrl/indexfile/globalindex.txt'
    global_temp_directory='C:/srfctrl/tempstore/'
    global_collide_directory='C:/srfctrl/collide/'

    # Begin program
    f = __FILE__
    len = -f.length
    my_dir = File::expand_path(f)[0...len]
    my_dirname = my_dir.sub(/^.+\/(\w+?)\/$/,'\1')

    puts my_dir
    puts my_dirname

    digest_map_name={}
    digest_map_directory={}

    IO.foreach(global_digest_index) { |line|
    th_dige=line.sub(/^.+?\:(.+?)\:.+?$/,'\1').chomp
    th_fnam=line.sub(/^.+?\:.+?\:(.+?)$/,'\1').chomp
    th_dir=line.sub(/^(.+?)\:.+?\:.+?$/,'\1').chomp
    digest_map_name[th_dige] = th_fnam
    digest_map_directory[th_dige] = th_dir
    }

    filecnt = filesuc = 0
    outfile = File.new(global_digest_index, "a")
    Dir['*'].each do |file_name|
    next unless (file_name =~ /\.mp3$|\.ogg$/i)
    filecnt += 1
    hex = Digest::MD5.hexdigest(open(file_name, 'rb').read)
    if digest_map_name.has_key?(hex) then
    collfilestrip = digest_map_name[hex].sub(/\.mp3$|\.ogg$/i,'')
    id_name = global_collide_directory + digest_map_directory[hex].to_s
    + '_' + collfilestrip + '_' + file_name
    FileUtils.cp(file_name,id_name)
    else
    filesuc +=1
    digest_map_name[hex] = file_name
    digest_map_directory[hex] = my_dirname
    outfile.puts my_dirname + ':' + hex + ':' + file_name
    id_name = global_temp_directory + file_name
    FileUtils.cp(digest_map_name[hex],id_name)
    end
    end
    outfile.close

    puts "Processed " + filecnt.to_s + " files, out of which " +
    filesuc.to_s + " were not duplicates."
    ----------------------------------------------
    Sven Johansson, Dec 30, 2005
    #1
    1. Advertising

  2. Sven Johansson <> wrote:
    > Hi, good people of clr,
    >
    > I'm just dipped into the goodness that is ruby for the first time
    > yesterday, and while this group and the online docs proved useful, I'm
    > left somewhat bewildered by a few things. Environment: Win XP SP2,
    > one-click-install 1.8.2 ruby.
    >
    > 1) Current working directories:
    > I currently use
    >
    > f = __FILE__
    > len = -f.length
    > my_dir = File::expand_path(f)[0...len]
    >
    > To find the script's current working directory.


    No, you get the script's path - although this will incidetally match with
    the working directory when run in Windows (because the working directory
    defaults to the script directory).

    > Snappier alternatives
    > such as
    >
    > my_dir = File.dirname(__FILE__)
    >
    > just report back with ".", which, while true, isn't exactly helpful.


    You want File.expand_path like in

    >> File.expand_path('.')

    => "/home/Robert"

    Now:

    working_dir = File.expand_path( Dir.getwd )
    script_dir = File.expand_path( File.dirname(__FILE__) )

    > Problem: this only works if the script is invoked from the command
    > line as "ruby this.rb". Trying to invoke it by double-clicking on the
    > script in the windows explorer makes the above function return an
    > empty string. Is there any way, short of embedding the call to ruby
    > in a bat file, to make ruby read its currrent working directory even
    > if invokend by double-clicking?


    See above.

    > 2) MD5 hashes and file handles:
    > I currently use something like
    >
    > Dir['*'].each {|f| print Digest::MD5.hexdigest(open(f, 'rb').read), '
    > ', f, "\n"}
    >
    > I tried stuff like
    >
    > Dir['*'].each {|f|print f, " "; puts
    > Digest::MD5.hexdigest(File.read(f))}
    > or
    > dig=Digest::MD5.new
    > dig.update(file)
    >
    > and they both seem to suffer from some sort of buffer on the directory
    > reading; that is, they'll produce the same hash for several files when
    > scanning a large directory. The first line above bypasses this, I
    > suppose by the 'rb' reading mode on the file handle. Is there any way
    > to unbuffer the directory file handle stream (akin to Perl's $|=1)?


    Your code in the first line has at least these problems:

    1) You don't check for directories, i.e., you'll try to create MD5 of
    directories as well.

    2) You don't close files properly. You should use the block form of
    File.open - that way file handles are always closed properly and timely.

    Alternatives

    Dir['*'].each {|f| File.open(f,'rb') {|io| print f, " ",
    Digest::MD5.hexdigest(io.read), "\n" } if File.file? f}

    Dir['*'].each {|f| print f, " ", Digest::MD5.hexdigest(File.open(f,'rb')
    {|io| io.read}), "\n" if File.file? f}


    I can't reproduce the problem you state (identical digests) with the other
    lines of code. I tried

    Dir['*'].each {|f|print f, " "; puts Digest::MD5.hexdigest(File.read(f)) if
    File.file? f}

    But the problem here is that the file is not opened in binary mode which is
    a must for this to work.

    > 3) Finally, I submit for very first ruby script for merciless
    > criticism. What here could have been done otherwise? What screams for
    > a better ruby solution? I'm aware of that I should probably look into
    > split instead of relying so much on regexps for splitting and I was
    > trying to set up a structure like hash[key]=[a,b], but I found I could
    > not access hash.each_pair { |key,value] puts key, value(0), value (1)
    > }.
    >
    > ------------------------------------------------------------------
    > require 'Digest/md5'
    > require 'fileutils'
    >
    > # Variables to set manually
    > global_digest_index='C:/srfctrl/indexfile/globalindex.txt'
    > global_temp_directory='C:/srfctrl/tempstore/'
    > global_collide_directory='C:/srfctrl/collide/'
    >
    > # Begin program
    > f = __FILE__
    > len = -f.length
    > my_dir = File::expand_path(f)[0...len]
    > my_dirname = my_dir.sub(/^.+\/(\w+?)\/$/,'\1')
    >
    > puts my_dir
    > puts my_dirname
    >
    > digest_map_name={}
    > digest_map_directory={}
    >
    > IO.foreach(global_digest_index) { |line|
    > th_dige=line.sub(/^.+?\:(.+?)\:.+?$/,'\1').chomp
    > th_fnam=line.sub(/^.+?\:.+?\:(.+?)$/,'\1').chomp
    > th_dir=line.sub(/^(.+?)\:.+?\:.+?$/,'\1').chomp
    > digest_map_name[th_dige] = th_fnam
    > digest_map_directory[th_dige] = th_dir
    > }
    >
    > filecnt = filesuc = 0
    > outfile = File.new(global_digest_index, "a")
    > Dir['*'].each do |file_name|
    > next unless (file_name =~ /\.mp3$|\.ogg$/i)
    > filecnt += 1
    > hex = Digest::MD5.hexdigest(open(file_name, 'rb').read)
    > if digest_map_name.has_key?(hex) then
    > collfilestrip = digest_map_name[hex].sub(/\.mp3$|\.ogg$/i,'')
    > id_name = global_collide_directory + digest_map_directory[hex].to_s
    > + '_' + collfilestrip + '_' + file_name
    > FileUtils.cp(file_name,id_name)
    > else
    > filesuc +=1
    > digest_map_name[hex] = file_name
    > digest_map_directory[hex] = my_dirname
    > outfile.puts my_dirname + ':' + hex + ':' + file_name
    > id_name = global_temp_directory + file_name
    > FileUtils.cp(digest_map_name[hex],id_name)
    > end
    > end
    > outfile.close
    >
    > puts "Processed " + filecnt.to_s + " files, out of which " +
    > filesuc.to_s + " were not duplicates."
    > ----------------------------------------------


    It's not completely clear to me what you want to do here. Apparently you
    check a number of audio files and shove them somewhere else based on some
    criterion. What's the aim of doing this?

    Kind regards

    robert
    Robert Klemme, Dec 30, 2005
    #2
    1. Advertising

  3. Robert Klemme wrote:

    Thank you for your response! Quick and clarifying at the same time.

    > Sven Johansson <> wrote:


    > No, you get the script's path - although this will incidetally match with
    > the working directory when run in Windows (because the working directory
    > defaults to the script directory).
    >
    > You want File.expand_path like in
    >
    > >> File.expand_path('.')

    > => "/home/Robert"
    >
    > Now:
    >
    > working_dir = File.expand_path( Dir.getwd )
    > script_dir = File.expand_path( File.dirname(__FILE__) )


    Yes, indeed. All those work as advertised, even from the explorer
    shell. Thanks!

    > Your code in the first line has at least these problems:
    >
    > 1) You don't check for directories, i.e., you'll try to create MD5 of
    > directories as well.
    >
    > 2) You don't close files properly. You should use the block form of
    > File.open - that way file handles are always closed properly and timely.
    >
    > Alternatives
    >
    > Dir['*'].each {|f| File.open(f,'rb') {|io| print f, " ",
    > Digest::MD5.hexdigest(io.read), "\n" } if File.file? f}
    >
    > Dir['*'].each {|f| print f, " ", Digest::MD5.hexdigest(File.open(f,'rb')
    > {|io| io.read}), "\n" if File.file? f}
    >
    > I can't reproduce the problem you state (identical digests) with the other
    > lines of code. I tried
    >
    > Dir['*'].each {|f|print f, " "; puts Digest::MD5.hexdigest(File.read(f)) if
    > File.file? f}


    Using:
    require 'Digest/mp5'
    Dir['*'].each {|f|print f, " "; puts
    Digest::MD5.hexdigest(File.read(f)) if File.file? f}

    gives

    001.mp3 6ce4ad47bfa79b6c0e48636040c1dfb9
    002.mp3 6ce4ad47bfa79b6c0e48636040c1dfb9
    0022-042.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-043.ogg 5947035093bbfa22a9e7cf6e69b82a4e
    0022-044.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-045.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-046.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-047.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-048.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-049.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-050.ogg 5947035093bbfa22a9e7cf6e69b82a4e
    0022-057.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-058.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-059.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-061.ogg a7d6f03e275d69b363b9771c9d88e681
    0022-062.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-069.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-070.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-071.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-072.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-073.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-074.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-077.ogg 4cac5ea5e666942920aff937aa9b3ee5
    0022-078.ogg 5947035093bbfa22a9e7cf6e69b82a4e
    [snip]

    which clearly isn't good. However, both your suggestested alternatives
    above work just fine. It would seem that binary mode really is a must
    on Win32 - exhanging 'rb' for 'r' in those suggestions gives me the
    hash repeat problem again. Good to know.

    > But the problem here is that the file is not opened in binary mode which is
    > a must for this to work.


    Yes, so it would seem.

    > It's not completely clear to me what you want to do here. Apparently you
    > check a number of audio files and shove them somewhere else based on some
    > criterion. What's the aim of doing this?


    Oh, it works as it's supposed to do, so I'm not really trying to debug
    it. It takes the hashes of all the files in a directory, compares them
    to a global list of hashes, appends the new unique hashes to that list
    and moves the corresponding files someplace, moves files that already
    have "their" hashes in the list someplace else. The rest is just
    morphing file names.

    I was looking for more input along the line of "that's not how we do it
    in ruby - this is how we would express this particular sort of
    statement".

    I realise that the first thing I should do is probably to read the
    files by block instead of slurping them in wholesale, and that I would
    be far better off maintainging the global list of hashes in a DB
    instead of in a text file. I'll try my hands at the first, now that
    I've gotten the hash and filehandle issue resolved above... as for the
    second, taking a peek at this group reveals that making ruby talk with
    mysql on Win32 isn't for the faint of heart, so I'll let that be for
    now.

    Thanks again!
    /Sven
    Sven Johansson, Dec 30, 2005
    #3
  4. Sven Johansson <> wrote:
    > Robert Klemme wrote:
    >
    > Thank you for your response! Quick and clarifying at the same time.


    You're welcome!

    <snip/>

    > which clearly isn't good. However, both your suggestested alternatives
    > above work just fine. It would seem that binary mode really is a must
    > on Win32 - exhanging 'rb' for 'r' in those suggestions gives me the
    > hash repeat problem again. Good to know.


    When calculating the hash digest of a file binary mode is really the only
    reasonable thing to do it. Guess you just found another reason. :)

    >> It's not completely clear to me what you want to do here.
    >> Apparently you check a number of audio files and shove them
    >> somewhere else based on some criterion. What's the aim of doing
    >> this?

    >
    > Oh, it works as it's supposed to do, so I'm not really trying to debug
    > it. It takes the hashes of all the files in a directory, compares them
    > to a global list of hashes, appends the new unique hashes to that list
    > and moves the corresponding files someplace, moves files that already
    > have "their" hashes in the list someplace else. The rest is just
    > morphing file names.
    >
    > I was looking for more input along the line of "that's not how we do
    > it in ruby - this is how we would express this particular sort of
    > statement".


    Yes, I was aware of that. I just wanted to know the purpose of the code so
    I might be able to make more appropriate statements. :)

    > I realise that the first thing I should do is probably to read the
    > files by block instead of slurping them in wholesale, and that I would
    > be far better off maintainging the global list of hashes in a DB
    > instead of in a text file. I'll try my hands at the first, now that
    > I've gotten the hash and filehandle issue resolved above... as for the
    > second, taking a peek at this group reveals that making ruby talk with
    > mysql on Win32 isn't for the faint of heart, so I'll let that be for
    > now.


    The easiest way to store some arbitrary Ruby structure is to use YAML or
    Marshal. I'd probably do something like this:

    REPO_FILE = "repo.bin".freeze

    class Repository
    attr_accessor :main_dir, :duplicate_dir, :extensions

    def initialize(extensions = %w{mp3 ogg})
    @extension = extensions
    @repository = {}
    end

    def process_dir(dir)
    # find all files with the extensions we support
    Dir[File.join(dir, "*.{#{extensions.join(',')}}")].each do |f|
    process_file( File.join(dir, f) )
    end
    end

    def process_file(file)
    digest = digest(file)
    name = @repository[digest]

    if name
    target = duplicate_dir
    # ...
    else
    target = main_dir
    # ...
    end

    FileUtils.cp( file, File.join( target, File.basename( file ) ) )
    end

    def digest(file)
    Digest::MD5.hexdigest( File.open(file, 'rb') {|io| io.read})
    end

    def self.load(file)
    File.open(file, 'rb') {|io| Marshal.load(io)}
    end

    def save(file)
    File.open(file, 'wb') {|io| Marshal.dump(self, io)}
    end
    end


    repo = begin
    Repository.load( REPO_FILE )
    rescue Exception => e
    # not there => create
    r = Repository.new
    r.main_dir = "foo"
    r.duplicate_dir = "bar"
    r
    end

    ARGV.each {|dir| repo.process_dir(dir)}

    repo.save( REPO_FILE )

    The main point being here to encapsulate certain functionality into methods
    of their own. This greatly increases readability and reusability.

    Kind regards

    robert
    Robert Klemme, Dec 30, 2005
    #4
  5. Robert Klemme wrote:

    > The easiest way to store some arbitrary Ruby structure is to use YAML or
    > Marshal. I'd probably do something like this:
    >
    > REPO_FILE = "repo.bin".freeze
    >
    > class Repository
    > attr_accessor :main_dir, :duplicate_dir, :extensions
    >
    > def initialize(extensions = %w{mp3 ogg})
    > @extension = extensions
    > @repository = {}
    > end
    >
    > def process_dir(dir)
    > # find all files with the extensions we support
    > Dir[File.join(dir, "*.{#{extensions.join(',')}}")].each do |f|
    > process_file( File.join(dir, f) )
    > end
    > end
    >
    > def process_file(file)
    > digest = digest(file)
    > name = @repository[digest]
    >
    > if name
    > target = duplicate_dir
    > # ...
    > else
    > target = main_dir
    > # ...
    > end
    >
    > FileUtils.cp( file, File.join( target, File.basename( file ) ) )
    > end
    >
    > def digest(file)
    > Digest::MD5.hexdigest( File.open(file, 'rb') {|io| io.read})
    > end
    >
    > def self.load(file)
    > File.open(file, 'rb') {|io| Marshal.load(io)}
    > end
    >
    > def save(file)
    > File.open(file, 'wb') {|io| Marshal.dump(self, io)}
    > end
    > end
    >
    >
    > repo = begin
    > Repository.load( REPO_FILE )
    > rescue Exception => e
    > # not there => create
    > r = Repository.new
    > r.main_dir = "foo"
    > r.duplicate_dir = "bar"
    > r
    > end
    >
    > ARGV.each {|dir| repo.process_dir(dir)}
    >
    > repo.save( REPO_FILE )
    >
    > The main point being here to encapsulate certain functionality into methods
    > of their own. This greatly increases readability and reusability.


    Very informative indeed, if perhaps more than a bit humbling! Thank you
    again.

    One last question, then - while the style above is easily more readable
    and quite... enjoyable, for lack of a better word, to read, how does
    Ruby measure up when it comes to passing all those variables around to
    functions (method calls) all the time? Do I lose significant
    performance by having method calls in inner loops? And no, I can hear
    it already; "Dude, you traverse big directories, do calculations on a
    big number of big files and push the filesystem to it's limits copying
    them like there was no tomorrow already..." Obviously, it doesn't
    matter here. But would it matter if one was wrtiting, say, a port
    listener or some other reasonably performance critical application?
    Sven Johansson, Dec 31, 2005
    #5
  6. Sven Johansson <> wrote:

    <snip/>

    > One last question, then - while the style above is easily more
    > readable and quite... enjoyable, for lack of a better word, to read,
    > how does Ruby measure up when it comes to passing all those variables
    > around to functions (method calls) all the time? Do I lose significant
    > performance by having method calls in inner loops? And no, I can hear
    > it already; "Dude, you traverse big directories, do calculations on a
    > big number of big files and push the filesystem to it's limits copying
    > them like there was no tomorrow already..." Obviously, it doesn't
    > matter here. But would it matter if one was wrtiting, say, a port
    > listener or some other reasonably performance critical application?


    There are two ways to answer this: reasoning and testing. You'll get the
    definitive answer only by measuring performance of a real application. On
    the theoretical side we can state this: first, Ruby uses call by value but
    values are object references (i.e. objects are not copied as they are with
    call by value in C++ and there are two references so assignment does not
    affect the calling environment); this has rather low overhead compared to a
    real call by value where objects must be copied. Second, every method call
    has a certain overhead attached to it (unless a runtime system as the Java
    VM inlines it at run time).

    A simple test shows that there is indeed significant overhead attached to
    method invocations - if methods perform simple tasks. The relative overhead
    of course depends on the work the method performs. I for my part would
    always start with a modularized version and only inline methods if this is
    actually a cure for a performance problem. There is a famous quote about
    premature optimization...

    #! /usr/bin/env ruby

    require 'benchmark'

    REP = 1_000_000

    def foo(n) 0 + n end

    Benchmark.bmbm(10) do |bm|
    bm.report("direct") do
    REP.times { x = 0 + 1 }
    end

    bm.report("method") do
    REP.times { x = foo(1) }
    end
    end


    Rehearsal ---------------------------------------------
    direct 1.188000 0.000000 1.188000 ( 1.201000)
    method 2.156000 0.000000 2.156000 ( 2.166000)
    ------------------------------------ total: 3.344000sec

    user system total real
    direct 1.187000 0.000000 1.187000 ( 1.217000)
    method 2.172000 0.000000 2.172000 ( 2.234000)

    $ ruby -e 'puts 2.172000 / 1.187000'
    1.82982308340354

    Happy new year!

    robert
    Robert Klemme, Dec 31, 2005
    #6
  7. Tim Hammerquist wrote:

    > Just for our edification, would you run this following code on
    > those same files?
    >
    > require 'digest/md5'
    >
    > files = Dir['*'].select { |f| File.file?(f) }
    >
    > files.each { |filename|
    > fs_size = File.size(filename) # get size of file from OS
    >
    > data = File.read(filename) # read the file
    > data_size = data.length # get the size of the data read
    >
    > hash = Digest::MD5.hexdigest(data) # calculate hash
    >
    > # compare amount of data on filesystem
    > # with amount of data read
    > puts "#{hash} - #{filename}: #{data_size}/#{fs_size}"
    > }
    >


    Sure. Here it is:

    6ce4ad47bfa79b6c0e48636040c1dfb9 - 001.mp3: 52/50344
    6ce4ad47bfa79b6c0e48636040c1dfb9 - 002.mp3: 52/52468
    4cac5ea5e666942920aff937aa9b3ee5 - 0022-042.ogg: 335/141226
    5947035093bbfa22a9e7cf6e69b82a4e - 0022-043.ogg: 335/118208
    4cac5ea5e666942920aff937aa9b3ee5 - 0022-044.ogg: 335/178869
    4cac5ea5e666942920aff937aa9b3ee5 - 0022-045.ogg: 335/181622
    4cac5ea5e666942920aff937aa9b3ee5 - 0022-046.ogg: 335/154218
    4cac5ea5e666942920aff937aa9b3ee5 - 0022-047.ogg: 335/161483
    4cac5ea5e666942920aff937aa9b3ee5 - 0022-048.oog: 335/147162
    4cac5ea5e666942920aff937aa9b3ee5 - 0022-049.ogg: 335/145142
    5947035093bbfa22a9e7cf6e69b82a4e - 0022-050.ogg: 335/149968
    4cac5ea5e666942920aff937aa9b3ee5 - 0022-057.ogg: 335/161358
    4cac5ea5e666942920aff937aa9b3ee5 - 0022-058.ogg: 335/156026
    4cac5ea5e666942920aff937aa9b3ee5 - 0022-059.ogg: 335/176575
    a7d6f03e275d69b363b9771c9d88e681 - 0022-061.ogg: 335/148704
    4cac5ea5e666942920aff937aa9b3ee5 - 0022-062.ogg: 335/186715
    4cac5ea5e666942920aff937aa9b3ee5 - 0022-069.ogg: 335/173036
    4cac5ea5e666942920aff937aa9b3ee5 - 0022-070.ogg: 335/173752
    4cac5ea5e666942920aff937aa9b3ee5 - 0022-071.ogg: 335/173581
    [snip]

    Which... hmm... does this mean that File.read(filename) will only read
    as far as the first percieved end of line in the binary file? Here I
    thought that would slurp up the entire file no matter what, even if it
    played havoc with the "lines" of the file. Given that it seems to read
    as much per file for each file type, it would seem it just reads and
    hashes the file header before it encounters something that it considers
    to be an end of line. But then again, shouldn't all the hashes be
    identical for the same header - if they are not, you'd think it'd read
    somewhat more or less of the file?
    Sven Johansson, Jan 1, 2006
    #7
  8. Robert Klemme wrote:

    > A simple test shows that there is indeed significant overhead attached to
    > method invocations - if methods perform simple tasks. The relative overhead
    > of course depends on the work the method performs. I for my part would
    > always start with a modularized version and only inline methods if this is
    > actually a cure for a performance problem. There is a famous quote about
    > premature optimization...


    Heh. But how can we know if it premature unless we have an idea about
    how inefficient method calls are? :) Nevertheless, you point is well
    taken.

    > #! /usr/bin/env ruby
    >
    > require 'benchmark'
    >
    > REP = 1_000_000
    >
    > def foo(n) 0 + n end
    >
    > Benchmark.bmbm(10) do |bm|
    > bm.report("direct") do
    > REP.times { x = 0 + 1 }
    > end
    >
    > bm.report("method") do
    > REP.times { x = foo(1) }
    > end
    > end
    >
    >
    > Rehearsal ---------------------------------------------
    > direct 1.188000 0.000000 1.188000 ( 1.201000)
    > method 2.156000 0.000000 2.156000 ( 2.166000)
    > ------------------------------------ total: 3.344000sec
    >
    > user system total real
    > direct 1.187000 0.000000 1.187000 ( 1.217000)
    > method 2.172000 0.000000 2.172000 ( 2.234000)
    >
    > $ ruby -e 'puts 2.172000 / 1.187000'
    > 1.82982308340354


    Ah, interesting. And a fine exemple of how to use the internal
    benchmarking support for us Ruby newbies. More to the point, it shows
    that it isn't so bad - I was thinking of order-of-magnitude losses, and
    here is merely a factor two or so, and that's for essentially empty
    method bodies... it will do. :)

    > Happy new year!


    To you as well!
    /Sven
    Sven Johansson, Jan 1, 2006
    #8
  9. Sven Johansson

    Ryan Davis Guest

    On Jan 1, 2006, at 8:12 AM, Sven Johansson wrote:

    >> require 'benchmark'
    >>
    >> REP = 1_000_000
    >>
    >> def foo(n) 0 + n end
    >>
    >> Benchmark.bmbm(10) do |bm|
    >> bm.report("direct") do
    >> REP.times { x = 0 + 1 }
    >> end
    >>
    >> bm.report("method") do
    >> REP.times { x = foo(1) }
    >> end
    >> end

    >
    > Ah, interesting. And a fine exemple of how to use the internal
    > benchmarking support for us Ruby newbies. More to the point, it shows
    > that it isn't so bad - I was thinking of order-of-magnitude losses,
    > and
    > here is merely a factor two or so, and that's for essentially empty
    > method bodies... it will do. :)


    No, you discovered the difference between 1 method invocation (Fixnum.
    +) and 2 (Kernel.foo and Fixnum.+). If you are worried about times,
    I'd look at using a good profiler instead of the benchmarks so you
    can get insight on where your time is actually being spent (it sure
    isn't on Fixnum.+). Don't use the standard profiler, use zenprofiler
    or shugo's profiler.

    --
    - http://blog.zenspider.com/
    http://rubyforge.org/projects/ruby2c/
    http://rubyforge.org/projects/rubyinline/
    Ryan Davis, Jan 2, 2006
    #9
  10. Sven Johansson

    Ryan Davis Guest

    On Jan 1, 2006, at 8:12 AM, Sven Johansson wrote:


    >> require 'benchmark'
    >>
    >> REP = 1_000_000
    >>
    >> def foo(n) 0 + n end
    >>
    >> Benchmark.bmbm(10) do |bm|
    >> bm.report("direct") do
    >> REP.times { x = 0 + 1 }
    >> end
    >>
    >> bm.report("method") do
    >> REP.times { x = foo(1) }
    >> end
    >> end
    >>

    >
    > Ah, interesting. And a fine exemple of how to use the internal
    > benchmarking support for us Ruby newbies. More to the point, it shows
    > that it isn't so bad - I was thinking of order-of-magnitude losses,
    > and
    > here is merely a factor two or so, and that's for essentially empty
    > method bodies... it will do. :)
    >


    No, you discovered the difference between 1 method invocation (Fixnum.
    +) and 2 (Kernel.foo and Fixnum.+). "0 + 1" is a method invocation
    just like any other. You can see that by using ParseTree's
    parse_tree_show utility:

    [:call, [:lit, 0], :+, [:array, [:lit, 1]]]

    If you are worried about times, I'd look at using a good profiler
    instead of the benchmarks so you can get insight on where your time
    is actually being spent (it sure isn't on Fixnum.+). Don't use the
    standard profiler, use zenprofiler or shugo's profiler.

    --
    - http://blog.zenspider.com/
    http://rubyforge.org/projects/ruby2c/
    http://rubyforge.org/projects/rubyinline/
    Ryan Davis, Jan 2, 2006
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. William Parker
    Replies:
    3
    Views:
    552
    Hans Kesting
    Jun 25, 2004
  2. =?Utf-8?B?TmV3Ymll?=
    Replies:
    1
    Views:
    415
  3. Replies:
    13
    Views:
    431
    Thomas Nelson
    Aug 3, 2006
  4. Replies:
    10
    Views:
    539
  5. Murali
    Replies:
    2
    Views:
    545
    Jerry Coffin
    Mar 9, 2006
Loading...

Share This Page