Bill Kelly said:
Hi - does the file really contain text lines? Or is it a file
full of binary data. If it's a binary file, there may be no
guarantee the whole thing isn't one very long "line". In that
case I'd recommend reading it in chunks.
Untested:
md5 = Digest::MD5.new()
File.open(file, 'rb') do |io|
while (buf = io.read(4096)) && buf.length > 0
md5.update(buf)
end
end
io.read will return nil at EOF so your test for positive length is basically
obsolete. Also, for reasons of error checking I'd place the digest creation
inside the block because then the digest is never created if the file cannot
be opened:
md5 = File.open(file, 'rb') do |io|
dig = Digest::MD5.new
while (buf = io.read(4096))
dig.update(buf)
end
dig
end
If you want to increase efficiency, you can do this, which will prevent new
strings to be created as buffers all the time:
md5 = File.open(file, 'rb') do |io|
dig = Digest::MD5.new
buf = ""
while io.read(4096, buf)
dig.update(buf)
end
dig
end
Here's another nice variant:
md5 = File.open(file, 'rb') do |io|
dig = Digest::MD5.new
buf = ""
dig.update(buf) while io.read(4096, buf)
dig
end
Kind regards
robert