IO#sysread on windows

pihentagy · Jun 14, 2006

Hi!

I tried to write a file dupe finder. For this to work, I created an
improved File::Stat, like this:

class File::StatWithSha < File::Stat
attr_reader :filename, :read
def initialize fn
@filename=File.expand_path fn
@read = 0
super fn
end
def sha1sum
return @sha1sum if @sha1sum ||= nil
warn "Calculating sha1sum for #@filename"
chunk = nil
fs = 0
d = Digest::SHA1.new
File.open(filename) {|f|
begin
while chunk = f.sysread(1048576)
fs += chunk.length
d.update(chunk)
end
rescue EOFError
warn "\nResult is #{d} #{fs} <=> #{self.size}"
return @sha1sum = d
rescue e
warn "Holy shit! #{e}"
end
}
warn "Oh my god!"
exit
end
def inspect; @filename;end
end

When under windows, it fails with both ruby1.8.2 and ruby1.8.4

irb(main):006:0> fws.sha1sum
Calculating sha1sum for F:/private/prg/ruby/g2.rb
Chunk is 2113

Result is c75de1a39ce389e7e198c97345ffad52b074e5e9 2113 <=> 2210
=> c75de1a39ce389e7e198c97345ffad52b074e5e9

Under linux it works fine.
Anyway, how should I calculate the sha1sum of a BIG file, just using
ruby?

Tim Hunter · Jun 14, 2006

pihentagy said:
Hi!

I tried to write a file dupe finder. For this to work, I created an
improved File::Stat, like this:

class File::StatWithSha < File::Stat
attr_reader :filename, :read
def initialize fn
@filename=File.expand_path fn
@read = 0
super fn
end
def sha1sum
return @sha1sum if @sha1sum ||= nil
warn "Calculating sha1sum for #@filename"
chunk = nil
fs = 0
d = Digest::SHA1.new
File.open(filename) {|f|
begin
while chunk = f.sysread(1048576)
fs += chunk.length
d.update(chunk)
end
rescue EOFError
warn "\nResult is #{d} #{fs} <=> #{self.size}"
return @sha1sum = d
rescue e
warn "Holy shit! #{e}"
end
}
warn "Oh my god!"
exit
end
def inspect; @filename;end
end

When under windows, it fails with both ruby1.8.2 and ruby1.8.4

irb(main):006:0> fws.sha1sum
Calculating sha1sum for F:/private/prg/ruby/g2.rb
Chunk is 2113

Result is c75de1a39ce389e7e198c97345ffad52b074e5e9 2113 <=> 2210
=> c75de1a39ce389e7e198c97345ffad52b074e5e9

Under linux it works fine.

Probably you should open the files with "rb" instead of letting it
default to "r".

Anyway, how should I calculate the sha1sum of a BIG file, just using
ruby?

For finding dups, I wonder if it's useful to compare checksums unless
you've already computed them in advance. I notice that Ruby's own
FileUtils.install checks filea == fileb by simply comparing the files
until it finds a difference or gets to EOF.

Robert Klemme · Jun 14, 2006

Tim said:
For finding dups, I wonder if it's useful to compare checksums unless
you've already computed them in advance. I notice that Ruby's own
FileUtils.install checks filea == fileb by simply comparing the files
until it finds a difference or gets to EOF.

It depends. If you want to find duplicates in a set of files then using
the digest as hash key can make finding duplicates much faster. OTOH if
you can detect candidates by looking at other attributes (size,
mtime...) then the additional overhead for the checksum calculation
might slow things down. It depends - as always.

Btw, I don't see a reason to use sysread in this scenario. read will do.

Kind regards

robert

pihentagy · Jun 14, 2006

Tim said:
Probably you should open the files with "rb" instead of letting it
default to "r".

Holy s**t! Since I tried and failed on textfiles, I don't know why does
it count anyway.
Ah, that damned \r\n - \n transformation I guess.

For finding dups, I wonder if it's useful to compare checksums unless
you've already computed them in advance. I notice that Ruby's own
FileUtils.install checks filea == fileb by simply comparing the files
until it finds a difference or gets to EOF.

Well, first I'd like to partition files based on filesize. And after
that, I compare them.
If you have more than 2 files having the same size, it's better to
calculate sha1sum for all the files involved once. And, if you'd like
to live on the safe side, you can compare by content the files having
the same sha1sum.
And, you can improve caching sha1sums (say in a file in every
directory).

*.rb not running on Windows any more	2	Jul 20, 2006
A patch for irb, where to submit?	11	Jan 14, 2005
Errors on REXML reading an HTML.	1	Dec 24, 2010
Win32OLE + DRb - Windows = Fun	2	Feb 10, 2006
[ANN] Subversion Handy Backup (SHB)	1	Jan 25, 2006
Problem with win32 change notify, dbi and sql server	3	Dec 27, 2006
Optimization help - reading out of /proc on Solaris	4	Sep 16, 2008
[SUMMARY] NDiff (#46)	2	Sep 15, 2005

IO#sysread on windows

pihentagy

Tim Hunter

Robert Klemme

pihentagy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads