IO#sysread on windows

P

pihentagy

Hi!

I tried to write a file dupe finder. For this to work, I created an
improved File::Stat, like this:

class File::StatWithSha < File::Stat
attr_reader :filename, :read
def initialize fn
@filename=File.expand_path fn
@read = 0
super fn
end
def sha1sum
return @sha1sum if @sha1sum ||= nil
warn "Calculating sha1sum for #@filename"
chunk = nil
fs = 0
d = Digest::SHA1.new
File.open(filename) {|f|
begin
while chunk = f.sysread(1048576)
fs += chunk.length
d.update(chunk)
end
rescue EOFError
warn "\nResult is #{d} #{fs} <=> #{self.size}"
return @sha1sum = d
rescue e
warn "Holy shit! #{e}"
end
}
warn "Oh my god!"
exit
end
def inspect; @filename;end
end


When under windows, it fails with both ruby1.8.2 and ruby1.8.4

irb(main):006:0> fws.sha1sum
Calculating sha1sum for F:/private/prg/ruby/g2.rb
Chunk is 2113

Result is c75de1a39ce389e7e198c97345ffad52b074e5e9 2113 <=> 2210
=> c75de1a39ce389e7e198c97345ffad52b074e5e9

Under linux it works fine.
Anyway, how should I calculate the sha1sum of a BIG file, just using
ruby?
 
T

Tim Hunter

pihentagy said:
Hi!

I tried to write a file dupe finder. For this to work, I created an
improved File::Stat, like this:

class File::StatWithSha < File::Stat
attr_reader :filename, :read
def initialize fn
@filename=File.expand_path fn
@read = 0
super fn
end
def sha1sum
return @sha1sum if @sha1sum ||= nil
warn "Calculating sha1sum for #@filename"
chunk = nil
fs = 0
d = Digest::SHA1.new
File.open(filename) {|f|
begin
while chunk = f.sysread(1048576)
fs += chunk.length
d.update(chunk)
end
rescue EOFError
warn "\nResult is #{d} #{fs} <=> #{self.size}"
return @sha1sum = d
rescue e
warn "Holy shit! #{e}"
end
}
warn "Oh my god!"
exit
end
def inspect; @filename;end
end


When under windows, it fails with both ruby1.8.2 and ruby1.8.4

irb(main):006:0> fws.sha1sum
Calculating sha1sum for F:/private/prg/ruby/g2.rb
Chunk is 2113

Result is c75de1a39ce389e7e198c97345ffad52b074e5e9 2113 <=> 2210
=> c75de1a39ce389e7e198c97345ffad52b074e5e9

Under linux it works fine.


Probably you should open the files with "rb" instead of letting it
default to "r".
Anyway, how should I calculate the sha1sum of a BIG file, just using
ruby?

For finding dups, I wonder if it's useful to compare checksums unless
you've already computed them in advance. I notice that Ruby's own
FileUtils.install checks filea == fileb by simply comparing the files
until it finds a difference or gets to EOF.
 
R

Robert Klemme

Tim said:
For finding dups, I wonder if it's useful to compare checksums unless
you've already computed them in advance. I notice that Ruby's own
FileUtils.install checks filea == fileb by simply comparing the files
until it finds a difference or gets to EOF.

It depends. If you want to find duplicates in a set of files then using
the digest as hash key can make finding duplicates much faster. OTOH if
you can detect candidates by looking at other attributes (size,
mtime...) then the additional overhead for the checksum calculation
might slow things down. It depends - as always. :)

Btw, I don't see a reason to use sysread in this scenario. read will do.

Kind regards

robert
 
P

pihentagy

Tim said:
Probably you should open the files with "rb" instead of letting it
default to "r".
Holy s**t! Since I tried and failed on textfiles, I don't know why does
it count anyway.
Ah, that damned \r\n - \n transformation I guess.
For finding dups, I wonder if it's useful to compare checksums unless
you've already computed them in advance. I notice that Ruby's own
FileUtils.install checks filea == fileb by simply comparing the files
until it finds a difference or gets to EOF.
Well, first I'd like to partition files based on filesize. And after
that, I compare them.
If you have more than 2 files having the same size, it's better to
calculate sha1sum for all the files involved once. And, if you'd like
to live on the safe side, you can compare by content the files having
the same sha1sum.
And, you can improve caching sha1sums (say in a file in every
directory).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top