D
David King Landrith
Hello everyone,
I'm having an IO problem. The code below is designed to read a series
of text files (most of these are about 8k) and output them into a
single text file separated by form feed characters. This works very
quickly with sets of files that are in directories containing (say) 90K
documents or fewer. But when there are 200k + documents in the
directories, it begins a substantial amount of time.
The main problem seems to be that sysread seems to be painfully slow
with files in very large directories. At this point, it would be
faster to do the following:
system("cat #{input_file} >> #{outputFile}")
system("echo \f >> #{outputFile}")
It seems to me that there is no way that this should be faster than
doing a sysread.
Any help would be appreciated.
Best,
Dave
-- begin code (This has been cleaned a bit and changed to protect the
innocent)
# docInfo object is a wrapper for pages array with some additional info
outputFile = docInfo.outputFile
output = nil
isOpen = false
chunk = (10240 * 2) # 20k
fmode = File::CREAT|File::TRUNC|File::WRONLY
begin
docInfo.each do | pageInfo |
pageNo = pageInfo.pageNo
start = 0
count = chunk
begin
# open source
input = File.open(pageInfo.inputFile)
fileSize = input.stat.size
# open destination if not already open
unless isOpen
output = File.open(outputFile, fmode, 0664)
isOpen = true
end
# loop to make sure that no
while start < fileSize
count = (fileSize - start) if (start + chunk) > fileSize
output.syswrite(input.sysread(count))
start += count
end
output.syswrite("\f")
ensure
begin
input.close
rescue Exception => err
STDERR << "WARNING: couldn't close #{inputFile}\n"
end
end
end
ensure
begin
output.close if isOpen
rescue Exception
STDERR << "WARNING: couldn't close #{outputFile}\n"
end
end
--end code
I'm having an IO problem. The code below is designed to read a series
of text files (most of these are about 8k) and output them into a
single text file separated by form feed characters. This works very
quickly with sets of files that are in directories containing (say) 90K
documents or fewer. But when there are 200k + documents in the
directories, it begins a substantial amount of time.
The main problem seems to be that sysread seems to be painfully slow
with files in very large directories. At this point, it would be
faster to do the following:
system("cat #{input_file} >> #{outputFile}")
system("echo \f >> #{outputFile}")
It seems to me that there is no way that this should be faster than
doing a sysread.
Any help would be appreciated.
Best,
Dave
-- begin code (This has been cleaned a bit and changed to protect the
innocent)
# docInfo object is a wrapper for pages array with some additional info
outputFile = docInfo.outputFile
output = nil
isOpen = false
chunk = (10240 * 2) # 20k
fmode = File::CREAT|File::TRUNC|File::WRONLY
begin
docInfo.each do | pageInfo |
pageNo = pageInfo.pageNo
start = 0
count = chunk
begin
# open source
input = File.open(pageInfo.inputFile)
fileSize = input.stat.size
# open destination if not already open
unless isOpen
output = File.open(outputFile, fmode, 0664)
isOpen = true
end
# loop to make sure that no
while start < fileSize
count = (fileSize - start) if (start + chunk) > fileSize
output.syswrite(input.sysread(count))
start += count
end
output.syswrite("\f")
ensure
begin
input.close
rescue Exception => err
STDERR << "WARNING: couldn't close #{inputFile}\n"
end
end
end
ensure
begin
output.close if isOpen
rescue Exception
STDERR << "WARNING: couldn't close #{outputFile}\n"
end
end
--end code