Windows has block sizes too, but File.stat(...).blksize returns nil.
(With Win XP Pro, Ruby 1.8.2)
But you can hardcode block sizes: Find or create a small file (a few
hundred bytes or less) and select 'properties' in Windows Explorer. It
says something like "size: 112 bytes. size on disk: 4096 bytes". 'Size
on disk' is the block size.
Sysread/write on my system seems to benefit from using a block-sized
buffer, it is slower with a buffer twice or half the size of the
block. Thanks for the tip
Plain buffered read/write appears to be less sensitive to buffer size,
but performs best with about twice the buffer size. There seems to be
little to distinguish buffered standard read/write from buffered
sysread/write.
Timings on a laptop for read/write of a 1.2 GB file:
- 3.5 min: buffered plain read/write (buffer 8192), buffered
sysread/write (buffer 4096)
- 17 min: File#each
Just in case the records are not fixed size.
jf
BTW: There are two more scenarios:
- If there is only one record to remove each time the file is opened,
it may be possible to use read/write mode (a+) and update the file in
place: Use IO#seek to go to the entry, and move all blocks following
the deleted entry forward. On average you save the writing of half the
file. If there are dozens of records, there is little gain. (Because
the first one is likely to be relatively close to the start of the
file.)
- Use 'lazy delete': Merely overwrite the record(s) with blanks,
nulls, newlines, whatever, or mark them as deleted in some other
fashion. The file keeps the same size, and all entries have the same
file position as before. Repackage the file once in a while, removing
the blank entries, when they start to take up a significant proportion
of the file size.
This is clearly the best-performing solution by far, but other
programs using the file may need to be updated to recognize the 'this
entry is deleted' marking.