I was thinking how you might go about writing a sort that could handle\nmore data than could fit in RAM. It handled the problem is Abundance\nby checkpointing the app to disk to free up maximum RAM, then spawning\na copy of Opt-Tech sort. My records were roughly like DataOutputStream\nwould produce, so I could automatically generate the command script\nsort the fields in any way I wanted.\n\nI thought you might pull it off in Java this way.\n\n1. You write a comparator as if you were going to sort Objects in an\nArrayList.\n\n2. the external sort has an add method that also takes collections.\n\nIt accepts a "chunk" of records, and sorts them using Sun's sort.\n\nThen it writes them out as SERIALISED objects in heavily buffered\nstream. There may be some way to do a partial reset after each object\nto speed it up.\n\nThen you repeat collecting, sorting and writing another batch to\nanother file. \n\nWhen you have created N files, you recycle, appending. (Optimal N to\nbe determined by experiment). Ideally each file would be on a\ndifferent physical drive.\n\nThen when all the records have been added, you start merging chunks\ninto longer chunks, and writing out the longer chunks. Each N-way\nmerge cuts the number of chunks by 1/N and increases the length of the\nchunks N times. \n\non the final merge pass does not happen until the user invokes the\nIterator to hand over the resulting records.\n\nAnother way it might be done is the records to be sorted must by byte\narrays, chunks effectively produced by DataOutputStream. You specify\noffset, length and key type e.g.\nint, byte, short, float, double, String.\n\nThis would require a detailed knowledge of the bit structure of the\nrecords, the way you did in the olden days of assembler and C.\n\nThis would be clumsier to use, but would avoid the overhead of\npickling and reconstituting records on every pass.\n\nThen of course, there is the possibility someone has already solved\nthis and done it well.\n\nThe universe has a sneaky habit. Problems start out small, and it\nlooks like a purely in RAM solution is perfectly adequate. Then they\nbit by bit grow and grow and start pushing the limits of the RAM\nsolution. Suddenly you are faced with a major redesign.