Is there any standard way to sort a disk file?

P

Peter Olcott

I have used the standard template library quite a bit, and std::sort has been
very useful. Now I have reached the point where my memory requirements are
exceeded. Is there some standard way to sort a binary file with a specifically
defined structure?

The only thing that I could think of would be to std::sort portions of this file
in memory, and then later manually merge these sorted portions together using a
home brewed merge sort . Does anyone have any better ideas?
 
A

Alf P. Steinbach

* Peter Olcott:
> [bad cross-posting]

Please don't cross-post to [clc++] and Microsoft-specific groups.

I have used the standard template library quite a bit, and std::sort has been
very useful. Now I have reached the point where my memory requirements are
exceeded. Is there some standard way to sort a binary file with a specifically
defined structure?

Yes, COBOL does that.

That does not mean you should cross-post to [clc++] and [comp.lang.cobol].

The only thing that I could think of would be to std::sort portions of this file
in memory, and then later manually merge these sorted portions together using a
home brewed merge sort . Does anyone have any better ideas?

Follow-ups set to the two Microsoft groups.
 
T

Thomas J. Gritzan

Peter said:
I have used the standard template library quite a bit, and std::sort has been
very useful. Now I have reached the point where my memory requirements are
exceeded. Is there some standard way to sort a binary file with a specifically
defined structure?

Load the key and some index as pair into memory and sort it by key.

After that, you can build a new file from the old file in the right
order using the index.
 
P

Peter Olcott

Thomas J. Gritzan said:
Load the key and some index as pair into memory and sort it by key.

After that, you can build a new file from the old file in the right
order using the index.

That won't work in my case because all the data is used as the key.
 
J

Jerry Coffin

I have used the standard template library quite a bit, and std::sort has been
very useful. Now I have reached the point where my memory requirements are
exceeded. Is there some standard way to sort a binary file with a specifically
defined structure?

The only thing that I could think of would be to std::sort portions of this file
in memory, and then later manually merge these sorted portions together using a
home brewed merge sort . Does anyone have any better ideas?

Yup, that seems pretty much correct. A few points: first of all,
std::merge may simplify writing the merge portion of the sort. It'll
handle most of the details of an individual merge run, so from there
it's mostly a matter of deciding what merge pattern you want.
Traditionally, the polyphase merge was generally considered the best,
but I can't say I've done any recent testing to see how well it does
for hard-drive based sorting. My immediate reaction is that it'll
probably do quite well, but may not improve efficiency enough to be
worth the trouble.

To create your sorted runs, you're probably better off using a
priority_queue than reading data into something like a vector,
sorting it, and then writing it out. In particular, a priority_queue
will typically allow you to create longer runs in your initial sorted
files -- as you write out each record, you can read in another from
the original input. As long as it doesn't precede those that have
already been written out, you can add it to the priority queue, and
from there it'll end up in the same file. On average this can nearly
double the length of the initial runs, which generally helps quite a
bit.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,522
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top