Sorting records using sort()

E

Elijah Bailey

I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;


which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

I thought of tricking sort() using a dummy Record class that is
templated using the size of the record...But since m can change
dynamically this doesnt work.


Thanks in advance for your comments,
--Elijah
 
P

Peter Koch Larsen

Elijah Bailey said:
I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;


which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

I thought of tricking sort() using a dummy Record class that is
templated using the size of the record...But since m can change
dynamically this doesnt work.


Thanks in advance for your comments,
--Elijah


Make your own iterator class, which really should not be so difficult. Then
you can use all the C++ algorithms on your container, including std::sort.

Kind regards
Peter
 
B

Bo Persson

Elijah Bailey said:
I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;


which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

Well, if the data is BIG, the standard method IS of course to sort an array
of pointers...

It will be MUCH faster than actually moving all the data around!

Bo Persson
 
R

Ron Natalie

Elijah Bailey said:
Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

You're confusing me. How is it this data is stored if you don't have
pointers to it. Are you telling me it's all in contiguous memory and you
want it sorted in place without using any additional storage? The standard
C routine qsort will do this.

qsort(ptr_to_beginning, m, n, compare_func);

The compare func has to return positive if a > b, 0 if a==b and negative if a<b.
(It takes two void*'s as args).
 
J

Jeff Schwab

Elijah said:
I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;


which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

I thought of tricking sort() using a dummy Record class that is
templated using the size of the record...But since m can change
dynamically this doesnt work.


Do all records in "data" have the same size m, or do different records
have different values of m?
 
J

John Potter

"Elijah Bailey" <[email protected]> skrev i en meddelelse
Make your own iterator class, which really should not be so difficult. Then
you can use all the C++ algorithms on your container, including std::sort.

Note his thoughts. Now tell us what the value type of that iterator
will be. Sort will likely want to create a temporary of that type.

John
 
E

Elijah Bailey

Is there no easy way to solve this problem??
I dont want to use qsort from C...! I cant figure
out how to even write a random access iterator
for this problem. What will be its value type?

I would like to swap the records to sort them,
not copy keys and sort them separately...

Thanks in advance for your help,

Thanks ,
--Elijah
 
R

Ron Natalie

Elijah Bailey said:
Is there no easy way to solve this problem??
I dont want to use qsort from C...

Why not? The std::sort isn't guaranteed to work any better.
 
J

Jeff Schwab

Elijah said:
I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;

which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

I thought of tricking sort() using a dummy Record class that is
templated using the size of the record...But since m can change
dynamically this doesnt work.
I dont want to use qsort from C...! I cant figure
out how to even write a random access iterator
for this problem. What will be its value type?

I would like to swap the records to sort them,
not copy keys and sort them separately...


If you want to use an existing implementation of a sorting algorithm,
you have to provide a way for the implementation to know where records
begin and end, and how to compare them. This issue is fundamental.

Writing your own iterators would not be trivial, but probably would be a
educational. Having such iterators available would make your data
structure compatible with most of the standard library, and might avoid
countless future headaches.

If you just want to sort your records in place, why not use qsort? It's
part of the C++ standard library. The fact that the function was
inherited from C is no reason to avoid qsort. In fact, it sounds like
almost the perfect fit to your problem.

-Jeff
 
R

Ron Natalie

Jeff Schwab said:
Writing your own iterators would not be trivial, but probably would be a
educational. Having such iterators available would make your data
structure compatible with most of the standard library, and might avoid
countless future headaches.

The trouble with writing iterators for this problem is that he needs it to be
variable size at run time and that the iterators have to be derferenceable
and the derefenced types assignable. Add his requirement to not allocate
even enough memory to hold pointers to each object, I can't figure out how
you could do it.
 
E

Elijah Bailey

Thanks Ron, I think that is definitely a problem.
But I still "think" it can be done, dont know how to thou...:)
 
M

Martijn Lievaart

On Fri, 02 Jan 2004 03:13:26 -0800, Elijah Bailey wrote:

[ Please don't top post, thank you. M4 ]

[ Crossposted to csc++, is this standard complient? ]

[ Short description of the problem ]

We have an array of n*m bytes. It holds n objects of size m. Both are only
known at runtime. For some strange reason we want to sort this using
std::sort. We know there are other solutions, the question is, can it be
done using std::sort.

Constraints: Not clear, but memory seems to be an issue, creating an array
of pointers and sort that is out.
Thanks Ron, I think that is definitely a problem. But I still "think" it
can be done, dont know how to thou...:)

I also think it can be done. However, it requires so much trickery that it
is not interesting to do so. Any of the other options quickly become very
appealing. My approach uses some dynamic memory, but probably less than an
array of pointers would take. The actual amount depends on the number of
copies std::sort makes internally, there is no way around this. I guess
for a typical implementation this would be around log(n) copies, unless
the implementation takes special care to dispose of temporaries. You
cannot count on any of this though.

OK, how can we do it? Obviously every iterator must know the size of the
object it is supposed to handle. The main problems we are facing:

- We cannot create real objects (directly), the size of the real object is
only known at runtime.
- std::sort is allowed to (and most frequently does) make extra copies of
objects.

Asusmptions:
- There are functions to copy these objects and to compare these objects.
- The objects to be sorted don't need special construction or destruction
and can be copied by the previously mentioned routine. They can be
constructed by copying into a new memory area.
- std::sort never uses pointers to objects, only iterators. I think the
standard mandates this, but couldn't find it.

We create a proxy class and let the iterator return proxy objects. These
proxy objects store a pointer to some master descriptor that stores the
location and size of the array and the size of the individual objects. We
could store this in the proxy object itself, but that seems wasteful.
Also, the proxy object stores a void* to store the data for the real
object.

This proxy class is what our iterators value_type is.

Now the proxy object implements some intelligent constructors. I doubt we
need a default constructor, the value type needs to be assignable only
(this follows from the iterator requirements). So we create a constructor
that is used only inside the iterator, it sets the data pointer to point
inside the array at the correct location. If the copy constructor is
called (which would be by std::sort) we dynamically allocate memory to
hold the object and use the copying routine to fill it.

The assignment operator must use the copying routine if the destination is
inside the array. If not, it can either use copy-construct-and-swap or the
copying routine.

The destructor would check if the pointer points inside the array and if
not deletes the object.

Obviously, we need to define an operator< for the proxy objects, this
calls the comparison function above. If we template the proxy objects, we
can use a functor for the comparison, potentially inlining the comaparison
(this would be the only reason to use std::sort anyhow, so it makes
sense).

Looking at the standard, this would fullfill all requirements for
std::sort I could find. Somehow I think I did overlook something here, so
scrutiny by others is appreciated. I also left obvious details out,
obviously :)

A note on exceptions. This would give the basic guarentee, if the copying
and comparison routines give at least the basic guarentee.

A note on memory usage. Assuming that std::sort holds log(n) copies of
objects internally, this aproach uses less memory than the
sort-array-of-pointers aproach if log(n)*m < n*sizeof(void*). So this
greatly depends on these variables, but quickly holds for large n and gets
less interesting for large m. Assume sizeof(void*)==4, we get m <
4n/log(n). For n=100, m should be less than 60. For n=1000, m should be
less than 400. These are gross approximations but should give you a feel.

I hope anyone sees that you must have good reasons to do this. It is ugly,
errorprone, prone to assumptions your implementation of std::sort makes
(in this implementation &*it does not return a pointer to the real object,
any pointer arithmetic will fail), difficult to maintain. There is no real
reason why you should not call qsort in the first place, except as an
accademic exercise.

HTH,
M4
 
F

Francis Glassborow

Elijah said:
I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;


which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

I thought of tricking sort() using a dummy Record class that is
templated using the size of the record...But since m can change
dynamically this doesnt work.

Sorry to be so slow off the mark.

But if n and m are only known at execution time the array must be
created dynamically which immediately makes me think of using a vector.
The following trivial program demonstrates that.

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

int main(){
int n, m;
cout << "n and m ";
cin >> n >> m;
vector< vector<char> > a(n, m);
for(int i(0); i != n; ++i){
for(int j(0); j != m; ++j){
a[j] = 'a' - i - j;
}
}

for(int i(0); i != n; ++i){
for(int j(0); j != m; ++j){
cout << a[j] << " ";
}
cout << '\n';
}
sort(a.begin(), a.end());
cout << "\n\n";
for(int i(0); i != n; ++i){
for(int j(0); j != m; ++j){
cout << a[j] << " ";
}
cout << '\n';
}
}
 
J

John Potter

In message <[email protected]>, Elijah

Requirement for no added space proportional to size.
But if n and m are only known at execution time the array must be
created dynamically which immediately makes me think of using a vector.

Just use an array of size structs containing three pointers each.

I think you missed the requirements.

John
 
J

Jeff Schwab

Francis said:
Sorry to be so slow off the mark.

But if n and m are only known at execution time the array must be
created dynamically which immediately makes me think of using a vector.

This problem was posed by someone who gets handed a char*, and does not
want to copy the data.

I came to almost the same conclusions as Martijn. I don't think
std::sort needs (or uses, in the version shipped with GCC) log_n copies
of data; a few simultaneous copies are, however, needed. I naively
assumed that *no* copies would be needed, and that implementing "swap"
would be sufficient for avoiding the need of any temporaries. Silly me.

Anyway, the code I've got now seems to work *except* that all of the
data get overwritten by one of the values, due to the fact that all
records (even copies made by std::sort) alias the actual data. I'm sure
this could be fixed with a little bit of effort. Please let me know if
there is sufficient interest for me to post the code I've got (546
lines, lots of white space).

-Jeff
 
F

Francis Glassborow

On Fri, 02 Jan 2004 03:13:26 -0800, Elijah Bailey wrote:

[ Please don't top post, thank you. M4 ]

[ Crossposted to csc++, is this standard complient? ]

[ Short description of the problem ]

We have an array of n*m bytes. It holds n objects of size m. Both are only
known at runtime. For some strange reason we want to sort this using
std::sort. We know there are other solutions, the question is, can it be
done using std::sort.

Constraints: Not clear, but memory seems to be an issue, creating an array
of pointers and sort that is out.

But if n and m are only known at execution time the array must be
created dynamically which immediately makes me think of using a vector.
The following trivial program demonstrates that.

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

int main(){
int n, m;
cout << "N and m";
cin >> n >> m;
vector< vector<char> > a(n, m);
for(int i(0); i != n; ++i){
for(int j(0); j != m; ++j){
a[j] = 'a' - i - j;
}
}

for(int i(0); i != n; ++i){
for(int j(0); j != m; ++j){
cout << a[j] << " ";
}
cout << '\n';
}
sort(a.begin(), a.end());
cout << "\n\n";
for(int i(0); i != n; ++i){
for(int j(0); j != m; ++j){
cout << a[j] << " ";
}
cout << '\n';
}
}
 
E

Elijah Bailey

On Fri, 02 Jan 2004 03:13:26 -0800, Elijah Bailey wrote:

[ Please don't top post, thank you. M4 ]

[ Crossposted to csc++, is this standard complient? ]

[ Short description of the problem ]

We have an array of n*m bytes. It holds n objects of size m. Both are only
known at runtime. For some strange reason we want to sort this using
std::sort. We know there are other solutions, the question is, can it be
done using std::sort.

Thanks Martijn. This is a much better description than i came up with.
Constraints: Not clear, but memory seems to be an issue, creating an array
of pointers and sort that is out.

Constraints: O(n) space is not allowed. So you can not create a
pointer
to each object and sort them. O(log(n)) space is perfectly ok.
O(log(n)*m) is also allowed if that simplifies things.

My data has n >> m, i.e. n is much greater than m.
I also think it can be done. However, it requires so much trickery that it
is not interesting to do so. Any of the other options quickly become very
appealing. My approach uses some dynamic memory, but probably less than an
array of pointers would take. The actual amount depends on the number of
copies std::sort makes internally, there is no way around this. I guess
for a typical implementation this would be around log(n) copies, unless
the implementation takes special care to dispose of temporaries. You
cannot count on any of this though.

Hopefully, by the end of this thread, we'll have something simple
enough
to implement...:)
OK, how can we do it? Obviously every iterator must know the size of the
object it is supposed to handle. The main problems we are facing:

- We cannot create real objects (directly), the size of the real object is
only known at runtime.
- std::sort is allowed to (and most frequently does) make extra copies of
objects.

Asusmptions:
- There are functions to copy these objects and to compare these objects.
- The objects to be sorted don't need special construction or destruction
and can be copied by the previously mentioned routine. They can be
constructed by copying into a new memory area.
- std::sort never uses pointers to objects, only iterators. I think the
standard mandates this, but couldn't find it.

Would be interesting to know, what std::sort() is required to have
from
the standard. I think you are perfectly right that sort doesnt use
pointers
to the objects, only iterators and derefernces(*) the iterators...
We create a proxy class and let the iterator return proxy objects. These
proxy objects store a pointer to some master descriptor that stores the
location and size of the array and the size of the individual objects. We
could store this in the proxy object itself, but that seems wasteful.
Also, the proxy object stores a void* to store the data for the real
object.

This proxy class is what our iterators value_type is.

Now the proxy object implements some intelligent constructors. I doubt we
need a default constructor, the value type needs to be assignable only
(this follows from the iterator requirements). So we create a constructor
that is used only inside the iterator, it sets the data pointer to point
inside the array at the correct location. If the copy constructor is
called (which would be by std::sort) we dynamically allocate memory to
hold the object and use the copying routine to fill it.

The assignment operator must use the copying routine if the destination is
inside the array. If not, it can either use copy-construct-and-swap or the
copying routine.

The destructor would check if the pointer points inside the array and if
not deletes the object.

Obviously, we need to define an operator< for the proxy objects, this
calls the comparison function above. If we template the proxy objects, we
can use a functor for the comparison, potentially inlining the comaparison
(this would be the only reason to use std::sort anyhow, so it makes
sense).

Looking at the standard, this would fullfill all requirements for
std::sort I could find. Somehow I think I did overlook something here, so
scrutiny by others is appreciated. I also left obvious details out,
obviously :)

A note on exceptions. This would give the basic guarentee, if the copying
and comparison routines give at least the basic guarentee.

A note on memory usage. Assuming that std::sort holds log(n) copies of
objects internally, this aproach uses less memory than the
sort-array-of-pointers aproach if log(n)*m < n*sizeof(void*). So this
greatly depends on these variables, but quickly holds for large n and gets
less interesting for large m. Assume sizeof(void*)==4, we get m <
4n/log(n). For n=100, m should be less than 60. For n=1000, m should be
less than 400. These are gross approximations but should give you a feel.

Usually this is certainly the case for my data.
I hope anyone sees that you must have good reasons to do this. It is ugly,
errorprone, prone to assumptions your implementation of std::sort makes
(in this implementation &*it does not return a pointer to the real object,
any pointer arithmetic will fail), difficult to maintain. There is no real
reason why you should not call qsort in the first place, except as an
accademic exercise.

First problem to qsort: It's slower than using sort()
Second problem to qsort: Later I might want to use other STL
algorithms on this data!
like for_each()? find()? ...?

Thanks a lot for your comments,
--Elijah
 
M

Martijn Lievaart

On Sun, 04 Jan 2004 04:15:33 +0000, Elijah Bailey wrote:

[ csc++ removed, not relevant for what I want to say ]
First problem to qsort: It's slower than using sort()

Qsort is slower only if std::sort can inline the comparison. In this case
this seems to me that with all the other overhead qsort will actually be
faster (let alone easier).

M4
 
E

Elijah Bailey

Martijn Lievaart said:
On Sun, 04 Jan 2004 04:15:33 +0000, Elijah Bailey wrote:

[ csc++ removed, not relevant for what I want to say ]
First problem to qsort: It's slower than using sort()

Qsort is slower only if std::sort can inline the comparison. In this case
this seems to me that with all the other overhead qsort will actually be
faster (let alone easier).

M4

Yes, IF there is no 'simple' way to make std::sort() work.

And IF that is the case, then shouldnt there be another version
of std::sort() to do what I want to do. Doesnt it seem natural to
have an interface in c++ that does the same thing that the c
function qsort CAN do, without actually going the "c" way??

Thanks,
--Elijah
 
D

david m-

Sounds like some sort of intermediate class is required in order to spoof a
RandomAccessIterator as required by the STL sort() function.



i.e. you can create these to adapt the dynamic size of the elements to be
sorted to the static requirements of an iterator.



Though whether this is technically possible I'm not sure.



david m.



Elijah Bailey said:
(e-mail address removed) (Martijn Lievaart) wrote in message
On Fri, 02 Jan 2004 03:13:26 -0800, Elijah Bailey wrote:

[ Please don't top post, thank you. M4 ]

[ Crossposted to csc++, is this standard complient? ]

[ Short description of the problem ]

We have an array of n*m bytes. It holds n objects of size m. Both are only
known at runtime. For some strange reason we want to sort this using
std::sort. We know there are other solutions, the question is, can it be
done using std::sort.

Thanks Martijn. This is a much better description than i came up with.
Constraints: Not clear, but memory seems to be an issue, creating an array
of pointers and sort that is out.

Constraints: O(n) space is not allowed. So you can not create a
pointer
to each object and sort them. O(log(n)) space is perfectly ok.
O(log(n)*m) is also allowed if that simplifies things.

My data has n >> m, i.e. n is much greater than m.
I also think it can be done. However, it requires so much trickery that it
is not interesting to do so. Any of the other options quickly become very
appealing. My approach uses some dynamic memory, but probably less than an
array of pointers would take. The actual amount depends on the number of
copies std::sort makes internally, there is no way around this. I guess
for a typical implementation this would be around log(n) copies, unless
the implementation takes special care to dispose of temporaries. You
cannot count on any of this though.

Hopefully, by the end of this thread, we'll have something simple
enough
to implement...:)
OK, how can we do it? Obviously every iterator must know the size of the
object it is supposed to handle. The main problems we are facing:

- We cannot create real objects (directly), the size of the real object is
only known at runtime.
- std::sort is allowed to (and most frequently does) make extra copies of
objects.

Asusmptions:
- There are functions to copy these objects and to compare these objects.
- The objects to be sorted don't need special construction or destruction
and can be copied by the previously mentioned routine. They can be
constructed by copying into a new memory area.
- std::sort never uses pointers to objects, only iterators. I think the
standard mandates this, but couldn't find it.

Would be interesting to know, what std::sort() is required to have
from
the standard. I think you are perfectly right that sort doesnt use
pointers
to the objects, only iterators and derefernces(*) the iterators...
We create a proxy class and let the iterator return proxy objects. These
proxy objects store a pointer to some master descriptor that stores the
location and size of the array and the size of the individual objects. We
could store this in the proxy object itself, but that seems wasteful.
Also, the proxy object stores a void* to store the data for the real
object.

This proxy class is what our iterators value_type is.

Now the proxy object implements some intelligent constructors. I doubt we
need a default constructor, the value type needs to be assignable only
(this follows from the iterator requirements). So we create a constructor
that is used only inside the iterator, it sets the data pointer to point
inside the array at the correct location. If the copy constructor is
called (which would be by std::sort) we dynamically allocate memory to
hold the object and use the copying routine to fill it.

The assignment operator must use the copying routine if the destination is
inside the array. If not, it can either use copy-construct-and-swap or the
copying routine.

The destructor would check if the pointer points inside the array and if
not deletes the object.

Obviously, we need to define an operator< for the proxy objects, this
calls the comparison function above. If we template the proxy objects, we
can use a functor for the comparison, potentially inlining the comaparison
(this would be the only reason to use std::sort anyhow, so it makes
sense).

Looking at the standard, this would fullfill all requirements for
std::sort I could find. Somehow I think I did overlook something here, so
scrutiny by others is appreciated. I also left obvious details out,
obviously :)

A note on exceptions. This would give the basic guarentee, if the copying
and comparison routines give at least the basic guarentee.

A note on memory usage. Assuming that std::sort holds log(n) copies of
objects internally, this aproach uses less memory than the
sort-array-of-pointers aproach if log(n)*m < n*sizeof(void*). So this
greatly depends on these variables, but quickly holds for large n and gets
less interesting for large m. Assume sizeof(void*)==4, we get m <
4n/log(n). For n=100, m should be less than 60. For n=1000, m should be
less than 400. These are gross approximations but should give you a feel.

Usually this is certainly the case for my data.
I hope anyone sees that you must have good reasons to do this. It is ugly,
errorprone, prone to assumptions your implementation of std::sort makes
(in this implementation &*it does not return a pointer to the real object,
any pointer arithmetic will fail), difficult to maintain. There is no real
reason why you should not call qsort in the first place, except as an
accademic exercise.

First problem to qsort: It's slower than using sort()
Second problem to qsort: Later I might want to use other STL
algorithms on this data!
like for_each()? find()? ...?

Thanks a lot for your comments,
--Elijah
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,276
Latest member
Sawatmakal

Latest Threads

Top