How to sort a bitset vector quickly?

H

halfsearch2

Hi all,
I want to sort a bitset vector ( bit 1 > bit 0 ), while it seems bitset
doesn't have < or > operation. I tried to use a loop to compare each
bit one by one but it's unbearably slow. Any suggestion?
 
J

Jakob Bieling

Hi all,
I want to sort a bitset vector ( bit 1 > bit 0 ), while it seems
bitset
doesn't have < or > operation. I tried to use a loop to compare each
bit one by one but it's unbearably slow. Any suggestion?


You could count the number of zeros and construct a new bitset, with
the number of zeros you counted before and then the rest of ones (exact
number is determined by the size of the original bitset minus the number
of zeros).

hth
 
A

Alf P. Steinbach

* (e-mail address removed):
I want to sort a bitset vector ( bit 1 > bit 0 ), while it seems bitset
doesn't have < or > operation. I tried to use a loop to compare each
bit one by one but it's unbearably slow. Any suggestion?

Assuming this is a std::vector< std::bitset<n> >:

std::sort( v.begin(), v.end(), bitsetLess<n> );

where 'bitsetLess' is e.g.

template< unsigned n >
bool bitsetLess( std::bitset<n> const& left, std::bitset<n> const& right )
{
return left.to_ulong() < right.to_ulong();
}
 
W

Wiseguy

(e-mail address removed) scribbled on the stall wall:
Hi all,
I want to sort a bitset vector ( bit 1 > bit 0 ), while it seems bitset
doesn't have < or > operation. I tried to use a loop to compare each
bit one by one but it's unbearably slow. Any suggestion?

WHY?!

the whole purpose of bitsets is that they are generally position
dependent and lose their meaning when reordered.

rethink your reason for doing this.


--
dual 2.8Ghz Xeon; 2GB RAM; 500GB ATA-133; nVidia powered
Linux 2.6.10; glibc-2.3.5; vendor neutral home-brewed installation

----anything after this line is ANNOYING CRAP that the newsserver adds ----
---directly contact newsfeeds and ISPs that piggy back them to complain----
 
J

Jeff Flinn

Wiseguy said:
(e-mail address removed) scribbled on the stall wall:

WHY?!

the whole purpose of bitsets is that they are generally position
dependent and lose their meaning when reordered.

rethink your reason for doing this.

Perhaps the OP means std::vector< std::bitset<N> >?

Jeff Flinn
 
H

halfsearch2

Hi Alf,
Your code reflects what I really want to do, thanks. :)
The problem is each bitset contains more than 32*10 bits
 
A

Alf P. Steinbach

* (e-mail address removed):
Your code reflects what I really want to do, thanks. :)
The problem is each bitset contains more than 32*10 bits

So?

In the comparision function just loop over the bits till the first
difference.

Alternatively use to_string(), but although that might in practice
be efficient enough it _feels_ so inefficient that no self-respecting
C++ programmer would use it... ;-)
 
H

halfsearch2

Alf said:
In the comparision function just loop over the bits till the first
difference.
This is exactly what I did, it's too slow. The vector contains more
than 10 million bitsets, each bitset contains more than 32*10 bits.
 
A

Alf P. Steinbach

[About sorting a vector of bitsets]

* (e-mail address removed):
This is exactly what I did, it's too slow. The vector contains more
than 10 million bitsets, each bitset contains more than 32*10 bits.

Ah. It's time for the Big Revelation (TM). Tweaking the comparision
function will at best buy you a small constant improvement, whereas
optimizing the _algorithm_ and/or _data structure_ can yield really
impressive results.

I think what you've related so far indicates you have a good handle
on the C++ side of things, but that's probably not where the problem
is. There are a number of techniques that can be applied at the C++
level but I fear that if I mention them they'll spur you to blindly
move ahead on a path that doesn't lead to any real solution. So:

What's it all for, i.e. what's the original problem that this huge
vector of bitsets, and sorting of it, is intended to solve?


Cross-posted to [comp.programming], where this probably belongs! ;-)
 
R

rossum

This is exactly what I did, it's too slow. The vector contains more
than 10 million bitsets, each bitset contains more than 32*10 bits.

Some thoughts:

1 Can you fit all these bitsets into real memory at one time? If not
then you may want to look at doing an in-memory sort of a chunk of the
bitsets that will fit into memory and then merging the chunks to
produce the final result.

2 How about a radix sort? At each stage partition the current
in-memory chunk by looking at a single bit only. The do a
divide-and-conquer on each half of the result, again looking at only
one bit.

After first pass looking at bit[0]:

0xxxxxxx
..
..
0xxxxxxx
1xxxxxxx
..
..
1xxxxxxx

After second pass, looking at bit[1]:

00xxxxxx
..
..
00xxxxxx
01xxxxxx
..
..
01xxxxxx
10xxxxxx
..
..
10xxxxxx
11xxxxxx
..
..
11xxxxxx

and so on.


rossum



The ultimate truth is that there is no ultimate truth
 
I

Ivan Vecerina

This is exactly what I did, it's too slow. The vector contains more
than 10 million bitsets, each bitset contains more than 32*10 bits.


Unfortunately, std::bitset is not intended as a 'big int' library,
and does not support (efficient) numeric computations.
If I were you I would eventually write a custom bitset class,
and optimize the representation (e.g. endianness, separate
storage of exponent, etc) based on the typical distribution
of values you use.
That is, unless one of the existing 'bigint' packages can do
[but those often use heap-allocated storage, which may cause
performance problems in your case].

This said, have you tried a simple sorting based on to_ulong,
just to verify that performance will be at least satisfying
if you find an efficient comparison method?


I hope this helps,
Ivan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top