Comparing two vectors of bools

Discussion in 'C++' started by fgh.vbn.rty@gmail.com, Aug 2, 2008.

  1. Guest

    I know that std::vector<bool> is specialized to store individual bools
    as single bits. I have a question about the comparison operation on a
    pair of vector<bool>s.

    Is the comparison done bit by bit? That would be slow. In theory it's
    possible to just do a bit-wise XOR on all bits and then OR it all
    together and compare the single bit right? That would make it O(1)
    instead of O(n). What's bad about this method?

    Does the size of the vector matter here? For instance if the vectors
    are <= 32bits they can fit into two registers and it's easy to do the
    XOR. But if they are more than that then it's more complicated

    What about comparison on std::bitset? Does compare work the same way
    for it?

    And what about boost::dynamic_bitset? Any difference for that?

    Thanks.
     
    , Aug 2, 2008
    #1
    1. Advertising

  2. Rafał Guest

    wrote:

    > I know that std::vector<bool> is specialized to store individual bools
    > as single bits.


    Actually this is considered a bit of "hack" and AFAIR it will be removed
    soon (in C++0x?).
     
    Rafał, Aug 2, 2008
    #2
    1. Advertising

  3. Guest

    On Aug 1, 10:00 pm, Sam <> wrote:
    > writes:
    > > I know that std::vector<bool> is specialized to store individual bools
    > > as single bits. I have a question about the comparison operation on a
    > > pair of vector<bool>s.

    >
    > > Is the comparison done bit by bit? That would be slow. In theory it's
    > > possible to just do a bit-wise XOR on all bits and then OR it all
    > > together and compare the single bit right? That would make it O(1)
    > > instead of O(n). What's bad about this method?

    >
    > No, it wouldn't be O(1). It's still O(n). This would be faster, of course,
    > but the complexity is still linear, since the number of XOR operations is
    > linearly proportional to the vector's size.
    >


    Hmm, if we have a 32-bit ALU then we could load both vectors into a
    pair of registers and use the comparator to get the result in "one
    cycle" right? In other words, we are doing all XORs in parallel
    followed by one OR. Please correct me if I'm wrong here.

    Of course, I have no idea if the compiler would actually do this.

    Thanks.
     
    , Aug 2, 2008
    #3
  4. Jerry Coffin Guest

    In article <g70loa$4tb$>, lid
    says...
    > wrote:
    >
    > > I know that std::vector<bool> is specialized to store individual bools
    > > as single bits.

    >
    > Actually this is considered a bit of "hack" and AFAIR it will be removed
    > soon (in C++0x?).


    I doubt it. The current draft (N2691) still includes it, with minor
    changes from C++ 03. It now explicitly recommends: "a space optimized
    representation of bits...", whereas C++ 03 merely described the class as
    being provided: "To optimize space allocation..."

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Aug 2, 2008
    #4
  5. James Kanze Guest

    On Aug 2, 6:51 am, wrote:
    > On Aug 1, 10:00 pm, Sam <> wrote:


    > > writes:
    > > > I know that std::vector<bool> is specialized to store
    > > > individual bools as single bits. I have a question about
    > > > the comparison operation on a pair of vector<bool>s.


    > > > Is the comparison done bit by bit? That would be slow. In
    > > > theory it's possible to just do a bit-wise XOR on all bits
    > > > and then OR it all together and compare the single bit
    > > > right? That would make it O(1) instead of O(n). What's bad
    > > > about this method?


    The library is free to implement the function however it wants.
    From a QoI point of view, of course, it would be a very poor
    library that does the comparison bit by bit.

    > > No, it wouldn't be O(1). It's still O(n). This would be
    > > faster, of course, but the complexity is still linear, since
    > > the number of XOR operations is linearly proportional to the
    > > vector's size.


    > Hmm, if we have a 32-bit ALU then we could load both vectors
    > into a pair of registers and use the comparator to get the
    > result in "one cycle" right? In other words, we are doing all
    > XORs in parallel followed by one OR. Please correct me if I'm
    > wrong here.


    I'm not sure I understand. How can the code load both vectors
    into a pair of registers, if, say, the vectors each contain
    several thousand bits.

    What I would expect is that the bits are actually kept in an
    array of unsigned, and that the library compare this. Something
    like:

    return lhs.size() == rhs.size()
    && std::equal( lhs.buffer_start,
    lhs.buffer_end,
    rhs.buffer_start ) ;

    (Some additional logic is necessary to ensure that a possible
    partial word at the end doesn't compare unequal, even though all
    of the significant bits are equal.)

    The algorithm remains O(N), but:

    -- is O(1) is the lengths are different, and
    -- only actually loops N/32 (or whatever) times, rather than N
    (but this doesn't change the complexity, of course).

    > Of course, I have no idea if the compiler would actually do
    > this.


    Typicallyl, the compiler will do whatever the code in the
    library implementation tells it to do. (It doesn't have to;
    it's allowed to build in knowledge of std::vector<bool>, and use
    that knowledge to generate special code. But I don't know of
    any compiler which does this.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Aug 2, 2008
    #5
  6. wrote:
    > Hmm, if we have a 32-bit ALU then we could load both vectors into a
    > pair of registers and use the comparator to get the result in "one
    > cycle" right? In other words, we are doing all XORs in parallel
    > followed by one OR. Please correct me if I'm wrong here.


    If the std::vector<bool> has a size of 1 million, how do you suppose
    it can load all at once into the ALU?

    Btw, what do you need the xors and ors for? If you are comparing for
    equality or inequality, just compare the array elements (eg. if they are
    for example size_t elements) for equality of inequality. If any bit
    differs, the results will be unequal.
     
    Juha Nieminen, Aug 2, 2008
    #6
  7. Rafał wrote:
    > wrote:
    >
    >> I know that std::vector<bool> is specialized to store individual bools
    >> as single bits.

    >
    > Actually this is considered a bit of "hack" and AFAIR it will be removed
    > soon (in C++0x?).


    Exactly how is it a "hack" and why should it be removed? Has a better
    alternative been suggested?

    I certainly would like to have some kind of bool data container where
    each bool indeed takes only one bit, to save memory.

    (OTOH, I assume the dynamic bitvector in boost is more of the kind of
    data container people like rather than std::vector<bool>? I would be
    fine with that too, but will such a data container be included in the
    next standard?)
     
    Juha Nieminen, Aug 2, 2008
    #7
  8. Bo Persson Guest

    Juha Nieminen wrote:
    > Rafal wrote:
    >> wrote:
    >>
    >>> I know that std::vector<bool> is specialized to store individual
    >>> bools as single bits.

    >>
    >> Actually this is considered a bit of "hack" and AFAIR it will be
    >> removed soon (in C++0x?).

    >
    > Exactly how is it a "hack" and why should it be removed? Has a
    > better alternative been suggested?


    It's strange because it is the only container that does not store
    elements of the contained type. It also uses a proxy class for
    vector<bool>::reference, which is not allowed by the standard for any
    other container.

    The real hack is that the standard assumes that everyone benefits from
    a space optimization for vector<bool> and speed optimizations for
    vector<everything_else>. Why?


    It hasn't been removed so far, and as its section has actually been
    edited in the latest draft for the next standard, it is unlikely to go
    away anytime soon. We have also seen that other containers, like
    std::string and std::array, are allowed to somewhat deviate from the
    general container requirements, so why not?


    >
    > I certainly would like to have some kind of bool data container
    > where each bool indeed takes only one bit, to save memory.


    Always, or sometimes? :)

    >
    > (OTOH, I assume the dynamic bitvector in boost is more of the kind
    > of data container people like rather than std::vector<bool>? I
    > would be fine with that too, but will such a data container be
    > included in the next standard?)


    Doesn't seem so (yet).


    Bo Persson
     
    Bo Persson, Aug 2, 2008
    #8
  9. Guest

    On Aug 2, 3:10 am, James Kanze <> wrote:
    > On Aug 2, 6:51 am, wrote:
    >
    >
    > > > No, it wouldn't be O(1). It's still O(n). This would be
    > > > faster, of course, but the complexity is still linear, since
    > > > the number of XOR operations is linearly proportional to the
    > > > vector's size.

    > > Hmm, if we have a 32-bit ALU then we could load both vectors
    > > into a pair of registers and use the comparator to get the
    > > result in "one cycle" right? In other words, we are doing all
    > > XORs in parallel followed by one OR. Please correct me if I'm
    > > wrong here.

    >
    > I'm not sure I understand.  How can the code load both vectors
    > into a pair of registers, if, say, the vectors each contain
    > several thousand bits.


    I was only referring to the case when size <= 32. If it is more than
    that then clearly, the N/32 factor that you have mentioned will come
    in.

    Thanks.
     
    , Aug 2, 2008
    #9
  10. Rafał Guest

    Juha Nieminen wrote:

    > Exactly how is it a "hack" and why should it be removed?


    This container do not act like a container, it has other semantics.
    For example, taking address of N-th element will not work.

    #include <vector>

    template <typename T> void Test() {
    std::vector<T> V(3);
    T& third = V.at(2);
    }

    int main() {
    Test<int>();
    Test<bool>();
    }

    Works for int,
    but for bool:
    x.cpp:5: error: invalid initialization of non-const reference of
    type ‘bool&’ from a temporary of type ‘std::_Bit_reference

    > I certainly would like to have some kind of bool data container where
    > each bool indeed takes only one bit, to save memory.


    Yes, but this should be separate object - because it does not behave like a
    vector<>
     
    Rafał, Aug 2, 2008
    #10
  11. James Kanze Guest

    On Aug 2, 12:13 pm, "Bo Persson" <> wrote:
    > Juha Nieminen wrote:
    > > Rafal wrote:
    > >> wrote:


    > >>> I know that std::vector<bool> is specialized to store individual
    > >>> bools as single bits.


    > >> Actually this is considered a bit of "hack" and AFAIR it will be
    > >> removed soon (in C++0x?).


    > > Exactly how is it a "hack" and why should it be removed? Has a
    > > better alternative been suggested?


    > It's strange because it is the only container that does not
    > store elements of the contained type.


    But it does. At least as far as client code can tell. And
    bitset is in a similar situation. It's strange because it's
    called std::vector<bool>, but it doesn't conform to the
    documented contract of std::vector.

    > It also uses a proxy class for vector<bool>::reference, which
    > is not allowed by the standard for any other container.


    Again: bitset.

    The problem isn't that it's there. The problem is that it is
    called std::vector< bool >, and not something else. (Or
    alternatively, the problem is that the contract for std::vector
    is too constraining. You can argue it both ways. But if you
    take this second approach, you're going to have to change some
    of the basic principles of the STL.)

    > The real hack is that the standard assumes that everyone
    > benefits from a space optimization for vector<bool> and speed
    > optimizations for vector<everything_else>. Why?


    In many cases, the space optimization may also be a speed
    optimization (because of reduced cache misses). The difference
    is that this optimization only applies to vector<bool>, and that
    it isn't transparent, and that there's no way of making it
    transparent, given the design of the STL and the integration of
    std::vector into the STL.

    > It hasn't been removed so far, and as its section has actually
    > been edited in the latest draft for the next standard, it is
    > unlikely to go away anytime soon. We have also seen that other
    > containers, like std::string and std::array, are allowed to
    > somewhat deviate from the general container requirements, so
    > why not?


    Because the name std::vector gives, or should give, certain
    guarantees. std::bitset has a different name, and you don't
    expect it to define the same contract as std::vector, so there's
    no problem with it, and there'd be no problem with an
    std::dynamic_bitset, either.

    > > I certainly would like to have some kind of bool data
    > > container where each bool indeed takes only one bit, to save
    > > memory.


    > Always, or sometimes? :)


    Well, if it's not in a container, a normal bool is fine.

    My own experience is that I've never needed a vector of bool
    (with the interface of vector, or something similar), but if I
    did, I'd want it to behave like vector. My own experience is
    that I often need something like bitset (with the interface of
    bitset), but with dynamic length. Except that there is a lot
    more functionality which is useful: isSubsetOf( other ), etc.;
    bitset doesn't seem to be able to make up its mind what it
    really is either. The result is that I still use (and actively
    maintain) my pre-standard implementations, even today.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Aug 2, 2008
    #11
  12. Bo Persson Guest

    James Kanze wrote:
    > On Aug 2, 12:13 pm, "Bo Persson" <> wrote:
    >> Juha Nieminen wrote:
    >>> Rafal wrote:
    >>>> wrote:

    >
    >>>>> I know that std::vector<bool> is specialized to store individual
    >>>>> bools as single bits.

    >
    >>>> Actually this is considered a bit of "hack" and AFAIR it will be
    >>>> removed soon (in C++0x?).

    >
    >>> Exactly how is it a "hack" and why should it be removed? Has a
    >>> better alternative been suggested?

    >
    >> It's strange because it is the only container that does not
    >> store elements of the contained type.

    >
    > But it does. At least as far as client code can tell. And
    > bitset is in a similar situation. It's strange because it's
    > called std::vector<bool>, but it doesn't conform to the
    > documented contract of std::vector.
    >
    >> It also uses a proxy class for vector<bool>::reference, which
    >> is not allowed by the standard for any other container.

    >
    > Again: bitset.


    Yes, I forgot about that one. :)

    It claims to be an associative container, but doesn't have the
    required interface.

    I should have written that the general container requirements (section
    23.1) doesn't allow for this (reference as a proxy). On the other hand
    it is still required in some places...

    >
    > The problem isn't that it's there. The problem is that it is
    > called std::vector< bool >, and not something else. (Or
    > alternatively, the problem is that the contract for std::vector
    > is too constraining. You can argue it both ways. But if you
    > take this second approach, you're going to have to change some
    > of the basic principles of the STL.)


    The contract is cracking up anyway, as the revisions for the next
    standard introduces new containers and new requirements that just
    don't go well together.

    Just like bitset is (supposedly) an associative container with a fixed
    size (and therefore has a non-conforming different interface), C++0x
    introduces a std::array as a fixed size sequence container which
    cannot implement insert and erase, for example. We also have
    basic_string claiming to be a sequence container, but falling short.

    In comparison, vector<bool> seems like a minor problem. .-)


    Bo Persson
     
    Bo Persson, Aug 3, 2008
    #12
  13. Rafał wrote:
    > Juha Nieminen wrote:
    >
    >> Exactly how is it a "hack" and why should it be removed?

    >
    > This container do not act like a container, it has other semantics.
    > For example, taking address of N-th element will not work.


    Ok, that's *one* example. You write as if there were numerous. Any
    other examples of std::vector<bool> not behaving like other types of
    std::vector?

    (One could argue that not being able to get a pointer to a bool value
    inside the vector is not such a big of a loss...)
     
    Juha Nieminen, Aug 3, 2008
    #13
  14. On 2008-08-03 12:37, Juha Nieminen wrote:
    > Rafał wrote:
    >> Juha Nieminen wrote:
    >>
    >>> Exactly how is it a "hack" and why should it be removed?

    >>
    >> This container do not act like a container, it has other semantics.
    >> For example, taking address of N-th element will not work.

    >
    > Ok, that's *one* example. You write as if there were numerous. Any
    > other examples of std::vector<bool> not behaving like other types of
    > std::vector?
    >
    > (One could argue that not being able to get a pointer to a bool value
    > inside the vector is not such a big of a loss...)


    The loss is quite small until you try to use vector<bool> just like any
    other vector in one of your nice parametrised libraries and stuff no
    longer works and you are forced to rewrite half the library just to get
    it working with vector<bool>.

    --
    Erik Wikström
     
    Erik Wikström, Aug 3, 2008
    #14
  15. Rafał Guest

    Juha Nieminen wrote:

    > Ok, that's *one* example. You write as if there were numerous. Any
    > other examples of std::vector<bool> not behaving like other types of
    > std::vector?
    >
    > (One could argue that not being able to get a pointer to a bool value
    > inside the vector is not such a big of a loss...)


    Well, vector claims to allow to do T& x = &V.at(n); yet it brakes that
    promise.

    This is not USA government, but an official standardized language, so I
    would expect it to keep such "promises".

    It's unfortunate that some programs do rely on this behavior, and now it
    will not be change therefore.

    IMHO they should right away start with some vector_packed class.
     
    Rafał, Aug 3, 2008
    #15
  16. James Kanze Guest

    On Aug 3, 12:27 pm, "Bo Persson" <> wrote:
    > James Kanze wrote:
    > > On Aug 2, 12:13 pm, "Bo Persson" <> wrote:
    > >> It also uses a proxy class for vector<bool>::reference,
    > >> which is not allowed by the standard for any other
    > >> container.


    > > Again: bitset.


    > Yes, I forgot about that one. :)


    > It claims to be an associative container, but doesn't have the
    > required interface.


    And shouldn't have. Basically, the requirements for containers
    don't really mean anything; it doesn't start becoming serious
    until you consider the requirements for sequence.

    [...]
    > The contract is cracking up anyway, as the revisions for the
    > next standard introduces new containers and new requirements
    > that just don't go well together.


    The problem is that the original specifications aren't really
    well designed. They work (more or less) for the containers that
    were present in the original standard (with the exception of
    bitset), but they aren't nearly flexible enough to cover all, or
    even most, of the useful cases.

    > Just like bitset is (supposedly) an associative container with
    > a fixed size (and therefore has a non-conforming different
    > interface), C++0x introduces a std::array as a fixed size
    > sequence container which cannot implement insert and erase,
    > for example. We also have basic_string claiming to be a
    > sequence container, but falling short.


    basic_string should be fixed so that it is a sequence.
    Otherwise, however, I don't see anything wrong with introducing
    containers which aren't sequences, and it seems obvious (to me,
    anyways) that the basic concepts need working on: ) there should
    be a distinction in the notion of mutable---you need containers
    whose values can be modified, but whose topology is fixed (like
    tr1::array), for example, and there are more than a few problems
    with the iterators: why, for example, should an iterator into a
    non-mutable sequence still require operator* to return a
    reference, rather than a value? Or for that matter, why not
    rework the concepts to allow proxies in general?

    > In comparison, vector<bool> seems like a minor problem. .-)


    The problem isn't that it exists. The problem is that it is an
    std::vector, and not something else.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Aug 3, 2008
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ccs

    An array of bools

    ccs, Jun 13, 2004, in forum: C++
    Replies:
    5
    Views:
    567
    AngleWyrm
    Jun 13, 2004
  2. Mike
    Replies:
    3
    Views:
    639
  3. Jonathon McKitrick

    Python style of accessing bools?

    Jonathon McKitrick, Apr 15, 2004, in forum: Python
    Replies:
    3
    Views:
    372
    Roy Smith
    Apr 15, 2004
  4. utab
    Replies:
    6
    Views:
    333
    Richard Herring
    Apr 7, 2006
  5. Angus
    Replies:
    20
    Views:
    715
    Krice
    Jun 26, 2009
Loading...

Share This Page