vector and bool

D

daniel

1)
is C++ smart enough to automatically use "bits" for bool or will
a bool have the size of a charcter (byte).
2) The index of a vector is it an integer (4 byte) or a "long long"
with 8 bytes or something else?

I ask because I want to use

vector<bool> with a few billion entries exceeding the int32 range for
indexing and I want to use as few memory as possible for the whole
vector<bool>

e.g.
vector<bool> B;
B.resize(2^34);

thanks, daniel
 
I

Ivan Vecerina

daniel said:
1)
is C++ smart enough to automatically use "bits" for bool or will
a bool have the size of a charcter (byte).
The standard C++ library is actually required to specialize the
std::vector for bool, and to provide 'compact' storage.
[NB: for many uses, this is actually seen as an inconvenience]
2) The index of a vector is it an integer (4 byte) or a "long long"
with 8 bytes or something else?
This depends on the implementation, but it may well be limited
to size_t, or a 4-byte unsigned integer on many platforms.
I ask because I want to use

vector<bool> with a few billion entries exceeding the int32 range for
indexing and I want to use as few memory as possible for the whole
vector<bool>
May we ask for what type of application or what kind of data?
In many cases, using some form of compression might be better
that allocating gigabytes of RAM. Does your system have enough
memory?


Regards,
Ivan
 
J

JKop

daniel posted:
1)
is C++ smart enough to automatically use "bits" for bool or will
a bool have the size of a charcter (byte).

I believe that sizeof(bool) must be >= 1, which implies that a "bool" is of
one byte. But that doesn't mean that your particular system might do
something different in the background.

It's to do with the actual CPU of the system, what is the smallest unit of
memory that can be accessed, which is called a "byte", which 9 times out of
10 is equal to "8 bits". Code which would store 8 bools in 1 byte would be
slower and (ironically) more memory consumptive as the code which
manipulates the byte would have to be kept in memory.
2) The index of a vector is it an integer (4 byte) or a "long long"
with 8 bytes or something else?

There is no such intrinsic type "long long".

There is a type "long int", which is guaranteed to be atleast 32-bit on all
systems, but that doesn't guarantee that it'll be 4 bytes long. For
instance:

If "char" were 9-Bit, then although "long" would be atleast 32-Bit, it
wouldn't necessarily be 4 bytes long, as 4 bytes = 36 bits on that system.
I ask because I want to use

vector<bool> with a few billion entries exceeding the int32 range for
indexing and I want to use as few memory as possible for the whole
vector<bool>

e.g.
vector<bool> B;
B.resize(2^34);

thanks, daniel

I'm sure there's people here that've done this before. . .

I'm not one of them!


-JKop
 
R

Rolf Magnus

JKop said:
daniel posted:


I believe that sizeof(bool) must be >= 1, which implies that a "bool" is
of one byte. But that doesn't mean that your particular system might do
something different in the background.

AFAIK, some systems use even 4 bytes for a bool, because they can access it
faster.
It's to do with the actual CPU of the system, what is the smallest unit of
memory that can be accessed, which is called a "byte", which 9 times out
of 10 is equal to "8 bits".

However, a CPU byte might be different from a C++ byte. I have heared there
are signal processors that have a smallest memory unit of 24 bits and still
the C++ implementation gives you a char with 8 bit. And there are 4 bit
CPUS which would need to combine two bytes to meet the minimum requirement
of 8 bits for char. I guess there might be C implelemntations, but probably
no C++ ones on such CPUs, but C has the same requirements in that case.
Code which would store 8 bools in 1 byte would be slower and (ironically)
more memory consumptive as the code which manipulates the byte would have
to be kept in memory.


There is no such intrinsic type "long long".

There is a type "long int", which is guaranteed to be atleast 32-bit on
all systems, but that doesn't guarantee that it'll be 4 bytes long.

That's right.
For instance:

If "char" were 9-Bit, then although "long" would be atleast 32-Bit, it
wouldn't necessarily be 4 bytes long, as 4 bytes = 36 bits on that system.

That example isn't really chosen very well. 3 bytes would only be 27 bits,
so you would still need 4 bytes to meet the requirement of at least 32
bits.
 
A

assaarpa

Code which would store 8 bools in 1 byte would be
slower and (ironically) more memory consumptive as the code which
manipulates the byte would have to be kept in memory.

Who gives a damn if the code takes 10 or 30 bytes, when the guy wants to
store billions of bool's and the choises are 128 MB vs. 1024 MB for storing
one billion flags. And you cannot be so sure about the performance either,
very often smaller data translates to better performance in the real world
applications.
 
I

Ioannis Vranos

daniel said:
1)
is C++ smart enough to automatically use "bits" for bool or will
a bool have the size of a charcter (byte).


In C++ it is

1 <= sizeof(bool) <= sizeof(long)


However especially for vector<bool> the standard says:


"To optimize space allocation, a specialization of vector for bool
elements is provided:

....


reference is a class that simulates the behaviour of references of a
single bit in vector<bool>."




"The C++ Programming Language 3rd Edition" may help here:


"16.3.11 Vector<bool>

The specialization (§13.5) vector<bool> is provided as a compact vector
of bool. A bool variable is addressable, so it takes up at least one
byte. However, it is easy to implement vector<bool> so that each element
takes up only a bit.
The usual vector operations work for vector<bool> and retain their usual
meanings. In particular, subscripting and iteration work as expected.

For example:
void f(vector<bool>& v)
{
// iterate using subscripting
for (int i = 0; i<v.size() ; ++i) cin >> v ;

typedef vector<bool>: :const_ iterator VI;

// iterate using iterators
for (VI p = v.begin() ; p!=v.end() ; ++p) cout<<*p;
}

To achieve this, an implementation must simulate addressing of a single
bit. Since a pointer cannot address a unit of memory smaller than a
byte, vector<bool>: :iterator cannot be a pointer. In particular,
one cannot rely on bool* as an iterator for a vector<bool>:

void f(vector<bool>& v)
{
bool* p = v.begin() ; // error: type mismatch
// ...
}

A technique for addressing a single bit is outlined in §17.5.3.

The library also provides bitset as a set of Boolean values with Boolean
set operations (§17.5.3)."









2) The index of a vector is it an integer (4 byte) or a "long long"
with 8 bytes or something else?


It is vector<T>::size_type which is some unsigned integer type.


Example:


vector<int> someVec(100);

for(vector<int>::size_type i=0; i<someVec.size(); ++i)
// ...








I ask because I want to use

vector<bool> with a few billion entries exceeding the int32 range for
indexing and I want to use as few memory as possible for the whole
vector<bool>

e.g.
vector<bool> B;
B.resize(2^34);



Well, it is not guaranteed that bits will be used, although in most
cases this is what should happen.

You may also want to check bitset which uses bits for sure.
 
I

Ioannis Vranos

Ioannis said:
Well, it is not guaranteed that bits will be used, although in most
cases this is what should happen.

You may also want to check bitset which uses bits for sure.


My mistake! It is *guaranteed* that bits are used:


From the standard:

"23.2.5 Class vector<bool>

To optimize space allocation, a specialization of vector for bool
elements is provided:

....

// bit reference:
class reference {
friend class vector;
reference();
public:
˜reference();
operator bool() const;
reference& operator=(const bool x);
reference& operator=(const reference& x);
void flip(); // flips the bit
};

....

void flip(); // flips all bits

....

reference is a class that simulates the behavior of references of a
single bit in vector<bool>."
 
R

Ron Natalie

JKop said:
I believe that sizeof(bool) must be >= 1, which implies that a "bool" is of
one byte.

Actually it implies bool is at least one byte. Many systems
sizeof(bool) == sizeof(int).
But that doesn't mean that your particular system might do
something different in the background.

Actually, for vector it is REQUIRED to do something different. I think
it's stupid but the standard requires vector<bool> to be specialized to
act differently than vectors of anything else. It should have been a
different class.
It's to do with the actual CPU of the system, what is the smallest unit of
memory that can be accessed, which is called a "byte", which 9 times out of
10 is equal to "8 bits".

Actually it has nothing to do with the actual CPU. C++ could just as
easily have bit types. There are machines where the smallest
addressable unit of storage is NOT a byte. A byte is carved up out of
a larger word.

However, getting back on topic, a byte (and it's related C++ type char)
is the smallest unit of memory C++ will address. This unfornuately is
another C/C++ stupidity. There really shouldn't be a hard relationship
between "character" and "atomic memory unit." Of course, this is 30
some years of hindsight speaking.
 
R

Richard Herring

Ioannis Vranos said:
My mistake! It is *guaranteed* that bits are used:

I don't think so.
From the standard:

"23.2.5 Class vector<bool>

To optimize space allocation, a specialization of vector for bool
elements is provided:

...

// bit reference:
class reference {
friend class vector;
reference();
public:
Ëœreference();
operator bool() const;
reference& operator=(const bool x);
reference& operator=(const reference& x);
void flip(); // flips the bit
};

...

void flip(); // flips all bits

...

reference is a class that simulates the behavior of references of a
single bit in vector<bool>."
That just redefines the semantics of operator[] etc. to return a proxy
instead of an actual bool &, thus removing the requirement for elements
to have distinct addresses. Nothing there says that it _must_ use only
use one bit per element.
 
J

Jeff Flinn

daniel said:
1)
is C++ smart enough to automatically use "bits" for bool or will
a bool have the size of a charcter (byte).
2) The index of a vector is it an integer (4 byte) or a "long long"
with 8 bytes or something else?

I ask because I want to use

vector<bool> with a few billion entries exceeding the int32 range for
indexing and I want to use as few memory as possible for the whole
vector<bool>

Use either std::bitset or boost::dynamic_bitset. Both "pack" the bits for
better memory usage. You'll most likely have to modify these classes to
account for indices > 32 bit.

Jeff F
 
I

Ioannis Vranos

Richard said:
reference is a class that simulates the behavior of references of a
single bit in vector<bool>."

That just redefines the semantics of operator[] etc. to return a proxy
instead of an actual bool &, thus removing the requirement for elements
to have distinct addresses. Nothing there says that it _must_ use only
use one bit per element.



Well since it says that

"reference is a class that simulates the behavior of references of a
single bit in vector<bool>"

I think this implies that vector<bool> should be implemented using bits.
 
R

Richard Herring

Ioannis Vranos said:
Richard said:
reference is a class that simulates the behavior of references of a
single bit in vector<bool>."

That just redefines the semantics of operator[] etc. to return a
proxy instead of an actual bool &, thus removing the requirement for
elements to have distinct addresses. Nothing there says that it _must_
use only use one bit per element.

Well since it says that

"reference is a class that simulates the behavior of references of a
single bit in vector<bool>"

I think this implies that vector<bool> should be implemented using bits.
For a weak enough meaning of "should", I agree that that's the
intention. But I don't think anything in the Standard rules out a naive
implementation that makes no attempt to optimise, and simply implements
vector<bool> in terms of an underlying vector<some_int_type>.
 
I

Ioannis Vranos

Richard said:
For a weak enough meaning of "should", I agree that that's the
intention. But I don't think anything in the Standard rules out a naive
implementation that makes no attempt to optimise, and simply implements
vector<bool> in terms of an underlying vector<some_int_type>.


There is not an explicit prohibition on this, however explicit
prohibitions are rare in the standard.

Also the implementer has to implement this particular reference:


// bit reference:
class reference {
friend class vector;
reference();
public:
˜reference();
operator bool() const;
reference& operator=(const bool x);
reference& operator=(const reference& x);
void flip(); // flips the bit
};


with the note:

reference is a class that simulates the behavior of references of a
single bit in vector<bool>.



and this particular member function:

void flip(); // flips all bits




Also vector<bool> is mentioned as a separate case in the standard, and
thus I think the legislator intended this to be implemented only with
bits. :)
 
N

Nonenix (astalavista.net)

hmm not sure but i think somethink lieke that can be done with structures i
mena this:


struct item {

unsigned int IsTrue:1;
unsigned int watever:1;
unsigned int number:14;

}

some cpu dosen't support that or the compiler doesen't use it because of
speed loss

but if it work is should looks like this in the ram:

0 | 1 | 2.............15 |
IsTrue | waterver | number |


you could use it like an int but IMPORTANT

if you forgot the unsignet the int will always be 0

gnu says this:

...cpp: In function 'int main()':
....cpp:13: warning: comparison is always 0 due to width of bitfield


here is also the name they#re called bitfields

hope this can help ;-)
 
D

daniel

Well I found something:
http://www.sgi.com/tech/stl/bit_vector.html states
----------------
A bit_vector is essentially a vector<bool>: it is a Sequence that has
the same interface as vector. The main difference is that bit_vector
is optimized for space efficiency. A vector always requires at least
one byte per element, but a bit_vector only requires one bit per
element.

Warning: The name bit_vector will be removed in a future release of
the STL. The only reason that bit_vector is a separate class, instead
of a template specialization of vector<bool>, is that this would
require partial specialization of templates. On compilers that support
partial specialization, bit_vector is a specialization of
vector<bool>. The name bit_vector is a typedef. This typedef is not
defined in the C++ standard, and is retained only for backward
compatibility.
----------------
The std::bitset seems not to be useful because I cannot change its
size during runtime (no ->resize()).

So that should answer my first question.

Thanks for the help, daniel
 
I

Ioannis Vranos

Nonenix said:
hmm not sure but i think somethink lieke that can be done with structures i
mena this:


struct item {

unsigned int IsTrue:1;
unsigned int watever:1;
unsigned int number:14;

}

some cpu dosen't support that or the compiler doesen't use it because of
speed loss

but if it work is should looks like this in the ram:

0 | 1 | 2.............15 |
IsTrue | waterver | number |


you could use it like an int but IMPORTANT

if you forgot the unsignet the int will always be 0

Why?


gnu says this:

..cpp: In function 'int main()':
...cpp:13: warning: comparison is always 0 due to width of bitfield


May you provide that code?


Actually it could be even like this:


struct item
{
bool a:1;
bool b:1;
bool c:1;
bool d:1;
};
 
R

Richard Herring

Ioannis Vranos said:
There is not an explicit prohibition on this, however explicit
prohibitions are rare in the standard.

Just so. The standard specifies an interface and complexity (speed)
requirements, not the mechanism used to implement them. But it says
nothing prescriptive about the memory requirements for containers.
Also the implementer has to implement this particular reference:


// bit reference:
class reference {
friend class vector;
reference();
public:
Ëœreference();
operator bool() const;
reference& operator=(const bool x);
reference& operator=(const reference& x);
void flip(); // flips the bit
};
Yes, and the fact that this is what operator[] is mandated to return is
what makes it *possible* for the implementor to provide access to data
packed into something smaller than an addressable unit.
with the note:

reference is a class that simulates the behavior of references of a
single bit in vector<bool>.

Yes; note the word "simulates".
and this particular member function:

void flip(); // flips all bits

Also vector<bool> is mentioned as a separate case in the standard, and
thus I think the legislator intended this to be implemented only with
bits.

I'd say "intended it to be _possible_ to implement with bits".
 
J

Jeff Flinn

daniel said:
Well I found something:
http://www.sgi.com/tech/stl/bit_vector.html states
----------------
A bit_vector is essentially a vector<bool>: it is a Sequence that has
the same interface as vector. The main difference is that bit_vector
is optimized for space efficiency. A vector always requires at least
one byte per element, but a bit_vector only requires one bit per
element.

Warning: The name bit_vector will be removed in a future release of
the STL. The only reason that bit_vector is a separate class, instead
of a template specialization of vector<bool>, is that this would
require partial specialization of templates. On compilers that support
partial specialization, bit_vector is a specialization of
vector<bool>. The name bit_vector is a typedef. This typedef is not
defined in the C++ standard, and is retained only for backward
compatibility.


Which is why I also suggested boost::dynamic_bitset. See
http://www.boost.org/libs/dynamic_bitset/dynamic_bitset.html

Jeff F
 
I

Ioannis Vranos

Richard said:
I'd say "intended it to be _possible_ to implement with bits".


Well in the reference type definition



// bit reference:
class reference {
friend class vector;
reference();
public:
Ëœreference();
operator bool() const;
reference& operator=(const bool x);
reference& operator=(const reference& x);
void flip(); // flips the bit
};



the remark for flip() is "flips the bit". Which bit?
 
R

Richard Herring

Ioannis Vranos said:
Well in the reference type definition


// bit reference:
class reference {
friend class vector;
reference();
public:
Ëœreference();
operator bool() const;
reference& operator=(const bool x);
reference& operator=(const reference& x);
void flip(); // flips the bit
};


the remark for flip() is "flips the bit". Which bit?

The one "simulated" by the reference in the note you quoted earlier:
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top