Accessing std::vector data through pointer to first element.

J

jason.cipriani

As long as I am not inserting/erasing elements (or doing anything else
that may reallocate the memory) in the mean time, is it safe to assume
that the data in a vector is contiguous in memory and starts at
&(thevector[0])?

For example, say I have a vector of structures and I want to set it's
size to 100 then zero all the memory; can (not should) I do the
following:

vector<TheStructure> s;
s.resize(100);
memset(&(s[0]), 0, sizeof(TheStructure) * s.size());

And if I want to pass that info to, say, a function that normally
takes a plain old array of TheStructures, can I do this:

void function (TheStructure *data, unsigned);

function(&(s[0]), s.size());

It has worked in the past but is it something that's guaranteed about
a std::vector?

Thanks,
Jason
 
V

Victor Bazarov

As long as I am not inserting/erasing elements (or doing anything else
that may reallocate the memory) in the mean time, is it safe to assume
that the data in a vector is contiguous in memory and starts at
&(thevector[0])?
Yes.

For example, say I have a vector of structures and I want to set it's
size to 100 then zero all the memory; can (not should) I do the
following:

vector<TheStructure> s;
s.resize(100);
memset(&(s[0]), 0, sizeof(TheStructure) * s.size());

The safety of this depends on what 'TheStructure' is. At the end, it
is *better* to have a prototype struct and use it, like so:

vector<TheStructure> s;
static TheStructure const s_proto = {}; // value-initialised
s.resize(100, s_proto);
And if I want to pass that info to, say, a function that normally
takes a plain old array of TheStructures, can I do this:

void function (TheStructure *data, unsigned);

function(&(s[0]), s.size());

It has worked in the past but is it something that's guaranteed about
a std::vector?

Yes, it is.

V
 
J

jason.cipriani

As long as I am not inserting/erasing elements (or doing anything else
that may reallocate the memory) in the mean time, is it safe to assume
that the data in a vector is contiguous in memory and starts at
&(thevector[0])?
Yes.

For example, say I have a vector of structures and I want to set it's
size to 100 then zero all the memory; can (not should) I do the
following:
vector<TheStructure> s;
s.resize(100);
memset(&(s[0]), 0, sizeof(TheStructure) * s.size());

The safety of this depends on what 'TheStructure' is. At the end, it
is *better* to have a prototype struct and use it, like so:

vector<TheStructure> s;
static TheStructure const s_proto = {}; // value-initialised
s.resize(100, s_proto);


And if I want to pass that info to, say, a function that normally
takes a plain old array of TheStructures, can I do this:
void function (TheStructure *data, unsigned);
function(&(s[0]), s.size());
It has worked in the past but is it something that's guaranteed about
a std::vector?

Yes, it is.

Great, thanks.

Jason
 
J

Jim Langston

As long as I am not inserting/erasing elements (or doing anything else
that may reallocate the memory) in the mean time, is it safe to assume
that the data in a vector is contiguous in memory and starts at
&(thevector[0])?

For example, say I have a vector of structures and I want to set it's
size to 100 then zero all the memory; can (not should) I do the
following:

vector<TheStructure> s;
s.resize(100);
memset(&(s[0]), 0, sizeof(TheStructure) * s.size());

As Victor said, yes. But instead of resize you may want to give the size
when you create the structure.

vector said:
And if I want to pass that info to, say, a function that normally
takes a plain old array of TheStructures, can I do this:

void function (TheStructure *data, unsigned);

function(&(s[0]), s.size());

It has worked in the past but is it something that's guaranteed about
a std::vector?

Yes, it is guaranteed.

Although you should take Victor's advice into account about not using
memset.
 
A

Alf P. Steinbach

* (e-mail address removed):
As long as I am not inserting/erasing elements (or doing anything else
that may reallocate the memory) in the mean time, is it safe to assume
that the data in a vector is contiguous in memory and starts at
&(thevector[0])?
Yes.


For example, say I have a vector of structures and I want to set it's
size to 100 then zero all the memory; can (not should) I do the
following:

vector<TheStructure> s;
s.resize(100);
memset(&(s[0]), 0, sizeof(TheStructure) * s.size());

No, you should absolutely not.

If TheStructure is a POD, then resizing zeroes the new elements.

If TheStructure is not a POD, then you don't want zeroing but default
construction, and that's what resizing does.

In short, the resizing does the right thing for you, although it would in
general be even better to declare the vector with the right size at the start.

The memset call is just idiocy.

And if I want to pass that info to, say, a function that normally
takes a plain old array of TheStructures, can I do this:

void function (TheStructure *data, unsigned);

function(&(s[0]), s.size());

It has worked in the past but is it something that's guaranteed about
a std::vector?

Yes.


Cheers, & hth.,

- Alf
 
J

jason.cipriani

* (e-mail address removed):
For example, say I have avectorof structures and I want to set it's
size to 100 then zero all the memory; can (not should) I do the
following:
vector<TheStructure> s;
s.resize(100);
memset(&(s[0]), 0, sizeof(TheStructure) * s.size());

No, you should absolutely not.

If TheStructure is a POD, then resizing zeroes the new elements.

If TheStructure is not a POD, then you don't want zeroing but default
construction, and that's what resizing does.

In short, the resizing does the right thing for you, although it would in
general be even better to declare thevectorwith the right size at the start.

The memset call is just idiocy.

In my case the structure is a POD, and zeroing it is what I want. Also
the structure is in a C library and not defined by me, so I can't add
a constructor. For memcpy(), the reason I am using it is performance
is important and the compiler I'm using inlines a memcpy() call to a
rep stos instruction on my Intel machine. I'm only using the
std::vector to make memory management more convenient.

Jason
 
J

jason.cipriani

* (e-mail address removed):
For example, say I have avectorof structures and I want to set it's
size to 100 then zero all the memory; can (not should) I do the
following:
vector<TheStructure> s;
s.resize(100);
memset(&(s[0]), 0, sizeof(TheStructure) * s.size());
No, you should absolutely not.
If TheStructure is a POD, then resizing zeroes the new elements.
If TheStructure is not a POD, then you don't want zeroing but default
construction, and that's what resizing does.
In short, the resizing does the right thing for you, although it would in
general be even better to declare thevectorwith the right size at the start.
The memset call is just idiocy.

In my case the structure is a POD, and zeroing it is what I want. Also
the structure is in a C library and not defined by me, so I can't add
a constructor. For memcpy(), the reason I am using it is performance
is important and the compiler I'm using inlines a memcpy() call to a
rep stos instruction on my Intel machine. I'm only using the
std::vectorto make memory management more convenient.

Memset() rather. Also the resize() is done elsewhere, and I need to 0
all of the data repeatedly, even if the size didn't actually change. I
did not know that resize() 0'd the new elements be default, though --
I always assumed it just used the default constructors for objects.
I'll file that away for future reference.
 
V

Victor Bazarov

[..]
In my case the structure is a POD, and zeroing it is what I want.
Also the structure is in a C library and not defined by me, so I
can't add a constructor. For memcpy(), the reason I am using it is
performance is important and the compiler I'm using inlines a
memcpy() call to a rep stos instruction on my Intel machine. I'm
only using the std::vectorto make memory management more convenient.

Memset() rather. Also the resize() is done elsewhere, and I need to 0
all of the data repeatedly, even if the size didn't actually change. I
did not know that resize() 0'd the new elements be default, though --
I always assumed it just used the default constructors for objects.
I'll file that away for future reference.

In such dire circumstances (3rd party struct with no access, POD for
sure, repeated need to zero it out), you should be able to get away
with memset. Most of the caution warnings you get from us are very
relevant in *general cases* when you have the entire source in your
possession, and while the struct is POD now, it might lose its
PODness somehow in the future; the use of 'memset' in such case is
a maintenance nightmare. So don't take our comments personally, do
make a mental note of them and follow the right development practices
and you're going to be alright.

Good luck!

V
 
A

Alf P. Steinbach

* Victor Bazarov:
[..]
In my case the structure is a POD, and zeroing it is what I want.
Also the structure is in a C library and not defined by me, so I
can't add a constructor. For memcpy(), the reason I am using it is
performance is important and the compiler I'm using inlines a
memcpy() call to a rep stos instruction on my Intel machine. I'm
only using the std::vectorto make memory management more convenient.
Memset() rather. Also the resize() is done elsewhere, and I need to 0
all of the data repeatedly, even if the size didn't actually change. I
did not know that resize() 0'd the new elements be default, though --
I always assumed it just used the default constructors for objects.
I'll file that away for future reference.

In such dire circumstances (3rd party struct with no access, POD for
sure, repeated need to zero it out), you should be able to get away
with memset. Most of the caution warnings you get from us are very
relevant in *general cases* when you have the entire source in your
possession, and while the struct is POD now, it might lose its
PODness somehow in the future; the use of 'memset' in such case is
a maintenance nightmare. So don't take our comments personally, do
make a mental note of them and follow the right development practices
and you're going to be alright.

Uhm, well my point was the memset is entirely superfluous.

It adds execution time.

And as you say, it subtracts maintainability.


Cheers,

- Alf
 
V

Victor Bazarov

Alf said:
* Victor Bazarov:
[..]
In my case the structure is a POD, and zeroing it is what I want.
Also the structure is in a C library and not defined by me, so I
can't add a constructor. For memcpy(), the reason I am using it is
performance is important and the compiler I'm using inlines a
memcpy() call to a rep stos instruction on my Intel machine. I'm
only using the std::vectorto make memory management more
convenient.
Memset() rather. Also the resize() is done elsewhere, and I need to
0 all of the data repeatedly, even if the size didn't actually
change. I did not know that resize() 0'd the new elements be
default, though -- I always assumed it just used the default
constructors for objects. I'll file that away for future reference.

In such dire circumstances (3rd party struct with no access, POD for
sure, repeated need to zero it out), you should be able to get away
with memset. Most of the caution warnings you get from us are very
relevant in *general cases* when you have the entire source in your
possession, and while the struct is POD now, it might lose its
PODness somehow in the future; the use of 'memset' in such case is
a maintenance nightmare. So don't take our comments personally, do
make a mental note of them and follow the right development practices
and you're going to be alright.

Uhm, well my point was the memset is entirely superfluous.

It's not superfluous when you have to do it repeatedly. What's the
other way of setting all elements of a vector to some initial state
(like zeroing it out) that would be as fast as memset if you have to
do it many times over?
It adds execution time.

To a single resize, yes.

V
 
J

James Kanze

For example, say I have avectorof structures and I want to set it's
size to 100 then zero all the memory; can (not should) I do the
following:
vector<TheStructure> s;
s.resize(100);
memset(&(s[0]), 0, sizeof(TheStructure) * s.size());
No, you should absolutely not.
If TheStructure is a POD, then resizing zeroes the new
elements.
If TheStructure is not a POD, then you don't want zeroing
but default construction, and that's what resizing does.
In short, the resizing does the right thing for you,
although it would in general be even better to declare
thevectorwith the right size at the start.
The memset call is just idiocy.
[/QUOTE]
In my case the structure is a POD, and zeroing it is what I want.

Memset doesn't zero anything but unsigned char. In practice,
I've never heard of an architecture where it wouldn't also zero
other integral and floating point types (but I'm not sure---what
is the required value of the tag bit on a Unisys MCP?). There's
have definitely been machines where it wouldn't zero pointers,
however.

And of course, the vector<>::resize() has already correctly
zero'ed the struct's.
 
A

Alf P. Steinbach

* Victor Bazarov:
Alf said:
* Victor Bazarov:
(e-mail address removed) wrote:
[..]
In my case the structure is a POD, and zeroing it is what I want.
Also the structure is in a C library and not defined by me, so I
can't add a constructor. For memcpy(), the reason I am using it is
performance is important and the compiler I'm using inlines a
memcpy() call to a rep stos instruction on my Intel machine. I'm
only using the std::vectorto make memory management more
convenient.
Memset() rather. Also the resize() is done elsewhere, and I need to
0 all of the data repeatedly, even if the size didn't actually
change. I did not know that resize() 0'd the new elements be
default, though -- I always assumed it just used the default
constructors for objects. I'll file that away for future reference.
In such dire circumstances (3rd party struct with no access, POD for
sure, repeated need to zero it out), you should be able to get away
with memset. Most of the caution warnings you get from us are very
relevant in *general cases* when you have the entire source in your
possession, and while the struct is POD now, it might lose its
PODness somehow in the future; the use of 'memset' in such case is
a maintenance nightmare. So don't take our comments personally, do
make a mental note of them and follow the right development practices
and you're going to be alright.
Uhm, well my point was the memset is entirely superfluous.

It's not superfluous when you have to do it repeatedly.

Right. But it's superfluous in the original context of this thread, the OP's code

vector<TheStructure> s;
s.resize(100);
memset(&(s[0]), 0, sizeof(TheStructure) * s.size());

What's the
other way of setting all elements of a vector to some initial state
(like zeroing it out) that would be as fast as memset if you have to
do it many times over?

Don't know.

It seems that in practice memset is the fastest way to zero a vector of POD's:


<code>
#include <iostream>
#include <ostream>
#include <vector>
#include <algorithm>
#include <ctime>
#include <memory.h>

struct S
{
char filler[100];
};

typedef std::vector<S> SVec;

inline void clearUsingResize( SVec& v )
{
size_t const oldSize = v.size();
v.clear();
v.resize( oldSize );
}

inline void clearUsingCopy( SVec& v )
{
std::fill( v.begin(), v.end(), S() );
}

inline void clearUsingMemset( SVec& v )
{
memset( &v[0], 0, v.size()*sizeof(S) );
}

template< void(*clear)(SVec&) >
void test( char const testName[] )
{
SVec v( 1000 );
clock_t const startTime = clock();
for( int i = 1; i <= 10000; ++i )
{
clear( v );
}
clock_t const endTime = clock();

std::cout << "Testing " << testName << ":";
std::cout << double(endTime - startTime)/CLOCKS_PER_SEC << " sec.";
std::cout << std::endl;
std::cout << std::endl;
}

#define TEST( f ) test<f>( #f )

int main()
{
TEST( clearUsingResize );
TEST( clearUsingCopy );
TEST( clearUsingMemset );
}
</code>

<output>
V:\> gnuc vc_project.cpp -O2

V:\> a
Testing clearUsingResize:0.937 sec.

Testing clearUsingCopy:0.906 sec.

Testing clearUsingMemset:0.531 sec.


V:\> msvc vc_project.cpp -O2 -o b
vc_project.cpp

V:\> b
Testing clearUsingResize:0.828 sec.

Testing clearUsingCopy:0.75 sec.

Testing clearUsingMemset:0.593 sec.


V:\> _
</output>

Disclaimer: I just took the optimization option for speed from memory, might be
wrong.

However, it's not very big difference from safe way of doing it, so it's a
question whether it's really worth it.

As always in cases of optimization, measure measure measure, and identify where
optimization is really needed, and whether the trade-off is really worth it.


Cheers, & hth.,

- Alf
 
J

jason.cipriani

* (e-mail address removed):
For example, say I have avectorof structures and I want to set it's
size to 100 then zero all the memory; can (not should) I do the
following:
vector<TheStructure> s;
s.resize(100);
memset(&(s[0]), 0, sizeof(TheStructure) * s.size());
No, you should absolutely not.
If TheStructure is a POD, then resizing zeroes the new
elements.
If TheStructure is not a POD, then you don't want zeroing
but default construction, and that's what resizing does.
In short, the resizing does the right thing for you,
although it would in general be even better to declare
thevectorwith the right size at the start.
The memset call is just idiocy.
In my case the structure is a POD, and zeroing it is what I want.

Memset doesn't zero anything but unsigned char. In practice,
I've never heard of an architecture where it wouldn't also zero
other integral and floating point types (but I'm not sure---what
is the required value of the tag bit on a Unisys MCP?).

On a "typical" Intel machine (i.e. the only platform my application is
designed to run on), integral and pointer types don't have any padding
bits or bytes, or special tag values, and so zeroing each byte that
they occupy does set their values to 0:

int a = 23;
int *b = &a;

memset(&a, 0, sizeof(int));
memset(&b, 0, sizeof(int*));
assert(a == 0 && b == NULL);

Also, on the same "typical" Intel machine, zeroing all the bytes of
floats and doubles also sets their values to 0.

float c = 1.0f;
double d = 1.0;

memset(&c, 0, sizeof(c));
memset(&d, 0, sizeof(d));
assert(c == 0.0f && d == 0.0);

It's platform-specific code for a platform-specific application.

Just out of curiosity:
There's
have definitely been machines where it wouldn't zero pointers,
however.

Do you mean machines where this:

struct A {
unsigned char a;
void *b;
};

A a;
memset(&a, 0, sizeof(A));

Zeros the memory used by a.a but somehow knows to skip over a.b...? Or
do you mean that there are machines where:

void *ptr;
memset(&ptr, 0, sizeof(void *));

Does the incorrect thing because setting those bytes to 0 doesn't
necessarily correspond to the pointer having a NULL value (like
platforms with tag bits).

Jason
 
J

James Kanze

* (e-mail address removed):
For example, say I have avectorof structures and I want to set it's
size to 100 then zero all the memory; can (not should) I do the
following:
vector<TheStructure> s;
s.resize(100);
memset(&(s[0]), 0, sizeof(TheStructure) * s.size());
No, you should absolutely not.
If TheStructure is a POD, then resizing zeroes the new
elements.
If TheStructure is not a POD, then you don't want zeroing
but default construction, and that's what resizing does.
In short, the resizing does the right thing for you,
although it would in general be even better to declare
thevectorwith the right size at the start.
The memset call is just idiocy.
In my case the structure is a POD, and zeroing it is what I want.
Memset doesn't zero anything but unsigned char. In practice,
I've never heard of an architecture where it wouldn't also zero
other integral and floating point types (but I'm not sure---what
is the required value of the tag bit on a Unisys MCP?).
On a "typical" Intel machine (i.e. the only platform my
application is designed to run on), integral and pointer types
don't have any padding bits or bytes, or special tag values,
and so zeroing each byte that they occupy does set their
values to 0:

I only know of one machine in production today which uses tag
bits, but I think that they're different for pointers and ints
(so memset will be wrong for one or the other). And in the
past, there have certainly been a number of machines for which
null pointers weren't all bits 0.

Of course, if portability isn't an issue, AND the profiler has
shown that you just don't have the choice...

[...]
Just out of curiosity:
Do you mean machines where this:
struct A {
unsigned char a;
void *b;
};
A a;
memset(&a, 0, sizeof(A));
Zeros the memory used by a.a but somehow knows to skip over
a.b...? Or do you mean that there are machines where:
void *ptr;
memset(&ptr, 0, sizeof(void *));
Does the incorrect thing because setting those bytes to 0
doesn't necessarily correspond to the pointer having a NULL
value (like platforms with tag bits).

Even without tag bits. Null pointers don't necessarily have all
bits 0, and there have been quite a few architectures in the
past where they didn't. I know Honeywell had one, and I think
Prime, and some others as well. Arguably, it would have made
more sense on the early Intels to use 0xFFFF:0xF.

Today, off hand, the only "exotic" architectures I know of are
the Unisys mainframes. And while it wouldn't particularly
surprise me if one or both of them used null pointers with not
all bits 0, I don't actually have access to either of them to
test it. (I do have some documentation which indicates that
there is a tag bit in the 48 bit words of the MCP architecture.
And single tag bits were usually used to indicate whether the
contents are a pointer or data, so if the tag bit must be 0 for
int and float, then it must be non-zero for pointers, or vice
versa.)

I suspect that one of the reasons null pointers are usually all
bits 0 is that the standard requires pointers with static
lifetime to be initialized with a null pointer value, and the
easiest solution for that is to ensure that null poionter values
have the same bit pattern as 0 integer values, and use a memset
before program start up, without concern for the types.
(Another reason is probably that there is a lot of code out
there than naïvely supposes it, and compiler vendors don't want
to break it, even if it is formally broken already.)
 
O

Old Wolf

Memset doesn't zero anything but unsigned char. In practice,

All-bits-zero is guaranteed to be a valid
representation for 0 for any integral type,
is it not?
 
J

jason.cipriani

All-bits-zero is guaranteed to be a valid
representation for 0 for any integral type,
is it not?

I don't think so. I might be wrong, but I can conceive of weird
hypothetical systems where that wouldn't be the case (right?). For
example, a platform that uses different bit layouts for different
sized integers, with non-zero tag bits in various places. On the C++
side of things this is transparent. But then, on such a system, if you
did something like this:

int a;
char *b = (char *)&a;
for (unsigned n = 0; n < sizeof(a); ++ n)
b[n] = 0;

You could end up with the wrong value, possibly even a non-zero value,
in a.

Jason
 
A

Alf P. Steinbach

* Old Wolf:
All-bits-zero is guaranteed to be a valid
representation for 0 for any integral type,
is it not?

No, not in general.

For types other than char the Holy Standard differentiates between "object
representation" (n bytes where n = sizeof(T)) and "value representation" (m bits
where m <= n*CHAR_BIT),m §3.9/4 (and also some relevant stuff in §3.9.1/1).

When n*CHAR_BIT-m > 0 there may be implementation-defined requirements on the
values of those bits. Although I doubt that it's possible to find any C++
implementation where n*CHAR_BIT-m > 0, let alone where those bits can't be zero.
The in-practice, though, is different from the formal, and we should always
strive to support ENIAC in our programs, nicht war? <g>


Cheers, & hth.,

- Alf
 
J

James Kanze

All-bits-zero is guaranteed to be a valid representation for 0
for any integral type, is it not?

All value bits 0 is guaranteed to be a valid representation for
0 for any integral type. Integral types other than character
types are allowed to have additional, non-value (padding) bits,
however, and those potentially might have to be non-zero to
avoid a trapping representation.
 
J

James Kanze

* Old Wolf:
No, not in general.
For types other than char the Holy Standard differentiates
between "object representation" (n bytes where n = sizeof(T))
and "value representation" (m bits where m <= n*CHAR_BIT),m
§3.9/4 (and also some relevant stuff in §3.9.1/1).
When n*CHAR_BIT-m > 0 there may be implementation-defined
requirements on the values of those bits. Although I doubt
that it's possible to find any C++ implementation where
n*CHAR_BIT-m > 0,

On a Unisys MCP processor, INT_MAX and UINT_MAX are both equal
to 2^39-1, which of course, supposes 40 value bits for int
(since there must also be a sign bit) and 39 value bits for
unsigned int. sizeof(int) == 6, and CHAR_BIT == 8.

In fact, if I understand the documentation correctly, all of the
unused bits are "must be zero".
let alone where those bits can't be zero.
The in-practice, though, is different from the formal, and we
should always strive to support ENIAC in our programs, nicht
war? <g>

[I'm pretty sure you mean "nicht wahr".]

We should strive to support anything we might someday have to
support. For a general purpose library, I would say that you
should probably strive to support any architecture currently
being sold, and that does means Unisys MCP. For application
software, of course, you're likely using enough other system
specific features that it doesn't matter.
 
A

Alf P. Steinbach

* James Kanze:
On a Unisys MCP processor, INT_MAX and UINT_MAX are both equal
to 2^39-1, which of course, supposes 40 value bits for int
(since there must also be a sign bit) and 39 value bits for
unsigned int. sizeof(int) == 6, and CHAR_BIT == 8.

I think that scheme runs afoul of the standard.

§3.9.1/3 "... the value representation of each corresponding signed/unsigned
type shall be the same".

Where "value representation" does not denote a mapping from bitpatterns to
conceptual values (a code), but is a term defined earlier by §3.9/4, "The /value
representation/ of an object is the set of bits in the object representation
that determines a /value/, which is one discrete element of an
implementation-defined set of values."; i.e., the set of bits that represents
the value shall be the same for each corresponding signed/unsigned type.


Cheers,

- Alf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,276
Latest member
Sawatmakal

Latest Threads

Top