valarray <vallaray<T> > efficiency

V

Victor Bazarov

Ioannis said:
Victor said:
No, it's not. You conveniently omitted the rest of the paragraph
7, following the "type. 65)". Read it. The result of such
conversion is unspecified. That means no guarantees are given. Where you
get your "feeling the guarantee exists", I cannot imagine.
If you find it elsewhere, do post about it. Until you do, the
guarantee that you speak of does NOT exist, and that is expressed in
the second sentence of the paragraph 7 of [expr.reinterpret.cast].


Doesn't the standard mention that all kinds of automatic storage/heap
arrays are stored in a sequence?

What does this have to do with 'reinterpret_cast'?

V
 
I

Ioannis Vranos

Victor said:
What does this have to do with 'reinterpret_cast'?


It has with converting an int * to int (*)[5] and vice versa. If C-style
cast works for this, I think reinterpret_cast should also work, along
with static_cast<int (*)[5]> (static_cast<void *> (&i)); where i is an
int and a member of a sequence.
 
V

Victor Bazarov

Ioannis said:
Victor said:
What does this have to do with 'reinterpret_cast'?


It has with converting an int * to int (*)[5] and vice versa. If
C-style cast works for this,

What do you mean by "works"? Where in the Standard does it say
that a C-style cast would work?
I think reinterpret_cast should also
work, along with static_cast<int (*)[5]> (static_cast<void *> (&i));
where i is an int and a member of a sequence.

Again, should also work for what? Heating your house by lighting
up grandpa's farts works, but you have to feed him too much beans
and open flame is dangerous indoors. Is that the kind of "works"
you advocate?

The effects are unspecified. That means no guarantees exist in
the document that governs the behaviour of C++ programs. Go ahead
and use the mechanism if it works for you, but please don't say
that it "works", because the unconditional "works" means that it
does so in _any C++ program_. Do you not get that?

V
 
K

kwikius

Again, should also work for what?  Heating your house by lighting
up grandpa's farts works, but you have to feed him too much beans
and open flame is dangerous indoors.

Victor .

Hasnt your long suffering grandpa suffered enough now from your
misguided experiments to save on heating bills?


;_)


regards
Andy Little
 
I

Ioannis Vranos

Victor said:
It has with converting an int * to int (*)[5] and vice versa. If
C-style cast works for this,

What do you mean by "works"? Where in the Standard does it say
that a C-style cast would work?
I think reinterpret_cast should also
work, along with static_cast<int (*)[5]> (static_cast<void *> (&i));
where i is an int and a member of a sequence.

[emptied trash]

The effects are unspecified. That means no guarantees exist in
the document that governs the behaviour of C++ programs. Go ahead
and use the mechanism if it works for you, but please don't say
that it "works", because the unconditional "works" means that it
does so in _any C++ program_. Do you not get that?

Let's take a step at a time. Is the following guaranteed to work (to
output 50 zeros) always?



#include <iostream>


inline void some_func(int *p, const std::size_t SIZE)
{
using namespace std;

for(size_t i=0; i<SIZE; ++i)
cout<< p<< " ";
}


int main()
{
int array[10][5]= {0};

some_func(array[0], sizeof(array)/sizeof(**array));

std::cout<< std::endl;
}
 
J

jkherciueh

Ioannis said:
Victor said:
It has with converting an int * to int (*)[5] and vice versa. If
C-style cast works for this,

What do you mean by "works"? Where in the Standard does it say
that a C-style cast would work?
I think reinterpret_cast should also
work, along with static_cast<int (*)[5]> (static_cast<void *> (&i));
where i is an int and a member of a sequence.

[emptied trash]

The effects are unspecified. That means no guarantees exist in
the document that governs the behaviour of C++ programs. Go ahead
and use the mechanism if it works for you, but please don't say
that it "works", because the unconditional "works" means that it
does so in _any C++ program_. Do you not get that?

Let's take a step at a time. Is the following guaranteed to work (to
output 50 zeros) always?



#include <iostream>


inline void some_func(int *p, const std::size_t SIZE)
{
using namespace std;

for(size_t i=0; i<SIZE; ++i)
cout<< p<< " ";
}


int main()
{
int array[10][5]= {0};

some_func(array[0], sizeof(array)/sizeof(**array));

std::cout<< std::endl;
}


I think, the code has undefined behavior. See the following thread for a
similar case:

groups.google.com/group/comp.lang.c++/browse_frm/thread/9c501bae821bd406


Best

Kai-Uwe Bux
 
I

Ioannis Vranos

Let's take a step at a time. Is the following guaranteed to work (to
output 50 zeros) always?



#include <iostream>


inline void some_func(int *p, const std::size_t SIZE)
{
using namespace std;

for(size_t i=0; i<SIZE; ++i)
cout<< p<< " ";
}


int main()
{
int array[10][5]= {0};

some_func(array[0], sizeof(array)/sizeof(**array));

std::cout<< std::endl;
}


I think, the code has undefined behavior. See the following thread for a
similar case:

groups.google.com/group/comp.lang.c++/browse_frm/thread/9c501bae821bd406



AFAIK it has not undefined behavior. In all kinds of array, its members
are in a sequence. Also all POD objects can be considered as sequences
of chars or unsigned chars (bytes), and we can read them as such.
 
L

Lionel B

Hi, in TC++PL 3 on pages 674-675 it is mentioned:


"Maybe your first idea for a two-dimensional vector was something like
this:

class Matrix {
valarray< valarray<double> >v;
public:
// ...
};

This would also work (22.9[10]). However, it is not easy to match
efficiency and compatibility required by high performance computations
without dropping to the lower and more conventional level represented by
valarray plus slices".


However since 1998 much time has passed, and I wonder if the current
compiler implementations allow valarray<valarray<T> > to be equally
efficient (or more) than using a valarray with slices/slice_arrays.

I have been following this thread with some interest, as I actually use
valarray quite extensively and have used it to design (2 dim) matrix
classes.

The logic behind my using valarray is roughly as follows: my matrix
classes frequently interface to BLAS and LAPACK libraries which, being
Fortran, expect contiguous (C array-style) storage - this rules out the
"array-of-arrays" approach. So I could still use, say std::valarray,
std::vector or simply allocate basic arrays via new or malloc. I have
used all of those without problems (and not, to be honest, with much by
way of discernible performance difference - but see below) but have I
generally plumped for std::valarray on the grounds that - as I understand
it - it is supposed to guarantee non-aliasing of its (internal) arrays
and that this potentially allows compilers to optimise more efficiently.

Whether any modern compilers actually *do* take advantage of this
potential is, as discussed in this thread, a moot point. My suspicion is
that compilers which can vectorise array operations (via some hardware
facility) are starting to do so; in particular, I *think* I can point to
noticeable performance improvements for std::valarray (over std::vector,
etc.) with the Intel compiler ICC on modern Intel architectures and, to a
lesser extent, with later versions of GCC on modern Intel and AMD
architectures (I don't work on Windows platforms, so I can't speak to
Microsoft compilers).

In any case, I don't see any *harm* in preferring std::valarray over
std::vector or simply array allocation (apart from its flaky syntax -
those reversed ctor arguments catch me every time :() so if there is some
chance that compiler writers may be starting to implement the potential
optimisations then it seems reasonable to go with it.

BTW, I take the point someone raised about the dominant performance
overhead of temporary copying in general matrix manipulation, so should
point out that this is not currently an issue for me as I tend to manage
temporary copies (tediously!) "by hand".

I'd be interested in comment,

Regards,
 
J

James Kanze

(e-mail address removed) wrote:
Let's take a step at a time. Is the following guaranteed to
work (to output 50 zeros) always?
#include <iostream>
inline void some_func(int *p, const std::size_t SIZE)
{
using namespace std;
for(size_t i=0; i<SIZE; ++i)
cout<< p<< " ";
}
int main()
{
int array[10][5]= {0};
some_func(array[0], sizeof(array)/sizeof(**array));
std::cout<< std::endl;
}

I think, the code has undefined behavior. See the following
thread for a similar case:
groups.google.com/group/comp.lang.c++/browse_frm/thread/9c501bae821bd406

AFAIK it has not undefined behavior.

According to both the C and the C++ standard, it is undefined
behavior.
In all kinds of array, its members are in a sequence. Also all
POD objects can be considered as sequences of chars or
unsigned chars (bytes), and we can read them as such.

Reading an object as a sequence of bytes is a special case, and
I'm not sure how it applies here. In the above code, however,
array[0] is an array of 5 ints. The conversion to int* results
in a pointer to the first element in an array of 5 ints. The C
standard was very carefully worded to allow an implementation to
check this, and there have been implementations (and maybe still
are) which actually checked this. (Checking is not widespread,
because it requires fat pointers---the pointer must include not
only the address, but the legal bounds---, which have a very
definite negative impact on performance.)
 
I

Ioannis Vranos

Lionel said:
I have been following this thread with some interest, as I actually use
valarray quite extensively and have used it to design (2 dim) matrix
classes.

The logic behind my using valarray is roughly as follows: my matrix
classes frequently interface to BLAS and LAPACK libraries which, being
Fortran, expect contiguous (C array-style) storage - this rules out the
"array-of-arrays" approach. So I could still use, say std::valarray,
std::vector or simply allocate basic arrays via new or malloc. I have
used all of those without problems (and not, to be honest, with much by
way of discernible performance difference - but see below) but have I
generally plumped for std::valarray on the grounds that - as I understand
it - it is supposed to guarantee non-aliasing of its (internal) arrays
and that this potentially allows compilers to optimise more efficiently.

Whether any modern compilers actually *do* take advantage of this
potential is, as discussed in this thread, a moot point. My suspicion is
that compilers which can vectorise array operations (via some hardware
facility) are starting to do so; in particular, I *think* I can point to
noticeable performance improvements for std::valarray (over std::vector,
etc.) with the Intel compiler ICC on modern Intel architectures and, to a
lesser extent, with later versions of GCC on modern Intel and AMD
architectures (I don't work on Windows platforms, so I can't speak to
Microsoft compilers).

In any case, I don't see any *harm* in preferring std::valarray over
std::vector or simply array allocation (apart from its flaky syntax -
those reversed ctor arguments catch me every time :() so if there is some
chance that compiler writers may be starting to implement the potential
optimisations then it seems reasonable to go with it.

BTW, I take the point someone raised about the dominant performance
overhead of temporary copying in general matrix manipulation, so should
point out that this is not currently an issue for me as I tend to manage
temporary copies (tediously!) "by hand".

I'd be interested in comment,


I am currently reading Chapter 22 of TC++PL 3,v which includes
valarrays, and from what I have read so far, valarrays are intended to
be heavily optimised (even by using parallel operations on multi-cpu
systems) so there are fewer assumptions we can make in comparison to
other containers, for example "valarrays are assumed to be alias free,
and the introduction of auxiliary types and ==>the elimination of
temporaries is allowed<== as long as the basic semantics are maintained".

So it is the only container that I am not sure we can use pointers and
pointer arithmetic to access and manipulate its data, in addition to
accessing them via the subscript operator[].

Also "22.4.7 Temporaries, Copying and Loops" of TC++PL 3 may be useful
to you, since it describes a method of deferring the various
sub-calculations, until all data for a given calculation are available
to be used in a final_calculation_function, while inlining the
"sub-calculations".
 
J

jkherciueh

Ioannis said:
Let's take a step at a time. Is the following guaranteed to work (to
output 50 zeros) always?



#include <iostream>


inline void some_func(int *p, const std::size_t SIZE)
{
using namespace std;

for(size_t i=0; i<SIZE; ++i)
cout<< p<< " ";
}


int main()
{
int array[10][5]= {0};

some_func(array[0], sizeof(array)/sizeof(**array));

std::cout<< std::endl;
}


I think, the code has undefined behavior. See the following thread for a
similar case:

groups.google.com/group/comp.lang.c++/browse_frm/thread/9c501bae821bd406



AFAIK it has not undefined behavior. In all kinds of array, its members
are in a sequence.


True, but irrelevant. The problem is not with memory layout but with pointer
types and conversions.

C++ allows for bounds-checking pointers. The rules for pointer conversions
are crafted so that an implementation could decorate each pointer with the
bounds of the array from which is is obtained and propagate that
information through conversions. Using pointer arithmetic to access objects
outside the stored bounds could then trigger whatever the implementation
sees fit.
Also all POD objects can be considered as sequences
of chars or unsigned chars (bytes), and we can read them as such.

True, but (a) you are not converting pointer to byte and (b) you are not
converting from a pointer to an int[10][5].


Best

Kai-Uwe Bux
 
E

Erik Wikström

I am currently reading Chapter 22 of TC++PL 3,v which includes
valarrays, and from what I have read so far, valarrays are intended to
be heavily optimised (even by using parallel operations on multi-cpu
systems) so there are fewer assumptions we can make in comparison to
other containers, for example "valarrays are assumed to be alias free,
and the introduction of auxiliary types and ==>the elimination of
temporaries is allowed<== as long as the basic semantics are maintained".

Might be of interest to the discussion:
http://www.oonumerics.org/oon/oonstd/archive/0018.html
 
L

Lionel B

[...]

Might be of interest to the discussion:
http://www.oonumerics.org/oon/oonstd/archive/0018.html

Thanks... perhaps I hadn't realised how "dead" std::valarray really is.

There's also an interesting thread regarding aliasing and the
"restrict" (non-)keyword. Todd Veldhuizen writes:

"Having the NCEG "restrict" keyword would be more useful
than a built-in alias-free array class like valarray<>.
The restrict keyword has apparently been adopted into the C9x
standard, so hopefully it will become part of C++ in the future.
Already several C++ compilers support it (Cray,KAI C++,SGI)."

And this this is circa 1988...

GCC at least has a __restrict__ extension.
 
L

Lionel B

[...]
I am currently reading Chapter 22 of TC++PL 3,v which includes
valarrays, and from what I have read so far, valarrays are intended to
be heavily optimised (even by using parallel operations on multi-cpu
systems) so there are fewer assumptions we can make in comparison to
other containers, for example "valarrays are assumed to be alias free,
and the introduction of auxiliary types and ==>the elimination of
temporaries is allowed<== as long as the basic semantics are
maintained".

So it is the only container that I am not sure we can use pointers and
pointer arithmetic to access and manipulate its data, in addition to
accessing them via the subscript operator[].

I'd always understood that &v[0] is guaranteed to point to a contiguous
array (I've always blithely passed it as a Fortran array parameter
without any problems)... is not the case?
Also "22.4.7 Temporaries, Copying and Loops" of TC++PL 3 may be useful
to you, since it describes a method of deferring the various
sub-calculations, until all data for a given calculation are available
to be used in a final_calculation_function, while inlining the
"sub-calculations".

Thanks, I'll check it out.
 
J

Jerry Coffin

[ ... ]
So it is the only container that I am not sure we can use pointers and
pointer arithmetic to access and manipulate its data, in addition to
accessing them via the subscript operator[].

I'd always understood that &v[0] is guaranteed to point to a contiguous
array (I've always blithely passed it as a Fortran array parameter
without any problems)... is not the case?

Theoretically, no; practically, yes. The C++ 98 standard didn't require
std::vector to use contiguous memory. All of the publicly available
implementations have done so however, and in C++ 0x, it will be
required.

FWIW, the same is true of std::string.
 
J

Jerry Coffin

Jerry said:
[ ... ]
So it is the only container that I am not sure we can use pointers
and pointer arithmetic to access and manipulate its data, in
addition to accessing them via the subscript operator[].

I'd always understood that &v[0] is guaranteed to point to a
contiguous array (I've always blithely passed it as a Fortran array
parameter without any problems)... is not the case?

Theoretically, no; practically, yes. The C++ 98 standard didn't
require std::vector to use contiguous memory. All of the publicly
available implementations have done so however, and in C++ 0x, it
will be required.

It is already required in current [2003] Standard. See [lib.vector],
paragraph 1.

Ah, I hadn't noticed that. Thanks for the heads-up.
Not sure what "the same" you mean here.

That C++ 98 didn't require it to use contiguous storage, but C++ 0x
will.
 
V

Victor Bazarov

Jerry said:
[ ... ]
So it is the only container that I am not sure we can use pointers
and pointer arithmetic to access and manipulate its data, in
addition to accessing them via the subscript operator[].

I'd always understood that &v[0] is guaranteed to point to a
contiguous array (I've always blithely passed it as a Fortran array
parameter without any problems)... is not the case?

Theoretically, no; practically, yes. The C++ 98 standard didn't
require std::vector to use contiguous memory. All of the publicly
available implementations have done so however, and in C++ 0x, it
will be required.

It is already required in current [2003] Standard. See [lib.vector],
paragraph 1.
FWIW, the same is true of std::string.

Not sure what "the same" you mean here.

V
 
I

Ioannis Vranos

Jerry said:
Theoretically, no; practically, yes. The C++ 98 standard didn't require
std::vector to use contiguous memory. All of the publicly available
implementations have done so however, and in C++ 0x, it will be
required.

FWIW, the same is true of std::string.


AFAIK, regarding vector, string, etc containers this was changed and are
guaranteed to use contiguous memory. So in C++03 it is mentioned:

"A vector is a kind of sequence that supports random access iterators.
In addition, it supports (amortized) constant time insert and erase
operations at the end; insert and erase in the middle take linear time.
Storage management is handled automatically, though hints can be given
to improve efficiency.


===> The elements of a vector are stored contiguously, meaning that if v
is a vector<T, Allocator> where T is some type other than bool, then it
obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size()".
 
I

Ioannis Vranos

Ioannis said:
AFAIK, regarding vector, string, etc containers this was changed and are
guaranteed to use contiguous memory. So in C++03 it is mentioned:

"A vector is a kind of sequence that supports random access iterators.
In addition, it supports (amortized) constant time insert and erase
operations at the end; insert and erase in the middle take linear time.
Storage management is handled automatically, though hints can be given
to improve efficiency.


===> The elements of a vector are stored contiguously, meaning that if v
is a vector<T, Allocator> where T is some type other than bool, then it
obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size()".

For std::string etc containers the thing is the same, but valarray is
more complicated:

"The class template valarray<T> is a one-dimensional smart array, with
elements numbered sequentially from zero. It is a representation of the
mathematical concept of an ordered set of values. The illusion of higher
dimensionality may be produced by the familiar idiom of computed
indices, together with the powerful subsetting capabilities provided by
the generalized subscript operators.255)".


Elsewhere it is mentioned:

"The expression &a[i+j] == &a + j evaluates as true for all size_t i
and size_t j such that i+j is less than the length of ===> the
non-constant array a. <===
Likewise, the expression &a != &b[j] evaluates as true for any two
==> non-constant arrays <== a and b and for any size_t i and size_t j
such that i is less than the length of a and j is less than the length
of b. This property indicates an absence of aliasing and may be used to
advantage by optimizing compilers.260)".


So as far as I can understand, in the case of the valarray, pointer
arithmetic can be used only for non-constant valarrays.
 
L

Lionel B

Ioannis Vranos wrote:

[...]
"The class template valarray<T> is a one-dimensional smart array, with
elements numbered sequentially from zero. It is a representation of the
mathematical concept of an ordered set of values. The illusion of higher
dimensionality may be produced by the familiar idiom of computed
indices, together with the powerful subsetting capabilities provided by
the generalized subscript operators.255)".

Elsewhere it is mentioned:

"The expression &a[i+j] == &a + j evaluates as true for all size_t i
and size_t j such that i+j is less than the length of ===> the
non-constant array a. <===
Likewise, the expression &a != &b[j] evaluates as true for any two
==> non-constant arrays <== a and b and for any size_t i and size_t j
such that i is less than the length of a and j is less than the length
of b. This property indicates an absence of aliasing and may be used to
advantage by optimizing compilers.260)".

So as far as I can understand, in the case of the valarray, pointer
arithmetic can be used only for non-constant valarrays.


I find that a bit bizarre. Why should constant valarrays be hamstrung
with regard to potential optimisability? Or is the intention to
facilitate some different/better optimisation (I can't imagine what)
based on const-ness and which might be impeded by imposing the above
restrictions?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top