is this portable, conforming to standard, elegant?

R

r.z.

class vector3
{
public:
union
{
float[3] data;
struct
{
float x, y, z;
};
};
};

access with data array is convenient for file io and x,y,z for other things
 
A

Andre Kostur

r.z. said:
class vector3
{
public:
union
{
float[3] data;
struct
{
float x, y, z;
};
};
};

access with data array is convenient for file io and x,y,z for other
things


Is it possible that the structure may be byte-packed in a different manner
than an array?
 
R

red floyd

r.z. said:
class vector3
{
public:
union
{
float[3] data;
syntax error.
float data[3];
struct
{
float x, y, z;
};
};
};

access with data array is convenient for file io and x,y,z for other things

Depends on how you use it. Behavior is only defined if you access the
same member of the union that you last modified. In other words, if you
write to x, and try to read it with data[0], you get UB.
 
R

r.z.

Is it possible that the structure may be byte-packed in a different manner
than an array?

do you ask whether data[] will always contain values x, y, z in this order?
(like in the structure) then the answer is yes.
 
R

r.z.

Depends on how you use it. Behavior is only defined if you access the
same member of the union that you last modified. In other words, if you
write to x, and try to read it with data[0], you get UB.

bad news :/
 
K

Kai-Uwe Bux

r.z. said:
class vector3
{
public:
union
{
float[3] data;
struct
{
float x, y, z;
};
};
};

access with data array is convenient for file io and x,y,z for other
things

If you mix access modes, you get undefined behavior. However, you can fake
the interface as follows:

struct Vector3 {

float data [3];
float & x;
float & y;
float & z;

Vector3 ( void )
: x ( data[0] )
, y ( data[1] )
, z ( data[2] )
{
x = y = z = 0;
}

};

#include <iostream>

int main ( void ) {
Vector3 a;
a.data[1] = 2;
std::cout << a.y << '\n';
}


An obvious drawback is that assignment and copy-construction become tricky.
Also, it is not clear whether the compiler will realize the reference
members as pointers and allocate additional space. Therefore, the following
seems to be better:

struct Vector3 {

float data [3];

float & x ( void ) {
return ( data[0] );
}

float const & x ( void ) const {
return ( data[0] );
}

float & y ( void ) {
return ( data[1] );
}

float const & y ( void ) const {
return ( data[1] );
}

float & z ( void ) {
return ( data[2] );
}

float const & z ( void ) const {
return ( data[2] );
}

};

#include <iostream>

int main ( void ) {
Vector3 a;
a.data[1] = 2;
std::cout << a.y() << '\n';
}


Now, there is still the issue that the user could easily do data[4] and
trigger undefined behavior. Thus, maybe:

#include <cassert>

struct Vector3 {

static unsigned int const size = 3;

private:

float data [size];

public:

Vector3 ( void )
: data ()
{}

float & x ( void ) {
return ( data[0] );
}

float const & x ( void ) const {
return ( data[0] );
}

float & y ( void ) {
return ( data[1] );
}

float const & y ( void ) const {
return ( data[1] );
}

float & z ( void ) {
return ( data[2] );
}

float const & z ( void ) const {
return ( data[2] );
}

float & operator[] ( unsigned i ) {
assert( i < size );
return ( data );
}

float const & operator[] ( unsigned i ) const {
assert( i < size );
return ( data );
}

};

#include <iostream>

int main ( void ) {
Vector3 a;
a[1] = 2;
std::cout << a.y() << '\n';
}



Finally, you may consider to use double instead of float.


Best

Kai-Uwe Bux
 
R

r.z.

Depends on how you use it. Behavior is only defined if you access the
same member of the union that you last modified. In other words, if you
write to x, and try to read it with data[0], you get UB.

but data[0] and x are still the same types? aren't they?
 
K

Kai-Uwe Bux

r.z. said:
Depends on how you use it. Behavior is only defined if you access the
same member of the union that you last modified. In other words, if you
write to x, and try to read it with data[0], you get UB.

but data[0] and x are still the same types? aren't they?

Sure, both are float (given the context from the OP that has been lost due
to editing). But why would that matter? You have undefined behavior anyway.


Best

Kai-Uwe Bux
 
M

Michael DOUBEZ

red floyd a écrit :
r.z. said:
class vector3
{
public:
union
{
float[3] data;
syntax error.
float data[3];
struct
{
float x, y, z;
};
};
};

access with data array is convenient for file io and x,y,z for other things

Depends on how you use it. Behavior is only defined if you access the
same member of the union that you last modified. In other words, if you
write to x, and try to read it with data[0], you get UB.


From the standard draft(2135) §9.5 (but it is the same in the standard):
[Note: one special guarantee is made in order to simplify the use of
9.2), and if an object of unions: If a POD-union contains several
POD-structs that share a common initial sequence ( this POD-union type
contains one of the POD-structs, it is permitted to inspect the common
initial sequence of any of POD-struct members; see 9.2. —end note]

Then if you have:
enum
{
struct
{
int a;
int b;
float c
} one;

struct
{
int aa;
int bb;
char foo[42];
} two;
} data;

data.one.a=1;
Then accessing data.two.aa is permitted and equal to data.one.a.

In the case of the OP, the initialisation sequence is not the same so it
is UB although it looks naggingly close.

Michael
 
P

peter koch

r.z. said:
class vector3
{
public:
union
{
float[3] data;
struct
{
float x, y, z;
};
};
};
access with data array is convenient for file io and x,y,z for other
things

If you mix access modes, you get undefined behavior. However, you can fake
the interface as follows:

struct Vector3 {

float data [3];
float & x;
float & y;
float & z;

Vector3 ( void )
: x ( data[0] )
, y ( data[1] )
, z ( data[2] )
{
x = y = z = 0;
}

};

#include <iostream>

int main ( void ) {
Vector3 a;
a.data[1] = 2;
std::cout << a.y << '\n';

}

An obvious drawback is that assignment and copy-construction become tricky.
Also, it is not clear whether the compiler will realize the reference
members as pointers and allocate additional space. Therefore, the following
seems to be better:

struct Vector3 {

float data [3];

float & x ( void ) {
return ( data[0] );
}

float const & x ( void ) const {
return ( data[0] );
}

float & y ( void ) {
return ( data[1] );
}

float const & y ( void ) const {
return ( data[1] );
}

float & z ( void ) {
return ( data[2] );
}

float const & z ( void ) const {
return ( data[2] );
}

};

#include <iostream>

int main ( void ) {
Vector3 a;
a.data[1] = 2;
std::cout << a.y() << '\n';

}

Now, there is still the issue that the user could easily do data[4] and
trigger undefined behavior. Thus, maybe:

#include <cassert>

struct Vector3 {

static unsigned int const size = 3;

private:

float data [size];

public:

Vector3 ( void )
: data ()
{}

float & x ( void ) {
return ( data[0] );
}

float const & x ( void ) const {
return ( data[0] );
}

float & y ( void ) {
return ( data[1] );
}

float const & y ( void ) const {
return ( data[1] );
}

float & z ( void ) {
return ( data[2] );
}

float const & z ( void ) const {
return ( data[2] );
}

float & operator[] ( unsigned i ) {
assert( i < size );
return ( data );
}

float const & operator[] ( unsigned i ) const {
assert( i < size );
return ( data );
}

};

#include <iostream>

int main ( void ) {
Vector3 a;
a[1] = 2;
std::cout << a.y() << '\n';

}

Finally, you may consider to use double instead of float.

Best

Kai-Uwe Bux- Skjul tekst i anførselstegn -

- Vis tekst i anførselstegn -


Your solution has the nice property that it is portable and does not
break the standard. The problem is that the references will take up
space on many compilers - or at least did when I tried to use the same
trick.
The "union hack" is formally undefined behaviour, but in practice it
is very portable.

/Peter
 
P

peter koch

class vector3
{
public:
union
{
float[3] data;
struct
{
float x, y, z;
};
};

};

access with data array is convenient for file io and x,y,z for other things
It is neither elegant nor standards-conforming. Still, it is quite
portable and a hack that could be used in one specific situation,
namely the one where you have legacy code using xyz-notation but would
like to interface to code using arrays. Test it on the platform - e.g.
each time your program starts up - and live with it.
In all other situations, you should prefer a not-so-hackish solution.

/Peter
 
E

Emmanuel Deloget

[sniped a LOT of code]
struct Vector3 {

float data [3];
float & x;
float & y;
float & z;

Vector3 ( void )
: x ( data[0] )
, y ( data[1] )
, z ( data[2] )
{
x = y = z = 0;
}

};


Kai-Uwe Bux

A common trick (which I first saw in a gamedev.net thread) is to use a
static array as a proxy:

class vector3d
{
static float vector3d::* proxy[3];
public:
float x, y, z;
float& operator[](unsigned int i) { return (*this).*proxy; }
};

float vector3d::* vector3d::proxy[3] = { &vector3d::x, &vector3d::y,
&vector3d::z };

I'm not sure about the initialization time of proxy[], but beside that
this code looks ok (I haven't checked it). sizeof(vector3d) is the
expected size, and we provide fast access to the data when we use
operator[]. best of all world, IMHO.

Regards,

-- Emmanuel Deloget
 
S

Sylvester Hesp

r.z. said:
class vector3
{
public:
union
{
float[3] data;
struct
{
float x, y, z;
};
};
};

access with data array is convenient for file io and x,y,z for other
things

Aside from the issues the others have already raised, your struct
declaration doesn't declare or define anything - it has no typename and it
is not a member definition. There are compilers that support this kind of
syntax (with the same behaviour as with a union - they insert the members in
the scope in which the union is being defined), but it is not conforming to
standard C++.

- Sylvester
 
S

Sylvester Hesp

*snap*

The "union hack" is formally undefined behaviour, but in practice it
is very portable.

/Peter

I'm still waiting for the compiler that actually *does* generate code that
formats your harddrive whenever it encounters UB ;)

- Sylvester
 
M

Michael DOUBEZ

Sylvester Hesp a écrit :
I'm still waiting for the compiler that actually *does* generate code that
formats your harddrive whenever it encounters UB ;)

Easy.

If the undefined behaviour causes a bug very difficult to locate, the
tester may well put his hard drive in a microwave just to be rid of the
code and start anew. He may put the backups in it as well.


Michael
 
D

David O

*snap*



I'm still waiting for the compiler that actually *does* generate code that
formats your harddrive whenever it encounters UB ;)

- Sylvester

It is interesting that union hacking is undefined behavior, whereas
reinterpret_cast has implementation-defined behavior.

In the (non-normative) example in 9.5 para 2 of my copy of the draft
standard (which has probably moved), it shows an anonymous union { int
a; char *p ; } and notes that a and p are ordinary variables which
have the same address. Local variables may be placed in registers -
sometimes even if their address is taken - in which case the compiler
could reasonably place a and p in different registers (especially if
sizeof(int) != sizeof(p)), as that would not affect any fully-
complient program. For example,

void test( bool set_int, bool get_int )
{
union { int a; char *p ; };
if( set_int ) a = 42;
else p = "hello";

if( get_int ) std::cout << a << std::endl;
else std::cout << p << std::endl;
}

test(false, false) ["42"] and test(true, true) ["hello"] are both well-
defined. test(true, false) is undefined and likely to crash.

test( false, true ) may be expected to print the address of the local
string, but - while it probably won't actually format your hard drive
- there are many "reasonable" types of undefined behavior that a
conscientious (i.e. non-malicious) compiler writer may cause to
happen. These include outputting the string address, outputting "42",
or outputting a random integer; architectures which reserve 0x800..000
as an integer NaN may even helpfully tell that a was uninitialized!

The code may have been effectively rewritten as:

void test( bool set_int, bool get_int )
{
register int a; // = NaN
register char *p ;
if( set_int ) a = 42;
else p = "hello";

if( get_int ) std::cout << a << std::endl;
else std::cout << p << std::endl;
}

which would give a garbage (or uninitialized) integer value.

This may have been optimized to:

void test( bool /*set_int*/, bool get_int )
{
register int a = 42; // only value actually used for a.
register char *p = "hello"; // only value actually used for p.

if( get_int ) std::cout << a << std::endl;
else std::cout << p << std::endl;
}

or equivalently:

void test( bool /*set_int*/, bool get_int )
{
if( get_int ) std::cout << 42 << std::endl;
else std::cout << "hello" << std::endl;
}

Using reintrepret_cast on the data should be more reliable - just read
the compiler documentation ;-).

In practice, I must admit to using the "union hack" in low level code
- accessing hardware registers or implementing communications
protocols - without problem. However, this is library code that is
known to be non-portable, and has a suite of unit tests which get run
on each new compiler version.

I can't think of any reasons, outside maintenence situations - where
the union hack would apply to application code.

Best Regards,

David O.
 
A

Alexei Polkhanov

class vector3
{
public:
union
{
float[3] data;
struct
{
float x, y, z;
};
};

};

access with data array is convenient for file io and x,y,z for other things

Use overloaded operator [] instead, and given that union in C++ is
also a class - we can have code like this:
class BadIndexException
{
public:
BadIndexException(const char* text = 0)
{
// ...
}
};

union U
{
public:
float& operator[](int index)
{
if (index == 0)
return x;
else if (index == 1)
return y;
else if (index == 2)
return z;
else
throw BadIndexException("Invalid index");
};
private:
float data[3];
public:
struct
{
float x;
float y;
float z;
};
};


int main(int argc, char* argv[])
{
U u;
u[0] = 2.0;
u[2] = 3.0;
u.x = 4.0;
printf("u[0]= %f, sizeof(u)= %d, sizeof(float[3])= %d", u[0],
sizeof(u), sizeof(float[3]));
return 0;
}

This code will produce "u[0]= 4.000000, sizeof(u)= 12,
sizeof(float[3])= 12", from which you can see that desired behavior
and savings of memory are achieved.

---
Alexei Polkhanov
Sr. Consultant/Software Systems Analyst
Tel: (604) 719-2515
E-mail: (e-mail address removed)
http://www.monteaureus.com/
 
C

Craig Scott

class vector3
{
public:
union
{
float[3] data;
struct
{
float x, y, z;
};
};

access with data array is convenient for file io and x,y,z for other things

Use overloaded operator [] instead, and given that union in C++ is
also a class - we can have code like this:
class BadIndexException
{
public:
BadIndexException(const char* text = 0)
{
// ...
}

};

union U
{
public:
float& operator[](int index)
{
if (index == 0)
return x;
else if (index == 1)
return y;
else if (index == 2)
return z;
else
throw BadIndexException("Invalid index");
};
private:
float data[3];
public:
struct
{
float x;
float y;
float z;
};

};

int main(int argc, char* argv[])
{
U u;
u[0] = 2.0;
u[2] = 3.0;
u.x = 4.0;
printf("u[0]= %f, sizeof(u)= %d, sizeof(float[3])= %d", u[0],
sizeof(u), sizeof(float[3]));
return 0;

}

This code will produce "u[0]= 4.000000, sizeof(u)= 12,
sizeof(float[3])= 12", from which you can see that desired behavior
and savings of memory are achieved.

Your code doesn't need the private data[] array at all. Change the
union to an ordinary class, remove the private data[] array and you
get the same result. You could then even change the x,y,z members into
regular class data members instead of nesting them inside an anonymous
struct. This would make the class very simple. The result would look
something like this:

class U
{
// Normally, x,y,z would be private, but OP needs them
// to be public
public:
float x;
float y;
float z;

// operator[] should be public
public:
// Should also provide a const version of this
float& operator[](int index)
{
if (index == 0)
return x;
else if (index == 1)
y;
else if (index == 2)
return z;
else
// Throw exception or something else appropriate
};
};

Personally, to me this then becomes the "best compromise" solution to
the original poster's problem. It allows clients to continue using
x,y,z member variables but also access using array index notation. It
even allows array index checking to be done transparently. The one
thing it does not do (to my understanding of the standard) is
guarantee that x,y,z are laid out in memory in an equivalent fashion
to float[3], so if you need that then this solution is probably not
sufficient.
 
K

kalki70

r.z. said:
class vector3
{
public:
union
{
float[3] data;
struct
{
float x, y, z;
};
};
};
access with data array is convenient for file io and x,y,z for other
things
If you mix access modes, you get undefined behavior. However, you can fake
the interface as follows:
struct Vector3 {
float data [3];
float & x;
float & y;
float & z;
Vector3 ( void )
: x ( data[0] )
, y ( data[1] )
, z ( data[2] )
{
x = y = z = 0;
}

#include <iostream>
int main ( void ) {
Vector3 a;
a.data[1] = 2;
std::cout << a.y << '\n';

An obvious drawback is that assignment and copy-construction become tricky.
Also, it is not clear whether the compiler will realize the reference
members as pointers and allocate additional space. Therefore, the following
seems to be better:
struct Vector3 {
float data [3];
float & x ( void ) {
return ( data[0] );
}
float const & x ( void ) const {
return ( data[0] );
}
float & y ( void ) {
return ( data[1] );
}
float const & y ( void ) const {
return ( data[1] );
}
float & z ( void ) {
return ( data[2] );
}
float const & z ( void ) const {
return ( data[2] );
}

#include <iostream>
int main ( void ) {
Vector3 a;
a.data[1] = 2;
std::cout << a.y() << '\n';

Now, there is still the issue that the user could easily do data[4] and
trigger undefined behavior. Thus, maybe:
#include <cassert>
struct Vector3 {
static unsigned int const size = 3;

float data [size];

Vector3 ( void )
: data ()
{}
float & x ( void ) {
return ( data[0] );
}
float const & x ( void ) const {
return ( data[0] );
}
float & y ( void ) {
return ( data[1] );
}
float const & y ( void ) const {
return ( data[1] );
}
float & z ( void ) {
return ( data[2] );
}
float const & z ( void ) const {
return ( data[2] );
}
float & operator[] ( unsigned i ) {
assert( i < size );
return ( data );
}

float const & operator[] ( unsigned i ) const {
assert( i < size );
return ( data );
}

#include <iostream>

int main ( void ) {
Vector3 a;
a[1] = 2;
std::cout << a.y() << '\n';

Finally, you may consider to use double instead of float.

Kai-Uwe Bux- Skjul tekst i anførselstegn -
- Vis tekst i anførselstegn -

Your solution has the nice property that it is portable and does not
break the standard. The problem is that the references will take up
space on many compilers - or at least did when I tried to use the same
trick.
The "union hack" is formally undefined behaviour, but in practice it
is very portable.

/Peter



I'm not sure that it is very portable. I have seen that, depending on
compiler options, members inside a struct can change their relative
position.
I don't think there is anything in the standard that says for the
following x,y and z must have consecutive memory addresses.
union
{
float[3] data;
struct
{
float x, y, z;
};
};

&x == data ??
&y == data + 1??
&z == data + 2 ??

I think the compiler has the freedom to choose any memory addresses,
so we probably can have &x == data + N, 0 < N <= 2

Luis
 
K

Kai-Uwe Bux

kalki70 said:
r.z. wrote:
class vector3
{
public:
union
{
float[3] data;
struct
{
float x, y, z;
};
};
};
access with data array is convenient for file io and x,y,z for other
things
If you mix access modes, you get undefined behavior. However, you can
fake the interface as follows:
struct Vector3 {
float data [3];
float & x;
float & y;
float & z;
Vector3 ( void )
: x ( data[0] )
, y ( data[1] )
, z ( data[2] )
{
x = y = z = 0;
}

#include <iostream>
int main ( void ) {
Vector3 a;
a.data[1] = 2;
std::cout << a.y << '\n';

An obvious drawback is that assignment and copy-construction become
tricky. Also, it is not clear whether the compiler will realize the
reference members as pointers and allocate additional space. Therefore,
the following seems to be better:
struct Vector3 {
float data [3];
float & x ( void ) {
return ( data[0] );
}
float const & x ( void ) const {
return ( data[0] );
}
float & y ( void ) {
return ( data[1] );
}
float const & y ( void ) const {
return ( data[1] );
}
float & z ( void ) {
return ( data[2] );
}
float const & z ( void ) const {
return ( data[2] );
}

#include <iostream>
int main ( void ) {
Vector3 a;
a.data[1] = 2;
std::cout << a.y() << '\n';

Now, there is still the issue that the user could easily do data[4] and
trigger undefined behavior. Thus, maybe:
#include <cassert>
struct Vector3 {
static unsigned int const size = 3;

float data [size];

Vector3 ( void )
: data ()
{}
float & x ( void ) {
return ( data[0] );
}
float const & x ( void ) const {
return ( data[0] );
}
float & y ( void ) {
return ( data[1] );
}
float const & y ( void ) const {
return ( data[1] );
}
float & z ( void ) {
return ( data[2] );
}
float const & z ( void ) const {
return ( data[2] );
}
float & operator[] ( unsigned i ) {
assert( i < size );
return ( data );
}

float const & operator[] ( unsigned i ) const {
assert( i < size );
return ( data );
}

#include <iostream>

int main ( void ) {
Vector3 a;
a[1] = 2;
std::cout << a.y() << '\n';

Finally, you may consider to use double instead of float.

Kai-Uwe Bux- Skjul tekst i anførselstegn -
- Vis tekst i anførselstegn -

Your solution has the nice property that it is portable and does not
break the standard. The problem is that the references will take up
space on many compilers - or at least did when I tried to use the same
trick.
The "union hack" is formally undefined behaviour, but in practice it
is very portable.

/Peter



I'm not sure that it is very portable. I have seen that, depending on
compiler options, members inside a struct can change their relative
position.


That would require an intervening access specifier [9.2/12]:

Nonstatic data members of a (non-union) class declared without an
intervening access-specifier are allocated so that later members have
higher addresses within a class object. The order of allocation of
nonstatic data members separated by an access-specifier is unspecified
(11.1). Implementation alignment requirements might cause two adjacent
members not to be allocated immediately after each other; so might
requirements for space for managing virtual functions (10.3) and virtual
base classes (10.1).
I don't think there is anything in the standard that says for the
following x,y and z must have consecutive memory addresses.
union
{
float[3] data;
struct
{
float x, y, z;
};
};

&x == data ??
&y == data + 1??
&z == data + 2 ??

Well, since the standard mentions alignment requirements as the only reason
for gaps in POD types, it gives a string hint that in the above case, there
are not gaps between the various floats.
I think the compiler has the freedom to choose any memory addresses,
so we probably can have &x == data + N, 0 < N <= 2

Since we are actually not talking the standard anymore, but portability from
a practical point of view, I would like to ask whether you know a compiler
that chooses N != 0?


Best

Kai-Uwe Bux
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top