how to structure a class that may hold two kind of values

fabricio.olivetti · Feb 7, 2008

I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.

Of course I could declare something like this:

class foo{

private:
vector< char > cData;
vector< double > dData;
bool type;
public:
double getData(unsigned i);
};

and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

Regards,
Fabricio

Victor Bazarov · Feb 7, 2008

I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.

Of course I could declare something like this:

class foo{

private:
vector< char > cData;
vector< double > dData;
bool type;
public:
double getData(unsigned i);
};

and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

Without knowing how 'getData' is supposed to be used there is no way
to tell if it fits the requirements (assumed or specified). I can
easily think of a scenario where I'd like to have to members

char getChar(unsigned i) const;
double getDouble(unsigned i) const;

and have them throw an exception if the type doesn't matcht the stored
type of the class.

V

fabricio.olivetti · Feb 7, 2008

Without knowing how 'getData' is supposed to be used there is no way
to tell if it fits the requirements (assumed or specified). I can
easily think of a scenario where I'd like to have to members

char getChar(unsigned i) const;
double getDouble(unsigned i) const;

and have them throw an exception if the type doesn't matcht the stored
type of the class.

V

Let's say getData just returns the element on the 'i'th position of
the vector. And also, let's assume that I want to avoid a check of
what member function to call (don't wanna do a: switch(type) case 0:
getChar(); case 1: getDouble();...)

AnonMail2005 · Feb 7, 2008

I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.

Of course I could declare something like this:

class foo{

private:
vector< char > cData;
vector< double > dData;
bool type;
public:
double getData(unsigned i);

};

and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

Regards,
Fabricio

Since you haven't specified how large the dataset can be I would just
store doubles. Worry about space optimization later if it indeed
matters.

HTH

Victor Bazarov · Feb 7, 2008

Let's say getData just returns the element on the 'i'th position of
the vector. And also, let's assume that I want to avoid a check of
what member function to call (don't wanna do a: switch(type) case 0:
getChar(); case 1: getDouble();...)

That's not really a design specificiation. It looks very much like
an implementation detail. Now, the check you are talking about has
to be done inside 'getData' anyway. So, it doesn't really matter
who makes it since it has to be made.

Now, perhaps you will think a bit what the importance of the dual
storage is and what the use of 'getData' is, and then (you don't
really have to tell us) you will have a clearer picture why you
think you need the two vectors and one 'getData' function. Who
stores the vectors? Who sets the 'type'? How are they changed?
And, most importantly, why?

And, if you can help it, please don't quote signatures. Thanks!

V

fabricio.olivetti · Feb 7, 2008

That's not really a design specificiation. It looks very much like
an implementation detail. Now, the check you are talking about has
to be done inside 'getData' anyway. So, it doesn't really matter
who makes it since it has to be made.

Now, perhaps you will think a bit what the importance of the dual
storage is and what the use of 'getData' is, and then (you don't
really have to tell us) you will have a clearer picture why you
think you need the two vectors and one 'getData' function. Who
stores the vectors? Who sets the 'type'? How are they changed?
And, most importantly, why?

And, if you can help it, please don't quote signatures. Thanks!

Hmm let me explain my problem further.
I'm designin a toolbox that holds a collection of clustering
algorithms. So let's say I'm implementing a class for algorithm A,
another one for algorithm B and so on.
I want the data access to be transparent for them (just use something
like data(i,j)

whenever they need a direct access of it.

The data container class will have a function to read it from a file,
to store it in the most effective way and to perform some common
required calculations required by the algorithms.

So basically I built a GUI that gives a user the option to load a data
from a file (and the data class will be constructed), choose one
algorithm and this algorithm will its things with the data.

But, unfotunatelly, I can come accross datasets that can store either
integer values or double values, additionally the data may be dense or
sparse (so I may end up with a sparse matrix representation) and, even
when sparse, the data may be so large that I MUST save memory storing
it with "char" type instead of double!

I guess declaring both types and using it internally with a variable
that holds what type it is must be the only way to deal with it!

thanks

Victor Bazarov · Feb 7, 2008

[..]
Hmm let me explain my problem further.
I'm designin a toolbox that holds a collection of clustering
algorithms. So let's say I'm implementing a class for algorithm A,
another one for algorithm B and so on.
I want the data access to be transparent for them (just use something
like data(i,j) whenever they need a direct access of it.

The data container class will have a function to read it from a file,
to store it in the most effective way and to perform some common
required calculations required by the algorithms.

So basically I built a GUI that gives a user the option to load a data
from a file (and the data class will be constructed), choose one
algorithm and this algorithm will its things with the data.

But, unfotunatelly, I can come accross datasets that can store either
integer values or double values, additionally the data may be dense or
sparse (so I may end up with a sparse matrix representation) and, even
when sparse, the data may be so large that I MUST save memory storing
it with "char" type instead of double!

I guess declaring both types and using it internally with a variable
that holds what type it is must be the only way to deal with it!

I am guessing you don't mind this tennis match, do you?

I take it that your data class in some cases has to store so much data,
and the data themselves are so imprecise, that storing a 'char' value
is OK and you'd like to do it because you want to save space. Well, it
smells like premature optimization, but if you want it that way, fine.

Now, it does seem that your data storage class is essentially dumb and
does not serve any other purpose except for storage and retrieval. Let
me level with you here. I don't think such class has enough reason to
exist. If your algorithm A (or B, or C) calls for some data to be
lugged around, let the class that handles the data also *store it*.

It makes no sense that the data are stored separately from where they
are processed. If your reading class knows how to read 'char' (or some
other type), fine. But let the processing class allocate the needed
buffer and pass it to the reader for populating with values from the
file.

If you let the processing object hold its own data, then you simply
declare different data types in your different processing classes.
Algorithm A would have 'double', so would Algorithm B, but Algorithm
C, for instance, would have 'char'. No need to extract this into
a separate class and torture yourself trying to squeeze two types
where even one doesn't feel comfortable.

V

fabricio.olivetti · Feb 7, 2008

[email protected] said:
[email protected] said:

[..]
Hmm let me explain my problem further.
I'm designin a toolbox that holds a collection of clustering
algorithms. So let's say I'm implementing a class for algorithm A,
another one for algorithm B and so on.
I want the data access to be transparent for them (just use something
like data(i,j) whenever they need a direct access of it.

Click to expand...

The data container class will have a function to read it from a file,
to store it in the most effective way and to perform some common
required calculations required by the algorithms.

Click to expand...

So basically I built a GUI that gives a user the option to load a data
from a file (and the data class will be constructed), choose one
algorithm and this algorithm will its things with the data.

Click to expand...

But, unfotunatelly, I can come accross datasets that can store either
integer values or double values, additionally the data may be dense or
sparse (so I may end up with a sparse matrix representation) and, even
when sparse, the data may be so large that I MUST save memory storing
it with "char" type instead of double!

Click to expand...

I guess declaring both types and using it internally with a variable
that holds what type it is must be the only way to deal with it!

Click to expand...

I am guessing you don't mind this tennis match, do you?

I take it that your data class in some cases has to store so much data,
and the data themselves are so imprecise, that storing a 'char' value
is OK and you'd like to do it because you want to save space. Well, it
smells like premature optimization, but if you want it that way, fine.

Now, it does seem that your data storage class is essentially dumb and
does not serve any other purpose except for storage and retrieval. Let
me level with you here. I don't think such class has enough reason to
exist. If your algorithm A (or B, or C) calls for some data to be
lugged around, let the class that handles the data also *store it*.

It makes no sense that the data are stored separately from where they
are processed. If your reading class knows how to read 'char' (or some
other type), fine. But let the processing class allocate the needed
buffer and pass it to the reader for populating with values from the
file.

If you let the processing object hold its own data, then you simply
declare different data types in your different processing classes.
Algorithm A would have 'double', so would Algorithm B, but Algorithm
C, for instance, would have 'char'. No need to extract this into
a separate class and torture yourself trying to squeeze two types
where even one doesn't feel comfortable.

I don't mind at all

You misunderstood some points of what my program will do.

First, imagine that the dataset is a large matrix of arbitrary values.
Some datasets hold small integer values on the range of [0; 5] (it's
not that storing a char is ok, it's just what I need), others may hold
double values on the range of [-10.0; 10.0]. There is some special
datasets that is extremely huge, and using double prevents these from
loading into memory (HUUUUUUUUUUGE). When using "char" I can load
those datasets, tho.

This data class must be shared among all the algorithms, there's no
sense in loading the dataset each time I run a different algorithm
(they all perform the same task, i.e., cluster the data). I must be
able to compare the performance among them, so it's most likely that
I'll run two or more algorithms on the same dataset.

The rational on operation is something like this: load data, run each
algorithm one at a time, compare results.

Kira Yamato · Feb 8, 2008

I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.

Of course I could declare something like this:

class foo{

private:
vector< char > cData;
vector< double > dData;
bool type;
public:
double getData(unsigned i);
};

and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

Regards,
Fabricio

// Try Boost::Variant. <http://www.boost.org/doc/html/variant.html>

#include <vector>
#include <iostream>
#include "boost/variant.hpp"

int main()
{
using namespace std;
using namespace boost;

// This allows a vector of mixed values, char or double.
vector<variant<char, double> > a;

a.push_back('A');
a.push_back(3.5);

cout << "a[0]=" << get<char>(a[0]) << endl;
cout << "a[1]=" << get<double>(a[1]) << endl;

return 0;
}

Michael Downton · Feb 8, 2008

I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.

Of course I could declare something like this:

class foo{

private:
vector< char > cData;
vector< double > dData;
bool type;
public:
double getData(unsigned i);
};

and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

Regards,
Fabricio

First don't strain a class with having to play two roles (even if
similar). Make an interface, make two seperate classes one for
optimized storage, one for plain. If you don't want virtual functions
make the interface implicit and pass the class as a template argument
to the algorithms. (toss up between general, and specific).

Though if the data sets are that hugh, most of your data will be left
in swap anyway. You'd be better off working the algorithms to make
intellegent use of smaller buffers, and sorting the data into regions
to increase locality. That way (if the algorithms are intellegent)
misses on your active set of data is rarer. I might even go so far as
to setup a list of tasks that each algorithm performs. then if a cache
miss does happen (note you'll need to keep track of what is and isn't
loaded yourself), that specific task could be paused. Of course that
last depends on your algorithms.

cheers
Michael

Kai-Uwe Bux · Feb 8, 2008

I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.

Of course I could declare something like this:

class foo{

private:
vector< char > cData;
vector< double > dData;
bool type;
public:
double getData(unsigned i);
};

and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

The most elegant and probably also most efficient way is to not solve the
problem. Just store doubles as doubles instead of converting back and forth
on the fly.

As for how you _could_ go about the problem of representing different data
transparently, you might have a look at this:

#include <vector>
#include <cassert>
#include <iostream>
#include <memory>

class foo {
public:

typedef std::size_t size_type;

private:

struct base {

virtual
void push_back ( double ) = 0;

virtual
double get ( size_type ) const = 0;

virtual
base * clone ( void ) const = 0;

virtual
~base ( void ) {}

};

template < typename T >
struct node : public base {

std::vector< T > the_data;

void push_back ( double d ) {
the_data.push_back( d );
}

double get ( size_type n ) const {
assert( n < the_data.size() );
return ( the_data[n] );
}

base * clone ( void ) const {
return new node ( *this );
}

};

base * node_ptr;

public:

foo ( bool use_double = true )
: node_ptr ()
{
std::auto_ptr< base > dummy
( use_double ? new node<double> () : new node<char> () );
// put your filling routine here:
for ( unsigned i = 0; i < 20; ++i ) {
dummy->push_back( 100 );
}
node_ptr = dummy.release();
}

double getData ( size_type n ) const {
return ( node_ptr->get( n ) );
}

foo ( foo const & other )
: node_ptr ( other.node_ptr->clone() )
{}

~foo ( void ) {
delete ( node_ptr );
}

};

int main ( void ) {
foo f;
std::cout << f.getData( 5 ) << '\n';
}

Best

Kai-Uwe Bux

Gerhard Fiedler · Feb 8, 2008

First, imagine that the dataset is a large matrix of arbitrary values.
Some datasets hold small integer values on the range of [0; 5] (it's not
that storing a char is ok, it's just what I need), others may hold
double values on the range of [-10.0; 10.0]. There is some special
datasets that is extremely huge, and using double prevents these from
loading into memory (HUUUUUUUUUUGE). When using "char" I can load those
datasets, tho.

This data class must be shared among all the algorithms, there's no
sense in loading the dataset each time I run a different algorithm (they
all perform the same task, i.e., cluster the data). I must be able to
compare the performance among them, so it's most likely that I'll run
two or more algorithms on the same dataset.

The rational on operation is something like this: load data, run each
algorithm one at a time, compare results.

Without getting into the details, I guess you can store them in your
optimized vectors, but return a union from getData() (or a struct of a type
identifier plus that union, if that's necessary).

Gerhard

fabricio.olivetti · Feb 8, 2008

[email protected] said:
[email protected] said:

I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.

Click to expand...

Of course I could declare something like this:

Click to expand...

class foo{

Click to expand...

private:
vector< char > cData;
vector< double > dData;
bool type;
public:
double getData(unsigned i);
};

Click to expand...

and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

Click to expand...

The most elegant and probably also most efficient way is to not solve the
problem. Just store doubles as doubles instead of converting back and forth
on the fly.

As for how you _could_ go about the problem of representing different data
transparently, you might have a look at this:

#include <vector>
#include <cassert>
#include <iostream>
#include <memory>

class foo {
public:

typedef std::size_t size_type;

private:

struct base {

virtual
void push_back ( double ) = 0;

virtual
double get ( size_type ) const = 0;

virtual
base * clone ( void ) const = 0;

virtual
~base ( void ) {}

};

template < typename T >
struct node : public base {

std::vector< T > the_data;

void push_back ( double d ) {
the_data.push_back( d );
}

double get ( size_type n ) const {
assert( n < the_data.size() );
return ( the_data[n] );
}

base * clone ( void ) const {
return new node ( *this );
}

};

base * node_ptr;

public:

foo ( bool use_double = true )
: node_ptr ()
{
std::auto_ptr< base > dummy
( use_double ? new node<double> () : new node<char> () );
// put your filling routine here:
for ( unsigned i = 0; i < 20; ++i ) {
dummy->push_back( 100 );
}
node_ptr = dummy.release();
}

double getData ( size_type n ) const {
return ( node_ptr->get( n ) );
}

foo ( foo const & other )
: node_ptr ( other.node_ptr->clone() )
{}

~foo ( void ) {
delete ( node_ptr );
}

};

int main ( void ) {
foo f;
std::cout << f.getData( 5 ) << '\n';

}

Thanks! That's the way I'll go!
Thank you all!

Daniel T. · Feb 8, 2008

[email protected] said:
I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.

Of course I could declare something like this:

class foo{

private:
vector< char > cData;
vector< double > dData;
bool type;
public:
double getData(unsigned i);
};

and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

If all the elements of the dataset can be the same type, (once you
determine which type it can be) then:

class foo {
public:
virtual ~foo() { }
virtual double getData(unsigned i) = 0;
};

class CharFoo {
vector< char > data;
public:
virtual double getData(unsigned i);
};

class DoubleFoo {
vector< char > data;
public:
virtual double getData(unsigned i);
};

If all of the elements can't be the same type (for example if you can
only represent some of the doubles as chars in a particular dataset,)
then store them all as doubles because the amount of memory you would
need to keep track of which ones were chars and which were doubles would
overwhelm the memory you would be saving by compressing the storage to
begin with.

Also, if your datasets are so big, maybe you should use deque instead of
vector.

Jerry Coffin · Feb 17, 2008

(e-mail address removed)>, (e-mail address removed)
says...

[ ... ]

Hmm let me explain my problem further.
I'm designin a toolbox that holds a collection of clustering
algorithms. So let's say I'm implementing a class for algorithm A,
another one for algorithm B and so on.
I want the data access to be transparent for them (just use something
like data(i,j) whenever they need a direct access of it.

The data container class will have a function to read it from a file,
to store it in the most effective way and to perform some common
required calculations required by the algorithms.

So basically I built a GUI that gives a user the option to load a data
from a file (and the data class will be constructed), choose one
algorithm and this algorithm will its things with the data.

If you want to use the method you originally outlined, I'd carry it out
via inheritance:

class base_container {
public:
virtual double getData(usigned i) = 0;
};

class char_container {
vector<char> data;
public:
double getData(unsigned i) { return data; }
};

class double_container {
vector<double> data;
public:
double getData(unsigned i) { return data; }
};

base_container *data;

if (file_contains_chars) {
data = new char_container;
read_chars(file, data);
}
else {
data = new double_container;
read_doubles(file, data);
}

switch (cluster_method) {
case KM: k_means_cluster(data); break;
case SOM: self_ordered_map(data); break;
};

How to try a range of hex values in C# code ?	0	Nov 19, 2022
How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023
How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023
How to get all values of an object	1	Mar 26, 2022
How can I calculate the last payment of the year to be the sum of all previous payments for that year and subtracting it from Research Costs value?	7	Aug 22, 2023
How to have two html audio players on one page?	0	May 3, 2022
How to save textBox values into a xml-file(with naming an choosing directory)?	1	Aug 23, 2022
How to extract all values except the last value in a string separated by comma in sql	2	Jun 15, 2023

how to structure a class that may hold two kind of values

fabricio.olivetti

Victor Bazarov

fabricio.olivetti

AnonMail2005

Victor Bazarov

fabricio.olivetti

Victor Bazarov

fabricio.olivetti

Kira Yamato

Michael Downton

Kai-Uwe Bux

Gerhard Fiedler

fabricio.olivetti

Daniel T.

Jerry Coffin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads