how to structure a class that may hold two kind of values

  • Thread starter fabricio.olivetti
  • Start date
F

fabricio.olivetti

I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.

Of course I could declare something like this:

class foo{

private:
vector< char > cData;
vector< double > dData;
bool type;
public:
double getData(unsigned i);
};

and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

Regards,
Fabricio
 
V

Victor Bazarov

I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.

Of course I could declare something like this:

class foo{

private:
vector< char > cData;
vector< double > dData;
bool type;
public:
double getData(unsigned i);
};

and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

Without knowing how 'getData' is supposed to be used there is no way
to tell if it fits the requirements (assumed or specified). I can
easily think of a scenario where I'd like to have to members

char getChar(unsigned i) const;
double getDouble(unsigned i) const;

and have them throw an exception if the type doesn't matcht the stored
type of the class.

V
 
F

fabricio.olivetti

Without knowing how 'getData' is supposed to be used there is no way
to tell if it fits the requirements (assumed or specified). I can
easily think of a scenario where I'd like to have to members

char getChar(unsigned i) const;
double getDouble(unsigned i) const;

and have them throw an exception if the type doesn't matcht the stored
type of the class.

V

Let's say getData just returns the element on the 'i'th position of
the vector. And also, let's assume that I want to avoid a check of
what member function to call (don't wanna do a: switch(type) case 0:
getChar(); case 1: getDouble();...)
 
A

AnonMail2005

I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.

Of course I could declare something like this:

class foo{

     private:
          vector< char > cData;
          vector< double > dData;
          bool type;
     public:
          double getData(unsigned i);

};

and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

Regards,
Fabricio

Since you haven't specified how large the dataset can be I would just
store doubles. Worry about space optimization later if it indeed
matters.

HTH
 
V

Victor Bazarov

Let's say getData just returns the element on the 'i'th position of
the vector. And also, let's assume that I want to avoid a check of
what member function to call (don't wanna do a: switch(type) case 0:
getChar(); case 1: getDouble();...)

That's not really a design specificiation. It looks very much like
an implementation detail. Now, the check you are talking about has
to be done inside 'getData' anyway. So, it doesn't really matter
who makes it since it has to be made.

Now, perhaps you will think a bit what the importance of the dual
storage is and what the use of 'getData' is, and then (you don't
really have to tell us) you will have a clearer picture why you
think you need the two vectors and one 'getData' function. Who
stores the vectors? Who sets the 'type'? How are they changed?
And, most importantly, why?

And, if you can help it, please don't quote signatures. Thanks!

V
 
F

fabricio.olivetti

That's not really a design specificiation. It looks very much like
an implementation detail. Now, the check you are talking about has
to be done inside 'getData' anyway. So, it doesn't really matter
who makes it since it has to be made.

Now, perhaps you will think a bit what the importance of the dual
storage is and what the use of 'getData' is, and then (you don't
really have to tell us) you will have a clearer picture why you
think you need the two vectors and one 'getData' function. Who
stores the vectors? Who sets the 'type'? How are they changed?
And, most importantly, why?

And, if you can help it, please don't quote signatures. Thanks!

Hmm let me explain my problem further.
I'm designin a toolbox that holds a collection of clustering
algorithms. So let's say I'm implementing a class for algorithm A,
another one for algorithm B and so on.
I want the data access to be transparent for them (just use something
like data(i,j);) whenever they need a direct access of it.

The data container class will have a function to read it from a file,
to store it in the most effective way and to perform some common
required calculations required by the algorithms.

So basically I built a GUI that gives a user the option to load a data
from a file (and the data class will be constructed), choose one
algorithm and this algorithm will its things with the data.

But, unfotunatelly, I can come accross datasets that can store either
integer values or double values, additionally the data may be dense or
sparse (so I may end up with a sparse matrix representation) and, even
when sparse, the data may be so large that I MUST save memory storing
it with "char" type instead of double!

I guess declaring both types and using it internally with a variable
that holds what type it is must be the only way to deal with it!

thanks
 
V

Victor Bazarov

[..]
Hmm let me explain my problem further.
I'm designin a toolbox that holds a collection of clustering
algorithms. So let's say I'm implementing a class for algorithm A,
another one for algorithm B and so on.
I want the data access to be transparent for them (just use something
like data(i,j);) whenever they need a direct access of it.

The data container class will have a function to read it from a file,
to store it in the most effective way and to perform some common
required calculations required by the algorithms.

So basically I built a GUI that gives a user the option to load a data
from a file (and the data class will be constructed), choose one
algorithm and this algorithm will its things with the data.

But, unfotunatelly, I can come accross datasets that can store either
integer values or double values, additionally the data may be dense or
sparse (so I may end up with a sparse matrix representation) and, even
when sparse, the data may be so large that I MUST save memory storing
it with "char" type instead of double!

I guess declaring both types and using it internally with a variable
that holds what type it is must be the only way to deal with it!

I am guessing you don't mind this tennis match, do you?

I take it that your data class in some cases has to store so much data,
and the data themselves are so imprecise, that storing a 'char' value
is OK and you'd like to do it because you want to save space. Well, it
smells like premature optimization, but if you want it that way, fine.

Now, it does seem that your data storage class is essentially dumb and
does not serve any other purpose except for storage and retrieval. Let
me level with you here. I don't think such class has enough reason to
exist. If your algorithm A (or B, or C) calls for some data to be
lugged around, let the class that handles the data also *store it*.

It makes no sense that the data are stored separately from where they
are processed. If your reading class knows how to read 'char' (or some
other type), fine. But let the processing class allocate the needed
buffer and pass it to the reader for populating with values from the
file.

If you let the processing object hold its own data, then you simply
declare different data types in your different processing classes.
Algorithm A would have 'double', so would Algorithm B, but Algorithm
C, for instance, would have 'char'. No need to extract this into
a separate class and torture yourself trying to squeeze two types
where even one doesn't feel comfortable.

V
 
F

fabricio.olivetti

[..]
Hmm let me explain my problem further.
I'm designin a toolbox that holds a collection of clustering
algorithms. So let's say I'm implementing a class for algorithm A,
another one for algorithm B and so on.
I want the data access to be transparent for them (just use something
like data(i,j);) whenever they need a direct access of it.
The data container class will have a function to read it from a file,
to store it in the most effective way and to perform some common
required calculations required by the algorithms.
So basically I built a GUI that gives a user the option to load a data
from a file (and the data class will be constructed), choose one
algorithm and this algorithm will its things with the data.
But, unfotunatelly, I can come accross datasets that can store either
integer values or double values, additionally the data may be dense or
sparse (so I may end up with a sparse matrix representation) and, even
when sparse, the data may be so large that I MUST save memory storing
it with "char" type instead of double!
I guess declaring both types and using it internally with a variable
that holds what type it is must be the only way to deal with it!

I am guessing you don't mind this tennis match, do you?

I take it that your data class in some cases has to store so much data,
and the data themselves are so imprecise, that storing a 'char' value
is OK and you'd like to do it because you want to save space. Well, it
smells like premature optimization, but if you want it that way, fine.

Now, it does seem that your data storage class is essentially dumb and
does not serve any other purpose except for storage and retrieval. Let
me level with you here. I don't think such class has enough reason to
exist. If your algorithm A (or B, or C) calls for some data to be
lugged around, let the class that handles the data also *store it*.

It makes no sense that the data are stored separately from where they
are processed. If your reading class knows how to read 'char' (or some
other type), fine. But let the processing class allocate the needed
buffer and pass it to the reader for populating with values from the
file.

If you let the processing object hold its own data, then you simply
declare different data types in your different processing classes.
Algorithm A would have 'double', so would Algorithm B, but Algorithm
C, for instance, would have 'char'. No need to extract this into
a separate class and torture yourself trying to squeeze two types
where even one doesn't feel comfortable.

I don't mind at all :)

You misunderstood some points of what my program will do.

First, imagine that the dataset is a large matrix of arbitrary values.
Some datasets hold small integer values on the range of [0; 5] (it's
not that storing a char is ok, it's just what I need), others may hold
double values on the range of [-10.0; 10.0]. There is some special
datasets that is extremely huge, and using double prevents these from
loading into memory (HUUUUUUUUUUGE). When using "char" I can load
those datasets, tho.

This data class must be shared among all the algorithms, there's no
sense in loading the dataset each time I run a different algorithm
(they all perform the same task, i.e., cluster the data). I must be
able to compare the performance among them, so it's most likely that
I'll run two or more algorithms on the same dataset.

The rational on operation is something like this: load data, run each
algorithm one at a time, compare results.
 
K

Kira Yamato

I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.

Of course I could declare something like this:

class foo{

private:
vector< char > cData;
vector< double > dData;
bool type;
public:
double getData(unsigned i);
};

and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

Regards,
Fabricio

// Try Boost::Variant. <http://www.boost.org/doc/html/variant.html>

#include <vector>
#include <iostream>
#include "boost/variant.hpp"

int main()
{
using namespace std;
using namespace boost;

// This allows a vector of mixed values, char or double.
vector<variant<char, double> > a;

a.push_back('A');
a.push_back(3.5);

cout << "a[0]=" << get<char>(a[0]) << endl;
cout << "a[1]=" << get<double>(a[1]) << endl;

return 0;
}
 
M

Michael Downton

I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.

Of course I could declare something like this:

class foo{

private:
vector< char > cData;
vector< double > dData;
bool type;
public:
double getData(unsigned i);
};

and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

Regards,
Fabricio

First don't strain a class with having to play two roles (even if
similar). Make an interface, make two seperate classes one for
optimized storage, one for plain. If you don't want virtual functions
make the interface implicit and pass the class as a template argument
to the algorithms. (toss up between general, and specific).

Though if the data sets are that hugh, most of your data will be left
in swap anyway. You'd be better off working the algorithms to make
intellegent use of smaller buffers, and sorting the data into regions
to increase locality. That way (if the algorithms are intellegent)
misses on your active set of data is rarer. I might even go so far as
to setup a list of tasks that each algorithm performs. then if a cache
miss does happen (note you'll need to keep track of what is and isn't
loaded yourself), that specific task could be paused. Of course that
last depends on your algorithms.

cheers
Michael
 
K

Kai-Uwe Bux

I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.

Of course I could declare something like this:

class foo{

private:
vector< char > cData;
vector< double > dData;
bool type;
public:
double getData(unsigned i);
};

and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

The most elegant and probably also most efficient way is to not solve the
problem. Just store doubles as doubles instead of converting back and forth
on the fly.

As for how you _could_ go about the problem of representing different data
transparently, you might have a look at this:


#include <vector>
#include <cassert>
#include <iostream>
#include <memory>

class foo {
public:

typedef std::size_t size_type;

private:

struct base {

virtual
void push_back ( double ) = 0;

virtual
double get ( size_type ) const = 0;

virtual
base * clone ( void ) const = 0;

virtual
~base ( void ) {}

};

template < typename T >
struct node : public base {

std::vector< T > the_data;

void push_back ( double d ) {
the_data.push_back( d );
}

double get ( size_type n ) const {
assert( n < the_data.size() );
return ( the_data[n] );
}

base * clone ( void ) const {
return new node ( *this );
}

};

base * node_ptr;

public:

foo ( bool use_double = true )
: node_ptr ()
{
std::auto_ptr< base > dummy
( use_double ? new node<double> () : new node<char> () );
// put your filling routine here:
for ( unsigned i = 0; i < 20; ++i ) {
dummy->push_back( 100 );
}
node_ptr = dummy.release();
}

double getData ( size_type n ) const {
return ( node_ptr->get( n ) );
}

foo ( foo const & other )
: node_ptr ( other.node_ptr->clone() )
{}

~foo ( void ) {
delete ( node_ptr );
}

};

int main ( void ) {
foo f;
std::cout << f.getData( 5 ) << '\n';
}


Best

Kai-Uwe Bux
 
G

Gerhard Fiedler

First, imagine that the dataset is a large matrix of arbitrary values.
Some datasets hold small integer values on the range of [0; 5] (it's not
that storing a char is ok, it's just what I need), others may hold
double values on the range of [-10.0; 10.0]. There is some special
datasets that is extremely huge, and using double prevents these from
loading into memory (HUUUUUUUUUUGE). When using "char" I can load those
datasets, tho.

This data class must be shared among all the algorithms, there's no
sense in loading the dataset each time I run a different algorithm (they
all perform the same task, i.e., cluster the data). I must be able to
compare the performance among them, so it's most likely that I'll run
two or more algorithms on the same dataset.

The rational on operation is something like this: load data, run each
algorithm one at a time, compare results.

Without getting into the details, I guess you can store them in your
optimized vectors, but return a union from getData() (or a struct of a type
identifier plus that union, if that's necessary).

Gerhard
 
F

fabricio.olivetti

I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.
Of course I could declare something like this:
class foo{
private:
vector< char > cData;
vector< double > dData;
bool type;
public:
double getData(unsigned i);
};
and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

The most elegant and probably also most efficient way is to not solve the
problem. Just store doubles as doubles instead of converting back and forth
on the fly.

As for how you _could_ go about the problem of representing different data
transparently, you might have a look at this:

#include <vector>
#include <cassert>
#include <iostream>
#include <memory>

class foo {
public:

typedef std::size_t size_type;

private:

struct base {

virtual
void push_back ( double ) = 0;

virtual
double get ( size_type ) const = 0;

virtual
base * clone ( void ) const = 0;

virtual
~base ( void ) {}

};

template < typename T >
struct node : public base {

std::vector< T > the_data;

void push_back ( double d ) {
the_data.push_back( d );
}

double get ( size_type n ) const {
assert( n < the_data.size() );
return ( the_data[n] );
}

base * clone ( void ) const {
return new node ( *this );
}

};

base * node_ptr;

public:

foo ( bool use_double = true )
: node_ptr ()
{
std::auto_ptr< base > dummy
( use_double ? new node<double> () : new node<char> () );
// put your filling routine here:
for ( unsigned i = 0; i < 20; ++i ) {
dummy->push_back( 100 );
}
node_ptr = dummy.release();
}

double getData ( size_type n ) const {
return ( node_ptr->get( n ) );
}

foo ( foo const & other )
: node_ptr ( other.node_ptr->clone() )
{}

~foo ( void ) {
delete ( node_ptr );
}

};

int main ( void ) {
foo f;
std::cout << f.getData( 5 ) << '\n';

}

Thanks! That's the way I'll go!
Thank you all!
 
D

Daniel T.

I am designing a class to read a data file and provide access to
another class, but as this dataset may contain either double or int
values, and some of them may be very large I'd like to create a class
that can decide upon allocating a vector of "char" (for the integral
values may be enough) or a vector of double.
But I can't see how can I do that...using templates I still have to
determine, prior the class declaration, which type this class will
hold.

Of course I could declare something like this:

class foo{

private:
vector< char > cData;
vector< double > dData;
bool type;
public:
double getData(unsigned i);
};

and always return a double (casting the char when required) and using
just the required data type using a flag to determine which type is
used by the class.
How would be the most elegant and optimized way of doing that?

If all the elements of the dataset can be the same type, (once you
determine which type it can be) then:

class foo {
public:
virtual ~foo() { }
virtual double getData(unsigned i) = 0;
};

class CharFoo {
vector< char > data;
public:
virtual double getData(unsigned i);
};

class DoubleFoo {
vector< char > data;
public:
virtual double getData(unsigned i);
};

If all of the elements can't be the same type (for example if you can
only represent some of the doubles as chars in a particular dataset,)
then store them all as doubles because the amount of memory you would
need to keep track of which ones were chars and which were doubles would
overwhelm the memory you would be saving by compressing the storage to
begin with.

Also, if your datasets are so big, maybe you should use deque instead of
vector.
 
J

Jerry Coffin

(e-mail address removed)>, (e-mail address removed)
says...

[ ... ]
Hmm let me explain my problem further.
I'm designin a toolbox that holds a collection of clustering
algorithms. So let's say I'm implementing a class for algorithm A,
another one for algorithm B and so on.
I want the data access to be transparent for them (just use something
like data(i,j);) whenever they need a direct access of it.

The data container class will have a function to read it from a file,
to store it in the most effective way and to perform some common
required calculations required by the algorithms.

So basically I built a GUI that gives a user the option to load a data
from a file (and the data class will be constructed), choose one
algorithm and this algorithm will its things with the data.

If you want to use the method you originally outlined, I'd carry it out
via inheritance:

class base_container {
public:
virtual double getData(usigned i) = 0;
};

class char_container {
vector<char> data;
public:
double getData(unsigned i) { return data; }
};

class double_container {
vector<double> data;
public:
double getData(unsigned i) { return data; }
};

base_container *data;

if (file_contains_chars) {
data = new char_container;
read_chars(file, data);
}
else {
data = new double_container;
read_doubles(file, data);
}

switch (cluster_method) {
case KM: k_means_cluster(data); break;
case SOM: self_ordered_map(data); break;
};
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,227
Latest member
Daniella65

Latest Threads

Top