management of many small objects

Discussion in 'C++' started by Bo Peng, Jul 3, 2004.

  1. Bo Peng

    Bo Peng Guest

    Dear List,

    It is not clear what the title means. :) Here is the details:

    I need to manage a big bunch of small objects, like

    struct{
    int a;
    int b[size_undetermined];
    }obj;

    I do not want to use the usual int*, new, delete method since
    1. a million new and delete would be slow
    2. I need to access elements of b cross all objects. There might be
    better ways to loop through obj.b[2].
    3. the size of b is usually small so the overhead of an additional int *
    is too big.

    I am currently using vector<int>( num_objects*(1+size_of_b) ) to store
    the objects and vector<int>::iterator to access elements. But the
    problems are:

    1. I would like to pass reference of obj to other functions but I only
    have vector<int>::iterator, not vector<obj>::iter.
    2. If I would like to expand obj (e.g. adding another variable), my code
    has to be rewritten because of index changes.

    Is there a better way to do it? Can I somehow create a dynamic type
    struct{ int a, int b[size]; } and apply to the existing vector?
    boost::memorypool seems to be a good idea but there is still a waste of
    memory for using pointers.

    Many thanks in advance.
    Bo
    Bo Peng, Jul 3, 2004
    #1
    1. Advertising

  2. Bo Peng

    Mike Wahler Guest

    "Bo Peng" <> wrote in message
    news:cc4ss6$ghc$...
    > Dear List,
    >
    > It is not clear what the title means. :) Here is the details:
    >
    > I need to manage a big bunch of small objects, like
    >
    > struct{


    I'll give your type a name for use below

    struct T {

    > int a;
    > int b[size_undetermined];


    If the size can not be determined at compile time,
    then you can't use this syntax. You'll have to
    use a pointer and dynamic allocation. However,
    I suggest using a container here.

    > }obj;
    >
    > I do not want to use the usual int*, new, delete method since
    > 1. a million new and delete would be slow
    > 2. I need to access elements of b cross all objects. There might be
    > better ways to loop through obj.b[2].
    > 3. the size of b is usually small so the overhead of an additional int *
    > is too big.


    Well, an int is often the same size as a pointer, so there's not
    that much overhead unless your array usually has only zero to three
    elements or so.

    >
    > I am currently using vector<int>( num_objects*(1+size_of_b) )



    Huh? Why? Why not simply:

    vector<struct T>(num_objects);


    > to store
    > the objects and vector<int>::iterator


    You're not storing any object of your struct type, just ints.

    Don't store an iterator in a container. There's no need.

    >to access elements. But the
    > problems are:
    >
    > 1. I would like to pass reference of obj to other functions but I only
    > have vector<int>::iterator, not vector<obj>::iter.


    'obj' is not a type name, so you can't do that. But again,
    why not:

    vector<struct T>


    > 2. If I would like to expand obj (e.g. adding another variable), my code
    > has to be rewritten because of index changes.


    Adding another member to your struct won't change any indexes,
    it will only increase the size of your struct.

    >
    > Is there a better way to do it? Can I somehow create a dynamic type
    > struct{ int a, int b[size]; }


    struct T
    {
    int a;
    std::vector<int> b;
    T(std::vector<int>::size_type size) : b(size) {}
    };

    int main()
    {
    T cont(how_many);
    return 0;
    }

    > and apply to the existing vector?
    > boost::memorypool seems to be a good idea but there is still a waste of
    > memory for using pointers.
    >
    > Many thanks in advance.
    > Bo


    -Mike
    Mike Wahler, Jul 3, 2004
    #2
    1. Advertising

  3. Bo Peng

    Bo Peng Guest

    Hi, Mike,

    Thanks for your reply. I certainly know all the grammar details and what
    I showed was just some kind of psudo-code.

    The core of the problem is that I will have a million or so small
    objects with some fixed elements and an array. Most of the data is
    actually unsigned char (not int) so a pointer is 4 times bigger than it.


    Suppose the length of b is 8 and there will be 1000 objects. The actual
    datasize is (4+8)*1000=12000. Compare:

    solution 1: Most convenient.

    The implmentation of vector in gcc uses three pointers.

    struct T{
    int a; // 4 bytes
    vector<unsigned char> b; // 3 pointers + 8 real data = 20 bytes
    };

    vector<T> myarray(size) will use 1000 new and delete (unless STL uses
    something else in their allocator) + 24*1000 memory usage (50% waste).

    Solution 2: Medium difficulty.

    struct T{
    int a; // 4 bytes
    unsigned char * b; // b = new unsigned char[8] , 12 bytes
    }

    vector<T> myarray(size) will use 1000 new and delete + 16*1000 memory
    usage ( 25% waste )

    Solution 3: Most difficult to control. Putting all data in a big block.

    const int blocksize = sizeof(int)/sizeof(unsigned char) + 8;
    vector<unsigned char> ab( blocksize*size)

    2 new and delete + 12*1000 memory usage (0% waste)

    Because of the nature of the program (memory hungry), I can not tolerate
    50% memory waste. I dislike solution 2 since there are too many new and
    deletes and I guess

    for (int i=0; i< 1000; i++)
    myarray.b[2] = 2 // de-reference two pointers each time.

    is slower than that in solution 3.

    for (int i=0; i<1000*blocksize; i+=blocksize)
    ab = 2;


    However, managing a block of memory without data structure is really
    painful so what I was asking is (almost without hope) a way to apply
    data structure to a block of data. For example

    T* myobj = reinterpret_cast<T*> (&ab) ;
    myobj->a = 5;
    myobj->b[2] = 2; // or someway else

    or even better
    vector<T>::iterator it = (some kind of cast of) ab.begin();

    Of course type T here can not be defined since the size of T can not be
    pre-determined......

    There are other solutions hanging around (user-defined fixed-size
    vector, memory pool) but I have not find one with acceptable trade-off
    between memory-usage, performance (less new/delete, quick data access)
    and easy-to-use ...

    Thanks.

    Bo
    Bo Peng, Jul 3, 2004
    #3
  4. "Bo Peng" <> wrote:

    > struct{
    > int a;
    > int b[size_undetermined];
    > }obj;


    Yuck. That's not a very good construct. I think the
    following kind of thing will work much better for you:

    // MyFile.cpp:

    #include <iostream>
    #include <vector>

    using std::vector;

    struct MyStruct
    {
    int a;
    vector<int> b;
    };

    int main(void)
    {
    using std::cout;
    using std::endl;

    // InitSize is initial size of container of objects:
    const int InitSize = 10;

    vector<MyStruct> Objs (InitSize);

    { // Limit scope
    MyStruct Splat; // Temporary object
    Splat.a = 13;
    Splat.b.push_back(22);
    Splat.b.push_back(10);
    Objs[0] = Splat;
    } // Splat is uncreated here

    { // Limit scope
    MyStruct Splat; // Temporary object
    Splat.a = 97;
    Splat.b.push_back(33);
    Splat.b.push_back(84);
    Splat.b.push_back(27);
    Objs[1] = Splat;
    } // Splat is uncreated here

    vector<MyStruct>::size_type i; // Index to Objs
    vector<int>::iterator j; // Iterator to elements of b

    for (i = 0; i <2; ++i)
    {
    cout << "Object Number: " << i << endl;
    cout << "Value of a: " << Objs.a << endl;
    cout << "Elements of b:" << endl;
    for (j = Objs.b.begin(); j != Objs.b.end(); ++j)
    {
    cout << (*j) << endl;
    }
    cout << endl << endl;
    }
    return 0;
    }

    Of course, you'll need to handle issues such as whether to
    start with empty or pre-sized containers, and whether to use
    vectors or some other containers. If you change sizes
    or insert/delete elements often, then a list or deque may work
    better. Depends on the details of your application.

    Hope the above gives you some useful ideas.

    --
    Cheers,
    Robbie Hatley
    Tustin, CA, USA
    email: lonewolfintj at pacbell dot net
    web: home dot pacbell dot net slant earnur slant






    ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
    http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
    ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
    Robbie Hatley, Jul 3, 2004
    #4
  5. * Bo Peng:
    > Dear List,
    >
    > It is not clear what the title means. :) Here is the details:
    >
    > I need to manage a big bunch of small objects, like
    >
    > struct{
    > int a;
    > int b[size_undetermined];
    > }obj;
    >
    > I do not want to use the usual int*, new, delete method since
    > 1. a million new and delete would be slow
    > 2. I need to access elements of b cross all objects. There might be
    > better ways to loop through obj.b[2].
    > 3. the size of b is usually small so the overhead of an additional int *
    > is too big.


    I think you should regard the above struct as an _external
    representation_ (serialization) of an object, not as the object itself.

    That means essentially using the FlyWeight pattern (Google).



    > I am currently using vector<int>( num_objects*(1+size_of_b) ) to store
    > the objects and vector<int>::iterator to access elements.


    Goodie.


    > But the problems are:
    >
    > 1. I would like to pass reference of obj to other functions but I only
    > have vector<int>::iterator, not vector<obj>::iter.


    Construct FlyWeight data from reference to data, pass that object
    around.



    > 2. If I would like to expand obj (e.g. adding another variable), my code
    > has to be rewritten because of index changes.


    Yes, that is a problem. Proper solution depends on access patterns.
    For example, if you only use sequential access (forward and backward)
    then the cursor gap technique (Google, if no ref ask again here) would
    be ideal. Cursor gap also allows random read/write access but not for
    insertion, which is what expanding data entails. If you need random
    access with insertion then a B-tree (Google) seems to be indicated. But
    this is not a C++ problem, so please do ask further questions in a more
    appropriate groups such as e.g. [comp.programming].

    By the way, nowadays "a million" is not very large. Even a "Hello,
    world!" program of 512 byte (byte, not KiB) executable might use a few
    MiB's of memory when loaded. Have you had a look at the small-object
    allocator in Andrei's "Modern C++ Design"?

    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
    Alf P. Steinbach, Jul 3, 2004
    #5
  6. On Fri, 02 Jul 2004 22:18:09 -0500, Bo Peng <> wrote:

    > Hi, Mike,
    >
    > Thanks for your reply. I certainly know all the grammar details and what
    > I showed was just some kind of psudo-code.
    >
    > The core of the problem is that I will have a million or so small
    > objects with some fixed elements and an array. Most of the data is
    > actually unsigned char (not int) so a pointer is 4 times bigger than it.
    > Suppose the length of b is 8 and there will be 1000 objects. The actual
    > datasize is (4+8)*1000=12000. Compare:
    >
    > solution 1: Most convenient.
    >
    > The implmentation of vector in gcc uses three pointers.
    >
    > struct T{
    > int a; // 4 bytes
    > vector<unsigned char> b; // 3 pointers + 8 real data = 20 bytes
    > };
    >
    > vector<T> myarray(size) will use 1000 new and delete (unless STL uses
    > something else in their allocator) + 24*1000 memory usage (50% waste).
    >
    > Solution 2: Medium difficulty.
    >
    > struct T{
    > int a; // 4 bytes
    > unsigned char * b; // b = new unsigned char[8] , 12 bytes
    > }
    >
    > vector<T> myarray(size) will use 1000 new and delete + 16*1000 memory
    > usage ( 25% waste )
    >
    > Solution 3: Most difficult to control. Putting all data in a big block.
    >
    > const int blocksize = sizeof(int)/sizeof(unsigned char) + 8;
    > vector<unsigned char> ab( blocksize*size)
    >
    > 2 new and delete + 12*1000 memory usage (0% waste)
    >
    > Because of the nature of the program (memory hungry), I can not tolerate
    > 50% memory waste. I dislike solution 2 since there are too many new and
    > deletes and I guess
    >
    > for (int i=0; i< 1000; i++)
    > myarray.b[2] = 2 // de-reference two pointers each time.
    >
    > is slower than that in solution 3.
    >
    > for (int i=0; i<1000*blocksize; i+=blocksize)
    > ab = 2;
    >
    >
    > However, managing a block of memory without data structure is really
    > painful so what I was asking is (almost without hope) a way to apply
    > data structure to a block of data. For example
    >
    > T* myobj = reinterpret_cast<T*> (&ab) ;
    > myobj->a = 5;
    > myobj->b[2] = 2; // or someway else
    >
    > or even better
    > vector<T>::iterator it = (some kind of cast of) ab.begin();
    >
    > Of course type T here can not be defined since the size of T can not be
    > pre-determined......


    When I first read your post I assumed that the size of the block varied
    between objects, but since you seem to be putting all these objects in a
    vector that doesn't seem to be the case. Niether does it seem that
    blocksize only varies each time you compile the code, because if it did
    you would have a simple template based solution.

    So I'm assuming that blocksize is a run time variable, but is fixed for
    all the objects within a particular run. If so then the following
    non-standard but highly portable code would seem to do what you want.

    struct FakeObject
    {
    int a;
    unsigned char b[1];
    };

    // allocate 1000 objects
    size_t blocksize = 8;
    size_t objsize = sizeof(int) + blocksize;
    vector<unsigned char> ab(1000*objsize);

    // get object 100 (say)
    FakeObject* myobj = reinterpret_cast<FakeObject*>(&ab[100*objsize]);
    myobj->b[2] = 1;

    No doubt others can tell you the many ways in which this breaks the
    C++ standard but hopefully you are more interested in the fact that it
    works.

    john
    John Harrison, Jul 3, 2004
    #6
  7. Bo Peng

    Bo Peng Guest


    > So I'm assuming that blocksize is a run time variable, but is fixed for
    > all the objects within a particular run.


    Exactly. I should have been clearer about this.

    > if so then the following non-standard but highly portable code
    > would seem to do what you want.
    >
    > struct FakeObject
    > {
    > int a;
    > unsigned char b[1];
    > };
    >
    > // allocate 1000 objects
    > size_t blocksize = 8;
    > size_t objsize = sizeof(int) + blocksize;
    > vector<unsigned char> ab(1000*objsize);
    >
    > // get object 100 (say)
    > FakeObject* myobj = reinterpret_cast<FakeObject*>(&ab[100*objsize]);
    > myobj->b[2] = 1;


    This indeed partially works. It would be better if myobj has the correct
    size. I.e. myobj++ can work as expected.

    So, how about the following code?

    class ObjIterator{
    unsigned char * ptr;
    size_t objsize;
    public:
    // return FakeObject * for *ObjIterator
    FakeObject* operator * (){
    return reinterpret_cast<FakeObject*>(ptr);
    }

    // advance
    ObjIterator& operator++(int){ ptr+= objsize; return *this; }

    // and other ++, --(int), --, +=, -=, ==, <=, >=... chore.
    }

    for(ObjIterator it= ( a begin() function) ; it != ( a end() function); it++)
    it->b[2] = 2;

    I would however test

    1. if this is much slower than

    for(i=5; i<... ; i+=objsize)
    ab = 2;

    With everything inline, they hopefully have similar performance.

    2. if STL algorithms work as expected. I mean, if

    ObjIterator it = ab.begin();
    copy( it, it+3, destination)

    will copy 3 blocks of data correctly.


    About other replies:

    I read something about flyweight pattern. It is a wonderful idea but
    does not fit in my case since my objects have identities even orders.
    Also, accessing an object through a key and a map container might be
    very slow.

    It seems that I am approaching a good solution. Thanks!
    Bo
    Bo Peng, Jul 3, 2004
    #7
  8. >
    > This indeed partially works. It would be better if myobj has the correct
    > size. I.e. myobj++ can work as expected.
    >
    > So, how about the following code?
    >
    > class ObjIterator{
    > unsigned char * ptr;
    > size_t objsize;
    > public:
    > // return FakeObject * for *ObjIterator
    > FakeObject* operator * (){
    > return reinterpret_cast<FakeObject*>(ptr);
    > }


    I think it should be this

    FakeObject* operator->(){
    return reinterpret_cast<FakeObject*>(ptr);
    }
    FakeObject& operator*(){
    return *reinterpret_cast<FakeObject*>(ptr);
    }

    otherwise you get some ugly syntax when you use your iterator

    (*i)->b[2] = 2;

    instead of

    i->b[2] = 2;
    (*i).b[2] = 2;


    >
    > // advance
    > ObjIterator& operator++(int){ ptr+= objsize; return *this; }
    >
    > // and other ++, --(int), --, +=, -=, ==, <=, >=... chore.
    > }
    >
    > for(ObjIterator it= ( a begin() function) ; it != ( a end() function);
    > it++)
    > it->b[2] = 2;
    >
    > I would however test
    >
    > 1. if this is much slower than
    >
    > for(i=5; i<... ; i+=objsize)
    > ab = 2;
    >
    > With everything inline, they hopefully have similar performance.
    >
    > 2. if STL algorithms work as expected. I mean, if
    >
    > ObjIterator it = ab.begin();
    > copy( it, it+3, destination)
    >
    > will copy 3 blocks of data correctly.
    >


    No, I don't think that will work. You have to make some compromises
    somewhere.

    john
    John Harrison, Jul 3, 2004
    #8
  9. Bo Peng

    Bo Peng Guest

    John Harrison wrote:

    >> 2. if STL algorithms work as expected. I mean, if
    >>
    >> ObjIterator it = ab.begin();
    >> copy( it, it+3, destination)
    >>
    >> will copy 3 blocks of data correctly.
    >>

    >
    > No, I don't think that will work. You have to make some compromises
    > somewhere.


    I checked the implementation of copy, it uses something like
    *it=*destination,
    maybe I can create a correct copy constructor and/or operator= for
    FakeObject.

    // global, size of object
    int objsize;

    class FakeObj{
    public:
    int a;
    int b[1];

    FakeObj& operator=(FakeObj& rhs){
    a = rhs.a;
    memcpy(b, rhs.b, objsize);
    return *this;
    }

    FakeObj(FakeObj& rhs){
    a = rhs.a;
    memcpy(b, rhs.b, objsize);
    }
    }

    Again, no check for grammar.

    Things are getting more and more complicated but this is no more work
    than the copy constructor of Obj if we use unsigned char * and
    new/delete. Anyway, fake is fake. FakeObject might still fail in some
    way. :-(

    Bo
    Bo Peng, Jul 3, 2004
    #9
  10. On Sat, 03 Jul 2004 11:04:51 -0500, Bo Peng <> wrote:

    > John Harrison wrote:
    >
    >>> 2. if STL algorithms work as expected. I mean, if
    >>>
    >>> ObjIterator it = ab.begin();
    >>> copy( it, it+3, destination)
    >>>
    >>> will copy 3 blocks of data correctly.
    >>>

    >> No, I don't think that will work. You have to make some compromises
    >> somewhere.

    >
    > I checked the implementation of copy, it uses something like
    > *it=*destination,
    > maybe I can create a correct copy constructor and/or operator= for
    > FakeObject.
    >
    > // global, size of object
    > int objsize;
    >
    > class FakeObj{
    > public:
    > int a;
    > int b[1];
    >
    > FakeObj& operator=(FakeObj& rhs){
    > a = rhs.a;
    > memcpy(b, rhs.b, objsize);
    > return *this;
    > }
    >
    > FakeObj(FakeObj& rhs){
    > a = rhs.a;
    > memcpy(b, rhs.b, objsize);
    > }
    > }
    >
    > Again, no check for grammar.
    >
    > Things are getting more and more complicated but this is no more work
    > than the copy constructor of Obj if we use unsigned char * and
    > new/delete. Anyway, fake is fake. FakeObject might still fail in some
    > way. :-(
    >
    > Bo


    That looks OK, but now the compromise is the global variable (and the fact
    that we've left standard C++ far behind).

    john
    John Harrison, Jul 3, 2004
    #10
  11. Bo Peng

    Bo Peng Guest

    John Harrison wrote:

    > That looks OK, but now the compromise is the global variable (and the
    > fact that we've left standard C++ far behind).


    Yeap. I am not happy about this either. All these non-standard things
    will make understanding of my code a nightmare to others... So,
    motivated by Loki's SmallObjectAllocator, the idea is now:

    1. objects do use a pointer

    struct obj{
    int a;
    unsigned char * b;
    obj();
    }

    2. do not use the usual new unsigned char [8]. Rather, use
    new(address) unsigned char[8] where address point to a big chunk of
    memory of size 8 * number_of_objects. For example:

    struct Pool{
    vector<unsigned char> data;
    int cur; // current unused.

    vecrot<unsigned char>::iterator begin(){
    return data.begin();
    }

    vecrot<unsigned char>::iterator end(){
    return data.end();
    }

    Pool(size_t size):data(size),cur(0){};

    unsigned char * malloc(int blocksize){
    int tmp = cur;
    cur += blocksize;
    return &(data[tmp]);
    }
    }

    Pool pool(8*1000);

    When allocating an object:

    obj::eek:bj(){
    b = pool.malloc(8);
    }

    or something like:

    unsigned char * ptr = pool.malloc(8);
    b = new(ptr) unsigned char[8];


    3. Access the elements in two ways:

    for(int i=0; i< 1000; i++)
    objects.b[2] = 2;

    or directly through Pool

    for( vector<unsigned char>::iteator it=pool.begin()+2; it <
    pool.end()+2; it+=8)
    * it = 2;

    The latter should be faster. It is also possible to create slicing
    vector out of Pool.data.

    This looks like a good tradeoff. The interface is now clean and quick;
    there is only one pointer to waste and there is only one new operator to
    call.....


    Potential improvements:
    1. use template to generalize the idea to more complex objs.
    (boost::pool seems to be complicated for my purpose.)
    2. Use Loki's SmallObjectAllocator. His implementation might be better.
    3. Integrate SmallObjectAllocator into STL and keep all these behind the
    scene.

    C++ is hard.
    Bo
    Bo Peng, Jul 3, 2004
    #11
  12. Bo Peng wrote:
    >
    > Hi, Mike,
    >
    > Thanks for your reply. I certainly know all the grammar details and what
    > I showed was just some kind of psudo-code.
    >
    > The core of the problem is that I will have a million or so small
    > objects with some fixed elements and an array.


    Get yourself a copy of
    Scott Meyers 'Effective C++'

    In Item 10 he describes exactly what you seem to be after:
    custom versions of operator new and delete.
    In short: When new is called, it allocates not a single object but
    a whole array of such objects and hands out pointers into the array,
    when needed. Only when this pool of objects is used up, the customized
    version of new walks through the hassle of system memory management
    again by allocating another pool of objects.

    --
    Karl Heinz Buchegger
    Karl Heinz Buchegger, Jul 5, 2004
    #12
  13. Karl Heinz Buchegger wrote:
    >
    > Bo Peng wrote:
    > >
    > > Hi, Mike,
    > >
    > > Thanks for your reply. I certainly know all the grammar details and what
    > > I showed was just some kind of psudo-code.
    > >
    > > The core of the problem is that I will have a million or so small
    > > objects with some fixed elements and an array.

    >
    > Get yourself a copy of
    > Scott Meyers 'Effective C++'
    >
    > In Item 10 he describes exactly what you seem to be after:
    > custom versions of operator new and delete.
    > In short: When new is called, it allocates not a single object but
    > a whole array of such objects and hands out pointers into the array,
    > when needed. Only when this pool of objects is used up, the customized
    > version of new walks through the hassle of system memory management
    > again by allocating another pool of objects.
    >


    Sorry. After reading the follo ups, it seems you are
    heading for a different thing.

    --
    Karl Heinz Buchegger
    Karl Heinz Buchegger, Jul 5, 2004
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Floris van Haaster

    Project management / bug management

    Floris van Haaster, Sep 23, 2005, in forum: ASP .Net
    Replies:
    3
    Views:
    1,237
    Jon Paal
    Sep 23, 2005
  2. pouet
    Replies:
    2
    Views:
    752
    Will Hartung
    Jul 30, 2004
  3. Agent Mulder

    Many Small Components or Few Bigger?

    Agent Mulder, Aug 18, 2004, in forum: Java
    Replies:
    3
    Views:
    336
    marcus
    Aug 19, 2004
  4. Replies:
    2
    Views:
    518
    Keith Thompson
    Oct 23, 2008
  5. oeyvind toft

    Many small images

    oeyvind toft, Dec 5, 2004, in forum: Javascript
    Replies:
    12
    Views:
    136
    oeyvind toft
    Dec 7, 2004
Loading...

Share This Page