How much overhead for a new templated class?

Discussion in 'C++' started by jacob navia, May 21, 2012.

  1. jacob navia

    jacob navia Guest

    I am writing a containers library in C. The generic list container
    adds for a new type:

    code: 2495 bytes
    data: 488 bytes

    for a list container containing around 60 entry points.

    This means that you will pay 2983 bytes for each instatiation
    of the list template (one of the biggest around)

    I would be interested to know the corresponding vaue of the
    C++ stl. How many bytes of code it will cost you to instantiate
    a list template for a new type?

    Thanks in advance.
     
    jacob navia, May 21, 2012
    #1
    1. Advertising

  2. jacob navia

    Ian Collins Guest

    On 05/22/12 08:37 AM, jacob navia wrote:
    > I am writing a containers library in C. The generic list container
    > adds for a new type:
    >
    > code: 2495 bytes
    > data: 488 bytes
    >
    > for a list container containing around 60 entry points.
    >
    > This means that you will pay 2983 bytes for each instatiation
    > of the list template (one of the biggest around)
    >
    > I would be interested to know the corresponding vaue of the
    > C++ stl. How many bytes of code it will cost you to instantiate
    > a list template for a new type?\


    I'm sure you can check that for yourself, but with g++, about 1K to
    instantiate a list<int> and push_back a value.

    --
    Ian Collins
     
    Ian Collins, May 21, 2012
    #2
    1. Advertising

  3. jacob navia

    Dombo Guest

    Op 21-May-12 22:37, jacob navia schreef:
    > I am writing a containers library in C. The generic list container
    > adds for a new type:
    >
    > code: 2495 bytes
    > data: 488 bytes
    >
    > for a list container containing around 60 entry points.
    >
    > This means that you will pay 2983 bytes for each instatiation
    > of the list template (one of the biggest around)
    >
    > I would be interested to know the corresponding vaue of the
    > C++ stl. How many bytes of code it will cost you to instantiate
    > a list template for a new type?


    That depends on the compiler, compiler settings, the type for which the
    list template is instantiated and the other types are instantiated.

    For example:

    #include <list>
    #include <string>

    class A
    {
    public:
    A(): foo(0) {}
    long foo;
    };

    class B
    {
    public:
    B() {}
    std::string bar;
    };

    int main()
    {
    std::list<void*> lpv;
    std::list<A*> lpa;
    std::list<B*> lpb;
    std::list<A> la;
    std::list<B> lb;

    lpv.push_back(new A);
    lpa.push_back(new A);
    lpb.push_back(new B);
    la.push_back(A());
    lb.push_back(B());

    return 0;
    }


    If you compile this code Visual C++ 2010 with optimizations enabled all
    push_back() method calls result in the same code being called regardless
    of the list type (in other words the code is not replicated for each
    template instantiation). The code for the constructors is also shared
    between std::list<void*>, std::list<A*>, std::list<B*> and std::list<A>,
    but not for std::list<B>; std::list<B> has its own constructor code.

    My observation is that template instantiation for pointer types often
    share code with template instantiations for other pointer types. I.e.
    instantiating a list template with another pointer type in itself does
    not add code. With non-pointer types only some parts, if any, are shared
    between template instantiations. Of course YMMV because of all the
    variables involved that can affect the outcome.
     
    Dombo, May 21, 2012
    #3
  4. jacob navia

    Ian Collins Guest

    On 05/22/12 08:37 AM, jacob navia wrote:
    > I am writing a containers library in C. The generic list container
    > adds for a new type:
    >
    > code: 2495 bytes
    > data: 488 bytes
    >
    > for a list container containing around 60 entry points.
    >
    > This means that you will pay 2983 bytes for each instatiation
    > of the list template (one of the biggest around)
    >
    > I would be interested to know the corresponding vaue of the
    > C++ stl. How many bytes of code it will cost you to instantiate
    > a list template for a new type?


    Rather than fussing over a few bytes, how about a performance comparison?

    Here's a simple (related to some work I have been doing recently) benchmark:

    Generate a list of 10,000,000 random longs (call it random)
    use that list to instantiate another list (call it list)
    sort list
    sum and empty list by removing the first entry until empty.

    What would be your containers library solution and how does it compare
    to this C++ solution?

    #include <list>
    #include <iostream>
    #include <stdint.h>
    #include <stdlib.h>

    // For system hi-res timer, add your own here.
    //
    int64_t hiresTime();

    const unsigned entries = 10000000;

    typedef std::list<long> List;

    void fill( List& list )
    {
    srand48(42);
    for( unsigned n = 0; n < entries; ++n )
    list.push_back(lrand48());
    }

    uint64_t get( List& list )
    {
    uint64_t result(0);
    while( !list.empty() )
    {
    result += list.front();
    list.pop_front();
    }
    return result;
    }

    double delta( int64_t start, int64_t now )
    {
    return (now-start) / 1000000000.0;
    }

    int main()
    {
    using std::cout;
    using std::endl;

    List random;

    int64_t start = hiresTime();
    fill( random );
    int64_t now = hiresTime();

    cout << "To generate " << delta(start, now) << 'S' << endl;

    start = now;
    List list(random);
    now = hiresTime();

    cout << "To create " << delta(start, now) << 'S' << endl;

    start = now;
    list.sort();
    now = hiresTime();

    cout << "To sort " << delta(start, now) << 'S' << endl;

    start = now;
    uint64_t n = get( list );
    now = hiresTime();

    cout << "To empty " << delta(start, now) << 'S' << endl;

    cout << n << endl;
    }

    --
    Ian Collins
     
    Ian Collins, May 22, 2012
    #4
  5. jacob navia

    Ian Collins Guest

    On 05/22/12 11:28 AM, Ian Collins wrote:
    > On 05/22/12 08:37 AM, jacob navia wrote:
    >> I am writing a containers library in C. The generic list container
    >> adds for a new type:
    >>
    >> code: 2495 bytes
    >> data: 488 bytes
    >>
    >> for a list container containing around 60 entry points.
    >>
    >> This means that you will pay 2983 bytes for each instatiation
    >> of the list template (one of the biggest around)
    >>
    >> I would be interested to know the corresponding vaue of the
    >> C++ stl. How many bytes of code it will cost you to instantiate
    >> a list template for a new type?

    >
    > Rather than fussing over a few bytes, how about a performance comparison?
    >
    > Here's a simple (related to some work I have been doing recently) benchmark:
    >
    > Generate a list of 10,000,000 random longs (call it random)
    > use that list to instantiate another list (call it list)
    > sort list
    > sum and empty list by removing the first entry until empty.


    I forget to add include error checking for allocation failure.

    --
    Ian Collins
     
    Ian Collins, May 22, 2012
    #5
  6. jacob navia <> wrote:
    > I would be interested to know the corresponding vaue of the
    > C++ stl. How many bytes of code it will cost you to instantiate
    > a list template for a new type?


    It depends on which member functions of the templated class you are
    calling. The compiler will instantiate only those functions that get
    called and skip the rest.

    (This is, in fact, not just an optimization, but a *necessity*. The compiler
    *must* instantiate only what is being called because some templated classes
    actually depend on this behavior.)
     
    Juha Nieminen, May 22, 2012
    #6
  7. Ian Collins <> wrote:
    > Rather than fussing over a few bytes, how about a performance comparison?


    A performance comparison would be quite moot because the biggest bottleneck
    in the whole thing is memory allocation. By far. A performance comparison
    would only make sense if one version uses more memory allocations than the
    other (for example, I have seen "generic" linked lists in C that actually
    make two allocations per element rather than the one that std::list makes,
    hence basically making the C version twice as slow as std::list).

    It can be quite surprising how much time the program spends in memory
    allocation alone. For instance, if you instantiate a std::set and add
    some tens of millions of elements to it, it will take many seconds on
    a typical computer. You'd think that the majority of the time is spent
    re-balancing the tree after each insertion. You'd be wrong. Something
    like 80-90% of the time is spent allocating memory!

    Using a very fast allocator with std::set or std::list can speed it up
    quite considerably, if many insertions and deletions are performed.
    (The great thing about the C++ standard data containers is that they
    support user-created memory allocators, something that's way more
    difficult to do in a "generic" C container; as everything else.)
     
    Juha Nieminen, May 22, 2012
    #7
  8. jacob navia

    jacob navia Guest

    Le 22/05/12 08:05, Juha Nieminen a écrit :
    > Ian Collins<> wrote:
    >> Rather than fussing over a few bytes, how about a performance comparison?

    >
    > A performance comparison would be quite moot because the biggest bottleneck
    > in the whole thing is memory allocation. By far.


    Yes. With a fast allocator, my software runs at the same
    speed than C++. This due to bottleneck is in:

    Memory I/O
    Allocation

    > A performance comparison
    > would only make sense if one version uses more memory allocations than the
    > other (for example, I have seen "generic" linked lists in C that actually
    > make two allocations per element rather than the one that std::list makes,
    > hence basically making the C version twice as slow as std::list).
    >


    My software makes one allocation per element.

    > It can be quite surprising how much time the program spends in memory
    > allocation alone. For instance, if you instantiate a std::set and add
    > some tens of millions of elements to it, it will take many seconds on
    > a typical computer. You'd think that the majority of the time is spent
    > re-balancing the tree after each insertion. You'd be wrong. Something
    > like 80-90% of the time is spent allocating memory!
    >


    That number is correct, that's why I decided to use (for single linked
    lists)

    struct element {
    struct element *Next;
    char data[1];
    };

    > Using a very fast allocator with std::set or std::list can speed it up
    > quite considerably, if many insertions and deletions are performed.
    > (The great thing about the C++ standard data containers is that they
    > support user-created memory allocators, something that's way more
    > difficult to do in a "generic" C container; as everything else.)


    My software supports user supplied allocators. The default allocator
    object is:

    typedef struct tagAllocator {
    void *(*malloc)(size_t); /* Function to allocate memory */
    void (*free)(void *); /* Function to release it */
    void *(*realloc)(void *,size_t);/* Function to resize a block of
    memory */
    void *(*calloc)(size_t,size_t);
    } ContainerAllocator;

    You can replace it with your own allocator. For instance you can
    supply a GC (Boehm's for instance) allocator, or a custom one,
    as you wish.
     
    jacob navia, May 22, 2012
    #8
  9. jacob navia

    Ian Collins Guest

    On 05/22/12 06:05 PM, Juha Nieminen wrote:
    > Ian Collins<> wrote:
    >> Rather than fussing over a few bytes, how about a performance comparison?

    >
    > A performance comparison would be quite moot because the biggest bottleneck
    > in the whole thing is memory allocation. By far. A performance comparison
    > would only make sense if one version uses more memory allocations than the
    > other (for example, I have seen "generic" linked lists in C that actually
    > make two allocations per element rather than the one that std::list makes,
    > hence basically making the C version twice as slow as std::list).


    Hence my simple use of std::list, I just wanted some idea how the two
    compare (particularly with regard to the sort) before any optimisations
    are added.

    --
    Ian Collins
     
    Ian Collins, May 22, 2012
    #9
  10. jacob navia

    Ian Collins Guest

    On 05/22/12 06:45 PM, jacob navia wrote:
    > Le 22/05/12 08:05, Juha Nieminen a écrit :
    >> Ian Collins<> wrote:
    >>> Rather than fussing over a few bytes, how about a performance comparison?

    >>
    >> A performance comparison would be quite moot because the biggest bottleneck
    >> in the whole thing is memory allocation. By far.

    >
    > Yes. With a fast allocator, my software runs at the same
    > speed than C++. This due to bottleneck is in:


    Can you provide a simple, unoptimised example? I'm genuinely curios how
    the two compare both before and after allocator optimisation.

    --
    Ian Collins
     
    Ian Collins, May 22, 2012
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. RA Scheltema
    Replies:
    3
    Views:
    410
    RA Scheltema
    Jan 6, 2004
  2. Marijn
    Replies:
    5
    Views:
    479
    Marijn
    Feb 13, 2004
  3. Replies:
    0
    Views:
    2,247
  4. Amadeus W. M.
    Replies:
    2
    Views:
    406
    Amadeus W. M.
    Jul 4, 2006
  5. chhenning
    Replies:
    5
    Views:
    373
    chhenning
    Feb 13, 2008
Loading...

Share This Page