How to find out memory occupied by vector, set, or map?

Discussion in 'C++' started by Lambda, May 20, 2008.

  1. Lambda

    Lambda Guest

    I'm trying to develop several interesting components
    of a simple search engine follow some text books.

    A book introduces some ways to compress dictionary,
    allow it to stay on main memory.
    In the dictionary, is a list of millions of words
    with related data of several bytes.

    Word / Term Freq(int) / Pointer to disk file
    Hello 100

    One method is:
    concatenate all the words into one long contiguous string
    and an array of 4-byte character pointers is used to access.

    But I think STL map is a suitable data structure for this problem.
    I don't know the detailed internal implementation of map.
    I think it's implemented as a binary search tree.
    Every node occupy just enough memory. (I'm not sure)
    I also need the search feature of BST.

    So I'd like to know if it is a need to build the data structure
    myself.
    How can I know the whole memory occupied by a map,
    as well as other data structures such as vector, set.
    (Or memory occupied by a map element).

    BTW, should I use std::string or char[] to store the millions of
    words?
    How much is the string overhead?
     
    Lambda, May 20, 2008
    #1
    1. Advertising

  2. Lambda

    dizzy Guest

    Lambda wrote:

    > But I think STL map is a suitable data structure for this problem.
    > I don't know the detailed internal implementation of map.
    > I think it's implemented as a binary search tree.
    > Every node occupy just enough memory. (I'm not sure)
    > I also need the search feature of BST.


    Depends on implementation. g++ 4.1.2 comes with a red-black tree based
    implementation.

    > So I'd like to know if it is a need to build the data structure
    > myself.
    > How can I know the whole memory occupied by a map,
    > as well as other data structures such as vector, set.
    > (Or memory occupied by a map element).


    You could instantiate your used containers with your own allocator type that
    would keep counts on the allocated/deallocated memory (as the containers
    are mandated to use the allocator for all allocation needs) and call upon
    the std::allocator for the actual allocation/deallocation.

    Other than that there is no standard API to provide this information.

    You could also aproximate the used memory by assuming some overhead for each
    allocated node (for node based containers, like list, map, set, etc but NOT
    vector, deque) and also considering the size of the element (with sizeof()
    of course).

    > BTW, should I use std::string or char[] to store the millions of
    > words?


    If by char[] you mean an char[N] array type, you must know that such a type
    requires a compile time fixed N value. Which means that unless your words
    do not vary greatly in length you would be wasting alot of memory by having
    a big enough N to hold all possible words. And then again, if you need to
    allocate the words at runtime depending on some runtime input then you
    still need to perform dynamic allocation allocating static arrays somewhere
    in that dynamic allocated memory which kinda beats the purpose of static
    arrays. So just use std::string.

    > How much is the string overhead?


    What kind of overhead? Let's discuss a naive implementation (non reference
    counted) of std::string:
    - it probably has some optimization for empty strings and/or strings with
    short length (that is, when you create emtpy strings of strings of short
    length there isn't any dynamic memory allocation done by the string itself)
    - for all other cases, std::string will use the given allocator to allocate
    memory and put the string contents (we don't know if it does it continously
    or in chunks like std::deque usually does)
    - when you remove data from the string it will update internal counters and
    possibly copy bytes around but almost for sure it won't deallocate any
    memory
    - when you add data to the string (appending) it may allocate more memory to
    hold the new data if it doesn't have enough space yet

    So, from that perspective, what usecase will your std::string have? That is,
    will you create few std::string that constanlty get updated (in contents)?
    Or will you create many std::string that are basically constant? And when
    you compare if something has overhead you do it relative to another thing,
    if you try to imagine your own implementation of a std::string you won't
    come better than what your C++ library has anyway.

    Or otherwise put, make it work, profile, optimize, in that order!

    --
    Dizzy
     
    dizzy, May 20, 2008
    #2
    1. Advertising

  3. Lambda

    James Kanze Guest

    On May 20, 4:58 am, Lambda <> wrote:
    > I'm trying to develop several interesting components
    > of a simple search engine follow some text books.


    > A book introduces some ways to compress dictionary,
    > allow it to stay on main memory.
    > In the dictionary, is a list of millions of words
    > with related data of several bytes.


    > Word / Term Freq(int) / Pointer to disk file
    > Hello 100


    > One method is:
    > concatenate all the words into one long contiguous string
    > and an array of 4-byte character pointers is used to access.


    A classical string pool. It's not a solution to your problem,
    just a very special way to represent strings.

    > But I think STL map is a suitable data structure for this
    > problem. I don't know the detailed internal implementation of
    > map. I think it's implemented as a binary search tree. Every
    > node occupy just enough memory. (I'm not sure) I also need the
    > search feature of BST.


    Every node is dynamically allocated, and contains several
    pointers in addition to the raw data, so it can be fairly
    expensive in terms of memory use. On the other hand, for
    something small, like a couple of million elements, why not.

    Note that access time is O(lg n), and not constant. For more
    than a couple of thousand elements, a hash table will be faster
    (provided you use a good hash function).

    > So I'd like to know if it is a need to build the data
    > structure myself. How can I know the whole memory occupied by
    > a map, as well as other data structures such as vector, set.
    > (Or memory occupied by a map element).


    You can't really, since there are so many hidden overheads. I'd
    just go ahead and implement using std::map, but carefully
    encapsulating it so that I could replace it if it did cause a
    problem later.

    > BTW, should I use std::string or char[] to store the millions
    > of words?


    std::string, definitely.

    > How much is the string overhead?


    Again, it depends on the implementation. A lot of
    implementations today use a small string optimization which
    avoids dynamic allocation for strings shorter than some cut-off
    size; if your words are English, there's a good chance that most
    of them will fall under this limit.

    All sorts of optimizations are possible, once you have the
    application working, and can see if they're necessary.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 20, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    381
    Mark Space
    Jul 7, 2006
  2. Replies:
    8
    Views:
    1,930
    Csaba
    Feb 18, 2006
  3. Ravi

    Show all memory occupied by the program

    Ravi, Jun 3, 2007, in forum: C Programming
    Replies:
    31
    Views:
    807
    Keith Thompson
    Jun 6, 2007
  4. Replies:
    5
    Views:
    365
    Jerry Coffin
    Jun 15, 2008
  5. swapnalimore
    Replies:
    0
    Views:
    698
    swapnalimore
    Jul 12, 2011
Loading...

Share This Page