Re: Linked list takes 10X more memory than expected. Why?

Discussion in 'C++' started by Alf P. Steinbach, Jan 23, 2004.

  1. On Fri, 23 Jan 2004 17:39:30 +0000 (UTC), "Chocawok" <> wrote:

    >this is my first post here so please forgive any faux pas's such as too long
    >a post.


    The main problem with the posting is that you're posting a binary attachement
    in a non-binary group. Don't do that, please. It's not just an annoyance.



    >I have a problem which I am totally baffled by.
    >
    >I have written a program which reads in a text file of words, which looks
    >like this:
    >ABLE
    >ARDVARK
    >.
    >BALLOON
    >.
    >ZYGOTE
    >
    >
    >It converts this into a doubly linked list, with each character of the word
    >as node.


    Not a good idea with 1 character per node.

    You might check out the "fly-weight" pattern if you need to treat
    individual characters as objects.

    Otherwise consider using std::vector<std::string> as the data structure.



    >I then search this structure using a depth first search.
    >
    >The problem is that this structure is taking a HUGE memory foot print.
    >I have tried setting up the structure using 2 different input files. Both
    >use OVER TEN TIMES more memory than I expect.


    So your expectation is incorrect, and the question is why it's incorrect.



    >To try and trouble shoot I have added to my code a "count_node" variable
    >which is increased by 1 every time a node is added.


    Good idea -- hard data is always a prerequisite.


    >Here are the details:
    >
    >First file, 1.5Million characters (170,000 words) and comes out about
    >373,683 nodes.


    Which characters are you _not_ storing in nodes?



    >Second file I have is 3.5Million characters (215,000 words) 1,833,930 nodes
    >are created for this.


    Ditto.



    >Now, my node structure is this:
    >
    >-----------------------------------------------------------
    >
    >struct Node
    >
    >{
    >Node* next;
    >Node* down;
    >char letter;
    >bool terminal;
    >
    >//constructor
    >Node()
    >{
    >down = 0;
    >next = 0;
    >letter = 0;
    >terminal = false; }
    >
    >//constructor
    >Node(char CurrentLetter)
    >{
    >down = 0;
    >next = 0;
    >letter = CurrentLetter;
    >terminal = false;
    >}
    >};
    >
    >------------------------------------------------------------
    >
    >
    >As you can see, the node struct should be about 4 bytes per node.


    That's the absolute minimum size of a node when disregarding size
    requirements from C.

    No current computer will give you that minimum size.

    On a 32-bit system reasonable sizes might range from 10 to 16 bytes,
    in a 64-bit system from 18 to 32 bytes.



    >For the first file 373,683 X 4 = 1.5MegaBytes
    >For the second file 1,833,930 X 4 = 7.3 MegaBytes.
    >
    >Even if the pointers in the struct are long (pointers are not a strong point
    >for me) then these figures should only be about 2.5 and 10 MB.
    >
    >AND YET, after reading in the first file the foot print is 21MB.
    >If I use my other larger file its 100 megabytes of memory used!!!!


    Where and how exactly did you obtain these measurements?



    >Thats A LOT of memory.


    Nope.



    >I have run the file without a dictionary (actually there was a file but it
    >was empty) and the foot print of the standalone app is 0.5 MB.


    But THAT is a lot of memory, if it's true.



    >Has anyone got any ideas on why so much memory is being used?


    Regarding the app size it's probably a linker issue.
     
    Alf P. Steinbach, Jan 23, 2004
    #1
    1. Advertising

  2. Alf P. Steinbach

    Chocawok Guest

    ----- Original Message -----
    From: "Alf P. Steinbach" <>
    > >It converts this into a doubly linked list, with each character of the

    word
    > >as node.

    >
    > Not a good idea with 1 character per node.
    >
    > You might check out the "fly-weight" pattern if you need to treat
    > individual characters as objects.


    I checked this out. Interesting idea that I think I can use elsewhere.
    However, I don't beleive it is suitable here, as each object has to have a
    reference to the "flyweight". This will take up as much (if not more) memory
    than the "char".

    > Which characters are you _not_ storing in nodes?


    Basically, consider the two words DOG and DOT. In my structure, The
    characters that would be stored are 'D','O','G','T', because the first two
    characters are "shared".


    > >As you can see, the node struct should be about 4 bytes per node.

    >
    > That's the absolute minimum size of a node when disregarding size
    > requirements from C.
    >
    > No current computer will give you that minimum size.
    >
    > On a 32-bit system reasonable sizes might range from 10 to 16 bytes,
    > in a 64-bit system from 18 to 32 bytes.


    by using sizeof I found that it is 12 bytes.

    > >AND YET, after reading in the first file the foot print is 21MB.
    > >If I use my other larger file its 100 megabytes of memory used!!!!

    >
    > Where and how exactly did you obtain these measurements?


    I used Task Manager (I'm running WinXP).

    Choca
     
    Chocawok, Jan 25, 2004
    #2
    1. Advertising

  3. On Sun, 25 Jan 2004 17:16:02 +0000 (UTC), "Chocawok" <> wrote:

    >> You might check out the "fly-weight" pattern if you need to treat
    >> individual characters as objects.

    >
    >I checked this out. Interesting idea that I think I can use elsewhere.
    >However, I don't beleive it is suitable here, as each object has to have a
    >reference to the "flyweight". This will take up as much (if not more) memory
    >than the "char".


    In the flyweight pattern the fine-grained data does not contain pointers
    to objects. Instead, you bring a suitable object, from a pool, to the
    data, so to speak. In your case, the character data would be what most
    descriptions of this pattern refer to as "extrinsic state" (and in fact, one
    major application of the flyweight pattern is to handle text).


    >> Which characters are you _not_ storing in nodes?

    >
    >Basically, consider the two words DOG and DOT. In my structure, The
    >characters that would be stored are 'D','O','G','T', because the first two
    >characters are "shared".


    Perhaps it would use less memory and also be much easier (less error-prone)
    to simply store the strings in order in a character array. A separate array
    could then contain pointers into the character array.
     
    Alf P. Steinbach, Jan 25, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Karl Heinz Buchegger
    Replies:
    2
    Views:
    408
    Nils Petter Vaskinn
    Jan 26, 2004
  2. Mr. SweatyFinger
    Replies:
    2
    Views:
    2,124
    Smokey Grindel
    Dec 2, 2006
  3. MrsEntity
    Replies:
    20
    Views:
    499
  4. Steven D'Aprano
    Replies:
    0
    Views:
    130
    Steven D'Aprano
    Dec 23, 2013
  5. Replies:
    3
    Views:
    105
    Gary Herron
    Dec 23, 2013
Loading...

Share This Page