Tree in C++

Discussion in 'C++' started by Ian, Aug 13, 2004.

  1. Ian

    Ian Guest

    Casper wrote:
    > In scanning a drive and comparing file content, I need to remember
    > filepaths to every single file I encounter so I can take actions on
    > certain files later on.
    >
    > Rather than having a huge list enumerating the complete filepath to
    > every file it seems the smarter way (faster, more memmory efficient) is
    > to model the filesystem treestructure in a abstract tree - and having
    > only the filenames & node pointer in an Array.
    >
    > struct tree {
    > struct tree *parent;
    > struct tree *child;
    > CString filename;
    > };
    >

    What's a CString?

    > struct file {
    > CString name;
    > struct tree *path;
    > };
    >
    > Does this sound like a good idea? Specifically, can I utilize an
    > existing data structure? What about stack/heap issues and tradeoffs?
    >
    > Not being awfully experienced with trees and pointers, why won't this run:
    >
    > struct tree *root;
    > root = ( struct tree * ) malloc ( sizeof( struct tree ) );
    > root->filename = "Test";
    >

    Why use malloc?

    tree* root = new tree;

    would be better.

    Ian
     
    Ian, Aug 13, 2004
    #1
    1. Advertising

  2. Ian

    David Hilsee Guest

    "Casper" <> wrote in message
    news:eAVSc.14460$...
    > In scanning a drive and comparing file content, I need to remember
    > filepaths to every single file I encounter so I can take actions on
    > certain files later on.
    >
    > Rather than having a huge list enumerating the complete filepath to
    > every file it seems the smarter way (faster, more memmory efficient) is
    > to model the filesystem treestructure in a abstract tree - and having
    > only the filenames & node pointer in an Array.
    >
    > struct tree {
    > struct tree *parent;
    > struct tree *child;
    > CString filename;
    > };
    >
    > struct file {
    > CString name;
    > struct tree *path;
    > };


    This doesn't make much sense to me. Perhaps you meant something like this:

    struct dir_or_file {
    // Note: "struct" not necessary for C++
    dir_or_file *parent;
    int numChildren;
    // pointer to first element of array of size numChildren
    dir_or_file *child;
    CString name;

    // maybe some flag here indicating whether or not it is a file or a
    directory
    };

    > Does this sound like a good idea? Specifically, can I utilize an
    > existing data structure? What about stack/heap issues and tradeoffs?
    >
    > Not being awfully experienced with trees and pointers, why won't this run:
    >
    > struct tree *root;
    > root = ( struct tree * ) malloc ( sizeof( struct tree ) );
    > root->filename = "Test";


    Your mistake is that you used malloc. The object to which "root" points did
    not have its constructor executed before you attempted to assign the value
    "Test" to one of its members. It it important to execute constructors
    before attempting to use the object, so you should have instead used "new":

    root = new tree();

    This will cause the (compiler-generated) constructor for the object to be
    executed. Now, the object will be properly initialized and the assignment
    should not cause any problems.

    --
    David Hilsee
     
    David Hilsee, Aug 13, 2004
    #2
    1. Advertising

  3. Ian

    Casper Guest

    In scanning a drive and comparing file content, I need to remember
    filepaths to every single file I encounter so I can take actions on
    certain files later on.

    Rather than having a huge list enumerating the complete filepath to
    every file it seems the smarter way (faster, more memmory efficient) is
    to model the filesystem treestructure in a abstract tree - and having
    only the filenames & node pointer in an Array.

    struct tree {
    struct tree *parent;
    struct tree *child;
    CString filename;
    };

    struct file {
    CString name;
    struct tree *path;
    };

    Does this sound like a good idea? Specifically, can I utilize an
    existing data structure? What about stack/heap issues and tradeoffs?

    Not being awfully experienced with trees and pointers, why won't this run:

    struct tree *root;
    root = ( struct tree * ) malloc ( sizeof( struct tree ) );
    root->filename = "Test";

    ---
    Unhandled exception at 0x0041b07b in DubWiz.exe: 0xC0000005: Access
    violation reading location 0xcdcdcdc1.
    In "atlsimpstr.h", GetLength() method (VC++ 7.0 compiler using MFC/ATL)
    ---

    Thanks in advance,
    Casper
     
    Casper, Aug 13, 2004
    #3
  4. Ian

    Casper Guest

    David Hilsee wrote:
    > This doesn't make much sense to me. Perhaps you meant something like this:
    >
    > struct dir_or_file {
    > // Note: "struct" not necessary for C++
    > dir_or_file *parent;
    > int numChildren;
    > // pointer to first element of array of size numChildren
    > dir_or_file *child;
    > CString name;
    >
    > // maybe some flag here indicating whether or not it is a file or a
    > directory
    > };


    Actually that would mix path and filename which is not optimum for my
    problem - but I can understand why my proposed structures seem weird to
    the uninvited.
    Basically what I do is to collect ALL filenames and its size in a list.
    The list uses insertion sort so that after scanning, I will be able to
    determine which files are *exactly* the same size. When I have similar
    sized files I do a binary compare (or CRC not sure yet) to verify they
    are identical. When dublets are found I will delete all but one copy and
    create hard links for the remaining copies. But to be able to delete
    (and know where to point the hard link at) I need the path part of this
    list item - and that's where the tree comes in - the list item holds a
    pointer to a node in my directory tree.

    Its all part pf a space saver experiment of mine.

    > Your mistake is that you used malloc. The object to which "root" points did
    > not have its constructor executed before you attempted to assign the value
    > "Test" to one of its members. It it important to execute constructors
    > before attempting to use the object, so you should have instead used "new":
    >
    > root = new tree();
    >
    > This will cause the (compiler-generated) constructor for the object to be
    > executed. Now, the object will be properly initialized and the assignment
    > should not cause any problems.


    Indeed that did the trick. I will continue my experiment then, thanks
    for the feedback! :)

    Casper
     
    Casper, Aug 13, 2004
    #4
  5. "Casper" <> wrote in message
    news:eek:0XSc.18846$...
    > David Hilsee wrote:
    > > This doesn't make much sense to me. Perhaps you meant something like

    this:
    > >
    > > struct dir_or_file {
    > > // Note: "struct" not necessary for C++
    > > dir_or_file *parent;
    > > int numChildren;
    > > // pointer to first element of array of size numChildren
    > > dir_or_file *child;
    > > CString name;
    > >
    > > // maybe some flag here indicating whether or not it is a file or a
    > > directory
    > > };

    >
    > Actually that would mix path and filename which is not optimum for my
    > problem - but I can understand why my proposed structures seem weird to
    > the uninvited.
    > Basically what I do is to collect ALL filenames and its size in a list.
    > The list uses insertion sort so that after scanning, I will be able to
    > determine which files are *exactly* the same size. When I have similar
    > sized files I do a binary compare (or CRC not sure yet) to verify they
    > are identical. When dublets are found I will delete all but one copy and
    > create hard links for the remaining copies. But to be able to delete
    > (and know where to point the hard link at) I need the path part of this
    > list item - and that's where the tree comes in - the list item holds a
    > pointer to a node in my directory tree.
    >
    > Its all part pf a space saver experiment of mine.
    >
    > > Your mistake is that you used malloc. The object to which "root" points

    did
    > > not have its constructor executed before you attempted to assign the

    value
    > > "Test" to one of its members. It it important to execute constructors
    > > before attempting to use the object, so you should have instead used

    "new":
    > >
    > > root = new tree();
    > >
    > > This will cause the (compiler-generated) constructor for the object to

    be
    > > executed. Now, the object will be properly initialized and the

    assignment
    > > should not cause any problems.

    >
    > Indeed that did the trick. I will continue my experiment then, thanks
    > for the feedback! :)
    >
    > Casper
    >


    Why not store the file size information in the tree as well, and sort by
    that? The performance should be much better than either a linked list or an
    array (insertion sort); the tree should be reasonably well balanced.
     
    Bill Thompson, Aug 13, 2004
    #5
  6. Ian

    David Fisher Guest

    "Bill Thompson" wrote:

    >> Basically what I do is to collect ALL filenames and its size in a list.
    >> The list uses insertion sort so that after scanning, I will be able to
    >> determine which files are *exactly* the same size. When I have similar
    >> sized files I do a binary compare (or CRC not sure yet) to verify they
    >> are identical. When dublets are found I will delete all but one copy and
    >> create hard links for the remaining copies. But to be able to delete
    >> (and know where to point the hard link at) I need the path part of this
    >> list item - and that's where the tree comes in - the list item holds a
    >> pointer to a node in my directory tree.

    >
    > Why not store the file size information in the tree as well, and sort by
    > that? The performance should be much better than either a linked list or

    an
    > array (insertion sort); the tree should be reasonably well balanced.


    Sounds like a job for a priority queue keyed on the file size ...

    No need for post-sorting afterwards (and the hard link could even be created
    during the creation of the priority queue, whenever a duplicate key is
    found). Needs to handle multiple entries with the same file size, though -
    maybe a priority queue of linked lists ? Pointing into another (tree)
    structure which stores the filenames in a memory effecient way ?

    http://www.dinkumware.com/manuals/reader.aspx?lib=cpl&h=queue.html#priority_queue

    David Fisher
    HSA Systems
     
    David Fisher, Aug 13, 2004
    #6
  7. Ian

    Casper Guest


    > Why not store the file size information in the tree as well, and sort by
    > that? The performance should be much better than either a linked list or an
    > array (insertion sort); the tree should be reasonably well balanced.
    >

    Sounds very very interesting! I am not 100% sure I understand how a
    fixed/immutable nested tree structure can be sorted by a node data
    member?! Would it work as a secondard access path to the structure? I
    would love to see a prototype example of the struct you have in mind!

    Sincerely,
    Casper
     
    Casper, Aug 13, 2004
    #7
  8. "Casper" <> wrote in message
    news:9a_Sc.25209$...
    >
    > > Why not store the file size information in the tree as well, and sort by
    > > that? The performance should be much better than either a linked list

    or an
    > > array (insertion sort); the tree should be reasonably well balanced.
    > >

    > Sounds very very interesting! I am not 100% sure I understand how a
    > fixed/immutable nested tree structure can be sorted by a node data
    > member?! Would it work as a secondard access path to the structure? I
    > would love to see a prototype example of the struct you have in mind!
    >
    > Sincerely,
    > Casper
    >


    My apologies for the poorly worded statement: when building the tree, use
    the filesize as the sort field.

    You can find C++ code for simple binary trees very easily on the internet.
    Without looking too hard I found this link:
    http://www.cs.bu.edu/teaching/cs112/spring-2000/binary-tree/

    The code would have to be modified to deal with duplicate file sizes.

    It appears that you aren't particularly interested in sorting by the
    filename. After building the tree, you can traverse it in order by file
    size; as you traverse you can pull out sets of files with the same size and
    deal with them as you see fit.

    Another possibility is finding and using a database api. Or, easier still,
    writing a program to translate the file system into a file that can be
    imported into a database, processing the data, export a file back out, and
    write another program to deal with the exported data.

    As you've indicated, you could also build two trees, one sorted by name, the
    other by size; all refering to the same bank of file data.
     
    Bill Thompson, Aug 13, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ramkumar Menon

    B+ Tree versus Ternary Search Tree

    Ramkumar Menon, Aug 16, 2005, in forum: Java
    Replies:
    2
    Views:
    1,650
    Roedy Green
    Aug 16, 2005
  2. Ramkumar Menon

    B+ Tree versus Ternary Search Tree

    Ramkumar Menon, Aug 16, 2005, in forum: Java
    Replies:
    0
    Views:
    479
    Ramkumar Menon
    Aug 16, 2005
  3. Ramkumar Menon

    B+ Tree versus Ternary Search Tree

    Ramkumar Menon, Aug 16, 2005, in forum: Java
    Replies:
    1
    Views:
    481
    Andrew Thompson
    Aug 16, 2005
  4. Joris Gillis
    Replies:
    2
    Views:
    1,568
    Joris Gillis
    Jun 16, 2006
  5. Stub

    B tree, B+ tree and B* tree

    Stub, Nov 12, 2003, in forum: C Programming
    Replies:
    3
    Views:
    10,248
Loading...

Share This Page