Tree in C++

I

Ian

Casper said:
In scanning a drive and comparing file content, I need to remember
filepaths to every single file I encounter so I can take actions on
certain files later on.

Rather than having a huge list enumerating the complete filepath to
every file it seems the smarter way (faster, more memmory efficient) is
to model the filesystem treestructure in a abstract tree - and having
only the filenames & node pointer in an Array.

struct tree {
struct tree *parent;
struct tree *child;
CString filename;
};
What's a CString?
struct file {
CString name;
struct tree *path;
};

Does this sound like a good idea? Specifically, can I utilize an
existing data structure? What about stack/heap issues and tradeoffs?

Not being awfully experienced with trees and pointers, why won't this run:

struct tree *root;
root = ( struct tree * ) malloc ( sizeof( struct tree ) );
root->filename = "Test";
Why use malloc?

tree* root = new tree;

would be better.

Ian
 
D

David Hilsee

Casper said:
In scanning a drive and comparing file content, I need to remember
filepaths to every single file I encounter so I can take actions on
certain files later on.

Rather than having a huge list enumerating the complete filepath to
every file it seems the smarter way (faster, more memmory efficient) is
to model the filesystem treestructure in a abstract tree - and having
only the filenames & node pointer in an Array.

struct tree {
struct tree *parent;
struct tree *child;
CString filename;
};

struct file {
CString name;
struct tree *path;
};

This doesn't make much sense to me. Perhaps you meant something like this:

struct dir_or_file {
// Note: "struct" not necessary for C++
dir_or_file *parent;
int numChildren;
// pointer to first element of array of size numChildren
dir_or_file *child;
CString name;

// maybe some flag here indicating whether or not it is a file or a
directory
};
Does this sound like a good idea? Specifically, can I utilize an
existing data structure? What about stack/heap issues and tradeoffs?

Not being awfully experienced with trees and pointers, why won't this run:

struct tree *root;
root = ( struct tree * ) malloc ( sizeof( struct tree ) );
root->filename = "Test";

Your mistake is that you used malloc. The object to which "root" points did
not have its constructor executed before you attempted to assign the value
"Test" to one of its members. It it important to execute constructors
before attempting to use the object, so you should have instead used "new":

root = new tree();

This will cause the (compiler-generated) constructor for the object to be
executed. Now, the object will be properly initialized and the assignment
should not cause any problems.
 
C

Casper

In scanning a drive and comparing file content, I need to remember
filepaths to every single file I encounter so I can take actions on
certain files later on.

Rather than having a huge list enumerating the complete filepath to
every file it seems the smarter way (faster, more memmory efficient) is
to model the filesystem treestructure in a abstract tree - and having
only the filenames & node pointer in an Array.

struct tree {
struct tree *parent;
struct tree *child;
CString filename;
};

struct file {
CString name;
struct tree *path;
};

Does this sound like a good idea? Specifically, can I utilize an
existing data structure? What about stack/heap issues and tradeoffs?

Not being awfully experienced with trees and pointers, why won't this run:

struct tree *root;
root = ( struct tree * ) malloc ( sizeof( struct tree ) );
root->filename = "Test";
 
C

Casper

David said:
This doesn't make much sense to me. Perhaps you meant something like this:

struct dir_or_file {
// Note: "struct" not necessary for C++
dir_or_file *parent;
int numChildren;
// pointer to first element of array of size numChildren
dir_or_file *child;
CString name;

// maybe some flag here indicating whether or not it is a file or a
directory
};

Actually that would mix path and filename which is not optimum for my
problem - but I can understand why my proposed structures seem weird to
the uninvited.
Basically what I do is to collect ALL filenames and its size in a list.
The list uses insertion sort so that after scanning, I will be able to
determine which files are *exactly* the same size. When I have similar
sized files I do a binary compare (or CRC not sure yet) to verify they
are identical. When dublets are found I will delete all but one copy and
create hard links for the remaining copies. But to be able to delete
(and know where to point the hard link at) I need the path part of this
list item - and that's where the tree comes in - the list item holds a
pointer to a node in my directory tree.

Its all part pf a space saver experiment of mine.
Your mistake is that you used malloc. The object to which "root" points did
not have its constructor executed before you attempted to assign the value
"Test" to one of its members. It it important to execute constructors
before attempting to use the object, so you should have instead used "new":

root = new tree();

This will cause the (compiler-generated) constructor for the object to be
executed. Now, the object will be properly initialized and the assignment
should not cause any problems.

Indeed that did the trick. I will continue my experiment then, thanks
for the feedback! :)

Casper
 
B

Bill Thompson

Casper said:
Actually that would mix path and filename which is not optimum for my
problem - but I can understand why my proposed structures seem weird to
the uninvited.
Basically what I do is to collect ALL filenames and its size in a list.
The list uses insertion sort so that after scanning, I will be able to
determine which files are *exactly* the same size. When I have similar
sized files I do a binary compare (or CRC not sure yet) to verify they
are identical. When dublets are found I will delete all but one copy and
create hard links for the remaining copies. But to be able to delete
(and know where to point the hard link at) I need the path part of this
list item - and that's where the tree comes in - the list item holds a
pointer to a node in my directory tree.

Its all part pf a space saver experiment of mine.


Indeed that did the trick. I will continue my experiment then, thanks
for the feedback! :)

Casper

Why not store the file size information in the tree as well, and sort by
that? The performance should be much better than either a linked list or an
array (insertion sort); the tree should be reasonably well balanced.
 
D

David Fisher

Bill Thompson said:
Why not store the file size information in the tree as well, and sort by
that? The performance should be much better than either a linked list or an
array (insertion sort); the tree should be reasonably well balanced.

Sounds like a job for a priority queue keyed on the file size ...

No need for post-sorting afterwards (and the hard link could even be created
during the creation of the priority queue, whenever a duplicate key is
found). Needs to handle multiple entries with the same file size, though -
maybe a priority queue of linked lists ? Pointing into another (tree)
structure which stores the filenames in a memory effecient way ?

http://www.dinkumware.com/manuals/reader.aspx?lib=cpl&h=queue.html#priority_queue

David Fisher
HSA Systems
 
C

Casper

Why not store the file size information in the tree as well, and sort by
that? The performance should be much better than either a linked list or an
array (insertion sort); the tree should be reasonably well balanced.
Sounds very very interesting! I am not 100% sure I understand how a
fixed/immutable nested tree structure can be sorted by a node data
member?! Would it work as a secondard access path to the structure? I
would love to see a prototype example of the struct you have in mind!

Sincerely,
Casper
 
B

Bill Thompson

Casper said:
Sounds very very interesting! I am not 100% sure I understand how a
fixed/immutable nested tree structure can be sorted by a node data
member?! Would it work as a secondard access path to the structure? I
would love to see a prototype example of the struct you have in mind!

Sincerely,
Casper

My apologies for the poorly worded statement: when building the tree, use
the filesize as the sort field.

You can find C++ code for simple binary trees very easily on the internet.
Without looking too hard I found this link:
http://www.cs.bu.edu/teaching/cs112/spring-2000/binary-tree/

The code would have to be modified to deal with duplicate file sizes.

It appears that you aren't particularly interested in sorting by the
filename. After building the tree, you can traverse it in order by file
size; as you traverse you can pull out sets of files with the same size and
deal with them as you see fit.

Another possibility is finding and using a database api. Or, easier still,
writing a program to translate the file system into a file that can be
imported into a database, processing the data, export a file back out, and
write another program to deal with the exported data.

As you've indicated, you could also build two trees, one sorted by name, the
other by size; all refering to the same bank of file data.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top