Representing a hierarchical file system

J

jacob navia

Hi

Within the context of the containers library I would like to be able to
represent a file system, i.e. directories and files within a tree
structure.

Has this problem been solved in a public available way?

Are there any known implementations?

For instance:

typedef struct tagFile{
char *Name;
size_t size;
FileTime last_save;
} file;

typedef struct tagDirectory {
char *Name;
size_t NumberOfChildren;
struct tagDirectory **Subdirs;
size_t NumberOfFiles;
file **Files;
}Directory;

This is of course incomplete. Are there any available
implementations of such a data structure?
 
I

Ian Collins

Hi

Within the context of the containers library I would like to be able to
represent a file system, i.e. directories and files within a tree
structure.

Has this problem been solved in a public available way?

I'm not aware of any portable C solutions, but C++ has boost.filesystem
which may be of interest.
 
K

Keith Thompson

jacob navia said:
Within the context of the containers library I would like to be able to
represent a file system, i.e. directories and files within a tree
structure.
[...]

It's not clear that all filesystems can be straightforwardly
represented as "directories and files within a tree structure".
Take a look at OpenVMS, for example -- or Windows, for that matter.

But I suppose you could artificially impose a single-tree structure.

A number of languages provide something like this. I seem to recall
that there's a Lisp dialect (Scheme?) that has a pretty good portable
filesystem representation.
 
N

Nobody

Within the context of the containers library I would like to be able to
represent a file system, i.e. directories and files within a tree
structure.
For instance:

typedef struct tagFile{
char *Name;
size_t size;
FileTime last_save;
} file;

typedef struct tagDirectory {
char *Name;
size_t NumberOfChildren;
struct tagDirectory **Subdirs;
size_t NumberOfFiles;
file **Files;
}Directory;

This is quite different to typical filesystems.

E.g. on Unix, you have the inode, which contains the size,
owner, permissions, modified/accessed times, etc, plus the list of blocks
containing the data. Note that the inode /doesn't/ contain the name. Every
file, directory, symlink and device has (or, really, is) an inode. The
stat() call basically returns a copy of the inode, minus information which
is "internal" (e.g. the list of blocks).

Then you have directories, where the data consists of a mapping from names
to inode numbers. On early systems, a directory was simply a list of
<name,inode#> pairs. Modern systems use e.g. a btree to facilitate random
access (looking up a specific name in a directory is far more common than
enumerating the contents of a directory).

DOS' FAT filesystem places everything into the directory, so rather than
just a name and an inode number, the entire "inode" is included as part of
the directory entry. The main drawback compared to Unix is you can't have
hard links (multiple directory entries referring to the same inode
number). Also, on VFAT and NTFS, filenames are Unicode wide strings, not
byte strings. The inefficiencies of FAT are circumvented by building more
efficient representations in memory.

Assuming that you're not trying to implement a real filesystem, I'd
suggest replacing the file and directory structures with a common
"node" structure which includes a union member for the
file/directory/other parts. For a directory, the data would be a map
(btree, hashtable, etc) with names as keys and either nodes or references
(pointers, indices) to nodes. Separate maps for files and subdirectories
may or may not be a good idea.

If you are trying to implement a real filesystem, you need to figure out
how to make modifications efficiently (i.e. not contiguous blocks which
need to be realloc()d whenever something changes).
 
J

jacob navia

Le 15/08/11 12:36, Ian Collins a écrit :
I'm not aware of any portable C solutions, but C++ has boost.filesystem
which may be of interest.
Thanks, that is very useful
 
J

jacob navia

No, I do not want to implement a file system.

I want some portable representation of a file system structure in memory
 
J

jacob navia

Le 15/08/11 19:35, China Blue Tip Wrench a écrit :
Unix doesn't distinguish files and subdirectories in a directory. To get that
you need an additional call for each entry (stat). If you're going to stat, you
might want to save the information to avoid redundant stat calls. Also the
directory graph can be cyclic when you follow soft links.

I would rather ignore links, and avoid anything Unix specific (or DOS
specific)

Just a simple hierarchical representation with some attibutes like size
and time

I would like it to be as portable as possible
 
H

Heinrich Wolf

....
I would rather ignore links, and avoid anything Unix specific (or DOS
specific)

Just a simple hierarchical representation with some attibutes like size
and time

I would like it to be as portable as possible

Be careful!
NTFS also has hard links like Unix, not only *.lnk files
 
I

Ian Collins

Le 15/08/11 19:35, China Blue Tip Wrench a écrit :

I would rather ignore links, and avoid anything Unix specific (or DOS
specific)

Just a simple hierarchical representation with some attibutes like size
and time

I would like it to be as portable as possible

Your first and last statements appear contradictory; you can't ignore
filesystem features and by portable!

I have an application that maintains a directory tree in memory (to
enable us to search and index 30 million files). I use is a file object
and a directory object which is a file with an array of file pointers.
Following the POSIX model, all meta-data is in the file object.

Because it doesn't care what a directory entry is, the POSIX interface
is probably the most portable you will find. Unix/Linux systems can
access DOS and windows filesystems though the standard interface.
 
N

Nobody

I would rather ignore links, and avoid anything Unix specific (or DOS
specific)

Just a simple hierarchical representation with some attibutes like size
and time

I would like it to be as portable as possible

These are self-contradictory.

If you /really/ ignore links then, on Unix, your filesystem will be empty;
a Unix filesystem is nothing but links. A common Unix noob mistake is
asking how to identify or distinguish a hard link (i.e. something created
by link()) from a "real" file. This is a meaningless question; a hard link
is just a directory entry; the links created by "ln" are no different to
those created when the file itself was originally created with open() or
creat().

You can only create a "hierarchical representation" of a Unix filesystem
by losing the distinction between a (hard) link and the file to which it
refers, so you can't distinguish between N links to a single file and N
distinct files. On the bright side, modern Unices don't normally allow
creating hard links to directories, so you can't have cycles (assuming
that you ignore the "." and ".." links).

If you just discard symbolic links, many filenames will simply be invalid
according to your representation, which is likely to impair its utility.
OTOH, if you treat symbolic links as synonymous with the object to which
they refer, not only will your representation not be hierarchical, it will
probably be cyclic (if you actually manage to construct it without going
into infinite recursion).

[Actually, you can end up going into infinite recursion anyhow, if you try
to traverse a directory tree without explicitly checking for loops, as
directories can be renamed (i.e. moved) at any time.]

If you want portability, you really have to support the key features of
all of the target systems (i.e. the common superset, not the common
subset). So e.g. if you want to support Unix, the data model has to
support the concepts of hard and symbolic links. On Windows, these will
still exist as concepts, but you just won't encounter any in practice.
 
B

BGB

I would rather ignore links, and avoid anything Unix specific (or DOS
specific)

Just a simple hierarchical representation with some attibutes like size
and time

I would like it to be as portable as possible

These are self-contradictory.

If you /really/ ignore links then, on Unix, your filesystem will be empty;
a Unix filesystem is nothing but links. A common Unix noob mistake is
asking how to identify or distinguish a hard link (i.e. something created
by link()) from a "real" file. This is a meaningless question; a hard link
is just a directory entry; the links created by "ln" are no different to
those created when the file itself was originally created with open() or
creat().

You can only create a "hierarchical representation" of a Unix filesystem
by losing the distinction between a (hard) link and the file to which it
refers, so you can't distinguish between N links to a single file and N
distinct files. On the bright side, modern Unices don't normally allow
creating hard links to directories, so you can't have cycles (assuming
that you ignore the "." and ".." links).

If you just discard symbolic links, many filenames will simply be invalid
according to your representation, which is likely to impair its utility.
OTOH, if you treat symbolic links as synonymous with the object to which
they refer, not only will your representation not be hierarchical, it will
probably be cyclic (if you actually manage to construct it without going
into infinite recursion).

[Actually, you can end up going into infinite recursion anyhow, if you try
to traverse a directory tree without explicitly checking for loops, as
directories can be renamed (i.e. moved) at any time.]

If you want portability, you really have to support the key features of
all of the target systems (i.e. the common superset, not the common
subset). So e.g. if you want to support Unix, the data model has to
support the concepts of hard and symbolic links. On Windows, these will
still exist as concepts, but you just won't encounter any in practice.

note: on newer Windows one may actually encounter links, as Windows has
added them and does actually use them for some things.


how I would probably do things:

typedef struct DirEntry_s DirEntry;
typedef struct DirHead_s DirHead;

struct DirEntry_s
{
char *name;
DirHead *dir; //holds something if a directory
int64_t size; //file size
....
};

struct DirHead_s
{
DirEntry **ents; //directory entries
size_t n_files; //num files in directory
size_t m_files; //current max size of ents list
....
};

or, alternatively, use a linked list:

struct DirEntry_s
{
DirEntry *next;
char *name;
DirHead *dir; //holds something if a directory, NULL for file
int64_t size; //file size
....
};

struct DirHead_s
{
DirEntry *first; //directory entries
....
};


note that implementing a trivial file-system is not necessarily all that
much different (although, more common is to have space reserved for the
filename in-struct and use an endianess-agnostic means of representing
values, typically so that the dir-ents can map 1:1 with their on-disk
format, ... but all this is technically optional, and a more filesystem
agnostic means may have merits as well).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top