In need of a virtual filesystem / archive

E

Enigma Curry

I need to store a large number of files in an archive. From Python, I
need to be able to create an archive, put files into it, modify files
that are already in it, and delete files already in it.

The easy solution would be to use a zip file or a tar file. Python has
good standard modules for accessing those types. However, I would tend
to think that modifying or deleting files in the archive would require
rewriting the entire archive.

Is there any archive format that can allow Python to modify a file in
the archive *in place*? That is to say if my archive is 2GB large and I
have a small text file in the archive I want to be able to modify that
small text file (or delete it) without having to rewrite the entire
archive to disk.

Does anything like this exist? If nothing exists for Python, is there
something written in C maybe that I could wrap (preferably you won't
suggest wrapping the ext2 filesystem driver.. ;) ?
 
P

Paul Rubin

Enigma Curry said:
Is there any archive format that can allow Python to modify a file in
the archive *in place*? That is to say if my archive is 2GB large and I
have a small text file in the archive I want to be able to modify that
small text file (or delete it) without having to rewrite the entire
archive to disk.

Does anything like this exist?

Yes, what you want is called a database. Try the bsddb module or
something with MySQL depending on your requirements.
 
S

Steven D'Aprano

Enigma said:
I need to store a large number of files in an archive. From Python, I
need to be able to create an archive, put files into it, modify files
that are already in it, and delete files already in it.

The easy solution would be to use a zip file or a tar file. Python has
good standard modules for accessing those types. However, I would tend
to think that modifying or deleting files in the archive would require
rewriting the entire archive.

Is there any archive format that can allow Python to modify a file in
the archive *in place*? That is to say if my archive is 2GB large and I
have a small text file in the archive I want to be able to modify that
small text file (or delete it) without having to rewrite the entire
archive to disk.

Yes. I believe your common or garden variety file
manager can handle this task, by storing files in an
archive called "a directory". For example, many mail
systems use the "maildir" archive for storing email
while still being able to access it quickly and robustly.

Do you really need to store your files in a single
meta-file? Do you need compression? How much overhead
for the archive structure are you prepared to carry? Do
you expect the archive to shrink when you delete a file
from the middle?

I suspect you can pick any two of the following three:

1. single file
2. space used for deleted files is reclaimed
3. fast performance

Using a proper database will give you 2 and 3, but at
the cost of a lot of overhead, and typically a
relational database is not a single file.
 
B

bonono

Steven said:
I suspect you can pick any two of the following three:

1. single file
2. space used for deleted files is reclaimed
3. fast performance

Using a proper database will give you 2 and 3, but at
the cost of a lot of overhead, and typically a
relational database is not a single file.
sqlite can give 1-3, it does have overhead but whether it worths it
depends on individual judgement based on features, usage pattern etc..
I think monotone use it.
 
R

Rene Pijlman

Enigma Curry:
I need to store a large number of files in an archive. From Python, I
need to be able to create an archive, put files into it, modify files
that are already in it, and delete files already in it.

Use the file system. That's what it's for.
 
I

Ivan Vilata i Balaguer

En/na Enigma Curry ha escrit::
I need to store a large number of files in an archive. From Python, I
need to be able to create an archive, put files into it, modify files
that are already in it, and delete files already in it.
[...]
Is there any archive format that can allow Python to modify a file in
the archive *in place*? That is to say if my archive is 2GB large and I
have a small text file in the archive I want to be able to modify that
small text file (or delete it) without having to rewrite the entire
archive to disk.
[...]

Although it is not its main usage, PyTables_ can be used to store
ordinary files in a single HDF5_ file. HDF5 files have a hierarchical
structure of nodes and groups which maps quite well to files and
directories. You can create, read, modify, copy, move and remove nodes
at will, freed space is reclaimed, and HDF5 is very efficient no matter
how large data is.

For working with the files, PyTables includes a FileNode_ module which
offers Python file semantics for nodes in an HDF5 file. You can also
keep nodes transparently compressed, or you may repack the whole HDF5
file to defragment it or (de)compress its nodes, which may make a
reasonable option to a compressed archive.

I will be pleased to give more information. Hope that helps.

.. _PyTables: http://www.pytables.org/
.. _HDF5: http://hdf.ncsa.uiuc.edu/HDF5/
.. _FileNode: http://pytables.sourceforge.net/html-doc/usersguide6.html

import disclaimer

::

Ivan Vilata i Balaguer >qo< http://www.carabos.com/
Cárabos Coop. V. V V Enjoy Data
""


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFD+wXCmKrUC8oEF40RAnF2AJ40ZFvZhujkpK2GtAXXZOA05EUBXQCginkR
JrkqUEMB8pKxyPghkKlY7Gg=
=7iCi
-----END PGP SIGNATURE-----
 
E

Enigma Curry

Thanks for all the suggestions!

I realized a few minutes after I posted that a database would work.. I
just wasn't in that "mode" of thinking when I posted.

PyTables also looks very interesting, especially because apparently I
can read a file in the archive like a normal python file, ie one line
at a time.

Could I do the same using SQL? I'm assuming I would get the whole file
back when I did my SELECT statement. I guess I could chunk the file out
and store it in multiple rows, but that sounds complicated.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top