S
samwyse
I'm a relative newbie to Python, so please bear with me. There are
currently two standard modules used to access archived data: zipfile
and tarfile. The interfaces are completely different. In particular,
script wanting to analyze different types of archives must duplicate
substantial pieces of logic. The problem is not limited to method
names; it includes how stat-like information is accessed.
I think it would be a good thing if a standardized interface existed,
similar to PEP 247. This would make it easier for one script to access
multiple types of archives, such as RAR, 7-Zip, ISO, etc. In
particular, a single factory class could produce PEP 302 import hooks
for future as well as current archive formats.
I think that an archive module adhering to the standard should adopt a
least-common-denominator approach, initially supporting read-only access
without seek, i.e. tar files on actual tape. For applications that
require a seek method (such as importers) a standard wrapper class could
transparently cache archive members in temp files; this would fit in
well with Python 3000's rewrite of the I/O interface to support
stackable interfaces. To this end, we'd need is_seekable and
is_writable attributes for both the module and instances (moduel level
would declare if something is possible, not if it is always true).
Most importantly, all archive modules should provide a standard API for
accessing their individual files via a single archive_content class that
provides a standard 'read' method. Less importantly but nice to have
would be a way for archives to be auto-magically scanned during walks of
directories.
Feedback?
currently two standard modules used to access archived data: zipfile
and tarfile. The interfaces are completely different. In particular,
script wanting to analyze different types of archives must duplicate
substantial pieces of logic. The problem is not limited to method
names; it includes how stat-like information is accessed.
I think it would be a good thing if a standardized interface existed,
similar to PEP 247. This would make it easier for one script to access
multiple types of archives, such as RAR, 7-Zip, ISO, etc. In
particular, a single factory class could produce PEP 302 import hooks
for future as well as current archive formats.
I think that an archive module adhering to the standard should adopt a
least-common-denominator approach, initially supporting read-only access
without seek, i.e. tar files on actual tape. For applications that
require a seek method (such as importers) a standard wrapper class could
transparently cache archive members in temp files; this would fit in
well with Python 3000's rewrite of the I/O interface to support
stackable interfaces. To this end, we'd need is_seekable and
is_writable attributes for both the module and instances (moduel level
would declare if something is possible, not if it is always true).
Most importantly, all archive modules should provide a standard API for
accessing their individual files via a single archive_content class that
provides a standard 'read' method. Less importantly but nice to have
would be a way for archives to be auto-magically scanned during walks of
directories.
Feedback?