Python 3000: Standard API for archives?

Discussion in 'Python' started by samwyse, Jun 4, 2007.

  1. samwyse

    samwyse Guest

    I'm a relative newbie to Python, so please bear with me. There are
    currently two standard modules used to access archived data: zipfile
    and tarfile. The interfaces are completely different. In particular,
    script wanting to analyze different types of archives must duplicate
    substantial pieces of logic. The problem is not limited to method
    names; it includes how stat-like information is accessed.

    I think it would be a good thing if a standardized interface existed,
    similar to PEP 247. This would make it easier for one script to access
    multiple types of archives, such as RAR, 7-Zip, ISO, etc. In
    particular, a single factory class could produce PEP 302 import hooks
    for future as well as current archive formats.

    I think that an archive module adhering to the standard should adopt a
    least-common-denominator approach, initially supporting read-only access
    without seek, i.e. tar files on actual tape. For applications that
    require a seek method (such as importers) a standard wrapper class could
    transparently cache archive members in temp files; this would fit in
    well with Python 3000's rewrite of the I/O interface to support
    stackable interfaces. To this end, we'd need is_seekable and
    is_writable attributes for both the module and instances (moduel level
    would declare if something is possible, not if it is always true).

    Most importantly, all archive modules should provide a standard API for
    accessing their individual files via a single archive_content class that
    provides a standard 'read' method. Less importantly but nice to have
    would be a way for archives to be auto-magically scanned during walks of
    directories.

    Feedback?
    samwyse, Jun 4, 2007
    #1
    1. Advertising

  2. samwyse

    Chuck Rhode Guest

    samwyse wrote this on Mon, 04 Jun 2007 12:02:03 +0000. My reply is
    below.

    > I think it would be a good thing if a standardized interface
    > existed, similar to PEP 247. This would make it easier for one
    > script to access multiple types of archives, such as RAR, 7-Zip,
    > ISO, etc.


    Gee, it would be great to be able to open an archive member for update
    I/O. This is kind of hard to do now. If it were possible, though, it
    would obscure the difference between file directories and archives,
    which would be kind of neat. Furthermore, you could navigate archives
    of archives (zips of tars and other abominations).

    --
    ... Chuck Rhode, Sheboygan, WI, USA
    ... Weather: http://LacusVeris.com/WX
    ... 62° — Wind N 7 mph — Sky overcast. Mist.
    Chuck Rhode, Jun 4, 2007
    #2
    1. Advertising

  3. samwyse

    Tim Golden Guest

    Chuck Rhode wrote:
    > samwyse wrote this on Mon, 04 Jun 2007 12:02:03 +0000. My reply is
    > below.
    >
    >> I think it would be a good thing if a standardized interface
    >> existed, similar to PEP 247. This would make it easier for one
    >> script to access multiple types of archives, such as RAR, 7-Zip,
    >> ISO, etc.

    >
    > Gee, it would be great to be able to open an archive member for update
    > I/O. This is kind of hard to do now. If it were possible, though, it
    > would obscure the difference between file directories and archives,
    > which would be kind of neat. Furthermore, you could navigate archives
    > of archives (zips of tars and other abominations).


    FWIW, there's no need to get hung on Python-3000 or
    any other release. Just put something together a module
    called "archive" or whatever, which exposes the kind of
    API you're thinking of, offering support across zip, bz2
    and whatever else you want. Put it up on the Cheeseshop,
    announce it on c.l.py.ann and anywhere else which seems
    apt. See if it gains traction. Take it from there.

    NB This has the advantage that you can start small, say
    with zip and bz2 support and maybe see if you get
    contributions for less common formats, even via 3rd
    party libs. If you were to try to get it into the stdlib
    it would need to be much more fully specified up front,
    I suspect.

    TJG
    Tim Golden, Jun 4, 2007
    #3
  4. samwyse

    Chuck Rhode Guest

    Tim Golden wrote this on Mon, 04 Jun 2007 15:55:30 +0100. My reply is
    below.

    > Chuck Rhode wrote:


    >> samwyse wrote this on Mon, 04 Jun 2007 12:02:03 +0000. My reply is
    >> below.


    >>> I think it would be a good thing if a standardized interface
    >>> existed, similar to PEP 247. This would make it easier for one
    >>> script to access multiple types of archives, such as RAR, 7-Zip,
    >>> ISO, etc.


    >> Gee, it would be great to be able to open an archive member for
    >> update I/O. This is kind of hard to do now. If it were possible,
    >> though, it would obscure the difference between file directories
    >> and archives, which would be kind of neat. Furthermore, you could
    >> navigate archives of archives (zips of tars and other
    >> abominations).


    > Just put something together a module called "archive" or whatever,
    > which exposes the kind of API you're thinking of, offering support
    > across zip, bz2 and whatever else you want. Put it up on the
    > Cheeseshop, announce it on c.l.py.ann and anywhere else which seems
    > apt. See if it gains traction. Take it from there.


    > NB This has the advantage that you can start small, say with zip and
    > bz2 support and maybe see if you get contributions for less common
    > formats, even via 3rd party libs. If you were to try to get it into
    > the stdlib it would need to be much more fully specified up front, I
    > suspect.


    Yeah, this is in the daydreaming stages. I'd like to maintain
    not-just-read-only libraries of geographic shapefiles, which are
    available free from governmental agencies and which are riddled with
    obvious errors. Typically these are published in compressed archives
    within which every subdirectory is likewise compressed (apparently for
    no other purpose than a rather vain attempt at flattening the
    directory structure, which must be reconstituted on the User's end
    anyway). Building a comprehensive index to what member name(s) the
    different map layers (roads, political boundaries, watercourses) have
    in various political districts of varying geographic resolutions is
    much more than merely frustrating. I've given it up. However, I
    believe that once I've located something usable, the thing to do is
    save a grand unified reference locator (GURL) for it. The GURL would
    specify a directory path to the highest level archive followed by a
    (potential cascade of) archive member name(s for enclosed archives) of
    the data file(s) to be operated on. Unpacking and repacking would be
    behind the scenes. Updates (via FTP) of non-local resources would be
    transparent, too. I think, though, that notes about the publication
    date, publisher, resolution, area covered, and format of the map or
    map layer ought to be kept out of the GURL.

    My whole appetite for this sort of thing would vanish if access to the
    shapefiles were more tractable to begin with.

    --
    ... Chuck Rhode, Sheboygan, WI, USA
    ... 1979 Honda Goldwing GL1000 (Geraldine)
    ... Weather: http://LacusVeris.com/WX
    ... 52° — Wind N 9 mph — Sky overcast.
    Chuck Rhode, Jun 5, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. mike kreiner
    Replies:
    11
    Views:
    554
    Bengt Richter
    Dec 30, 2004
  2. rahul

    Python 3000 C API Changes

    rahul, Aug 23, 2008, in forum: Python
    Replies:
    2
    Views:
    249
    Stefan Behnel
    Aug 24, 2008
  3. Curt Hibbs
    Replies:
    1
    Views:
    237
    olof sivertsson
    Dec 18, 2005
  4. Curt Hibbs
    Replies:
    2
    Views:
    240
    Curt Hibbs
    Dec 18, 2005
  5. David Karr
    Replies:
    2
    Views:
    505
    J. Gleixner
    Jun 1, 2012
Loading...

Share This Page