[PyWart 1001] Inconsistencies between zipfile and tarfile APIs

rantingrick · Jul 21, 2011

I may have found the mother of all inconsitency warts when comparing
the zipfile and tarfile modules. Not only are the API's different, but
the entry and exits are differnet AND zipfile/tarfile do not behave
like proper file objects should.

--------------------------------------------------
1. Zipfile and tarfile entry exit.
--------------------------------------------------
Traceback (most recent call last):
File "<pyshell#12>", line 1, in <module>
zf = zipfile.open(ZIP_PATH)
AttributeError: 'module' object has no attribute 'open'<tarfile.TarFile object at 0x02B3B850>

*COMMENT*
As you can see, the tarfile modules exports an open function and
zipfile does not. Actually i would prefer that neither export an open
function and instead only expose a class for instantion.

*COMMENT*
Since a zipfile object is a file object then asking for the tf object
after the object after the file is closed should show a proper
message!
Traceback (most recent call last):
File "<pyshell#72>", line 1, in <module>
tf = tarfile.TarFile(TAR_PATH)
File "C:\Python27\lib\tarfile.py", line 1572, in __init__
self.firstmember = self.next()
File "C:\Python27\lib\tarfile.py", line 2335, in next
raise ReadError(str(e))
ReadError: invalid headerTraceback (most recent call last):
File "<pyshell#75>", line 1, in <module>
tf.fp
AttributeError: 'TarFile' object has no attribute 'fp'True

*COMMENT*
Tarfile is missing the attribute "fp" and instead exposes a boolean
"closed". This mismatching API is asinine! Both tarfile and zipfile
should behave EXACTLY like file objects
<closed file 'C:\text.txt', mode 'r' at 0x02B26F98>

--------------------------------------------------
2. Zipfile SPECIFIC entry exit
--------------------------------------------------None

*COMMENT*
As you can see, unlike tarfile zipfile cannot handle a passed path.

--------------------------------------------------
3. Zipfile and Tarfile obj API differences.
--------------------------------------------------

zf.namelist() -> tf.getnames()
zf.getinfo(name) -> tf.getmenber(name)
zf.infolist() -> tf.getmembers()
zf.printdir() -> tf.list()

*COMMENT*
Would it have been too difficult to make these names match? Really?

--------------------------------------------------
4. Zipfile and Tarfile infoobj API differences.
--------------------------------------------------

zInfo.filename -> tInfo.name
zInfo.file_size -> tInfo.size
zInfo.date_time -> tInfo.mtime

*COMMENT*
Note the inconsistencies in naming conventions of the zipinfo methods.

*COMMENT*
Not only is modified time named different between zipinfo and tarinfo,
they even return completely different values of time.

--------------------------------------------------
Conclusion:
--------------------------------------------------
It is very obvious that these modules need some consistency between
not only themselves but also collectively. People, when emulating a
file type always be sure to emulate the built-in python file type as
closely as possible.

PS: I will be posting more warts very soon. This stdlib is a gawd
awful mess!

Corey Richardson · Jul 22, 2011

Excerpts from rantingrick's message of Thu Jul 21 23:46:05 -0400 2011:

I may have found the mother of all inconsitency warts when comparing
the zipfile and tarfile modules. Not only are the API's different, but
the entry and exits are differnet AND zipfile/tarfile do not behave
like proper file objects should.

I agree, actually.
--
Corey Richardson
"Those who deny freedom to others, deserve it not for themselves"
-- Abraham Lincoln

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iQEcBAEBCAAGBQJOKPj8AAoJEAFAbo/KNFvpb9EH/2EGTZAdrHgrXNJvpcnAv5so
xtsO7dshBC7O+oqQyBICf6UvjbWMr4pZUz2ZwmheY2+Ygk3w4IaXBJb5QQg0CAzn
p26rwj86b1OF97UT6lzcogDErjPZ7MRGAGktRRWbMqlDC+Ba4eTKEGM4J3WaHWJW
qKvDd6/75IZL1sk7O7/wMDjiyETQOrocRkFslli5beQv3lqaYPf6RqqT3f7UVmJn
tPQFbGellhsM2JpjqalwQaRdWimb4ltWxr+nL2iJKQWGTd2ffxj+r8ut/aEqhtO0
MlX/ztFsVZ7X+yjhF9KvKuiYv/2OvUSCnWZoAyTSvqNsnbETOoB1aNvxjR6VoQ8=
=19ef
-----END PGP SIGNATURE-----

rantingrick · Jul 22, 2011

Excerpts from rantingrick's message of Thu Jul 21 23:46:05 -0400 2011:

I agree, actually.

Unfortunately i know what the "powers that be" are going to say about
fixing this wart.

PTB: "Sorry we cannot break backwards compatibility"
Rick: But what about Python 3000?
PTB: " Oh, well, umm, lets see. Well that was then and this is now!

Maybe i can offer a solution. A NEW module called "archive.py" (could
even be a package!) which exports both the zip and tar file classes.
However, unlike the current situation this archive module will be
consistent with it's API.

Corey Richardson · Jul 22, 2011

Excerpts from rantingrick's message of Fri Jul 22 00:48:37 -0400 2011:

Maybe i can offer a solution. A NEW module called "archive.py" (could
even be a package!) which exports both the zip and tar file classes.
However, unlike the current situation this archive module will be
consistent with it's API.

I have nothing to do this weekend, I might as well either write my own or
twist around the existing implementations in the hg repo.
--
Corey Richardson
"Those who deny freedom to others, deserve it not for themselves"
-- Abraham Lincoln

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iQEcBAEBCAAGBQJOKQUTAAoJEAFAbo/KNFvpYQAH/1pYe+De3uwj4MoAo0f9dYAu
n7wyXy35iMr/gyUJRAZ4xCSgkk92TIzTUIWX94eisQyuByeW2AfIC2FgaDAYP5tS
q18TRmZbSvxqrl8MfLNltkFFr+pn4/NBvtM5uPzkz+Cp7aJcpEiQHYA0WNt7cHiQ
GSpePSVE4MRbSpMOGgPkSy7rS2ofaBdMS+bEQHGBkTpVO7Qh6RuF8ZkWprh5C/fY
tuf3R7PY3S4w06n9N2u4OtZSI1QZ8DzOR1MIs2hNhNxd24UOsqog8XX7zCPDDo7p
lxd4CoMYt9ka8+TiyJmSOyNw3yc+G4otSPhe64H+L7w38Yn72EneQzuDJMfxMYM=
=jCgi
-----END PGP SIGNATURE-----

Terry Reedy · Jul 22, 2011

Hmm. Archives are more like directories than files. Windows, at least,
seems to partly treat zipfiles as more or less as such. Certainly, 7zip
present a directory interface. So opening a zipfile/tarfile would be
like opening a directory, which we normally do not do. On the other
hand, I am not sure I like python's interface to directories that much.

It would be more sensible to open files within the archives. Certainly,
it would be nice to have the result act like file objects as much as
possible.

Seaching open issues for 'tarfile' or 'zipfile' returns about 40 issues
each. So I think some people would care more about fixing bugs than
adjusting the interfaces. Of course, some of the issues may be about the
interface and increasing consistency where it can be done without
compatibility issues. However, I do not think there are any active
developers focued on those two modules.

Unfortunately i know what the "powers that be" are going to say about
fixing this wart.

PTB: "Sorry we cannot break backwards compatibility"

Do you propose we break compatibility more than we do? You are not the
only Python ranter. People at Google march into Guido's office to
complain instead of posting here.

Rick: But what about Python 3000?
PTB: " Oh, well, umm, lets see. Well that was then and this is now!

The changes made for 3.0 were more than enough for some people to
discourage migration to Py3. And we *have* made additional changes
since. So the resistance to incompatible feature changes has increased.

Maybe i can offer a solution. A NEW module called "archive.py" (could
even be a package!) which exports both the zip and tar file classes.
However, unlike the current situation this archive module will be
consistent with it's API.

Not a bad idea. Put it on PyPI and see how much support you can get.

Ryan Kelly · Jul 22, 2011

Hmm. Archives are more like directories than files. Windows, at least,
seems to partly treat zipfiles as more or less as such. Certainly, 7zip
present a directory interface. So opening a zipfile/tarfile would be
like opening a directory, which we normally do not do. On the other
hand, I am not sure I like python's interface to directories that much.

Indeed. Actually, I'd say that archives are more like *entire
filesystems* than either files or directories.

We have a pretty nice ZipFS implementation as part of the PyFilesystem
project:

http://packages.python.org/fs/

If anyone cares enough to whip up a TarFS implementation it would be
gratefully merged into trunk. (There may even be the start of one in
the bugtracker somewhere, I don't recall...)

Cheers,

Ryan

--
Ryan Kelly
http://www.rfk.id.au | This message is digitally signed. Please visit
(e-mail address removed) | http://www.rfk.id.au/ramblings/gpg/ for details

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEABECAAYFAk4pEQIACgkQfI5S64uP50qwrACfXAfL5t5qOlfqjMbhp453ce3a
m10AoJv+veHdoy33/crFrSnFEhim7P4p
=kqjq
-----END PGP SIGNATURE-----

rantingrick · Jul 22, 2011

I have nothing to do this weekend, I might as well either write my own or
twist around the existing implementations in the hg repo.

My hat is off to you Mr. Richardson. I've even considered creating my
own clean versions of these two modules, because heck, it is not that
difficult to do! However we must stop fixing these warts on a local
level Corey. We MUST clean up this damn python stdlib once and for
all.

I am willing and you are willing; that's two people. However, can we
convince the powers that be to upgrade these modules? Sure, if we get
enough people shouting for it to happen they will notice. So come on
people make your voices heard. Chime in and let the devs know we are
ready to unite and tackle these problems in our stdlib.

What this community needs (first and foremost) is some positive
attitudes. If you don't want to write the code fine. But at least
chime in and say... "Hey guys, that's a good idea! I would like to see
some of these APIs cleaned up too. good luck! +1"

Now, even if we get one hundred people chanting... "Yes, Yes, Fix This
Mess!"... i know Guido and company are going to frown because of
backwards incompatibility. But let me tell you something people, the
longer we put off these changes the more painful they are going to
be.

Python 3000 would have been the perfect time to introduce a more
intuitive and unified zip/tar archive module however that did not
happen. So now we need to think about adding a duplicate module
"archive.py" and deprecating zipfile.py and tarfile.py. We can remove
the old modules when Python 4000 rolls out.

That's just step one people, we have a long way to go!

rantingrick · Jul 22, 2011

Hmm. Archives are more like directories than files. Windows, at least,
seems to partly treat zipfiles as more or less as such.

Yes but a zipfile is just a file not a directory. This is not the
first time Microsoft has "mislead" people you know. ;-)

Certainly, 7zip
present a directory interface. So opening a zipfile/tarfile would be
like opening a directory, which we normally do not do. On the other
hand, I am not sure I like python's interface to directories that much.

I don't think we should make comparisons between applications and
API's.

It would be more sensible to open files within the archives. Certainly,
it would be nice to have the result act like file objects as much as
possible.

Well you still need to start at the treetop (which is the zip/tar
file) because lots of important information is exposed at that level:

* compressed file listing
* created, modified times
* adding / deleting
* etc.

I'll admit you could think of it as a directory but i would not want
to do that. People need to realize that tar and zip files are FILES
and NOT folders.

Seaching open issues for 'tarfile' or 'zipfile' returns about 40 issues
each. So I think some people would care more about fixing bugs than
adjusting the interfaces. Of course, some of the issues may be about the
interface and increasing consistency where it can be done without
compatibility issues.

Yes i agree! If we can at least do something as meager as this it
would be a step forward. However i still believe the current API is
broken beyond repair so we must introduce a new "archive" module.
That's my opinion anyway.

However, I do not think there are any active
developers focued on those two modules.

We need some fresh blood infused into Python-dev. I have been trying
to get involved for a long time. We as a community need to realize
that this community is NOT a homogeneous block. We need to be a little
more accepting of new folks and new ideas. I know this language would
evolve much quicker if we did.

Do you propose we break compatibility more than we do? You are not the
only Python ranter. People at Google march into Guido's office to
complain instead of posting here.

Well, i do feel for Guido because i know he's taking holy hell over
this whole Python 3000 thing. If you guys don't remember i was a
strong opponent of almost all the changes a few years ago (search the
archives). However soon after taking a "serious" look at the changes
and considering the benefits i was convinced. I believe we are moving
in the correct direction with the language HOWEVER the library is
growing stale by the second. I want to breathe new life into this
library and i believe many more people like myself exist but they
don't know how to get involved. I can tell everyone who is listening
the easiest first step is simply to speak up and make a voice for
yourself. Don't be afraid to state your opinions. You can start right
now by chiming in on this thread. Anybody is welcome to offer opinions
no matter what experience level.

The changes made for 3.0 were more than enough for some people to
discourage migration to Py3. And we *have* made additional changes
since. So the resistance to incompatible feature changes has increased.

Yes i do understand these changes have been very painful for some
folks (me included). However there is only but one constant in this
universe and that constant is change. I believe we can improve many of
these API's starting with zip/tar modules. By the time Python 4000
gets here (and it will be much sooner than you guys realize!) we need
to have this stdlib in pristine condition. That means:

* Removing style guide violations.
* Removing inconsistencies in existing API's.
* Making sure doc strings and comments are everywhere.
* Cleaning up the IDLE library (needs a complete re-write!)
* Cleaning up Tkinter.
* And more

Baby steps are the key to winning this battle. We hit all the easy
stuff first (doc-strings and style guide) and save the painful stuff
for Python 4000. Meanwhile we introduce new modules and deprecate the
old stuff. However we need to start the python 4000 migration now. We
cannot keep putting off what should have already been done in Python
3000.

Not a bad idea. Put it on PyPI and see how much support you can get.

Thanks, I might just do that!

Corey Richardson · Jul 22, 2011

Excerpts from rantingrick's message of Fri Jul 22 02:40:51 -0400 2011:

Yes but a zipfile is just a file not a directory. This is not the
first time Microsoft has "mislead" people you know. ;-)

Ehh...yes and no. Physically, it is a file and nothing more. But its actual
use and contents could reflect that of a directory. Are files and directories
that different, after all? I don't believe so. They are both an expression
of the same thing. Both contain data, one just contains others of itself.
Of course, treating a zipfile as a directory will certainly have a performance
cost. But here in Linux-land (and elsewhere I'm sure) I can mount, for example,
a disk image to a mountpoint anywhere. It's a useful thing to do!

I don't think we should make comparisons between applications and
API's.

Ehh...yes and no again. Maybe the applications are on to something? Whether
the filesystem is physically on disk or is just a representation of a
filesystem on a file in a filesystem on disk, treating them both as a
filesystem is a useful abstraction (NOT the only one available?)

Well you still need to start at the treetop (which is the zip/tar
file) because lots of important information is exposed at that level:

* compressed file listing
* created, modified times
* adding / deleting
* etc.

I'll admit you could think of it as a directory but i would not want
to do that. People need to realize that tar and zip files are FILES
and NOT folders.

I think it's a useful abstraction to think if an archive as a directory.
They ARE files, yes. But must their physical representation impact their
semantics? I think not! It doesn't matter if Python's list object is a
linked-list down under or if it isn't. Or any sequence, for that matter!
It's a useful abstraction to treat them all as sequences, uniform interface
etc, even though one sequence might be a linked list in a C module, or
a row from a database, or whatever!

Yes i agree! If we can at least do something as meager as this it
would be a step forward. However i still believe the current API is
broken beyond repair so we must introduce a new "archive" module.
That's my opinion anyway.

Checking if such a thing exists already may be more useful. I saw someone
mention a project similar?

We need some fresh blood infused into Python-dev. I have been trying
to get involved for a long time. We as a community need to realize
that this community is NOT a homogeneous block. We need to be a little
more accepting of new folks and new ideas. I know this language would
evolve much quicker if we did.

Yes i do understand these changes have been very painful for some
folks (me included). However there is only but one constant in this
universe and that constant is change. I believe we can improve many of
these API's starting with zip/tar modules. By the time Python 4000
gets here (and it will be much sooner than you guys realize!) we need
to have this stdlib in pristine condition. That means:

* Removing style guide violations.
* Removing inconsistencies in existing API's.
* Making sure doc strings and comments are everywhere.
* Cleaning up the IDLE library (needs a complete re-write!)
* Cleaning up Tkinter.
* And more

All noble goals. I think the fact that everyone* knows that the stdlib is
a mess and not the epitome of Good Python is kinda sad...

* for some definition of "everyone"
--
Corey Richardson
"Those who deny freedom to others, deserve it not for themselves"
-- Abraham Lincoln

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iQEcBAEBCAAGBQJOKSSIAAoJEAFAbo/KNFvpdWoIAJw9J6zymtlRzR2cR34TPtMi
bp8Zija5bH7aDvwRukV+Iy2fs5TLv3Vvm6CTvgmwboevSav/uVNQ2g+fYegue82y
ESjkchh7Qw0XiYrJI6jWeNbosch+IqBKEmKXel3P3Dp7XIQTOXmyHT816xMLIard
b+Escep+TOdf/xMsgBaYYaoLPksrHVnKpTaoyvUD3jmUvHk7CiXdpLe2lGgPNL62
YGUx/AbvcKbvXki5GQa988bk6f+ks0XMR45NTaviEiD1uc9oTjLuwN6a5dHPtC7+
qShTlSDWn8dMk6NNI7L/WYCadrTSd703SnjQ1sptxSh1Y14oHonOzVq+wfgft8k=
=SjyG
-----END PGP SIGNATURE-----

Lars Gustäbel · Jul 22, 2011

I may have found the mother of all inconsitency warts when comparing
the zipfile and tarfile modules. Not only are the API's different, but
the entry and exits are differnet AND zipfile/tarfile do not behave
like proper file objects should.

There is a reason why these two APIs are different. When I wrote tarfile
zipfile had already been existing for maybe 8 years and I didn't like its
interface very much. So, I came up with a different one for tarfile that in my
opinion was more general and better suited the format and the kind of things I
wanted to do with it. In the meantime the zipfile API got a lot of attention
and some portions of tarfile's API were ported to zipfile.

*COMMENT*
As you can see, the tarfile modules exports an open function and
zipfile does not. Actually i would prefer that neither export an open
function and instead only expose a class for instantion.

So that is your preference.

*COMMENT*
Since a zipfile object is a file object then asking for the tf object
after the object after the file is closed should show a proper
message!

It is no file object.

*COMMENT*
Tarfile is missing the attribute "fp" and instead exposes a boolean
"closed". This mismatching API is asinine! Both tarfile and zipfile
should behave EXACTLY like file objects

No, they don't. Because they have not much in common with file objects. I am
not sure what you are trying to prove here. And although I must admit that you
have a point overall you seem to get the details wrong. If tarfile and zipfile
objects behave "EXACTLY" like file objects, what does the read() method return?
What does seek() do? And readline()?

What do you prove when you say that tarfile has no "fp" attribute? You're not
supposed to use the tarfile's internal file object, there is nothing productive
you could do with it.

*COMMENT*
As you can see, unlike tarfile zipfile cannot handle a passed path.

Hm, I don't know what you mean.

zf.namelist() -> tf.getnames()
zf.getinfo(name) -> tf.getmenber(name)
zf.infolist() -> tf.getmembers()
zf.printdir() -> tf.list()

*COMMENT*
Would it have been too difficult to make these names match? Really?

As I already stated above, I didn't want to adopt the zipfile API because I
found it unsuitable. So I came up with an entirely new one. I thought that
being incompatible was better than using an API that did not fit exactly.

*COMMENT*
Note the inconsistencies in naming conventions of the zipinfo methods.

*COMMENT*
Not only is modified time named different between zipinfo and tarinfo,
they even return completely different values of time.

See above.

It is very obvious that these modules need some consistency between
not only themselves but also collectively. People, when emulating a
file type always be sure to emulate the built-in python file type as
closely as possible.

See above.

PS: I will be posting more warts very soon. This stdlib is a gawd
awful mess!

I do not agree. Although I come across one or two odd things myself from time
to time, I think the stdlib as a whole is great, usable and powerful.

The stdlib surely needs our attention. Instead of answering your post, I should
have been writing code and fixing bugs ...

--
Lars Gustäbel
(e-mail address removed)

Seek simplicity, and distrust it.
(Alfred North Whitehead)

Lars Gustäbel · Jul 22, 2011

My hat is off to you Mr. Richardson. I've even considered creating my
own clean versions of these two modules, because heck, it is not that
difficult to do! However we must stop fixing these warts on a local
level Corey. We MUST clean up this damn python stdlib once and for
all.

One could get the impression that you are leading a grass-roots movement
fighting a big faceless corporation. Instead, what you're dealing with is this
warm and friendly Python community you could as well be a part of if you are a
reasonable guy and write good code.

I am willing and you are willing; that's two people. However, can we
convince the powers that be to upgrade these modules? Sure, if we get
enough people shouting for it to happen they will notice. So come on
people make your voices heard. Chime in and let the devs know we are
ready to unite and tackle these problems in our stdlib.

Yeah, great. Please write code. Or a PEP.

What this community needs (first and foremost) is some positive
attitudes. If you don't want to write the code fine. But at least
chime in and say... "Hey guys, that's a good idea! I would like to see
some of these APIs cleaned up too. good luck! +1"
+1

Now, even if we get one hundred people chanting... "Yes, Yes, Fix This
Mess!"... i know Guido and company are going to frown because of
backwards incompatibility. But let me tell you something people, the
longer we put off these changes the more painful they are going to
be.

And backwards compatibility is bad why? Tell me, what exactly is your view
towards this? Should there be none?

Python 3000 would have been the perfect time to introduce a more
intuitive and unified zip/tar archive module however that did not
happen. So now we need to think about adding a duplicate module
"archive.py" and deprecating zipfile.py and tarfile.py. We can remove
the old modules when Python 4000 rolls out.

That's just step one people, we have a long way to go!

archive.py is no new idea. Unfortunately, to this day, nobody had the time to
come up with an implementation.

Let me say it again: less false pathos, more code. Please.

--
Lars Gustäbel
(e-mail address removed)

To a man with a hammer, everything looks like a nail.
(Mark Twain)

Thomas Jollans · Jul 22, 2011

PS: I will be posting more warts very soon. This stdlib is a gawd
awful mess!

Please don't. Not here.

There's a wonderful bug tracker at python.org. Use that. That's where
this kind of thing belongs. And, please, be concise.

What's the point of shouting it out here anyway? Just fix what you think
needs fixing! Sure, you can come here to ask for comments on your new
and improved API. Sure, when you've got something presentable, come here
and show us.

But nobody needs this kind of rant, rantingrick.

Tim Chase · Jul 22, 2011

What do you prove when you say that tarfile has no "fp"
attribute? You're not supposed to use the tarfile's internal
file object, there is nothing productive you could do with
it.

While I've needed access to such a fp object, it's been limited
to cases where I passed a file-like object to the constructor
instead of a path-name:

tf = tarfile.open(fileobj=foo, ...)

so I had access to "foo" without reaching into the
tarfile/zipfile object for the internal fp. Usually this
involves using a StringIO object or a temp-file that then gets
cleaned up when complete.

-tkc

rantingrick · Jul 22, 2011

There is a reason why these two APIs are different. When I wrote tarfile
zipfile had already been existing for maybe 8 years and I didn't like its
interface very much. So, I came up with a different one for tarfile that in my
opinion was more general and better suited the format and the kind of things I
wanted to do with it. In the meantime the zipfile API got a lot of attention
and some portions of tarfile's API were ported to zipfile.

Well i'll admit that i do like like the tarfile's API much better; so
kudos to you kind sir.

So that is your preference.

WWrong! It is more that just a MERE preference. Tarfile and zipfile
are BOTH archive modules and as such should present a consistent API.
I really don't care so much about the actual details AS LONG AS THE
APIs ARE CONSISTENT!

It is no file object.

Then why bother to open and close it like a file object? If we are not
going to treat it as a file object then we should not have API methods
open and close.

If tarfile and zipfile
objects behave "EXACTLY" like file objects, what does the read() method return?
What does seek() do? And readline()?

I am not suggesting that these methods become available. What i was
referring to is the fact that the instance does not return its current
state like a true file object would. But just for academic sake we
could apply these three methods in the following manner:

* read() -> extract the entire archive.
* readline() -> extract the N'ith archive member.
* seek() -> move to the N'ith archive member.

Not that i think we should however.

What do you prove when you say that tarfile has no "fp" attribute?

My point is that the API's between tarfile and zipfile should be
consistent. "fp" is another example of inconsistency. If we are going
to have an "fp" method in one, we should have it in the other.

Hm, I don't know what you mean.

Sorry that comment was placed in the wrong position. I also eulogizer
for sending the message three times; it seems my finger was a little
shaky that night. What i was referring to is that tarfile does not
allow a path to be passed to the constructor whereas zipfile does:
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
tf = tarfile.TarFile('c:\\tar.tar')
File "C:\Python27\lib\tarfile.py", line 1572, in __init__
self.firstmember = self.next()
File "C:\Python27\lib\tarfile.py", line 2335, in next
raise ReadError(str(e))
ReadError: invalid header

As I already stated above, I didn't want to adopt the zipfile API becauseI
found it unsuitable. So I came up with an entirely new one. I thought that
being incompatible was better than using an API that did not fit exactly.

I agree with you. Now if we can ONLY change the zipfile API to match
then we would be golden!

I do not agree. Although I come across one or two odd things myself from time
to time, I think the stdlib as a whole is great, usable and powerful.

And that's why we find ourselves in this current dilemma. This stdlib
IS a mess and yours and everyone else's denials about it is not
helping the situation.

The stdlib surely needs our attention. Instead of answering your post, I should
have been writing code and fixing bugs ...

Will you be starting with the zipfile API migration?

Chris Angelico · Jul 22, 2011

WWrong! It is more that just a MERE preference. Tarfile and zipfile
are BOTH archive modules and as such should present a consistent API.
I really don't care so much about the actual details AS LONG AS THE
APIs ARE CONSISTENT!

Python and C++ are BOTH programming languages and as such should
present a consistent API. I really don't care so much about the actual
details <caps>as long as the APIs (standard libraries) are
consistent!</caps>

Chris Angelico

Chris Angelico · Jul 22, 2011

Oh, and:

Will you be starting with the zipfile API migration?

Will you?

Rick, quit ranting and start coding. If you want things to happen, the
best way is to do them. If you make a post on the dev list WITH A
PATCH, or submit your patch on the bug tracker, then people might
start taking you seriously.

In other words, put up or shut up.

ChrisA

rantingrick · Jul 22, 2011

One could get the impression that you are leading a grass-roots movement
fighting a big faceless corporation. Instead, what you're dealing with isthis
warm and friendly Python community you could as well be a part of if you are a
reasonable guy and write good code.

Sometimes i do feel as if i am fighting against an evil empire. I am a
reasonable guy and i do write -good-, no excellent code.

Yeah, great. Please write code. Or a PEP.

I am not about to just hop through all the hoops of PEP and PEP8 code
just to have someone say "Sorry, we are not going to include your
code". What i want at this point is to get feedback from everyone
about this proposed archive.py module. Because unlike other people, i
don't want to ram MY preferred API down others throats.

Step one is getting feedback on the idea of including a new archive
module. Step two is hammering out an acceptable API spec. Step three
is is actually writing the code and finally getting it accepted into
the stdlib.

Not only do i need feedback from everyday Python scripters, i need
feedback from Python-dev. I even need feedback from the great GvR
himself! (maybe not right away but eventually).

+1

Thank you! Now, can you convince your comrades at pydev to offer their
opinions here also? Even if all they do is say "+1".

And backwards compatibility is bad why? Tell me, what exactly is your view
towards this? Should there be none?

First let me be clear that "backwards-compatibility" (BC) is very
important to any community. We should always strive for BC. However
there is no doubt we are going to make mistakes along the way and at
some point SOME APIs will need to be broken in the name of consistency
or some other important reason.

As i've said before Py3000 would have been the PERFECT opportunity to
fix this broken API within the current zipfile and tarfile modules.
Since that did not happen, we must now introduce a new module
"archive.py" and deprecate the zip and tar modules immediately. We
shall remove them forever in Python4000.

If you guys think we are done breaking BC, you are in for big
surprises! Py3000 was just the beginning of clean-ups. Py4000 is
going to be a game changer! And when we finally get to Py4000 and
remove all these ugly warts python is going to be a better language
for it. Mark my words people!

archive.py is no new idea. Unfortunately, to this day, nobody had the time to
come up with an implementation.

It's time to change;
Can't stay the same;
Rev-o-lu-tion is MY name!

We can never become complacent and believe we have reached perfection
because we never will.

Terry Reedy · Jul 22, 2011

On Jul 22, 12:45 am, Terry Reedy<[email protected]> wrote:

Let me give some overall comments rather than respond point by point.

Python-dev is a volunteer *human* community, not a faceless corporation,
with an ever-changing composition (a very mutable set;-).
It is too small, really, for the current size of the project.

Python 3 was mostly about syntax cleanup. Python-dev was not large
enough to also do much stdlib cleanup. With the syntax moratorium,
attention *was* focused on the stdlib and problems were found. Some
functions names was actively incorrect (due to shift from str-unicode to
bytes-strings). Some functions were undocumented and ambiguous as to
their public/private status. Some deprecations were made that will take
effect in 3.3 or 3.4.

This introduced the problem that upgrading to Python 3 is no longer a
single thing. We really need 2to3.1 (the current 2to3), 2to3.2, 2to3.3,
etc, but someone would have to make the new versions, but no one,
currently, has the energy and interest to do that. So people who did not
port their 2.x code early now use the problem of multiple Python 3
targets as another excuse not to do so now. (Actually, most 2.x code
should not be ported, but their are more libraries that we do need in 3.x.)

The way to revamp a module is to introduce a new module. Any anythong
now must be released first on PyPI. This has precedent. In 2.x days,
urllib2 was an upgrade to urllib though I do not if it was on PyPI.

For 3.x, Stephen Behnel's argparse supercedess optparse, but the latter
remains with the notice in red: "Deprecated since version 2.7: The
optparse module is deprecated and will not be developed further;
development will continue with the argparse module.". Argparse was first
released on pypi and versions compatible with earlier than 2.7 and 3.2
remain there.

The new 3.3 module 'distribute' is a renamed distutils2. It is now on
PyPI, where it has been tested with current and earlier versions and it
will remain there even after 3.3 is released.

An archive module should be released or at least listed on PyPI. It will
thus be available wherther or not incorporated into the stdlib. (Many
useful modules never are, partly because the authors recognize that
there are disadvantages as well as advantages to being in the stdlib.)
It should be compatible with at least 3.1+ so that people can use it and
be compatible with multiple 3.x versions. Starting with a version < 1.0
implies that the api is subject to change with user experience.

This does not preclude also making compatible changes *also* in stdlib
modules. And as I mentioned before, there are already a lot of bug and
feature requests on the tracker. Merely putting a new face (api) on a
sick pig is not enough.

Terry Reedy · Jul 22, 2011

I do not agree. Although I come across one or two odd things myself from time
to time, I think the stdlib as a whole is great, usable and powerful.

The stdlib surely needs our attention. Instead of answering your post, I should
have been writing code and fixing bugs ...

I am glad you posted, both to get your rebuttal and know you are still
active. I had presumed that the two modules were written by different
people at different times and hence the different apis. I do not know
the details of either well enough to know how consistent they could be.

You are right that discussing can be a distraction from coding;-).

Ned Deily · Jul 22, 2011

This introduced the problem that upgrading to Python 3 is no longer a
single thing. We really need 2to3.1 (the current 2to3), 2to3.2, 2to3.3,
etc, but someone would have to make the new versions, but no one,
currently, has the energy and interest to do that. So people who did not
port their 2.x code early now use the problem of multiple Python 3
targets as another excuse not to do so now. (Actually, most 2.x code
should not be ported, but their are more libraries that we do need in 3.x.)

I don't quite understand this. Since 2to3 is included with Python 3,
there are, in fact, separate releases of 2to3 for each release of Python
3 so far. And, unlike with Python 2 with a large installed base across
a number of versions, Python 3 version support can be and is much more
focused now in its early releases. Support for 3.0 was terminated
immediately upon release of 3.1. And 3.1 is now in security-fix mode
only. So, except for a brief overlap after the initial release of 3.2,
there has only been one Python 3 release that needs to be targeted. Of
course, that will change over time as adoption continues and mainstream
OS's include specific Python 3 releases. But, for now, it's easy: just
target the most recent Python 3 release, currently 3.2.1. Don't worry
about earlier releases.

3.2 can't extract tarfile produced by 2.7	0	Dec 26, 2012
PyWart: PEP8: a seething cauldron of inconsistencies.	1	Jul 28, 2011
PyWart: PEP8: A cauldron of inconsistencies.	7	Jul 27, 2011
tarfile doesn't work with tgz files?	7	May 16, 2009
tarfile woes	5	Aug 21, 2003
using python with tar files and compressed files	3	Aug 9, 2006
AttributeError: '<win32com.gen_py.Microsoft Excel 14.0 ObjectLibrary.Shape instance at 0x70837752>'	3	Feb 13, 2014
zipfile decompress problems	5	Jan 16, 2006

[PyWart 1001] Inconsistencies between zipfile and tarfile APIs

rantingrick

Corey Richardson

rantingrick

Corey Richardson

Terry Reedy

Ryan Kelly

rantingrick

rantingrick

Corey Richardson

Lars Gustäbel

Lars Gustäbel

Thomas Jollans

Tim Chase

rantingrick

Chris Angelico

Chris Angelico

rantingrick

Terry Reedy

Terry Reedy

Ned Deily

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads