filecmp.cmp() cache

=?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?= · Feb 15, 2007

Hello!

I have a question about filecmp.cmp(). The short code snippet blow
does not bahave as I would expect:

import filecmp

f0 = "foo.dat"
f1 = "bar.dat"

f = open(f0, "w")
f.write("1:2")
f.close()

f = open(f1, "w")
f.write("1:2")
f.close()

print "cmp 1: " + str(filecmp.cmp(f0, f1, False))

f = open(f1, "w")
f.write("2:3")
f.close()

print "cmp 2: " + str(filecmp.cmp(f0, f1, False))

I would expect the second comparison to return False instead of True.
Looking at the docs for filecmp.cmp() I found the following: "This
function uses a cache for past comparisons and the results, with a
cache invalidation mechanism relying on stale signatures.". I guess
that this is the reason for my test case failing.

Is there someone here that can tell me how I should invalidate this
cache? If that is not possible, what workaround could I use? I guess
that I can write my own file comparison function, but I would not like
to have to do that since we have filecmp.

Any ideas?

Regards,
Mattias

Peter Otten · Feb 15, 2007

Mattias said:
I have a question about filecmp.cmp(). The short code snippet blow
does not bahave as I would expect:

import filecmp

f0 = "foo.dat"
f1 = "bar.dat"

f = open(f0, "w")
f.write("1:2")
f.close()

f = open(f1, "w")
f.write("1:2")
f.close()

print "cmp 1: " + str(filecmp.cmp(f0, f1, False))

f = open(f1, "w")
f.write("2:3")
f.close()

print "cmp 2: " + str(filecmp.cmp(f0, f1, False))

I would expect the second comparison to return False instead of True.
Looking at the docs for filecmp.cmp() I found the following: "This
function uses a cache for past comparisons and the results, with a
cache invalidation mechanism relying on stale signatures.". I guess
that this is the reason for my test case failing.

Is there someone here that can tell me how I should invalidate this
cache? If that is not possible, what workaround could I use? I guess
that I can write my own file comparison function, but I would not like
to have to do that since we have filecmp.

Any ideas?

You can clear the cache with

filecmp._cache = {}

as a glance into the filecmp module would have shown.
If you don't want to use the cache at all (untested):

class NoCache:
def __setitem__(self, key, value):
pass
def get(self, key):
return None
filecmp._cache = NoCache()

Alternatively an update to Python 2.5 might work as the type of
os.stat(filename).st_mtime was changed from int to float and now offers
subsecond resolution.

Peter

=?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?= · Feb 15, 2007

You can clear the cache with

filecmp._cache = {}

as a glance into the filecmp module would have shown.

You are right, a quick glance would have enlighten me. Next time I
will RTFS first.

If you don't want to use the cache at all (untested):

class NoCache:
def __setitem__(self, key, value):
pass
def get(self, key):
return None
filecmp._cache = NoCache()

Just one small tought/question. How likely am I to run into trouble
because of this? I mean, by setting _cache to another value I'm
mucking about in filecmp's implementation details. Is this generally
considered OK when dealing with Python's standard library?

:.:: mattias

Peter Otten · Feb 15, 2007

Mattias said:
You are right, a quick glance would have enlighten me. Next time I
will RTFS first.

Just one small tought/question. How likely am I to run into trouble
because of this? I mean, by setting _cache to another value I'm
mucking about in filecmp's implementation details. Is this generally
considered OK when dealing with Python's standard library?

I think it's a feature that Python lends itself to monkey-patching, but
still there are a few things to consider:

- Every hack increases the likelihood that your app will break in the next
version of Python.
- You take some responsibility for the "patched" code. It's no longer the
tried and tested module as provided by the core developers.
- The module may be used elsewhere in the standard library or third-party
packages, and failures (or in the above example: performance degradation)
may ensue.

For a script and a relatively obscure module like 'filecmp' monkey-patching
is probably OK, but for a larger app or a module like 'os' that is heavily
used throughout the standard lib I would play it safe and reimplement.

Peter

=?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?= · Feb 15, 2007

I think it's a feature that Python lends itself to monkey-patching, but
still there are a few things to consider:

- Every hack increases the likelihood that your app will break in the next
version of Python.
- You take some responsibility for the "patched" code. It's no longer the
tried and tested module as provided by the core developers.
- The module may be used elsewhere in the standard library or third-party
packages, and failures (or in the above example: performance degradation)
may ensue.

For a script and a relatively obscure module like 'filecmp' monkey-patching
is probably OK, but for a larger app or a module like 'os' that is heavily
used throughout the standard lib I would play it safe and reimplement.

Thanks for the insight! Right now I need this for a unit test, so in
this case I'm quite happy to use the NoCache solution you suggested.

:.:: brasse

Steve Holden · Feb 16, 2007

Peter said:
I think it's a feature that Python lends itself to monkey-patching, but
still there are a few things to consider:

- Every hack increases the likelihood that your app will break in the next
version of Python.
- You take some responsibility for the "patched" code. It's no longer the
tried and tested module as provided by the core developers.
- The module may be used elsewhere in the standard library or third-party
packages, and failures (or in the above example: performance degradation)
may ensue.

For a script and a relatively obscure module like 'filecmp' monkey-patching
is probably OK, but for a larger app or a module like 'os' that is heavily
used throughout the standard lib I would play it safe and reimplement.

It would probably be a good idea to add a clear_cache() function to the
module API for 2.6 to avoid such issues.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Blog of Note: http://holdenweb.blogspot.com
See you at PyCon? http://us.pycon.org/TX2007

Python code problem	2	Apr 23, 2023
Newbie. Need help	2	Jul 8, 2013
Export data from python to a txt file	5	Mar 29, 2013
How do I write a script to generate 10 random EVEN numbers and writethem to a .txt file?	3	Jul 8, 2013
Need help with this script	4	Mar 12, 2023
Translater + module + tkinter	1	Feb 16, 2023
problem with saving data in a text file	0	Apr 24, 2013
Python battle game help	2	Feb 23, 2023

filecmp.cmp() cache

=?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?=

Peter Otten

=?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?=

Peter Otten

=?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?=

Steve Holden

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads