os.listdir unwanted behaviour

C

Chris Adamson

Hello,

I am writing code that cycles through files in a directory and for each
file it writes out another file with info in it. It appears that as I am
iterating through the list returned by os.listdir it is being updated
with the new files that are being added to the directory. This occurs
even if I reassign the list to another variable.

Here is my code:

fileList = os.listdir(temporaryDirectory)

for curFile in fileList:
# print the file list to see if it is indeed growing
print FileList
fp = file(os.path.join(temporaryDirectory, "." + curFile), 'w')
# write stuff
fp.close()

Here is the output:

['a', 'b', 'c']
['a', 'b', 'c', '.a']
['a', 'b', 'c', '.a', '.b']
['a', 'b', 'c', '.a', '.b', '.c']

So the list is growing and eventually curFile iterates through the list
of files that were created. I don't want this to happen and it seems
like a bug because the fileList variable should be static, i.e. not
updated after being assigned.
Even if I assign fileList to another variable this still happens. Any ideas?

Chris.
 
P

Peter Otten

Chris said:
I am writing code that cycles through files in a directory and for each
file it writes out another file with info in it. It appears that as I am
iterating through the list returned by os.listdir it is being updated
with the new files that are being added to the directory. This occurs
even if I reassign the list to another variable.

My guess is that this has nothing to do with os.listdir():
import os
files = os.listdir(".")
files ['b', 'a']
os.system("touch c") 0
files ['b', 'a'] # look Ma, no automatic updates!
os.listdir(".")
['b', 'c', 'a']

It is normal Python behaviour that assignment doesn't copy a list; it just
creates another reference:
a = [1]
b = a
id(a) == id(b) True
b.append(2)
a
[1, 2]

Use slicing to make an actual copy:
b = a[:] # b = list(a) would work, too
id(a) == id(b) False
b.append(3)
a [1, 2]
b
[1, 2, 3]
Here is my code:

No, it's not. If you post a simplified version it is crucial that you don't
remove the parts that actually cause the undesired behaviour. In your case
there has to be a mutating operation on the list like append() or extend().

Peter
 
S

Steven D'Aprano

Hello,

I am writing code that cycles through files in a directory and for each
file it writes out another file with info in it. It appears that as I am
iterating through the list returned by os.listdir it is being updated
with the new files that are being added to the directory. This occurs
even if I reassign the list to another variable.

Here is my code:

fileList = os.listdir(temporaryDirectory)

for curFile in fileList:
# print the file list to see if it is indeed growing
print FileList
fp = file(os.path.join(temporaryDirectory, "." + curFile), 'w')
# write stuff
fp.close()

Are you sure this is your code you're using? Where is FileList defined?
It's not the same as fileList.

What you describe is impossible -- os.listdir() returns an ordinary list,
it isn't a lazy iterator that updates automatically as the directory
changes. (At least not in Python2.5 -- I haven't checked Python 3.1.)
This is what happens when I try it:

import os
os.listdir('.') ['a', 'c', 'b']
filelist = os.listdir('.')
for curFile in filelist:
.... print filelist
.... fp = file(os.path.join('.', "."+curFile), 'w')
.... fp.close()
....
['a', 'c', 'b']
['a', 'c', 'b']
['a', 'c', 'b']


I think the bug is in your code -- you're probably inadvertently updating
fileList somehow.
 
P

Piet van Oostrum

Steven D'Aprano said:
SD> What you describe is impossible -- os.listdir() returns an ordinary list,
SD> it isn't a lazy iterator that updates automatically as the directory
SD> changes. (At least not in Python2.5 -- I haven't checked Python 3.1.)

He's not using Python3, see the print statement and the file function.
But even with the appropriate changes the behaviour will be the same in
3.1 as in 2.x.
 
T

Tim Chase

Piet said:
He's not using Python3, see the print statement and the file function.
But even with the appropriate changes the behaviour will be the same in
3.1 as in 2.x.

I think Steven may be remembering the conversation here on c.l.p
a month or two back where folks were asking to turn os.listdir()
into an iterator (or create an os.xlistdir() or os.iterdir()
function) because directories with lots of files were causing
inordinate slowdown. Yes, listdir() in both 2.x and 3.x both
return lists while such a proposed iterator version could be
changed on the fly by interim file/directory creation.

-tkc
 
H

Hendrik van Rooyen

I think Steven may be remembering the conversation here on c.l.p
a month or two back where folks were asking to turn os.listdir()
into an iterator (or create an os.xlistdir() or os.iterdir()
function) because directories with lots of files were causing
inordinate slowdown. Yes, listdir() in both 2.x and 3.x both
return lists while such a proposed iterator version could be
changed on the fly by interim file/directory creation.

Is os.walk not the right thing to use for this kind of stuff?

- Hendrik
 
T

Tim Chase

a month or two back where folks were asking to turn os.listdir()
Is os.walk not the right thing to use for this kind of stuff?

Behind the scenes os.walk() calls listdir() which has the same
problems in directories with large files. But yes, I believe
there was discussion in that thread of having a generator that
behaved like os.walk() but called the proposed xlistdir() or
iterdir() function instead. However, no such beast exists yet
(in stock Python).

-tkc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,139
Latest member
JamaalCald
Top