Count Files in a Directory

H

hokiegal99

I'm trying to count the number of files within a directory, but I
don't really understand how to go about it. This code:

for root, dirs, files in os.walk(path):
for fname in files:
x = str.count(fname)
print x

Produces this error:

TypeError: count() takes at least 1 argument (0 given)

for root, dirs, files in os.walk(path):
for fname in files:
x = list.count(files)
print x

TypeError: count() takes exactly one argument (0 given)

Also wondered why the inconsistency in error messages (numeric 1 vs.
one)??? Using 2.3.0

Thanks!!!
 
J

John Roth

hokiegal99 said:
I'm trying to count the number of files within a directory, but I
don't really understand how to go about it. This code:

for root, dirs, files in os.walk(path):
for fname in files:
x = str.count(fname)
print x

I'm not sure why you're trying to use the str(ing)
module.

(untested)
for root, dirs, files in os.walk(path):
print "files in '%s': '%s'" % (root, len(files))

This should give you a list of the directories
under 'path,' with the number of files (not
directories!) in each one.
Produces this error:

TypeError: count() takes at least 1 argument (0 given)

for root, dirs, files in os.walk(path):
for fname in files:
x = list.count(files)
print x

TypeError: count() takes exactly one argument (0 given)

Also wondered why the inconsistency in error messages (numeric 1 vs.
one)??? Using 2.3.0

One function (str.count) seems to have keyword arguments,
the other one doesn't. That's why the 'at least' versus the 'exactly.'
The two cases are processed slightly differently.

John Roth
 
M

Martin v. Loewis

hokiegal99 said:
I'm trying to count the number of files within a directory, but I
don't really understand how to go about it.

You mean, you want the number of files in a directory? The number
of immediate files, or the number including files in nested
directories as well?

If you want the number of immediate files, you can use
os.listdir(path) to produce a list of files in a directory.
You can then use len(L) to compute the number of items in
a list. IOW,

print len(os.listdir(path))

does what you might want.

Regards,
Martin
 
J

Jay Dorsey

I'm trying to count the number of files within a directory, but I
don't really understand how to go about it. This code:

Do you want the total number of files under one directory, or
the number of files under each directory?
for root, dirs, files in os.walk(path):
for fname in files:
x = str.count(fname)
print x

Produces this error:

TypeError: count() takes at least 1 argument (0 given)

str.count() is a string method used to obtain the number of
occurences of a substring in a string (try help(str.count))
1


for root, dirs, files in os.walk(path):
for fname in files:
x = list.count(files)
print x

TypeError: count() takes exactly one argument (0 given)

Also wondered why the inconsistency in error messages (numeric 1 vs.
one)??? Using 2.3.0
Similar problem here, for a list method ( try help(list.count)).
n = ["test", "blah", "bleh"]
n.count("test")
1

In your example, fname would be an individual file name within a
directory list of files (in your example, the variable files).

What you probably want is len(), not count().

If you want the number of files in each directory try:
.... print len(files)

Or, for the total number of files:

hth
 
A

Anand Pillai

If you are using Python 2.3, the new os.walk() function
is a better and faster choice.

Here is a sample code for counting files & sub-dirs recursively.

---------<snip>------<snip>-------------------

# dircount.py
import os

dir_count, file_count=0, 0
for root, dirs, files in os.walk('.'):
dir_count += len(dirs)
file_count += len(files)

print 'Found', dir_count, 'sub-directories in cwd'
print 'Found', file_count, 'files in cwd'
print 'Found', dir_count + file_count, 'files & sub-directories in cwd'
---------<snip>------<snip>-------------------

-Anand

Jay Dorsey said:
I'm trying to count the number of files within a directory, but I
don't really understand how to go about it. This code:

Do you want the total number of files under one directory, or
the number of files under each directory?
for root, dirs, files in os.walk(path):
for fname in files:
x = str.count(fname)
print x

Produces this error:

TypeError: count() takes at least 1 argument (0 given)

str.count() is a string method used to obtain the number of
occurences of a substring in a string (try help(str.count))
1


for root, dirs, files in os.walk(path):
for fname in files:
x = list.count(files)
print x

TypeError: count() takes exactly one argument (0 given)

Also wondered why the inconsistency in error messages (numeric 1 vs.
one)??? Using 2.3.0
Similar problem here, for a list method ( try help(list.count)).
n = ["test", "blah", "bleh"]
n.count("test")
1

In your example, fname would be an individual file name within a
directory list of files (in your example, the variable files).

What you probably want is len(), not count().

If you want the number of files in each directory try:
... print len(files)

Or, for the total number of files:

hth
 
H

hokieghal99

Jay said:
Do you want the total number of files under one directory, or
the number of files under each directory?

Thanks for the tip... this bit of code seems to work:

def fs_object_count(path):
file_count = 0
dir_count = 0
for root, dirs, files in os.walk(path):
for fname in files:
file_count += len(fname)
for dname in dirs:
dir_count += len(dname)
fs_object_count()
 
P

Peter Otten

hokieghal99 said:
Thanks for the tip... this bit of code seems to work:

def fs_object_count(path):
file_count = 0
dir_count = 0
for root, dirs, files in os.walk(path):
for fname in files:
file_count += len(fname)
for dname in dirs:
dir_count += len(dname)
# should return something, e. g:
return dir_count, file_count
fs_object_count()

No! You are calculating the total number of characters. Suppose path has two
subdirectories "alpha" and "beta". Your function would calculate a
dir_count of 9 instead of 2.

Instead of the inner for loops just add the length of the dirs list which is
equal to the number of subdirectories:

#untested
def fs_object_count(path):
file_count = 0
dir_count = 0
for root, dirs, files in os.walk(path):
file_count += len(files)
dir_count += len(dirs)
return dir_count, file_count
fs_object_count("/home/alone/II")

Peter
 
H

hokieghal99

Peter said:
hokieghal99 wrote:



# should return something, e. g:
return dir_count, file_count



No! You are calculating the total number of characters. Suppose path has two
subdirectories "alpha" and "beta". Your function would calculate a
dir_count of 9 instead of 2.

Instead of the inner for loops just add the length of the dirs list which is
equal to the number of subdirectories:

#untested
def fs_object_count(path):
file_count = 0
dir_count = 0
for root, dirs, files in os.walk(path):
file_count += len(files)
dir_count += len(dirs)
return dir_count, file_count
fs_object_count("/home/alone/II")

Peter

Wow!!! What an obvious error. I've drank too much coffee this morning.
We would have caught something this bad in testing. Thanks for the
help... you've saved me a bit of embarassment.
 
H

hokieghal99

def fs_object_count(setpath):
file_count = 0
dir_count = 0
for root, dirs, files in os.walk(setpath):
file_count += len(files)
dir_count += len(dirs)
return file_count, dir_count

I'm using this. Thanks to all for the tips and corrections! It works
very well on small sets of data (<= 5GB) it is very accurate on this
data. On largeer (> 20GB) filesystems, the count will be off by 30 to 40
objects. For example on one 22GB filesystem the OS thinks there are
100,222 objects in the path, Python thinks there are 100,260... Any
ideas on this difference?

Thanks!!!
 
H

hokieghal99

hokieghal99 said:
def fs_object_count(setpath):
file_count = 0
dir_count = 0
for root, dirs, files in os.walk(setpath):
file_count += len(files)
dir_count += len(dirs)
return file_count, dir_count

I'm using this. Thanks to all for the tips and corrections! It works
very well on small sets of data (<= 5GB) it is very accurate on this
data. On largeer (> 20GB) filesystems, the count will be off by 30 to 40
objects. For example on one 22GB filesystem the OS thinks there are
100,222 objects in the path, Python thinks there are 100,260... Any
ideas on this difference?

Thanks!!!

Woops... I spoke too soon. Just caught this difference on 4.4GB of data.
The os thinks there are 12,204 objects in the path while Python
thinks there are 12,205 objects.
 
R

Robin Munn

hokieghal99 said:
Woops... I spoke too soon. Just caught this difference on 4.4GB of data.
The os thinks there are 12,204 objects in the path while Python
thinks there are 12,205 objects.

I wonder if symlinks or hard links might account for the difference?

Here's an idea: modify your script to print (on stdout) the name of each
filesystem object found. Do the same with the OS's functions. Then
compare the two. I.e., on a Unix system, something like:

(Modify Python script to print full paths to stdout)
python myscript.py /some/path > python_file_list.txt
find /some/path -print > find_file_list.txt
sort python_file_list.txt > python_file_list_sorted.txt
sort find_file_list.txt > find_file_list_sorted.txt
diff -u find_file_list_sorted.txt python_file_list_sorted.txt
rm find_file_list*txt python_file_list*txt

That should tell you precisely which file is being found by the Python
script but isn't being found when you get the OS to do the work.
 
J

Jay Dorsey

Woops... I spoke too soon. Just caught this difference on 4.4GB of data.
The os thinks there are 12,204 objects in the path while Python
thinks there are 12,205 objects.

What command are you using to find out how many objects are in the path?

Also, what order to you run the commands in--maybe there *are* 12,205 objects
in the path when you run the Python script. Maybe a tmp file is being created,
or processes are occuring that add and remove files on the system.

If you could check a non-system path (one where the OS wont' read/write to) with
large amounts of data that would probably be best. What OS is this, and whats
the path that you pass in to your function?
 
P

Peter Otten

hokieghal99 said:
The os thinks there are 12,204 objects in the path while Python
thinks there are 12,205 objects.

If it's always one more directory then it could be the initial directory
setpath (ugly name, by the way) that is counted by the os but not by your
function. To fix it, you could initialize

dir_count = 1 # instead of 0

Peter
 
A

Anand Pillai

os.walk() does not follow symlinks to avoid infinite recursions.
This could be the reason for the difference in reported count.

Qouting from python docs for 2.3,

"""Note: On systems that support symbolic links, links to
subdirectories appear in dirnames lists, but walk() will not visit
them (infinite loops are hard to avoid when following symbolic links).
To visit linked directories, you can identify them with
os.path.islink(path), and invoke walk(path) on each directly."""

You can do something like this.

dir_count, file_count = 0, 0

#untested code!
def dircount(path):

for root, dirs, files in os.walk(path):
dir_count += len(dirs)
file_count += len(files)
if os.path.islink(path) and os.path.isdir(path):
dircount(path)

HTH.

-Anand
 
H

hokieghal99

Robin said:
I wonder if symlinks or hard links might account for the difference?

Here's an idea: modify your script to print (on stdout) the name of each
filesystem object found. Do the same with the OS's functions. Then
compare the two. I.e., on a Unix system, something like:

(Modify Python script to print full paths to stdout)
python myscript.py /some/path > python_file_list.txt
find /some/path -print > find_file_list.txt
sort python_file_list.txt > python_file_list_sorted.txt
sort find_file_list.txt > find_file_list_sorted.txt
diff -u find_file_list_sorted.txt python_file_list_sorted.txt
rm find_file_list*txt python_file_list*txt

That should tell you precisely which file is being found by the Python
script but isn't being found when you get the OS to do the work.

This is very good. I will test it out today. Not sure exactly where the
difference is coming from. I noticed last night that when the script is
ran as root that Python counts many more files than the OS does... maybe
it is a permission issue too. Thnaks again!!!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,156
Latest member
KetoBurnSupplement
Top