How To Do It Faster?!?

A

andrea_gavana

Hello Jeremy & NG,
* Poke around in the Windows API for a function that does what you want,
and hope it can do it faster due to being in the kernel.

I could try it, but I think I have to explain a little bit more my problem.
If you post more information about how you are using this data, I can try
to help you.

Basically, I have to scan a really BIG directory: essentially, is a UNIX
file system where all our projects resides, with thousand and thousand of
files and more than 1 TB of information. However, we are about 200-300 users
of this space. This is what I do now and I would like to improve:

1) For a particular user (1 and only 1 at a time), I would like to scan
all directories and subdirectories in order to find which FILES are owned
by this user (I am NOT interested in directory owner, only files). Noting
that I am searching only for 1 user, its disc quota is around 20-30 GB,
or something like this;
2) My application is a GUI designed with wxPython. It run on Windows, at
the moment (this is why I am asking for Windows user IDs and similar, on
Unix is much simpler);
3) While scanning the directories (using os.walk), I process the results
of my command "dir /q /-c /a-d MyDirectory" and I display this results on
a wxListCtrl (a list viewer) of wxPython in my GUI;
4) I would not use the suggested command "dir /S" on a DOS shell because,
even if it scans recursively all directories, I am NOT able to process intermediate
results because this command never returns until it has finished to scan
ALL directories (and for 1 TB of files, it can take a LOT of time);
5) For all the files in each directory scanned, I do:
- IF a file belongs to that particular user THEN:
Get the file name;
Get the file size;
Get the last modification date;
Display the result on my wxListCtrl
- ELSE:
Disregard the information;
- END

I get the file owner using the /Q switch of the DIR command, and I exclude
a priori the subdirectories using the /a-d switch. That because I am using
os.walk().
6) All of our users can see this big unix directory on their PC, labeled
as E:\ or F:\ or whatever. I can not anyway use UNIX command on dos (and
I can not use rsh to communicate with the unix machine and then use something
like "find . -name etc".

I hope to have been clearer this time...

I really welcome all your suggestions.

Andrea.
 
J

Jeremy Bowers

Hello Jeremy & NG,
...
I hope to have been clearer this time...

I really welcome all your suggestions.

Yes, clearer, though I still don't know what you're *doing* with that data :)

Here's an idea to sort of come at the problem from a different angle. Can
you run something on the file server itself, and use RPC to access it?

The reason I mention this is a lot of UNIXes have an API to detect file
changes live; for instance, google "python fam". It would be easy to hook
something up to scan the files at startup and maintain your totals live,
and then use one of the many extremely easy Python RPC mechanisms to
request the data as the user wants it, which would most likely come back
at network speeds (fast).

This would be orders of magnitude faster, and no scanning system could
compete with it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,610
Members
45,254
Latest member
Top Crypto TwitterChannel

Latest Threads

Top