How To Do It Faster?!?

A

andrea_gavana

Hello Jeremy & NG,
Yes, clearer, though I still don't know what you're *doing* with that data
:)

Every user of thsi big directory works on big studies regarding oil fields.
Knowing the amount of data (and number of files) we have to deal with (produced
by simulators, visualization tools, and so on) and knowing that users are
usually lazy in doing clean up of unused/old files, this is a way for one
of us to "fast" scan all the directories and identify which files belong
to him. Having them in an organized, size-sorted wxPython list, the user
can decide if he want to delete some files (that almost surely he forgot
even that they exist...) or not. It is easy as a button click (retrieve
the data-->delete the files).
Here's an idea to sort of come at the problem from a different angle. Can
you run something on the file server itself, and use RPC to access it?

I don't even know what is RPC... I have to look at it.

The reason I mention this is a lot of UNIXes have an API to detect file
changes live; for instance, google "python fam". It would be easy to hook
something up to scan the files at startup and maintain your totals live,
and then use one of the many extremely easy Python RPC mechanisms to
request the data as the user wants it, which would most likely come back
at network speeds (fast).

I am not sure if my new explanation fits with your last information... as
above, I didn't even know about fam... I've read a little, but probably
I am too newbie to see a link between it and my scope. Do you think it exists?
It would be nice to have something that tracks the file status on all the
file system, but probably is a LOT of work wrt what my app should be able
to do.
Anyway, thanks for the hints! If my new explanation changed something, can
anyone post some more comments?

Thanks to you all.

Andrea.
 
J

Jeremy Bowers

Hello Jeremy & NG,
Every user of thsi big directory works on big studies regarding oil fields.
Knowing the amount of data (and number of files) we have to deal with (produced
by simulators, visualization tools, and so on) and knowing that users are
usually lazy in doing clean up of unused/old files, this is a way for one
of us to "fast" scan all the directories and identify which files belong
to him. Having them in an organized, size-sorted wxPython list, the user
can decide if he want to delete some files (that almost surely he forgot
even that they exist...) or not. It is easy as a button click (retrieve
the data-->delete the files).

Got it. A good idea!
I don't even know what is RPC... I have to look at it.

RPC stands for "remote procedure call". The idea is that you do something
that looks like a normal function call, except it happens on a remote
server. Complexity varies widely.

Given your situation, and if running something on the UNIX server is a
possibility, I'd recommend downloading and playing with Pyro; it is Python
specific, so I think it would be the best thing for you, being powerful,
well integrated with Python, and easy to use.

Then, on your client machine in Windows, ultimately you'd make some sort
of call to your server like

fileList = server.getFileList(user)

and you'd get the file list for that user, returning whatever you want for
your app; a list of tuples, objects, whatever you want. Pyro will add no
constraints to your app.
I am not sure if my new explanation fits with your last information... as
above, I didn't even know about fam... I've read a little, but probably
I am too newbie to see a link between it and my scope. Do you think it exists?
It would be nice to have something that tracks the file status on all the
file system, but probably is a LOT of work wrt what my app should be able
to do.

Maybe, maybe not. I've never used FAM. Perhaps someone who has can chime
in about the ease of use; I've changed the subject to try to attract such
a person. It also depends on if FAM works on your UNIX.

My point is that you can do one scan at startup (can't avoid this), but
then as the file system monitor tells you that a change has occurred, you
update your data structures to account for the change. That way, your data
is always in sync. (For safety's sake, you might set the server to
terminate itself and re-start every night.) Since it's always in sync, you
can send this data back instead of scanning the file system.

At this point, my suggestion would be to consider whether you want to
spend the effort to speed it up like this, which is something only you
(and presumably your managers) are in a position to know, given that you
have an existing tool (at least, you seem to speak like you have a
functional tool). If you do, then I'd take some time and work a bit with
Pyro and FAM, and *then* re-evaluate where you stand. By then you'll
probably be able to ask better questions, too, and like I said above,
perhaps someone will share their experiences with FAM.

Good luck, and have fun; seriously, that's important here.
 
S

Simo Melenius

Every user of thsi big directory works on big studies regarding oil
fields. Knowing the amount of data (and number of files) we have to
deal with (produced by simulators, visualization tools, and so on)
and knowing that users are usually lazy in doing clean up of
unused/old files, this is a way for one of us to "fast" scan all the
directories and identify which files belong to him. Having them in
an organized, size-sorted wxPython list, the user can decide if he
want to delete some files (that almost surely he forgot even that
they exist...) or not. It is easy as a button click (retrieve the
data-->delete the files).

Correct me if I'm wrong but since it _seems_ that the listing doesn't
need to be up-to-date each minute/hour as the users will be looking
primarily for old/unused files, why not have a daily cronjob on the
Unix server to produce an appropriate file list on e.g. the root
directory of your file server?

Your Python client would then load that (possibly compressed) text
file from the network share and find the needed bits in there.

Note that if some "old/unneeded" files are missing today, they'll show
right up the following day.

For example, running the GNU find command like this:

$ find . -type f -printf "%T@ %u %s %p\n" > /yourserverroot/files.txt

produces a file where each line contains the last modified time,
username, size and path for one file. Dead easy to parse with Python,
and you'll only have to set up the cronjob _once_ on the Unix server.

(If the file becomes too big, grep can be additionally used to split
the file e.g. per each user.)


br,
S
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,610
Members
45,255
Latest member
TopCryptoTwitterChannels

Latest Threads

Top