How to structure a perl program to include and exclude files?

H

Henry Law

I'm implementing this in Perl but I recognise that there's a strong
element of language independent program design in the question. Hope
there's enough perlishness to keep me afloat.

I am writing a Perl program which will process a file tree and allow
the user to specify which directories and subdirectories are to be
included or excluded. (Anyone who uses xxcopy in Win will know
immediately what I mean). I plan to have the users describe the files
to include and exclude by means of strict Perl regex's. So a control
file might look something like this

include /a # Do files in a and all subdirs
exclude a/b/~?temp\d* # Except for temp files in a/b

.... and so on. I haven't worked out the full grammar yet (do I allow
indefinite series of include..exclude..include? I don't know). But
I'm having more trouble with conceptualising how to write the program
in Perl. Current idea is to write a recursive function to process all
the files in a single directory, calling itself for sub-directories.
It would slurp in the control file regex's and sort them
alphabetically into two arrays, one for "include" and one for
"exclude", and then implement logic like

$do_this_one = 0;
foreach $regex (@includes) {
if ($current_file =~ $regex) {
$do_this_one = 1;
}
}
foreach $regex (@excludes) {
if ($current_file =~ $regex) {
$do_this_one = 0;
}
}
do_the_stuff() if $do_this_one;

But doing that lot for every file looks very laborious; for example if
the control file is a simple "include /a and all subdirectories" then
I don't want to look at the regex more than once. And it's not very
Perl-ish either, come to that.

Questions:
(1) Is there a module that will help me? Or some code that I could
copy?
(2) If not, is there a better way of structuring the do-we-do-this-one
logic to make it more elegant and efficient?

Henry Law <>< Manchester, England
 
G

Gunnar Hjalmarsson

Henry said:
I am writing a Perl program which will process a file tree and
allow the user to specify which directories and subdirectories are
to be included or excluded. (Anyone who uses xxcopy in Win will
know immediately what I mean). I plan to have the users describe
the files to include and exclude by means of strict Perl regex's.
So a control file might look something like this

include /a # Do files in a and all subdirs
exclude a/b/~?temp\d* # Except for temp files in a/b

... and so on. I haven't worked out the full grammar yet (do I
allow indefinite series of include..exclude..include? I don't
know). But I'm having more trouble with conceptualising how to
write the program in Perl. Current idea is to write a recursive
function to process all the files in a single directory, calling
itself for sub-directories.

Why not use File::Find?

use File::Find 'find';
find (
sub {
local $_ = $File::Find::name;
push @found, $_ if /$include/ and !/$exclude/
}, $path
);
 
K

Kan Yabumoto

Henry Law said:
I'm implementing this in Perl but I recognise that there's a strong
element of language independent program design in the question. Hope
there's enough perlishness to keep me afloat.

I am writing a Perl program which will process a file tree and allow
the user to specify which directories and subdirectories are to be
included or excluded. (Anyone who uses xxcopy in Win will know
immediately what I mean). I plan to have the users describe the files
to include and exclude by means of strict Perl regex's. So a control
file might look something like this

include /a # Do files in a and all subdirs
exclude a/b/~?temp\d* # Except for temp files in a/b

... and so on. I haven't worked out the full grammar yet (do I allow
indefinite series of include..exclude..include? I don't know). But
I'm having more trouble with conceptualising how to write the program
in Perl. Current idea is to write a recursive function to process all
the files in a single directory, calling itself for sub-directories.

Henry,

I think I can give you some advice on this issue since I've
been thinking of this issue many years.

Even though I'm extremely knowledgeable about XXCOPY, I'm not
sure exactly what you are trying to do. Are you trying to create
a perl script so that something similar to XXCOPY can be made
available in Linux (or other) environments?

Currently, XXCOPY's support for inclusion is very limited
(it accepts only variations in the "last name" (e.g.,
/IN:*.mp3 /IN:*.doc /IN:abc*). Other than this exception,
XXCOPY's file-selection mechanisms are all exclusive in nature.
There is good reason for this design. Exclusion specifiers
(in the form of date-range specifications, and filesize-specifications
in addition to file/directory pattern specifications) can all
be treated in an additive manner. As long as the file-selection
parameters (switches in XXCOPY command line) are exclusive
in nature, both the implementation and user-understanding
are very easy. Similar or dissimilar file-selection switches
won't contradict each other. They can overlap (some files
can be excluded for two or more reasons).

On the other hand, if you design a command rules that allow
both the exclusion and the inclusion, you really have to
decide which one will have the precedence over the other
since they are contradictory in nature (not only in the
definition of the command rule, but also for user understanding).

I think it is helpful to verbalize what you are trying to do
into plain English. If you can express what you (the user)
want to do and how you (the programmer) will implement and
document the program actions in plain English with clarity,
you may proceed. But, if you are confused of what you are
trying to achieve, you can't program it regardless of the
language you choose.

Let me go back to how XXCOPY presents its capability with
regard to the inclusion and exclusion. The truth is that
the inclusion feature in XXCOPY is really an exclusion
operation in disguise.

1. If there is no inclusion switch (/IN:...), XXCOPY will
not exclude anything.

xxcopy \src_dir\ ...

This is equivalent to

xxcopy \src_dir\*

Which is really

xxcopy \src_dir\ /IN:*

2. If the source specifier contains the lastname pattern,

xxcopy \src_dir\*.mp3

This is equivalent to

xxcopy \src_dir\ /X:(everything except *.mp3)

3. If the command contains two or more inclusion specifiers

xxcopy \src_dir\ /IN:*.mp3 /IN:*.jpg

This is equivalent to

xxcopy \src_dir\ /X:(everything except *.mp3 and *.jpg)

-------------

The above examples illustrate how XXCOPY transforms the
inclusion specifiers into exclusion actions inside.
As a matter of fact, date-specifier, size-specifier and
all other forms of file-selection mechanisms are treated
as exclusionary actions which can easily implemented
as "filters" here and there inside the program. Since
exclusion actions can be applied repeatedly without a
concern to precedence, etc. the implementation is
quite simple and the documentation is also straightforward.

The reason why XXCOPY does not support a simple thing
as a "list of filenames to process" in a text file
is it is really an unrestricted form of inclusion
operations. This may not go well with XXCOPY's one-source,
one-destination view of the file management operations.

In the future, we plan to implement a full inclusion
feature (even an "inclusion list" supplied as a text file)
in XXCOPY. When we do support such a feature, we plan
to resolve the inclusion-exclusion precedence as follows:

1. Gather all inclusion-specifiers (list of files and
directories) at first and define what will be
included (this can even be thought as exclusion
list in reverse).

2. Apply all other (exclusionary) specifiers, next.

This will give the exclusion specifiers the precedence.
Note that the precedence in this context does not mean
which one will be evaluated first. Rather, the last
one to be evaluated will prevail (have the lasting effect).
Therefore, in this case, the exclusion specifiers will
have overriding power to inclusion specifiers.

Here, I think the rules are clear. When the exclusion
and inclusion are mixed, unless you simplify the way
they are treated, the user will be totally confused
and you, the designer will be confused and you will not
have a working program whose behaviors will make sense
to anyone.

I'm not necessarily providing this idea as an advice
to make a product for sale which requires a formal
documentation. Even if this project is for your own
personal usage, you as a programmer and you as the
user have to come to a clear understanding. When you
start talking about "recursion" in the design of
inclusion and exclusion, I think you are clouding your
thoughts. Give one of the two an unconditional
precedence to the other. Else, you may never make
something concrete out of your nebulous idea.

Kan Yabumoto,
The author of XXCopy
 
H

Henry Law

Even though I'm extremely knowledgeable about XXCOPY, I'm not
Kan Yabumoto,
The author of XXCopy

Isn't usenet wonderful; I cite one of my favourite programs as an
example, and the author reads my post and gives me advice! Ken,
you're absolutely right that I haven't got the basic functions clear
in my mind: I need to think more about that. Your description of how
you do your includes and excludes is very helpful; I had sort of got
to the point where I recognised that includes and excludes can't go on
indefinitely.

But this has now become positively off-topic so I'll leave it at that.
To write more would be to risk Anno's or Tad's hand to appear out of
the monitor in 3D and hit me on the nose. (With justification ...)

Henry Law <>< Manchester, England
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top