How to structure a perl program to include and exclude files?

Discussion in 'Perl Misc' started by Henry Law, Jul 18, 2004.

  1. Henry Law

    Henry Law Guest

    I'm implementing this in Perl but I recognise that there's a strong
    element of language independent program design in the question. Hope
    there's enough perlishness to keep me afloat.

    I am writing a Perl program which will process a file tree and allow
    the user to specify which directories and subdirectories are to be
    included or excluded. (Anyone who uses xxcopy in Win will know
    immediately what I mean). I plan to have the users describe the files
    to include and exclude by means of strict Perl regex's. So a control
    file might look something like this

    include /a # Do files in a and all subdirs
    exclude a/b/~?temp\d* # Except for temp files in a/b

    .... and so on. I haven't worked out the full grammar yet (do I allow
    indefinite series of include..exclude..include? I don't know). But
    I'm having more trouble with conceptualising how to write the program
    in Perl. Current idea is to write a recursive function to process all
    the files in a single directory, calling itself for sub-directories.
    It would slurp in the control file regex's and sort them
    alphabetically into two arrays, one for "include" and one for
    "exclude", and then implement logic like

    $do_this_one = 0;
    foreach $regex (@includes) {
    if ($current_file =~ $regex) {
    $do_this_one = 1;
    }
    }
    foreach $regex (@excludes) {
    if ($current_file =~ $regex) {
    $do_this_one = 0;
    }
    }
    do_the_stuff() if $do_this_one;

    But doing that lot for every file looks very laborious; for example if
    the control file is a simple "include /a and all subdirectories" then
    I don't want to look at the regex more than once. And it's not very
    Perl-ish either, come to that.

    Questions:
    (1) Is there a module that will help me? Or some code that I could
    copy?
    (2) If not, is there a better way of structuring the do-we-do-this-one
    logic to make it more elegant and efficient?

    Henry Law <>< Manchester, England
     
    Henry Law, Jul 18, 2004
    #1
    1. Advertising

  2. Henry Law wrote:
    > I am writing a Perl program which will process a file tree and
    > allow the user to specify which directories and subdirectories are
    > to be included or excluded. (Anyone who uses xxcopy in Win will
    > know immediately what I mean). I plan to have the users describe
    > the files to include and exclude by means of strict Perl regex's.
    > So a control file might look something like this
    >
    > include /a # Do files in a and all subdirs
    > exclude a/b/~?temp\d* # Except for temp files in a/b
    >
    > ... and so on. I haven't worked out the full grammar yet (do I
    > allow indefinite series of include..exclude..include? I don't
    > know). But I'm having more trouble with conceptualising how to
    > write the program in Perl. Current idea is to write a recursive
    > function to process all the files in a single directory, calling
    > itself for sub-directories.


    Why not use File::Find?

    use File::Find 'find';
    find (
    sub {
    local $_ = $File::Find::name;
    push @found, $_ if /$include/ and !/$exclude/
    }, $path
    );

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jul 18, 2004
    #2
    1. Advertising

  3. Henry Law

    Kan Yabumoto Guest

    Henry Law <> wrote in message news:<>...
    > I'm implementing this in Perl but I recognise that there's a strong
    > element of language independent program design in the question. Hope
    > there's enough perlishness to keep me afloat.
    >
    > I am writing a Perl program which will process a file tree and allow
    > the user to specify which directories and subdirectories are to be
    > included or excluded. (Anyone who uses xxcopy in Win will know
    > immediately what I mean). I plan to have the users describe the files
    > to include and exclude by means of strict Perl regex's. So a control
    > file might look something like this
    >
    > include /a # Do files in a and all subdirs
    > exclude a/b/~?temp\d* # Except for temp files in a/b
    >
    > ... and so on. I haven't worked out the full grammar yet (do I allow
    > indefinite series of include..exclude..include? I don't know). But
    > I'm having more trouble with conceptualising how to write the program
    > in Perl. Current idea is to write a recursive function to process all
    > the files in a single directory, calling itself for sub-directories.


    Henry,

    I think I can give you some advice on this issue since I've
    been thinking of this issue many years.

    Even though I'm extremely knowledgeable about XXCOPY, I'm not
    sure exactly what you are trying to do. Are you trying to create
    a perl script so that something similar to XXCOPY can be made
    available in Linux (or other) environments?

    Currently, XXCOPY's support for inclusion is very limited
    (it accepts only variations in the "last name" (e.g.,
    /IN:*.mp3 /IN:*.doc /IN:abc*). Other than this exception,
    XXCOPY's file-selection mechanisms are all exclusive in nature.
    There is good reason for this design. Exclusion specifiers
    (in the form of date-range specifications, and filesize-specifications
    in addition to file/directory pattern specifications) can all
    be treated in an additive manner. As long as the file-selection
    parameters (switches in XXCOPY command line) are exclusive
    in nature, both the implementation and user-understanding
    are very easy. Similar or dissimilar file-selection switches
    won't contradict each other. They can overlap (some files
    can be excluded for two or more reasons).

    On the other hand, if you design a command rules that allow
    both the exclusion and the inclusion, you really have to
    decide which one will have the precedence over the other
    since they are contradictory in nature (not only in the
    definition of the command rule, but also for user understanding).

    I think it is helpful to verbalize what you are trying to do
    into plain English. If you can express what you (the user)
    want to do and how you (the programmer) will implement and
    document the program actions in plain English with clarity,
    you may proceed. But, if you are confused of what you are
    trying to achieve, you can't program it regardless of the
    language you choose.

    Let me go back to how XXCOPY presents its capability with
    regard to the inclusion and exclusion. The truth is that
    the inclusion feature in XXCOPY is really an exclusion
    operation in disguise.

    1. If there is no inclusion switch (/IN:...), XXCOPY will
    not exclude anything.

    xxcopy \src_dir\ ...

    This is equivalent to

    xxcopy \src_dir\*

    Which is really

    xxcopy \src_dir\ /IN:*

    2. If the source specifier contains the lastname pattern,

    xxcopy \src_dir\*.mp3

    This is equivalent to

    xxcopy \src_dir\ /X:(everything except *.mp3)

    3. If the command contains two or more inclusion specifiers

    xxcopy \src_dir\ /IN:*.mp3 /IN:*.jpg

    This is equivalent to

    xxcopy \src_dir\ /X:(everything except *.mp3 and *.jpg)

    -------------

    The above examples illustrate how XXCOPY transforms the
    inclusion specifiers into exclusion actions inside.
    As a matter of fact, date-specifier, size-specifier and
    all other forms of file-selection mechanisms are treated
    as exclusionary actions which can easily implemented
    as "filters" here and there inside the program. Since
    exclusion actions can be applied repeatedly without a
    concern to precedence, etc. the implementation is
    quite simple and the documentation is also straightforward.

    The reason why XXCOPY does not support a simple thing
    as a "list of filenames to process" in a text file
    is it is really an unrestricted form of inclusion
    operations. This may not go well with XXCOPY's one-source,
    one-destination view of the file management operations.

    In the future, we plan to implement a full inclusion
    feature (even an "inclusion list" supplied as a text file)
    in XXCOPY. When we do support such a feature, we plan
    to resolve the inclusion-exclusion precedence as follows:

    1. Gather all inclusion-specifiers (list of files and
    directories) at first and define what will be
    included (this can even be thought as exclusion
    list in reverse).

    2. Apply all other (exclusionary) specifiers, next.

    This will give the exclusion specifiers the precedence.
    Note that the precedence in this context does not mean
    which one will be evaluated first. Rather, the last
    one to be evaluated will prevail (have the lasting effect).
    Therefore, in this case, the exclusion specifiers will
    have overriding power to inclusion specifiers.

    Here, I think the rules are clear. When the exclusion
    and inclusion are mixed, unless you simplify the way
    they are treated, the user will be totally confused
    and you, the designer will be confused and you will not
    have a working program whose behaviors will make sense
    to anyone.

    I'm not necessarily providing this idea as an advice
    to make a product for sale which requires a formal
    documentation. Even if this project is for your own
    personal usage, you as a programmer and you as the
    user have to come to a clear understanding. When you
    start talking about "recursion" in the design of
    inclusion and exclusion, I think you are clouding your
    thoughts. Give one of the two an unconditional
    precedence to the other. Else, you may never make
    something concrete out of your nebulous idea.

    Kan Yabumoto,
    The author of XXCopy
     
    Kan Yabumoto, Jul 22, 2004
    #3
  4. Henry Law

    Henry Law Guest

    On 21 Jul 2004 23:23:10 -0700, (Kan Yabumoto) wrote:

    >Henry Law <> wrote in message news:<>...
    >> included or excluded. (Anyone who uses xxcopy in Win will know
    >> immediately what I mean). I plan to have the users describe the files


    >Even though I'm extremely knowledgeable about XXCOPY, I'm not


    >Kan Yabumoto,
    >The author of XXCopy


    Isn't usenet wonderful; I cite one of my favourite programs as an
    example, and the author reads my post and gives me advice! Ken,
    you're absolutely right that I haven't got the basic functions clear
    in my mind: I need to think more about that. Your description of how
    you do your includes and excludes is very helpful; I had sort of got
    to the point where I recognised that includes and excludes can't go on
    indefinitely.

    But this has now become positively off-topic so I'll leave it at that.
    To write more would be to risk Anno's or Tad's hand to appear out of
    the monitor in 3D and hit me on the nose. (With justification ...)

    Henry Law <>< Manchester, England
     
    Henry Law, Jul 22, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    598
    Alan Moore
    Jul 3, 2006
  2. KJ
    Replies:
    3
    Views:
    888
    Laurent Bugnion
    Nov 16, 2006
  3. AntiChrist

    Exclude files and general search

    AntiChrist, Dec 28, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    338
    Mark Fitzpatrick
    Dec 28, 2006
  4. Andreas Bogenberger
    Replies:
    3
    Views:
    959
    Andreas Bogenberger
    Feb 22, 2008
  5. John Mair
    Replies:
    8
    Views:
    1,003
    John Mair
    Oct 7, 2010
Loading...

Share This Page