# Better way of checking file/directory spec in Win32?

Discussion in 'Perl Misc' started by Henry Law, Sep 15, 2005.

1. ### Henry LawGuest

Running under Activestate on Windows XP I have need to feed a series of
directory and file specifications to a program, which then interprets
them. Specifications can be directories, single file names, or wildcard
combinations like c:\foo\bar\*.pdf which result in lists of matching files.

The specifications need to be validated to weed out those that don't
refer to an existing directory, or that result in no matching files.
I've Googled and searched CPAN for a module that does this, since it
must be a common requirement, but without success. Can someone suggest one?

Here's the subroutine I wrote to do the job, so you'll see what I'm
trying to do. If it needs to exist then comments on how to smarten it
up would be welcome.

#! /usr/bin.perl

use strict;
use warnings;

use constant VALID_DIRECTORY => -1;
use constant VALID_FILE => -2;

print "Enter spec: ";
my $spec = <STDIN>; chomp$spec;
while ($spec) { my$ret = interpret_spec($spec); if ($ret == VALID_DIRECTORY) {
print "'$spec' is a valid directory\n"; } elsif ($ret == VALID_FILE) {
print "'$spec' is an existing file\n"; } elsif ($ret) {
print "'$spec' results in$ret files\n";
} else {
print "'$spec' doesn't match any file or directory\n"; } print "=> ";$spec=<STDIN>;
chomp $spec; } sub interpret_spec{ # Checks a file/directory specification. Returns # 0 Spec matches nothing # -1 It's a valid directory specification # -2 It's a valid single file specification # number It's a glob, resulting in "number" files my$spec = shift;
return VALID_DIRECTORY if (-d $spec); return VALID_FILE if (-f$spec);
my @globs = glob $spec; if (scalar @globs == 1){ return ((-f$globs[0]) ? 1 : 0);
} else {
return scalar @globs;
}
}

Henry Law, Sep 15, 2005

2. ### Anno SiegelGuest

Henry Law <> wrote in comp.lang.perl.misc:
> Running under Activestate on Windows XP I have need to feed a series of
> directory and file specifications to a program, which then interprets
> them. Specifications can be directories, single file names, or wildcard
> combinations like c:\foo\bar\*.pdf which result in lists of matching files.
>
> The specifications need to be validated to weed out those that don't
> refer to an existing directory, or that result in no matching files.

My first idea when I read this was (untested):

my @valid = grep -e, map glob, @specs;

but you seem to want a more detailed analysis of each entry.

> I've Googled and searched CPAN for a module that does this, since it
> must be a common requirement, but without success. Can someone suggest one?
>
> Here's the subroutine I wrote to do the job, so you'll see what I'm
> trying to do. If it needs to exist then comments on how to smarten it
> up would be welcome.
>
> #! /usr/bin.perl
>
> use strict;
> use warnings;
>
> use constant VALID_DIRECTORY => -1;
> use constant VALID_FILE => -2;

[code that uses interpret_spec() snipped]

> sub interpret_spec{
> # Checks a file/directory specification. Returns
> # 0 Spec matches nothing
> # -1 It's a valid directory specification
> # -2 It's a valid single file specification
> # number It's a glob, resulting in "number" files
>
> my $spec = shift; > return VALID_DIRECTORY if (-d$spec);
> return VALID_FILE if (-f $spec); > my @globs = glob$spec;
> if (scalar @globs == 1){

^^^^^^
You don't need "scalar", it's already in scalar context.

> return ((-f $globs[0]) ? 1 : 0); > } else { > return scalar @globs; > } > } There are issues with some forms of glob() that need looking at. Under Unix, it is quite possible to have a file named like a glob pattern: "touch '*.foo'" creates a file with an actual asterisk in its name. The logic of interpret_spec() would detect the file first and never try to expand the glob (which may hit several more files). This problem may be void under windows, I don't know. A different problem occurs with globs of the form "foo.{a,b,c}". This expands to "foo.a", "foo.b" and "foo.c" with no check against the file system, that is, it doesn't matter whether "foo.a" etc. exist as files. That may result in unwanted answers. Anno -- If you want to post a followup via groups.google.com, don't use the broken "Reply" link at the bottom of the article. Click on "show options" at the top of the article, then click on the "Reply" at the bottom of the article headers. Anno Siegel, Sep 15, 2005 1. ### Advertising 3. ### Henry LawGuest Anno Siegel wrote: > My first idea when I read this was (untested): > > my @valid = grep -e, map glob, @specs; > > but you seem to want a more detailed analysis of each entry. Hmm; I've a lot still to learn. That wouldn't have been my first idea; nor my twenty-first either. >> if (scalar @globs == 1){ Yes, of course. Laziness, unfortunately. > There are issues with some forms of glob() that need looking at. > > Under Unix, it is quite possible to have a file named like a glob > pattern: "touch '*.foo'" creates a file with an actual asterisk in > its name. How horrible. Fortunately nobody other than me is going to run this thing and I'll just stay away from things like that. > This problem may be void under windows, I don't know. It is; "*" is not a valid character in Redmond's file names. > A different problem occurs with globs of the form "foo.{a,b,c}". > This expands to "foo.a", "foo.b" and "foo.c" with no check against > the file system Once again I can avoid those and get round the problem. But thank you for pointing out the breadth and depth of the problem I am trying to solve! Henry Law, Sep 15, 2005 4. ### Anno SiegelGuest Henry Law <> wrote in comp.lang.perl.misc: > Anno Siegel wrote: > > My first idea when I read this was (untested): > > > > my @valid = grep -e, map glob, @specs; > > > > but you seem to want a more detailed analysis of each entry. > > Hmm; I've a lot still to learn. That wouldn't have been my first idea; > nor my twenty-first either. Ah, but it isn't hard at all once you get into the habit of considering data structures (like arrays and lists, in this case) as units that are the objects of operations. In Perl, there are only three "operators", map, grep, and sort, that transform one list into another, but they are powerful because they all take a user-supplied function that works on list elements. OO takes that way of thinking a step further and lets you define your own operations on complex data structures. Working with objects for a while is habit-forming that way. You start to see would-be objects all over your normal code as well, which is a productive way of looking at things. > >> if (scalar @globs == 1){ > Yes, of course. Laziness, unfortunately. Ah, the kind of laziness that results in more text rather than less. Who was it who didn't have time for a short letter, so wrote a long one instead? > > There are issues with some forms of glob() that need looking at. > > > > Under Unix, it is quite possible to have a file named like a glob > > pattern: "touch '*.foo'" creates a file with an actual asterisk in > > its name. > How horrible. Fortunately nobody other than me is going to run this > thing and I'll just stay away from things like that. > > > This problem may be void under windows, I don't know. > It is; "*" is not a valid character in Redmond's file names. Yes, I learned that in the meantime too. > > A different problem occurs with globs of the form "foo.{a,b,c}". > > This expands to "foo.a", "foo.b" and "foo.c" with no check against > > the file system > Once again I can avoid those and get round the problem. But thank you > for pointing out the breadth and depth of the problem I am trying to solve! I'm afraid it is not so much depth, but a certain murkiness in the problem that I'm pointing out. The question whether a certain string "is" a glob or a plain filename can be answered variously. As a programmer, you must decide. It seems to me that the decision is rather implicit in the code. If it mattered (as it probably doesn't) it would be advisable to make the decision explicit. Anno -- If you want to post a followup via groups.google.com, don't use the broken "Reply" link at the bottom of the article. Click on "show options" at the top of the article, then click on the "Reply" at the bottom of the article headers. Anno Siegel, Sep 15, 2005 5. ### Anno SiegelGuest Jim Gibson <> wrote in comp.lang.perl.misc: > In article <dgcgjv$m5b\$-Berlin.DE>, Anno Siegel
> <-berlin.de> wrote:
> >
> > Ah, the kind of laziness that results in more text rather than less.
> > Who was it who didn't have time for a short letter, so wrote a long
> >

>
> Mark Twain

Thanks!

Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the