FileList and FileFilter and regular expressions

P

P.Hill

Greetings Java Folks,

I as writting a class that was processing a list of files.
Given an arbitrary file spec that looked like:
/incoming/projectData*.dat
The last splat in this case is actually a time stamp.

but now I need to complicate things by searching for
/incoming/*/projectData*.dat

Where the first splat is a userDir.

Anyone know of code that can help me with this?

Using generic stuff out of the JDK, I was playing with
File.fileList and a FileFilter, creating my special
FileFilter which has an accept method that looks like:

public boolean accept(File file) {
String filepath = file.getPath();
if ( filepath.startsWith( this.lefthand ) &&
filepath.endsWith( this.righthand ) ) {
return true;
}
return false;
}

Simple enough, but am I missing something somewhere else
for doing this type of generic filtering?

Why do I ask? Well we've realized we want to change
the code to deal with /incoming/*/projectData*.dat

Now I COULD use the existing code by telling
working this second spec. to put both username
and the date into the last splat, but for external
reason (other code) I think I'll stick to
the form /incoming/*/projectData*.dat

Is there anything in the JDK or elsewhere that I've overlooked that helps
me more than FileFilter to do a fancier job than the above FileFilter of
doing such wildcard matches? How about combining FileFilter with
Reg. Expressions? Never used that, but it looks like something like:

Pattern pattern = Pattern.compile( "/incoming/*/projectData*.dat" );
this.matcher = pattern.matcher( "" ); // start with no candidate string.

....
and then in accept( File file) I do:
String filepath = file.getPath();
this.matcher.reset( filepath );
boolean goodFile = this.matcher.matches();
if ( goodFile ) return true;

That look good to everyone?

It seems pretty good, but then I also have to find all
dirs in the /incoming/ directory looping through those
to to get a fileList from each.

None of this is rocket science, but it seems to be a well traveled
path. Anyone know of code that does more of the work for me instead
of working up a set of two calls to filePath?

Let me know, if you do, or if you have other ideas, meanwhile I'll
be coding the fileSet of dir's followed by the fileset from each
dir solution.

TIA,
-Paul
 
N

nos

if it was me, and it isn't, i would use split()
then x[0] is a directory
x[1] is a directory
x[2] is a filename
 
J

John C. Bollinger

P.Hill said:
Greetings Java Folks,

Greetings, P.Hill.
I as writting a class that was processing a list of files.
Given an arbitrary file spec that looked like:
/incoming/projectData*.dat
The last splat in this case is actually a time stamp.

but now I need to complicate things by searching for
/incoming/*/projectData*.dat

Where the first splat is a userDir.

Anyone know of code that can help me with this?

Using generic stuff out of the JDK, I was playing with
File.fileList and a FileFilter, creating my special
FileFilter which has an accept method that looks like:

public boolean accept(File file) {
String filepath = file.getPath();
if ( filepath.startsWith( this.lefthand ) &&
filepath.endsWith( this.righthand ) ) {
return true;
}
return false;
}

Simple enough, but am I missing something somewhere else
for doing this type of generic filtering?

You are looking in the right places, but I don't think you're going
about it in the right way.

For a solution to the specific case you raise
(/incoming/*/projectData*.dat), I would do something like this:

File incomingDir = new File("/incoming");
File[] contents = incomingDir.listFiles();
List files = new ArrayList();

for (int i = 0 ; i < contents.length; i++) {
if (contents.isDirectory()) {
File[] goodFiles = contents.listFiles(myFileFilter);
files.addAll(Arrays.asList(goodFiles));
}
}

where myFileFilter is an instance of a FileFilter that checks the name
part of the file. You don't need to worry there about the path, because
you needed (or at least wanted) to have already checked that (and in
this case the spec is "*", anyway).

To do this in a more general way, you should incorporate the filtration
into your tree walk. For instance, with something like:

List findFiles(File path, FileFilter smartFilter) {
List files = new ArrayList();
File[] contents = path.listFiles(smartFilter);

for (int i = 0; i < contents.length; i++) {
if (contents.isDirectory()) {
files.addAll(findFiles(contents, smartFilter));
} else {
files.add(contents);
}
}

return files;
}

void doSomething() {
...
List filesToProcess = findFiles(new File("/incoming"), new
MySmartFileFilter());
...
}

Here all the intelligence is built into one FileFilter, which must both
select which directories to traverse and select which regular files are
accepted. A more flexible approach might use a prebuilt chain of
FileFilters, one per tree level, or even FileFilters at each level that
can tell you which filters to use for the next level.

[...]
Is there anything in the JDK or elsewhere that I've overlooked that helps
me more than FileFilter to do a fancier job than the above FileFilter of
doing such wildcard matches? How about combining FileFilter with
Reg. Expressions? Never used that, but it looks like something like:

Pattern pattern = Pattern.compile( "/incoming/*/projectData*.dat" );

I think you'd want:

Pattern pattern = Pattern.compile("/incoming/[^/]+/projectData.*\\.dat");
this.matcher = pattern.matcher( "" ); // start with no candidate string.

...
and then in accept( File file) I do:
String filepath = file.getPath();
this.matcher.reset( filepath );
boolean goodFile = this.matcher.matches();
if ( goodFile ) return true;

That look good to everyone?

The pattern (as amended) could work, but it seems like more effort than
you want. Especially if the next change is to support different or more
complex filename patterns. If there is any chance of that then you
would be well advised to build a more general solution now.
It seems pretty good, but then I also have to find all
dirs in the /incoming/ directory looping through those
to to get a fileList from each.

You have to do this no matter what if you are processing files from
multiple directories. It's not so hard, though. (Vide supra)

Good luck,

John Bollinger
(e-mail address removed)
 
P

P.Hill

nos said:
if it was me, and it isn't, i would use split()
then x[0] is a directory
x[1] is a directory
x[2] is a filename

Hi Nos,

Yes, that is how I originally filled my FileFilter.
I usd split an built my simple filter with the LHS and RHS
(left and right).

-Paul
 
P

P.Hill

John said:
For a solution to the specific case you raise
(/incoming/*/projectData*.dat), I would do something like this:

File incomingDir = new File("/incoming");
File[] contents = incomingDir.listFiles();
List files = new ArrayList();

for (int i = 0 ; i < contents.length; i++) {
if (contents.isDirectory()) {
File[] goodFiles = contents.listFiles(myFileFilter);
files.addAll(Arrays.asList(goodFiles));
}
}


Actually I ended up implementing
a DirectoryTreeSearch class
which has the equivalent of some of the code you mention.
Which uses two FileFilters.
I did a DirFileFilter and a RegExpFileFilter.
Now the DirFileFilter does NOT result in the GENERAL case you mention
that goes to any depth like the ** operator in Ant.
/java/src/**/*.java
But I did implement the DirectoryTreeSearch
so that it handles /incoming/prePart*PostPart/projectDataPre*PostPart.dat
Which means I can force a certain pattern on the dir and allow various
extras in the filename.

The hard part was making sure I escaped dots and backslashes (in
windows file specs) to correctly convert my dir/ls like wildcard
spec to a proper java.regex (Perl-like) regex. A * had to
be convert to a [a-zA-Z_0-9] which is a \w in java pattern
(which is fine for me not to include the $)

That was fun! I even wrote a bunch of test cases for it.

Combining both doesn't sound very using for a search to
find
/incoming/prePartABC/data123.dat
OR
/incoming/prePartABC/prePartXYZ/data123.dat
given
/incoming/prePart*/data*.dat

On the other hand, I happen to stumble on the thought that
a partial dir spec might be useful in my case.

Maybe an alternate entry point (or the ** syntax) would be
a great way to provide both!

BUT I'm too agile oriented (i.e. don't implement what I don't need),
and I have a delivery this week, so I think I'll pass on your
interesting suggestion.

thanks for the input,
-Paul
 
A

Alan Moore

Greetings Java Folks,

I as writting a class that was processing a list of files.
Given an arbitrary file spec that looked like:
/incoming/projectData*.dat
The last splat in this case is actually a time stamp.

but now I need to complicate things by searching for
/incoming/*/projectData*.dat

Where the first splat is a userDir.

Anyone know of code that can help me with this?

Have you ever checked out JRegex? Its filesystem utilites seem to be
a very close match for what you're trying to do:

http://jregex.sourceforge.net/gstarted.html#filesystem
 
P

P.Hill

Alan said:
Have you ever checked out JRegex? Its filesystem utilites seem to be
a very close match for what you're trying to do:

http://jregex.sourceforge.net/gstarted.html#filesystem
Alan,

You are right. In fact that is pretty much an exact match for
everything both John and I suggested.

The jregex.util.io.WildcardFilter class.

[...]
File dir=...;
String[] htmlFiles=dir.list(new WildcardFilter("*.html"));


2. The jregex.util.io.PathPattern class.
[...]
The path pattern can be both relative and absolute and may the following
wildcards:
? - any-character
* - any-string
** - any-path

I should have waited a day while others where responding.
I will definitely note this classes.

thanks,
-Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top