Parse a filename (this SHOULD be easy, right?)

U

usenet

This sounds easy, but it has puzzled me. I want to parse a filename
into "path" , "basename", and "suffix" where terms are defined thus:
PATH - everything up to the last (or only) forward-slash.
SUFFIX - anything after the last (or only) dot
BASENAME - all between the path and the dot before the suffix.
Disgard the trailing slash in the path and the dot before the suffix.
It may be assumed that the parser will only process names of plain
files (so '/foo' is a file named 'foo' in '/', not a directory).

The parser should work if the filename has no path and/or no suffix
(those values should resolve to undef if not present in the filename).
I don't know what suffixes it might encounter (so using File::Basename
is not so obvious, though I've tried a qr// without much success,
because I can't figure out how to curb the greediness of the
expressions in this context).

I've been playing around with some code within this test framework
using multiple possible styles of filenames:

#!/usr/bin/perl
use strict;
use File::Basename;

foreach my $file(<DATA>) {chomp $file;
#my ($path, $name, $suffix) =
($file =~ m!^(?:(.*)/)?(.*)(?:\.(.*))?$!);
#my ($name,$path,$suffix) = fileparse($file, qr{\..*});
#Gotta come up with SOMETHING here!!!
printf ("%-19s%-7s%-9s%-4s\n", $file, $path, $name, $suffix);
}

__DATA__
/PATH/NAME.SUFFIX
/foo
/foo.txt
/tmp.xyz/foo.txt
tmp.xyz/foo.bar.txt
/tmp.xyz/foo
tmp/foo.txt
../tmp/foo.bar.txt
/tmp/foo
foo.txt
foo.bar.txt
foo

###### DESIRED OUTPUT #########################

/PATH/NAME.SUFFIX /PATH NAME SUFFIX
/foo / foo
/foo.txt / foo txt
/tmp.xyz/foo.txt /tmp.xyz foo txt
tmp.xyz/foo.bar.txt tmp.xyz foo.bar txt
/tmp.xyz/foo /tmp.zyz foo
/tmp/foo.txt /tmp foo txt
../tmp/foo.bar.txt ./tmp foo.bar txt
/tmp/foo /tmp foo
foo.txt foo txt
foo.bar.txt foo.bar txt
foo foo

I can ALMOST get it to work, but not quite... if I fix one test case, I
break another. I appreciate any suggestions...
 
B

Big and Blue

This sounds easy, but it has puzzled me.
foreach my $file(<DATA>) {chomp $file;
#my ($path, $name, $suffix) =
($file =~ m!^(?:(.*)/)?(.*)(?:\.(.*))?$!);
#my ($name,$path,$suffix) = fileparse($file, qr{\..*});

my ($name,$path,$suffix) = fileparse("./$file", qr{\.[^.]*});
$path = substr($path, 2);
#Gotta come up with SOMETHING here!!!
printf ("%-19s%-7s%-9s%-4s\n", $file, $path, $name, $suffix);
}

Seems to work. Not the corrected regex (to only match the last
component), the "./" prepended for processing, which is then removed by the
substr.
 
U

usenet

That works perfectly, thanks. But it does seem like a whole lot of
code to throw at what seems like a fairly simple problem.

If I get close to my screen and take a whiff, it smells like a problem
that needs to be sprayed with a regex. But the only bottle of regex
skills that I have is not very full...
 
A

attn.steven.kuo

That works perfectly, thanks. But it does seem like a whole lot of
code to throw at what seems like a fairly simple problem.

If I get close to my screen and take a whiff, it smells like a problem
that needs to be sprayed with a regex. But the only bottle of regex
skills that I have is not very full...


Actually, now that I'm looking at the solution posted by
"Big and Blue", I prefer his regular expession
to mine.
 
U

usenet

Actually, now that I'm looking at the solution posted by
"Big and Blue", I prefer his regular expession to mine.

B&B's solution is good, but it doesn't disgard the trailing slash on
the path or the leading dot on the extension. It could be easily done
with a couple of extra statements, but that also seems to be getting
unweildy for what seems like a simple task.
 
A

attn.steven.kuo

That works perfectly, thanks. But it does seem like a whole lot of
code to throw at what seems like a fairly simple problem.

If I get close to my screen and take a whiff, it smells like a problem
that needs to be sprayed with a regex. But the only bottle of regex
skills that I have is not very full...


If you prefer a solution using
regular expressions, then
how about:


while (<DATA>)
{
chomp;

my @matches = map
defined $_ ? $_ : '',
m#(?:(.+)/|^(/)|)([^/]+?)(?:\.([^.]*))?$#;

# check for successful match omitted ...

my $path = join '', splice(@matches, 0, 2);
my ($file, $suffix) = @matches;

printf("%-21s%-17s%-9s%-4s\n", $_, $path, $file, $suffix);
}
 
B

Big and Blue

B&B's solution is good, but it doesn't disgard the trailing slash on
the path or the leading dot on the extension.

Hmmmm - I'm sure it handled these at some point....
It could be easily done
with a couple of extra statements, but that also seems to be getting
unweildy for what seems like a simple task.

It becomes 4 lines.

my ($name,$path,$suffix) = fileparse("//$file", qr{\.[^.]*});
$suffix = substr($suffix || ' ', 1);
$path = substr($path, 2);
do {local $/='/'; chomp $path unless ($path eq '/')};

The oddity on the substr of the suffix is to avoid warnings when there
isn't one.
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to

That works perfectly, thanks. But it does seem like a whole lot of
code to throw at what seems like a fairly simple problem.

This puzzles me too - for more than 10 years now... I have no idea
why File::Basename has so lousy an API.

On the other hand, it it were critical, somebody would have fixed
it...

Hope this helps,
Ilya
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top