Parse a filename (this SHOULD be easy, right?)

usenet · Dec 15, 2005

This sounds easy, but it has puzzled me. I want to parse a filename
into "path" , "basename", and "suffix" where terms are defined thus:
PATH - everything up to the last (or only) forward-slash.
SUFFIX - anything after the last (or only) dot
BASENAME - all between the path and the dot before the suffix.
Disgard the trailing slash in the path and the dot before the suffix.
It may be assumed that the parser will only process names of plain
files (so '/foo' is a file named 'foo' in '/', not a directory).

The parser should work if the filename has no path and/or no suffix
(those values should resolve to undef if not present in the filename).
I don't know what suffixes it might encounter (so using File::Basename
is not so obvious, though I've tried a qr// without much success,
because I can't figure out how to curb the greediness of the
expressions in this context).

I've been playing around with some code within this test framework
using multiple possible styles of filenames:

#!/usr/bin/perl
use strict;
use File::Basename;

foreach my $file(<DATA>) {chomp $file;
#my ($path, $name, $suffix) =
($file =~ m!^(?

.*)/)?(.*)(?:\.(.*))?$!);
#my ($name,$path,$suffix) = fileparse($file, qr{\..*});
#Gotta come up with SOMETHING here!!!
printf ("%-19s%-7s%-9s%-4s\n", $file, $path, $name, $suffix);
}

__DATA__
/PATH/NAME.SUFFIX
/foo
/foo.txt
/tmp.xyz/foo.txt
tmp.xyz/foo.bar.txt
/tmp.xyz/foo
tmp/foo.txt
../tmp/foo.bar.txt
/tmp/foo
foo.txt
foo.bar.txt
foo

###### DESIRED OUTPUT #########################

/PATH/NAME.SUFFIX /PATH NAME SUFFIX
/foo / foo
/foo.txt / foo txt
/tmp.xyz/foo.txt /tmp.xyz foo txt
tmp.xyz/foo.bar.txt tmp.xyz foo.bar txt
/tmp.xyz/foo /tmp.zyz foo
/tmp/foo.txt /tmp foo txt
../tmp/foo.bar.txt ./tmp foo.bar txt
/tmp/foo /tmp foo
foo.txt foo txt
foo.bar.txt foo.bar txt
foo foo

I can ALMOST get it to work, but not quite... if I fix one test case, I
break another. I appreciate any suggestions...

Big and Blue · Dec 15, 2005

This sounds easy, but it has puzzled me.

foreach my $file(<DATA>) {chomp $file;
#my ($path, $name, $suffix) =
($file =~ m!^(?.*)/)?(.*)(?:\.(.*))?$!);
#my ($name,$path,$suffix) = fileparse($file, qr{\..*});

my ($name,$path,$suffix) = fileparse("./$file", qr{\.[^.]*});
$path = substr($path, 2);

#Gotta come up with SOMETHING here!!!
printf ("%-19s%-7s%-9s%-4s\n", $file, $path, $name, $suffix);
}

Seems to work. Not the corrected regex (to only match the last
component), the "./" prepended for processing, which is then removed by the
substr.

usenet · Dec 15, 2005

That works perfectly, thanks. But it does seem like a whole lot of
code to throw at what seems like a fairly simple problem.

If I get close to my screen and take a whiff, it smells like a problem
that needs to be sprayed with a regex. But the only bottle of regex
skills that I have is not very full...

attn.steven.kuo · Dec 15, 2005

That works perfectly, thanks. But it does seem like a whole lot of
code to throw at what seems like a fairly simple problem.

If I get close to my screen and take a whiff, it smells like a problem
that needs to be sprayed with a regex. But the only bottle of regex
skills that I have is not very full...

Actually, now that I'm looking at the solution posted by
"Big and Blue", I prefer his regular expession
to mine.

usenet · Dec 15, 2005

Actually, now that I'm looking at the solution posted by
"Big and Blue", I prefer his regular expession to mine.

B&B's solution is good, but it doesn't disgard the trailing slash on
the path or the leading dot on the extension. It could be easily done
with a couple of extra statements, but that also seems to be getting
unweildy for what seems like a simple task.

attn.steven.kuo · Dec 15, 2005

That works perfectly, thanks. But it does seem like a whole lot of
code to throw at what seems like a fairly simple problem.

If I get close to my screen and take a whiff, it smells like a problem
that needs to be sprayed with a regex. But the only bottle of regex
skills that I have is not very full...

If you prefer a solution using
regular expressions, then
how about:

while (<DATA>)
{
chomp;

my @matches = map
defined $_ ? $_ : '',
m#(?

.+)/|^(/)|)([^/]+?)(?:\.([^.]*))?$#;

# check for successful match omitted ...

my $path = join '', splice(@matches, 0, 2);
my ($file, $suffix) = @matches;

printf("%-21s%-17s%-9s%-4s\n", $_, $path, $file, $suffix);
}

Big and Blue · Dec 15, 2005

B&B's solution is good, but it doesn't disgard the trailing slash on
the path or the leading dot on the extension.

Hmmmm - I'm sure it handled these at some point....

It could be easily done
with a couple of extra statements, but that also seems to be getting
unweildy for what seems like a simple task.

It becomes 4 lines.

my ($name,$path,$suffix) = fileparse("//$file", qr{\.[^.]*});
$suffix = substr($suffix || ' ', 1);
$path = substr($path, 2);
do {local $/='/'; chomp $path unless ($path eq '/')};

The oddity on the substr of the suffix is to avoid warnings when there
isn't one.

Ilya Zakharevich · Dec 16, 2005

[A complimentary Cc of this posting was sent to

That works perfectly, thanks. But it does seem like a whole lot of
code to throw at what seems like a fairly simple problem.

This puzzles me too - for more than 10 years now... I have no idea
why File::Basename has so lousy an API.

On the other hand, it it were critical, somebody would have fixed
it...

Hope this helps,
Ilya

Perl-Python-a-Day: split a file full path	3	Oct 17, 2005
perl bug File::Basename and Perl's nature	14	Jan 25, 2004
Easily parsing a string to retrieve values and assign them to a variable/symbol.	6	Jul 18, 2007
[ANN] rs 0.1.2	0	Oct 19, 2006
Modify Python Code - no idea at all	0	Nov 5, 2003

Parse a filename (this SHOULD be easy, right?)

usenet

Big and Blue

usenet

attn.steven.kuo

usenet

attn.steven.kuo

Big and Blue

Ilya Zakharevich

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads