Parse a filename (this SHOULD be easy, right?)

Discussion in 'Perl Misc' started by usenet@DavidFilmer.com, Dec 15, 2005.

  1. Guest

    This sounds easy, but it has puzzled me. I want to parse a filename
    into "path" , "basename", and "suffix" where terms are defined thus:
    PATH - everything up to the last (or only) forward-slash.
    SUFFIX - anything after the last (or only) dot
    BASENAME - all between the path and the dot before the suffix.
    Disgard the trailing slash in the path and the dot before the suffix.
    It may be assumed that the parser will only process names of plain
    files (so '/foo' is a file named 'foo' in '/', not a directory).

    The parser should work if the filename has no path and/or no suffix
    (those values should resolve to undef if not present in the filename).
    I don't know what suffixes it might encounter (so using File::Basename
    is not so obvious, though I've tried a qr// without much success,
    because I can't figure out how to curb the greediness of the
    expressions in this context).

    I've been playing around with some code within this test framework
    using multiple possible styles of filenames:

    #!/usr/bin/perl
    use strict;
    use File::Basename;

    foreach my $file(<DATA>) {chomp $file;
    #my ($path, $name, $suffix) =
    ($file =~ m!^(?:(.*)/)?(.*)(?:\.(.*))?$!);
    #my ($name,$path,$suffix) = fileparse($file, qr{\..*});
    #Gotta come up with SOMETHING here!!!
    printf ("%-19s%-7s%-9s%-4s\n", $file, $path, $name, $suffix);
    }

    __DATA__
    /PATH/NAME.SUFFIX
    /foo
    /foo.txt
    /tmp.xyz/foo.txt
    tmp.xyz/foo.bar.txt
    /tmp.xyz/foo
    tmp/foo.txt
    ../tmp/foo.bar.txt
    /tmp/foo
    foo.txt
    foo.bar.txt
    foo

    ###### DESIRED OUTPUT #########################

    /PATH/NAME.SUFFIX /PATH NAME SUFFIX
    /foo / foo
    /foo.txt / foo txt
    /tmp.xyz/foo.txt /tmp.xyz foo txt
    tmp.xyz/foo.bar.txt tmp.xyz foo.bar txt
    /tmp.xyz/foo /tmp.zyz foo
    /tmp/foo.txt /tmp foo txt
    ../tmp/foo.bar.txt ./tmp foo.bar txt
    /tmp/foo /tmp foo
    foo.txt foo txt
    foo.bar.txt foo.bar txt
    foo foo

    I can ALMOST get it to work, but not quite... if I fix one test case, I
    break another. I appreciate any suggestions...

    --
    http://DavidFilmer.com
    , Dec 15, 2005
    #1
    1. Advertising

  2. Big and Blue Guest

    wrote:
    >
    > This sounds easy, but it has puzzled me.


    > foreach my $file(<DATA>) {chomp $file;
    > #my ($path, $name, $suffix) =
    > ($file =~ m!^(?:(.*)/)?(.*)(?:\.(.*))?$!);
    > #my ($name,$path,$suffix) = fileparse($file, qr{\..*});


    my ($name,$path,$suffix) = fileparse("./$file", qr{\.[^.]*});
    $path = substr($path, 2);

    > #Gotta come up with SOMETHING here!!!
    > printf ("%-19s%-7s%-9s%-4s\n", $file, $path, $name, $suffix);
    > }


    Seems to work. Not the corrected regex (to only match the last
    component), the "./" prepended for processing, which is then removed by the
    substr.

    --
    Just because I've written it doesn't mean that
    either you or I have to believe it.
    Big and Blue, Dec 15, 2005
    #2
    1. Advertising

  3. Guest

    wrote:
    >>> some code


    That works perfectly, thanks. But it does seem like a whole lot of
    code to throw at what seems like a fairly simple problem.

    If I get close to my screen and take a whiff, it smells like a problem
    that needs to be sprayed with a regex. But the only bottle of regex
    skills that I have is not very full...

    --
    http://DavidFilmer.com
    , Dec 15, 2005
    #3
  4. Guest

    wrote:
    > wrote:
    > >>> some code

    >
    > That works perfectly, thanks. But it does seem like a whole lot of
    > code to throw at what seems like a fairly simple problem.
    >
    > If I get close to my screen and take a whiff, it smells like a problem
    > that needs to be sprayed with a regex. But the only bottle of regex
    > skills that I have is not very full...
    >



    Actually, now that I'm looking at the solution posted by
    "Big and Blue", I prefer his regular expession
    to mine.

    --
    Regards,
    Steven
    , Dec 15, 2005
    #4
  5. Guest

    wrote:
    > Actually, now that I'm looking at the solution posted by
    > "Big and Blue", I prefer his regular expession to mine.


    B&B's solution is good, but it doesn't disgard the trailing slash on
    the path or the leading dot on the extension. It could be easily done
    with a couple of extra statements, but that also seems to be getting
    unweildy for what seems like a simple task.
    , Dec 15, 2005
    #5
  6. Guest

    wrote:
    > wrote:
    > >>> some code

    >
    > That works perfectly, thanks. But it does seem like a whole lot of
    > code to throw at what seems like a fairly simple problem.
    >
    > If I get close to my screen and take a whiff, it smells like a problem
    > that needs to be sprayed with a regex. But the only bottle of regex
    > skills that I have is not very full...
    >
    > --
    > http://DavidFilmer.com



    If you prefer a solution using
    regular expressions, then
    how about:


    while (<DATA>)
    {
    chomp;

    my @matches = map
    defined $_ ? $_ : '',
    m#(?:(.+)/|^(/)|)([^/]+?)(?:\.([^.]*))?$#;

    # check for successful match omitted ...

    my $path = join '', splice(@matches, 0, 2);
    my ($file, $suffix) = @matches;

    printf("%-21s%-17s%-9s%-4s\n", $_, $path, $file, $suffix);
    }

    --
    Regards,
    Steven
    , Dec 15, 2005
    #6
  7. Big and Blue Guest

    wrote:
    >
    > B&B's solution is good, but it doesn't disgard the trailing slash on
    > the path or the leading dot on the extension.


    Hmmmm - I'm sure it handled these at some point....

    > It could be easily done
    > with a couple of extra statements, but that also seems to be getting
    > unweildy for what seems like a simple task.


    It becomes 4 lines.

    my ($name,$path,$suffix) = fileparse("//$file", qr{\.[^.]*});
    $suffix = substr($suffix || ' ', 1);
    $path = substr($path, 2);
    do {local $/='/'; chomp $path unless ($path eq '/')};

    The oddity on the substr of the suffix is to avoid warnings when there
    isn't one.

    --
    Just because I've written it doesn't mean that
    either you or I have to believe it.
    Big and Blue, Dec 15, 2005
    #7
  8. [A complimentary Cc of this posting was sent to

    <>], who wrote in article <>:
    > wrote:
    > >>> some code

    >
    > That works perfectly, thanks. But it does seem like a whole lot of
    > code to throw at what seems like a fairly simple problem.


    This puzzles me too - for more than 10 years now... I have no idea
    why File::Basename has so lousy an API.

    On the other hand, it it were critical, somebody would have fixed
    it...

    Hope this helps,
    Ilya
    Ilya Zakharevich, Dec 16, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?Sm9l?=

    Extract filename from a filename typed by user

    =?Utf-8?B?Sm9l?=, Aug 23, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    1,002
    Travis Murray
    Aug 24, 2004
  2. Replies:
    1
    Views:
    1,431
    Roland de Ruiter
    Jun 15, 2006
  3. Ed
    Replies:
    10
    Views:
    45,727
    alok000707
    Jul 13, 2010
  4. Beauregard T. Shagnasty

    Re: filename.gif or filename.gif.jpg?

    Beauregard T. Shagnasty, May 30, 2008, in forum: HTML
    Replies:
    1
    Views:
    733
    Jonathan N. Little
    May 30, 2008
  5. Bergamot
    Replies:
    0
    Views:
    433
    Bergamot
    May 30, 2008
Loading...

Share This Page