Splitting a filename

Discussion in 'Perl Misc' started by Noel Sant, Mar 7, 2007.

  1. Noel Sant

    Noel Sant Guest

    I want to split a filename into the name itself and the extension. This:

    ($name, $extension) = split /\./, $input_file;

    works fine, providing there's only one dot in the filename, but if there are
    more I just get the first two bits of name. I really want to get the last
    bit into $extension and all the rest, including dots, into $name.

    I suppose I could use an array on the left-hand side, find out how many
    element there are and just build up $name from all the arrays bar the last,
    but this seems long-winded. I tried using "split /\.$/, ..." but then I got
    evrything in $name, and $extension was undefined. As though it's just
    looking at the end of the string and saying "Nope! no dot there" and not
    going any further back. Obviously I don't understand what $ does.

    How do I say "just match on the last dot", please?
    Noel Sant, Mar 7, 2007
    #1
    1. Advertising

  2. Noel Sant

    DJ Stunks Guest

    On Mar 7, 10:10 am, "Noel Sant" <> wrote:
    > I want to split a filename into the name itself and the extension


    perldoc File::Basename

    -jp
    DJ Stunks, Mar 7, 2007
    #2
    1. Advertising

  3. Noel Sant

    Paul Lalli Guest

    On Mar 7, 1:10 pm, "Noel Sant" <> wrote:
    > I want to split a filename into the name itself and the extension. This:
    >
    > ($name, $extension) = split /\./, $input_file;
    >
    > works fine, providing there's only one dot in the filename, but if there are
    > more I just get the first two bits of name. I really want to get the last
    > bit into $extension and all the rest, including dots, into $name.
    >
    > I suppose I could use an array on the left-hand side, find out how many
    > element there are and just build up $name from all the arrays bar the last,
    > but this seems long-winded. I tried using "split /\.$/, ..." but then I got
    > evrything in $name, and $extension was undefined. As though it's just
    > looking at the end of the string and saying "Nope! no dot there" and not
    > going any further back. Obviously I don't understand what $ does.
    >
    > How do I say "just match on the last dot", please?


    The answer to your actual question is to only split on a dot that's
    not followed by anything which includes dots:

    my ($name, $ext) = split /\.(?!.*\.)/, $file;
    (read about lookaheads in `perldoc perlre`)

    However, the correct answer to the problem you're actually trying to
    solve is to stop reinventing the wheel:
    perldoc File::Basename

    Paul Lalli
    Paul Lalli, Mar 7, 2007
    #3
  4. Noel Sant

    Mirco Wahab Guest

    Noel Sant wrote:
    > I want to split a filename into the name itself and the extension. This:
    >
    > ($name, $extension) = split /\./, $input_file;
    >
    > works fine, providing there's only one dot in the filename, but if there are
    > more I just get the first two bits of name. I really want to get the last
    > bit into $extension and all the rest, including dots, into $name.
    >
    > I suppose I could use an array on the left-hand side, find out how many
    > element there are and just build up $name from all the arrays bar the last,
    > but this seems long-winded. I tried using "split /\.$/, ..." but then I got
    > evrything in $name, and $extension was undefined. As though it's just
    > looking at the end of the string and saying "Nope! no dot there" and not
    > going any further back. Obviously I don't understand what $ does.
    >
    > How do I say "just match on the last dot", please?


    There's a module for it, as Paul and DJS said,
    so try to use it if possible.

    But, for "learning purpose", you could have
    splitted simple filenames by simple regular
    expressions.

    In case we have the 4 "splendid variants", like:

    my @names = qw'
    fi.le.ext
    file.file.
    file
    .ext
    ';

    (note the dots). Then we can "split them apart"
    by, eg.

    my $rg = qr/ (.*) \. (.*) $ | (.*) /x;


    In this case, the 'filename component' is in $1 or in $3
    ($3 => if no dot at all was there), so lets extract that:

    print
    map +($_->[0] || $_->[2] || '(undef)') ."\t". ($_->[1] || '(undef)') ."\n",
    map [ /$rg/g ],
    @names;

    The second (short) map expression applies the
    regular expression and converts ($1,$2,$3) to
    a list (reference).
    The "complicated looking" first map expression
    only serves the purpose to give some fancy
    output, in our case:

    fi.le ext
    file.file (undef)
    file (undef)
    (undef) ext

    we want to see which parts are 'defined'
    and which are not.

    Regards

    M.
    Mirco Wahab, Mar 7, 2007
    #4
  5. Noel Sant

    Uri Guttman Guest

    >>>>> "PV" == Petr Vileta <> writes:

    PV> I use my own function. Maybe stupid but 100% functional ;-)

    very stupid!

    PV> sub parsename {
    PV> my $filename = shift;
    PV> my ($name, $ext) = ('') x2;

    why the initialization of both? $name is ALWAYS set to something below.

    PV> if($filename =~ m/\./) {$filename =~ s/^(.*)(\..*)$/$1$2/; $name =

    are you allowing a empty file name with just a .suffix? what about a
    name with a dot with no suffix. ?

    PV> $1;

    why the destruction and rebuilding of $filename with s///? $filename is
    never used again inside that block. you can use the m// to get the same
    $1 and $2 as the s///.

    PV> $ext = $2; }

    your $ext always has the . which is not typical when breaking up a
    filename and extension.

    why save $1 and $2 when you can just return them?

    PV> else {$name = $filename;}

    your indenting is either very bad or your usenet program ruined it.

    PV> return ($name, $ext);
    PV> }

    this is almost just (untested) this one line sub:

    return $_[0] =~ /^(.*)(\..*)$/ ;

    other than making sure both parts are '' if not matched. that could be
    fixed easily too.

    stick with the module.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Mar 8, 2007
    #5
  6. Noel Sant

    Noel Sant Guest

    Wow!

    As you say, I'll stop trying to re-invent the wheel, and use fileparse.

    In answer to the query, I do use strict in programs, but not for that
    example.

    Many thanks.
    Noel Sant, Mar 8, 2007
    #6
  7. Noel Sant

    -berlin.de Guest

    Michele Dondi <> wrote in comp.lang.perl.misc:
    > On Thu, 8 Mar 2007 01:19:54 +0100, "Petr Vileta"
    > <> wrote:


    [...]

    > Just a rewrite of your code with the same semantics but a saner
    > syntax:
    >
    > sub parsename {
    > local $_ = shift;
    > return($_, '') unless /\./;
    > /^(.*)(\..*)$/;
    > }


    As a side note, I'd avoid localizing $_ if possible. There's a bug
    lurking that bites when $_ is aliased to a value in a tied hash (or
    something exotic like that). Ask Brian McCauley about it, he's
    the resident expert on that bug :)

    Sometimes a one-shot "for" can do the trick:

    sub parsename {
    for ( shift ) {
    return($_, '') unless /\./;
    return /^(.*)(\..*)$/;
    }
    }

    Anno
    -berlin.de, Mar 8, 2007
    #7
  8. Noel Sant

    -berlin.de Guest

    Michele Dondi <> wrote in comp.lang.perl.misc:
    > On Thu, 08 Mar 2007 21:46:42 +0100, Michele Dondi
    > <> wrote:
    >
    > >> sub parsename {
    > >> for ( shift ) {
    > >> return($_, '') unless /\./;
    > >> return /^(.*)(\..*)$/;
    > >> }
    > >> }

    > >
    > >Oh, I'm a big fan of one shot C<for>s. But then I heard about lexical

    >
    > And of course that could even be cast in the form of a single
    > statement:
    >
    > sub parsename { return /\./ ? /^(.*)(\..*)$/ : $_, '' for shift }
    >
    > Although just as obviously I would call that an *abuse*.
    > :)


    It's not entirely equivalent, however. The sub body is parsed as

    return ( /\./ ? /^(.*)(\..*)$/ : $_), '' for shift;

    so it returns a spurious third value (an empty string) when an extension
    is present. This would do:

    return /\./ ? /^(.*)(\..*)$/ : ( $_, '') for shift;

    It's probably better to re-write the regex to capture the right stuff
    in both cases. Also, it shouldn't include the "." in the extension,
    but that's secondary.

    [an embarrassing amount of time passes]

    This isn't as easy as I thought. I haven't found a single regex
    that captures first the name, and then the extension or an empty
    string if there is none, in all cases. I'll leave the solution as
    an exercise.

    It is possible to use two regexes only one of which ever matches,
    but that's no improvement.

    sub parsename { return ( /(.*)\.(.*)/, /^([^.]*)()$/) for shift }

    Anno
    -berlin.de, Mar 10, 2007
    #8
  9. On 10 Mar 2007 00:02:42 GMT, -berlin.de wrote:

    >It's not entirely equivalent, however. The sub body is parsed as
    >
    > return ( /\./ ? /^(.*)(\..*)$/ : $_), '' for shift;


    That's what I thought, too. Then before risking of being too ashamed
    after posting, I did some tests *before*: funnily enough I checked
    with

    my ($name, $ext) = parsename $_;

    and it worked as expected, just by throwing away spurious '', thus
    somewhat to my own surprise I wrongly concluded that it did the right
    altogether. Had I checked with -MO=Deparse,-p I would have known
    better.


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
    Michele Dondi, Mar 10, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?Sm9l?=

    Extract filename from a filename typed by user

    =?Utf-8?B?Sm9l?=, Aug 23, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    1,006
    Travis Murray
    Aug 24, 2004
  2. John Ericson
    Replies:
    0
    Views:
    420
    John Ericson
    Jul 19, 2003
  3. Mark
    Replies:
    0
    Views:
    436
  4. John Dibling
    Replies:
    0
    Views:
    406
    John Dibling
    Jul 19, 2003
  5. Replies:
    1
    Views:
    1,437
    Roland de Ruiter
    Jun 15, 2006
Loading...

Share This Page