Splitting a filename

N

Noel Sant

I want to split a filename into the name itself and the extension. This:

($name, $extension) = split /\./, $input_file;

works fine, providing there's only one dot in the filename, but if there are
more I just get the first two bits of name. I really want to get the last
bit into $extension and all the rest, including dots, into $name.

I suppose I could use an array on the left-hand side, find out how many
element there are and just build up $name from all the arrays bar the last,
but this seems long-winded. I tried using "split /\.$/, ..." but then I got
evrything in $name, and $extension was undefined. As though it's just
looking at the end of the string and saying "Nope! no dot there" and not
going any further back. Obviously I don't understand what $ does.

How do I say "just match on the last dot", please?
 
P

Paul Lalli

I want to split a filename into the name itself and the extension. This:

($name, $extension) = split /\./, $input_file;

works fine, providing there's only one dot in the filename, but if there are
more I just get the first two bits of name. I really want to get the last
bit into $extension and all the rest, including dots, into $name.

I suppose I could use an array on the left-hand side, find out how many
element there are and just build up $name from all the arrays bar the last,
but this seems long-winded. I tried using "split /\.$/, ..." but then I got
evrything in $name, and $extension was undefined. As though it's just
looking at the end of the string and saying "Nope! no dot there" and not
going any further back. Obviously I don't understand what $ does.

How do I say "just match on the last dot", please?

The answer to your actual question is to only split on a dot that's
not followed by anything which includes dots:

my ($name, $ext) = split /\.(?!.*\.)/, $file;
(read about lookaheads in `perldoc perlre`)

However, the correct answer to the problem you're actually trying to
solve is to stop reinventing the wheel:
perldoc File::Basename

Paul Lalli
 
M

Mirco Wahab

Noel said:
I want to split a filename into the name itself and the extension. This:

($name, $extension) = split /\./, $input_file;

works fine, providing there's only one dot in the filename, but if there are
more I just get the first two bits of name. I really want to get the last
bit into $extension and all the rest, including dots, into $name.

I suppose I could use an array on the left-hand side, find out how many
element there are and just build up $name from all the arrays bar the last,
but this seems long-winded. I tried using "split /\.$/, ..." but then I got
evrything in $name, and $extension was undefined. As though it's just
looking at the end of the string and saying "Nope! no dot there" and not
going any further back. Obviously I don't understand what $ does.

How do I say "just match on the last dot", please?

There's a module for it, as Paul and DJS said,
so try to use it if possible.

But, for "learning purpose", you could have
splitted simple filenames by simple regular
expressions.

In case we have the 4 "splendid variants", like:

my @names = qw'
fi.le.ext
file.file.
file
.ext
';

(note the dots). Then we can "split them apart"
by, eg.

my $rg = qr/ (.*) \. (.*) $ | (.*) /x;


In this case, the 'filename component' is in $1 or in $3
($3 => if no dot at all was there), so lets extract that:

print
map +($_->[0] || $_->[2] || '(undef)') ."\t". ($_->[1] || '(undef)') ."\n",
map [ /$rg/g ],
@names;

The second (short) map expression applies the
regular expression and converts ($1,$2,$3) to
a list (reference).
The "complicated looking" first map expression
only serves the purpose to give some fancy
output, in our case:

fi.le ext
file.file (undef)
file (undef)
(undef) ext

we want to see which parts are 'defined'
and which are not.

Regards

M.
 
U

Uri Guttman

PV> I use my own function. Maybe stupid but 100% functional ;-)

very stupid!

PV> sub parsename {
PV> my $filename = shift;
PV> my ($name, $ext) = ('') x2;

why the initialization of both? $name is ALWAYS set to something below.

PV> if($filename =~ m/\./) {$filename =~ s/^(.*)(\..*)$/$1$2/; $name =

are you allowing a empty file name with just a .suffix? what about a
name with a dot with no suffix. ?

PV> $1;

why the destruction and rebuilding of $filename with s///? $filename is
never used again inside that block. you can use the m// to get the same
$1 and $2 as the s///.

PV> $ext = $2; }

your $ext always has the . which is not typical when breaking up a
filename and extension.

why save $1 and $2 when you can just return them?

PV> else {$name = $filename;}

your indenting is either very bad or your usenet program ruined it.

PV> return ($name, $ext);
PV> }

this is almost just (untested) this one line sub:

return $_[0] =~ /^(.*)(\..*)$/ ;

other than making sure both parts are '' if not matched. that could be
fixed easily too.

stick with the module.

uri
 
N

Noel Sant

Wow!

As you say, I'll stop trying to re-invent the wheel, and use fileparse.

In answer to the query, I do use strict in programs, but not for that
example.

Many thanks.
 
A

anno4000

Michele Dondi said:
]

Just a rewrite of your code with the same semantics but a saner
syntax:

sub parsename {
local $_ = shift;
return($_, '') unless /\./;
/^(.*)(\..*)$/;
}

As a side note, I'd avoid localizing $_ if possible. There's a bug
lurking that bites when $_ is aliased to a value in a tied hash (or
something exotic like that). Ask Brian McCauley about it, he's
the resident expert on that bug :)

Sometimes a one-shot "for" can do the trick:

sub parsename {
for ( shift ) {
return($_, '') unless /\./;
return /^(.*)(\..*)$/;
}
}

Anno
 
A

anno4000

Michele Dondi said:
And of course that could even be cast in the form of a single
statement:

sub parsename { return /\./ ? /^(.*)(\..*)$/ : $_, '' for shift }

Although just as obviously I would call that an *abuse*.
:)

It's not entirely equivalent, however. The sub body is parsed as

return ( /\./ ? /^(.*)(\..*)$/ : $_), '' for shift;

so it returns a spurious third value (an empty string) when an extension
is present. This would do:

return /\./ ? /^(.*)(\..*)$/ : ( $_, '') for shift;

It's probably better to re-write the regex to capture the right stuff
in both cases. Also, it shouldn't include the "." in the extension,
but that's secondary.

[an embarrassing amount of time passes]

This isn't as easy as I thought. I haven't found a single regex
that captures first the name, and then the extension or an empty
string if there is none, in all cases. I'll leave the solution as
an exercise.

It is possible to use two regexes only one of which ever matches,
but that's no improvement.

sub parsename { return ( /(.*)\.(.*)/, /^([^.]*)()$/) for shift }

Anno
 
M

Michele Dondi

It's not entirely equivalent, however. The sub body is parsed as

return ( /\./ ? /^(.*)(\..*)$/ : $_), '' for shift;

That's what I thought, too. Then before risking of being too ashamed
after posting, I did some tests *before*: funnily enough I checked
with

my ($name, $ext) = parsename $_;

and it worked as expected, just by throwing away spurious '', thus
somewhat to my own surprise I wrongly concluded that it did the right
altogether. Had I checked with -MO=Deparse,-p I would have known
better.


Michele
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top