How is 'split' working here ?

D

dn.perl

Code :
#!/usr/bin/perl

use strict ;

my $fname = "My-range-20080511-20080514.txt" ;
my ($pattern1, $pattern2);
$pattern1 = '(.*)-(\d+)-(\d+).txt$';

my @parts = split ( /$pattern1/, $fname) ;
print "$parts[0] Z $parts[1] Z $parts[2] Z $parts[3] Z\n" ;
#######################


Output :
Z My-range Z 20080511 Z 20080514 Z

Why is $parts[0] a blank string? Besides, (\d+) is the second bracket
of pattern1, so I would expect it to be the second element, or
$parts[1], of @parts. Yet $parts[1] is something else. I am confused
as to how 'split' is working here.

Please advise. Thanks in advance.
 
P

Paul Lalli

Code :
#!/usr/bin/perl

use strict ;

my $fname = "My-range-20080511-20080514.txt" ;
my ($pattern1, $pattern2);
$pattern1 = '(.*)-(\d+)-(\d+).txt$';

my @parts = split ( /$pattern1/, $fname) ;
print "$parts[0] Z $parts[1] Z $parts[2] Z $parts[3] Z\n" ;
#######################

Output :
 Z My-range Z 20080511 Z 20080514 Z

Why is $parts[0] a blank string? Besides, (\d+) is the second bracket
of pattern1, so I would expect it to be the second element, or
$parts[1], of @parts. Yet $parts[1] is something else. I am confused
as to how 'split' is working here.

Please advise. Thanks in advance.


You are splitting the string "My-range-20080511-20080514.txt". The
pattern on which you are splitting atually matches the entire
pattern. Therefore, there would normally be exactly two returned
elements - an empty string in the front, and an empty string in the
back. By default, split() drops ending empty strings, however.

Make it simpler: Say my pattern is actually /-!-/. Here's what I
would get for splitting each of these strings:

'foo-!-bar' => ('foo', 'bar')

'foo-!--!-bar' => ('foo', '', 'bar')

'-!-bar' => ('', 'bar')

'-!-bar-!-' => ('', 'bar') #remember, trailing empty fields are
dropped

'-!-' => ('') #again, trailing empty fields are
dropped.

Your example is the equivalent of the last example. Your split
pattern matches the entire string. That is why the first element
returned is an empty string.

HOWEVER, you did something else - you used capturing parentheses
within your pattern. When you do that, split returns not only the
pieces of the string that have been split, but also whatever was
captured. Let's take another look. This time, pretend your pattern
is /-(!)-/. That is, the same pattern, but now you're capturing the
exclamation point:

'foo-!-bar' => ('foo', '!', 'bar')

'foo-!--!-bar' => ('foo', '!', '', '!', 'bar')

'-!-bar' => ('', '!', 'bar')

'-!-bar-!-' => ('', '!', 'bar', '!') #remember, trailing empty
fields are dropped

'-!-' => ('', '!') #again, trailing empty
fields are dropped.


This is what you did. Your pattern matches the entire string, so you
get an empty string returned, but you also get one result for each
capturing parentheses.

If you don't want those "extra" results, use non-capturing
parentheses:
$pattern1 = '(?:.*)-(?:\d+)-(?:\d+).txt$';

Alternatively, if what you're actually trying to do is find just those
captured parts, and your confusion is why that empty string appeared
in the first place, the answer is that you shouldn't be using split at
all. You should just be using the normal =~ operator, like so:

my @parts = ($fname =~ /$pattern1/);

Hope that helps,
Paul Lalli
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,062
Latest member
OrderKetozenseACV

Latest Threads

Top