Problem with split

T

Ted

I have appended a simple test script.

The context is this. There is a directory in which all of the files
have been created with file names consisting of a root, a six digit
date, and a 3 character extension, all separated by a period. I
expected the call to split would give me the three components of the
file name. It doesn't. When I run this scriplet, I find
$file_root,$bad_date, and $fext remain empty and the print statement in
the conditional block gives me the original file name rather than the
root concatenated with the extension (i.e. the result should be, but
isn't, the original file name with the six digit date removed).

Any ideas as to what I missed?

Ted
===============================================
$some_dir = "C:/FVA/data/univeris/univeris0608";

opendir (DIR, $some_dir) || die "can't opendir $some_dir\n";
@fnames = readdir(DIR);
closedir DIR;

$count = @fnames;
print $count;print "\n";
$c = 0;
my %file_names;
for ( $i = 0 ; $i < $count ; ++$i ) {
++$c;
print "$fnames[$i]\n";
($file_root,$bad_date,$fext) = split(/./,$fnames[$i]);
print "file root = $file_root\nbad date = $bad_date\nfile extention =
$fext\n";
if ( length($file_root) > 0) {
$file_names{$fnames[$i]} = "$file_root.$fext";
print $file_names{$fnames[$i]};print "\n\n";
}
}
print "\nThere are $c files in $some_dir.\n";
 
X

xhoster

Ted said:
I have appended a simple test script.

The context is this. There is a directory in which all of the files
have been created with file names consisting of a root, a six digit
date, and a 3 character extension, all separated by a period.

Split takes a regex. In regexes, an unescaped period matches any
character (other than \n). So you are splitting on every single
character.
($file_root,$bad_date,$fext) = split(/./,$fnames[$i]);


Xho
 
D

Dr.Ruud

Ted schreef:
I have appended a simple test script.

The context is this. There is a directory in which all of the files
have been created with file names consisting of a root, a six digit
date, and a 3 character extension, all separated by a period. I
expected the call to split would give me the three components of the
file name. It doesn't. When I run this scriplet, I find
$file_root,$bad_date, and $fext remain empty and the print statement
in the conditional block gives me the original file name rather than
the root concatenated with the extension (i.e. the result should be,
but isn't, the original file name with the six digit date removed).

Any ideas as to what I missed?

Ted
===============================================

Missing:

use strict ;
use warnings ;

(and sprinkle some 'my')
$some_dir = "C:/FVA/data/univeris/univeris0608";

opendir (DIR, $some_dir) || die "can't opendir $some_dir\n";
@fnames = readdir(DIR);
closedir DIR;

$count = @fnames;
print $count;print "\n";
$c = 0;
my %file_names;
for ( $i = 0 ; $i < $count ; ++$i ) {

You don't need the $i anywhere in the for-loop, so make that

for my $fname (@fnames) { ... }

++$c;
print "$fnames[$i]\n";
($file_root,$bad_date,$fext) = split(/./,$fnames[$i]);

The dot inside /./ doesn not mean what you think it means.

There will be a '.' and '..' in your dir, you can use -f to skip those.

print "file root = $file_root\nbad date = $bad_date\nfile extention
= $fext\n";
if ( length($file_root) > 0) {
$file_names{$fnames[$i]} = "$file_root.$fext";
print $file_names{$fnames[$i]};print "\n\n";
}
}
print "\nThere are $c files in $some_dir.\n";
 
J

Justin C

I have appended a simple test script.

The context is this. There is a directory in which all of the files
have been created with file names consisting of a root, a six digit
date, and a 3 character extension, all separated by a period. I

[snip]

Disclaimer: I know nothing!

I had something similar today, I just escaped the '.':

my ( $a, $b, $c ) = split /\./ ;

But there are probably a million reasons *not* to do that... one of
which will be along any minute now.


Justin.
 
A

A. Sinan Unur

....

my ( $a, $b, $c ) = split /\./ ;

But there are probably a million reasons *not* to do that...

Nope, in this context, that is the right thing to do, because the first
argument of split is a regex, and . is a special character in regexes, so
it needs to be escaped when you want it only match a period.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
 
D

DJ Stunks

A. Sinan Unur said:
Nope, in this context, that is the right thing to do, because the first
argument of split is a regex, and . is a special character in regexes, so
it needs to be escaped when you want it only match a period.

Question - when he was splitting on /./ why did he get an empty list?

Observe:
C:\>perl -Mstrict -we "print qq{[$_]} for split /./,'some.string'"

C:\>perl -Mstrict -we "print qq{[$_]} for split /s/,'some.string'"
[][ome.][tring]
C:\>perl -Mstrict -we "print qq{[$_]} for split /\./,'some.string'"
[some][string]
C:\>perl -Mstrict -we "print qq{[$_]} for split //,'some.string'"
[o][m][e][.][t][r][n][g]

Wacky? I would have expected /./ to function essentially like //....

This is perl, v5.8.7 built for MSWin32-x86-multi-thread

-jp
 
D

DJ Stunks

DJ said:
A. Sinan Unur said:
Nope, in this context, that is the right thing to do, because the first
argument of split is a regex, and . is a special character in regexes, so
it needs to be escaped when you want it only match a period.

Question - when he was splitting on /./ why did he get an empty list?

Observe:
C:\>perl -Mstrict -we "print qq{[$_]} for split /./,'some.string'"

C:\>perl -Mstrict -we "print qq{[$_]} for split /s/,'some.string'"
[][ome.][tring]
C:\>perl -Mstrict -we "print qq{[$_]} for split /\./,'some.string'"
[some][string]
C:\>perl -Mstrict -we "print qq{[$_]} for split //,'some.string'"
[o][m][e][.][t][r][n][g]

Wacky? I would have expected /./ to function essentially like //....


Nevermind, I understand now.... I should have included a couple more
test cases:

C:\>perl -MList::MoreUtils=uniq -e "print uniq split
//,'some.string'"
some.tring
C:\>perl -e "print qq{[$_]} for split /[some.tring]/,'some.string'"

C:\>perl -e "print qq{[$_]} for split /[some.ring]/,'some.string'"
[][][][][][][t]

Pardon the interruption...

-jp
 
A

A. Sinan Unur

DJ Stunks wrote:
....
Question - when he was splitting on /./ why did he get an empty list?

Observe:
C:\>perl -Mstrict -we "print qq{[$_]} for split /./,'some.string'"

Every character is a separator. Therefore, there are no fields.
C:\>perl -Mstrict -we "print qq{[$_]} for split /s/,'some.string'"
[][ome.][tring]

's' is a separator. split preserves empty leading fields.
C:\>perl -Mstrict -we "print qq{[$_]} for split /\./,'some.string'"
[some][string]

'.' is a separator.
C:\>perl -Mstrict -we "print qq{[$_]} for split //,'some.string'"
[o][m][e][.][t][r][n][g]


Every character is a field because the separator is empty pattern.

Very logical.
Nevermind, I understand now.... I should have included a couple more
test cases:

I thought a verbal explanation might be useful to some.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
 
J

John W. Krahn

A. Sinan Unur said:
DJ Stunks wrote:
...
Question - when he was splitting on /./ why did he get an empty list?

Observe:
C:\>perl -Mstrict -we "print qq{[$_]} for split /./,'some.string'"

Every character is a separator. Therefore, there are no fields.

Actually there are a lot of fields, its just that empty trailing fields aren't
returned from split unless you use a negative third argument:

$ perl -le 'print map qq{[$_]}, split /./, q[some.string]'

$ perl -le 'print map qq{[$_]}, split /./, q[some.string], -1'
[][][][][][][][][][][][]



John
 
I

Ian Wilson

John said:
A. Sinan Unur said:
DJ Stunks wrote:
...


Question - when he was splitting on /./ why did he get an empty list?

Observe:
C:\>perl -Mstrict -we "print qq{[$_]} for split /./,'some.string'"

Every character is a separator. Therefore, there are no fields.


Actually there are a lot of fields, its just that empty trailing fields aren't
returned from split unless you use a negative third argument:

Indeed, on my system, `perldoc -f split` says

split Splits a string into a list of strings and returns that list.
By default, empty leading fields are preserved, and empty
trailing ones are deleted.

It seems that when there are no non-empty fields, all the empty fields
are (arbitrarily?) deemed to be trailing ones, and are thus deleted from
the result.
$ perl -le 'print map qq{[$_]}, split /./, q[some.string]'

$ perl -le 'print map qq{[$_]}, split /./, q[some.string], -1'
[][][][][][][][][][][][]

Empty "trailing" fields *can* be returned without using a negative third
argument. Perhaps the following were too obvious to be worth mentioning?

$ perl -le 'print map qq{[$_]}, split /./, q[some.string], 12'
[][][][][][][][][][][][]

$ perl -le 'print map qq{[$_]}, split /./, q[some.string], 99'
[][][][][][][][][][][][]

I can see the negative value is more generally useful, but I note that
it isn't the only value that can return empty "trailing" fields.
 
A

anno4000

[snip]
Indeed, on my system, `perldoc -f split` says

split Splits a string into a list of strings and returns that list.
By default, empty leading fields are preserved, and empty
trailing ones are deleted.

It seems that when there are no non-empty fields, all the empty fields
are (arbitrarily?) deemed to be trailing ones, and are thus deleted from
the result.

The documentation is a bit misleading in explicitly saying that "leading
empty fields are preserved". What actually happens is that trailing
empty fields are removed while leading ones are not treated specially
at all. When the leading empty fields also happen to be trailing ones,
as is the case here, all fields are removed.

Anno
 
J

John W. Krahn

Ian said:
John said:
Actually there are a lot of fields, its just that empty trailing
fields aren't
returned from split unless you use a negative third argument:

Indeed, on my system, `perldoc -f split` says

split Splits a string into a list of strings and returns that list.
By default, empty leading fields are preserved, and empty
trailing ones are deleted.

It seems that when there are no non-empty fields, all the empty fields
are (arbitrarily?) deemed to be trailing ones, and are thus deleted from
the result.
$ perl -le 'print map qq{[$_]}, split /./, q[some.string]'

$ perl -le 'print map qq{[$_]}, split /./, q[some.string], -1'
[][][][][][][][][][][][]

Empty "trailing" fields *can* be returned without using a negative third
argument. Perhaps the following were too obvious to be worth mentioning?

$ perl -le 'print map qq{[$_]}, split /./, q[some.string], 12'
[][][][][][][][][][][][]

$ perl -le 'print map qq{[$_]}, split /./, q[some.string], 99'
[][][][][][][][][][][][]

I can see the negative value is more generally useful, but I note that
it isn't the only value that can return empty "trailing" fields.

The problem with using positive numbers is that you have to know beforehand
how much the regular expression will match and how long the string is. Using
a negative number will always return all empty trailing fields.


John
 
T

Ted

Thanks one and all. I not only learned what I missed in my regular
expression, and a lot of other stuff about regular expressions, but
also some additional Perl idioms.

Thanks.

Ted
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,135
Latest member
VeronaShap
Top