Problem with split

Ted · Jul 4, 2006

I have appended a simple test script.

The context is this. There is a directory in which all of the files
have been created with file names consisting of a root, a six digit
date, and a 3 character extension, all separated by a period. I
expected the call to split would give me the three components of the
file name. It doesn't. When I run this scriplet, I find
$file_root,$bad_date, and $fext remain empty and the print statement in
the conditional block gives me the original file name rather than the
root concatenated with the extension (i.e. the result should be, but
isn't, the original file name with the six digit date removed).

Any ideas as to what I missed?

Ted
===============================================
$some_dir = "C:/FVA/data/univeris/univeris0608";

opendir (DIR, $some_dir) || die "can't opendir $some_dir\n";
@fnames = readdir(DIR);
closedir DIR;

$count = @fnames;
print $count;print "\n";
$c = 0;
my %file_names;
for ( $i = 0 ; $i < $count ; ++$i ) {
++$c;
print "$fnames[$i]\n";
($file_root,$bad_date,$fext) = split(/./,$fnames[$i]);
print "file root = $file_root\nbad date = $bad_date\nfile extention =
$fext\n";
if ( length($file_root) > 0) {
$file_names{$fnames[$i]} = "$file_root.$fext";
print $file_names{$fnames[$i]};print "\n\n";
}
}
print "\nThere are $c files in $some_dir.\n";

xhoster · Jul 4, 2006

Ted said:
I have appended a simple test script.

The context is this. There is a directory in which all of the files
have been created with file names consisting of a root, a six digit
date, and a 3 character extension, all separated by a period.

Split takes a regex. In regexes, an unescaped period matches any
character (other than \n). So you are splitting on every single
character.

($file_root,$bad_date,$fext) = split(/./,$fnames[$i]);

Xho

Dr.Ruud · Jul 4, 2006

Ted schreef:

I have appended a simple test script.

The context is this. There is a directory in which all of the files
have been created with file names consisting of a root, a six digit
date, and a 3 character extension, all separated by a period. I
expected the call to split would give me the three components of the
file name. It doesn't. When I run this scriplet, I find
$file_root,$bad_date, and $fext remain empty and the print statement
in the conditional block gives me the original file name rather than
the root concatenated with the extension (i.e. the result should be,
but isn't, the original file name with the six digit date removed).

Any ideas as to what I missed?

Ted
===============================================

Missing:

use strict ;
use warnings ;

(and sprinkle some 'my')

$some_dir = "C:/FVA/data/univeris/univeris0608";

opendir (DIR, $some_dir) || die "can't opendir $some_dir\n";
@fnames = readdir(DIR);
closedir DIR;

$count = @fnames;
print $count;print "\n";

$c = 0;
my %file_names;
for ( $i = 0 ; $i < $count ; ++$i ) {

You don't need the $i anywhere in the for-loop, so make that

for my $fname (@fnames) { ... }

++$c;
print "$fnames[$i]\n";
($file_root,$bad_date,$fext) = split(/./,$fnames[$i]);

The dot inside /./ doesn not mean what you think it means.

There will be a '.' and '..' in your dir, you can use -f to skip those.

print "file root = $file_root\nbad date = $bad_date\nfile extention
= $fext\n";
if ( length($file_root) > 0) {
$file_names{$fnames[$i]} = "$file_root.$fext";
print $file_names{$fnames[$i]};print "\n\n";
}
}
print "\nThere are $c files in $some_dir.\n";

Justin C · Jul 4, 2006

I have appended a simple test script.

The context is this. There is a directory in which all of the files
have been created with file names consisting of a root, a six digit
date, and a 3 character extension, all separated by a period. I

[snip]

Disclaimer: I know nothing!

I had something similar today, I just escaped the '.':

my ( $a, $b, $c ) = split /\./ ;

But there are probably a million reasons *not* to do that... one of
which will be along any minute now.

Justin.

A. Sinan Unur · Jul 5, 2006

....

my ( $a, $b, $c ) = split /\./ ;

But there are probably a million reasons *not* to do that...

Nope, in this context, that is the right thing to do, because the first
argument of split is a regex, and . is a special character in regexes, so
it needs to be escaped when you want it only match a period.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html

DJ Stunks · Jul 5, 2006

A. Sinan Unur said:
Nope, in this context, that is the right thing to do, because the first
argument of split is a regex, and . is a special character in regexes, so
it needs to be escaped when you want it only match a period.

Question - when he was splitting on /./ why did he get an empty list?

Observe:
C:\>perl -Mstrict -we "print qq{[$_]} for split /./,'some.string'"

C:\>perl -Mstrict -we "print qq{[$_]} for split /s/,'some.string'"
[][ome.][tring]
C:\>perl -Mstrict -we "print qq{[$_]} for split /\./,'some.string'"
[some][string]
C:\>perl -Mstrict -we "print qq{[$_]} for split //,'some.string'"
[o][m][e][.][t][r][n][g]

Wacky? I would have expected /./ to function essentially like //....

This is perl, v5.8.7 built for MSWin32-x86-multi-thread

-jp

DJ Stunks · Jul 5, 2006

DJ said:
A. Sinan Unur said:

Nope, in this context, that is the right thing to do, because the first
argument of split is a regex, and . is a special character in regexes, so
it needs to be escaped when you want it only match a period.

Click to expand...

Question - when he was splitting on /./ why did he get an empty list?

Observe:
C:\>perl -Mstrict -we "print qq{[$_]} for split /./,'some.string'"

C:\>perl -Mstrict -we "print qq{[$_]} for split /s/,'some.string'"
[][ome.][tring]
C:\>perl -Mstrict -we "print qq{[$_]} for split /\./,'some.string'"
[some][string]
C:\>perl -Mstrict -we "print qq{[$_]} for split //,'some.string'"
[o][m][e][.][t][r][n][g]

Wacky? I would have expected /./ to function essentially like //....

Nevermind, I understand now.... I should have included a couple more
test cases:

C:\>perl -MList::MoreUtils=uniq -e "print uniq split
//,'some.string'"
some.tring
C:\>perl -e "print qq{[$_]} for split /[some.tring]/,'some.string'"

C:\>perl -e "print qq{[$_]} for split /[some.ring]/,'some.string'"
[][][][][][][t]

Pardon the interruption...

-jp

A. Sinan Unur · Jul 5, 2006

DJ Stunks wrote:
....

Question - when he was splitting on /./ why did he get an empty list?

Observe:
C:\>perl -Mstrict -we "print qq{[$_]} for split /./,'some.string'"

Click to expand...

Every character is a separator. Therefore, there are no fields.

C:\>perl -Mstrict -we "print qq{[$_]} for split /s/,'some.string'"
[][ome.][tring]

Click to expand...

's' is a separator. split preserves empty leading fields.

C:\>perl -Mstrict -we "print qq{[$_]} for split /\./,'some.string'"
[some][string]

Click to expand...

'.' is a separator.

C:\>perl -Mstrict -we "print qq{[$_]} for split //,'some.string'"
[o][m][e][.][t][r][n][g]

Click to expand...

Every character is a field because the separator is empty pattern.

Very logical.

Nevermind, I understand now.... I should have included a couple more
test cases:

Click to expand...

I thought a verbal explanation might be useful to some.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html

Joe Smith · Jul 5, 2006

DJ said:
Question - when he was splitting on /./ why did he get an empty list?

The first argument to split is what _not_ to return.

Dr.Ruud · Jul 5, 2006

Joe Smith schreef:

DJ Stunks:

The first argument to split is what _not_ to return.

Unless capturing paren's are used, see perldoc -f split.

John W. Krahn · Jul 5, 2006

A. Sinan Unur said:
DJ Stunks wrote:
...

Question - when he was splitting on /./ why did he get an empty list?

Observe:
C:\>perl -Mstrict -we "print qq{[$_]} for split /./,'some.string'"

Click to expand...

Click to expand...

Every character is a separator. Therefore, there are no fields.

Actually there are a lot of fields, its just that empty trailing fields aren't
returned from split unless you use a negative third argument:

$ perl -le 'print map qq{[$_]}, split /./, q[some.string]'

$ perl -le 'print map qq{[$_]}, split /./, q[some.string], -1'
[][][][][][][][][][][][]

John

Ian Wilson · Jul 5, 2006

John said:
A. Sinan Unur said:

DJ Stunks wrote:
...

Question - when he was splitting on /./ why did he get an empty list?

Observe:
C:\>perl -Mstrict -we "print qq{[$_]} for split /./,'some.string'"

Click to expand...

Every character is a separator. Therefore, there are no fields.

Click to expand...

Actually there are a lot of fields, its just that empty trailing fields aren't
returned from split unless you use a negative third argument:

Indeed, on my system, `perldoc -f split` says

split Splits a string into a list of strings and returns that list.
By default, empty leading fields are preserved, and empty
trailing ones are deleted.

It seems that when there are no non-empty fields, all the empty fields
are (arbitrarily?) deemed to be trailing ones, and are thus deleted from
the result.

$ perl -le 'print map qq{[$_]}, split /./, q[some.string]'

$ perl -le 'print map qq{[$_]}, split /./, q[some.string], -1'
[][][][][][][][][][][][]

Empty "trailing" fields *can* be returned without using a negative third
argument. Perhaps the following were too obvious to be worth mentioning?

$ perl -le 'print map qq{[$_]}, split /./, q[some.string], 12'
[][][][][][][][][][][][]

$ perl -le 'print map qq{[$_]}, split /./, q[some.string], 99'
[][][][][][][][][][][][]

I can see the negative value is more generally useful, but I note that
it isn't the only value that can return empty "trailing" fields.

anno4000 · Jul 5, 2006

[snip]

Indeed, on my system, `perldoc -f split` says

split Splits a string into a list of strings and returns that list.
By default, empty leading fields are preserved, and empty
trailing ones are deleted.

It seems that when there are no non-empty fields, all the empty fields
are (arbitrarily?) deemed to be trailing ones, and are thus deleted from
the result.

The documentation is a bit misleading in explicitly saying that "leading
empty fields are preserved". What actually happens is that trailing
empty fields are removed while leading ones are not treated specially
at all. When the leading empty fields also happen to be trailing ones,
as is the case here, all fields are removed.

Anno

John W. Krahn · Jul 5, 2006

Ian said:
John said:

Actually there are a lot of fields, its just that empty trailing
fields aren't
returned from split unless you use a negative third argument:

Click to expand...

Indeed, on my system, `perldoc -f split` says

split Splits a string into a list of strings and returns that list.
By default, empty leading fields are preserved, and empty
trailing ones are deleted.

It seems that when there are no non-empty fields, all the empty fields
are (arbitrarily?) deemed to be trailing ones, and are thus deleted from
the result.

$ perl -le 'print map qq{[$_]}, split /./, q[some.string]'

$ perl -le 'print map qq{[$_]}, split /./, q[some.string], -1'
[][][][][][][][][][][][]

Click to expand...

Empty "trailing" fields *can* be returned without using a negative third
argument. Perhaps the following were too obvious to be worth mentioning?

$ perl -le 'print map qq{[$_]}, split /./, q[some.string], 12'
[][][][][][][][][][][][]

$ perl -le 'print map qq{[$_]}, split /./, q[some.string], 99'
[][][][][][][][][][][][]

I can see the negative value is more generally useful, but I note that
it isn't the only value that can return empty "trailing" fields.

The problem with using positive numbers is that you have to know beforehand
how much the regular expression will match and how long the string is. Using
a negative number will always return all empty trailing fields.

John

Ted · Jul 5, 2006

Thanks one and all. I not only learned what I missed in my regular
expression, and a lot of other stuff about regular expressions, but
also some additional Perl idioms.

Thanks.

Ted

Need help with this script	4	Mar 12, 2023
Perl Scripting problem - please help	1	Mar 18, 2005
Problem Splitting Text String	2	Dec 29, 2022
Problem with codewars.	5	Dec 4, 2023
Cant encrypt a server disk with fernet PYTHON3	0	Jun 6, 2022
Information with WMI in Python.	1	Feb 28, 2023
Select files based on text list of filenames(part of the name:date) with condition	0	May 4, 2022
Problem with simple pthread program in C	1	Mar 9, 2023

Problem with split

Ted

xhoster

Dr.Ruud

Justin C

A. Sinan Unur

DJ Stunks

DJ Stunks

A. Sinan Unur

Joe Smith

Dr.Ruud

John W. Krahn

Ian Wilson

anno4000

John W. Krahn

Ted

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads