Where is "split on '.'" documented?

Tim Shoppa · Jan 4, 2004

I've found that the code fragment

my @p = split ".",$s;

doesn't behave as I would expect it.

For example:

print (join "\n",(split ".","some.text.with.dots.in.between"));

produces no output, and this

print scalar split(".","some.text.with.dots.in.between");

gives me a fat 0. If I replace all the dots with commas or other
"ordinary" characters then it works like I expect.

Now, when I do read the documentation on split it mentions that
split " " works like awk, throwing away leading and trailing spaces
and otherwise matching on /\s+/. But that's not my case.

I suspect that split is special-casing the "." into some special-
purpose pattern but I cannot find documentation on what it is doing.
Can anyone point me in the right direction?

This is perl 5.8.2, if it matters. (Same thing when I go back to 5.6.0,
as far as I can tell.)

Tim.

Marc Bissonnette · Jan 4, 2004

(e-mail address removed) (Tim Shoppa) wrote in

I've found that the code fragment

my @p = split ".",$s;

doesn't behave as I would expect it.

For example:

print (join "\n",(split ".","some.text.with.dots.in.between"));

produces no output, and this

print scalar split(".","some.text.with.dots.in.between");

gives me a fat 0. If I replace all the dots with commas or other
"ordinary" characters then it works like I expect.

Now, when I do read the documentation on split it mentions that
split " " works like awk, throwing away leading and trailing spaces
and otherwise matching on /\s+/. But that's not my case.

I suspect that split is special-casing the "." into some special-
purpose pattern but I cannot find documentation on what it is doing.
Can anyone point me in the right direction?

This is perl 5.8.2, if it matters. (Same thing when I go back to 5.6.0,
as far as I can tell.)

Shouldn't it be
my $p = split /\./,$s;

?

Web Surfer · Jan 4, 2004

[This followup was posted to comp.lang.perl.misc]

I've found that the code fragment

my @p = split ".",$s;

doesn't behave as I would expect it.

For example:

print (join "\n",(split ".","some.text.with.dots.in.between"));

produces no output, and this

print scalar split(".","some.text.with.dots.in.between");

gives me a fat 0. If I replace all the dots with commas or other
"ordinary" characters then it works like I expect.

Now, when I do read the documentation on split it mentions that
split " " works like awk, throwing away leading and trailing spaces
and otherwise matching on /\s+/. But that's not my case.

I suspect that split is special-casing the "." into some special-
purpose pattern but I cannot find documentation on what it is doing.
Can anyone point me in the right direction?

This is perl 5.8.2, if it matters. (Same thing when I go back to 5.6.0,
as far as I can tell.)

Tim.

Try this :

print scalar split(/\./,"some.text.with.dots.in.between");

On my Windows XP system running Perl 5.8.0 it produces the desired
result.

You have to remember that "." is a "special" character as far as regular
expressions are concerned, so if you want to ACTUALLY match a "." you
have to "escape" it with a backslash.

John W. Krahn · Jan 4, 2004

Tim said:
I've found that the code fragment

my @p = split ".",$s;

doesn't behave as I would expect it.

For example:

print (join "\n",(split ".","some.text.with.dots.in.between"));

produces no output, and this

print scalar split(".","some.text.with.dots.in.between");

gives me a fat 0. If I replace all the dots with commas or other
"ordinary" characters then it works like I expect.

Now, when I do read the documentation on split it mentions that
split " " works like awk, throwing away leading and trailing spaces
and otherwise matching on /\s+/. But that's not my case.

I suspect that split is special-casing the "." into some special-
purpose pattern but I cannot find documentation on what it is doing.
Can anyone point me in the right direction?

The first argument to split is a regular expression or will be converted
to a regular expression by split except for ' ' which is a special
case. In a regular expression the . character means to match any
character except the newline character (like [^\n]). Since your string
contains no newlines split will return undef for every match and since
you are assigning to an array the array will receive nothing as trailing
undef are not assigned to an array.

$ perl -le'$x = "one two"; @x = split /./, $x; print scalar @x'
0
$ perl -le'$x = "one two"; @x = split /./, $x, -1; print scalar @x'
8

John

Ben Morrow · Jan 4, 2004

print scalar split(".","some.text.with.dots.in.between");

perldoc -f split
| Use of split in scalar context is deprecated, however, because it
| clobbers your subroutine arguments.

If you must, you can use

print scalar @{[ split /\./, "some.text.with.dots" ]};

which calls split in list context and then applies scalar() to that.

Ben

Uri Guttman · Jan 4, 2004

MG> print scalar split(/\./,"some.text.with.dots.in.between");
MG> print scalar split('\.',"some.text.with.dots.in.between");
MG> print scalar split("\\.","some.text.with.dots.in.between");

MG> I can see how a person might stumble over the OP's problem,
MG> though. It would probably be useful to explain in the
MG> documentation why the above all work.

split /PATTERN/,EXPR,LIMIT
split /PATTERN/,EXPR
split /PATTERN/

what is it about PATTERN that needs more explanation? it doesn't say
expression or string.

Anything matching PATTERN is taken to be a delimiter separating
the fields.

is that enough to explain PATTERN?

The pattern "/PATTERN/" may be replaced with an expression to
specify patterns that vary at runtime. (To do runtime
compilation only once, use "/$variable/o".)

and that?

what more would you want? the first argument to split is a PATTERN. also
known as a regex, regular expression

uri

Matt Garrish · Jan 5, 2004

René Larsen said:
You haven't read far enough. *All* the examples from "perldoc -f split"
show a pattern as the first argument.

print scalar split(/\./,"some.text.with.dots.in.between");
print scalar split('\.',"some.text.with.dots.in.between");
print scalar split("\\.","some.text.with.dots.in.between");

I can see how a person might stumble over the OP's problem, though. It would
probably be useful to explain in the documentation why the above all work.

Matt

Rafael Garcia-Suarez · Jan 5, 2004

Michele said:
Said this, I cannot see any mention of *other* "non-pattern-split()s",
and I guess that no pre-defined behaviour is documented. However it
seems to be perfectly legal since -wMstrict doesn't complain; OTOH

perl -MO=Deparse -le 'print split ".", "a.b.c"'

yields:

print split(/./, 'a.b.c', 0);
-e syntax OK

so it seems that stings other than ' ' are automatically converted to
regexen in the "obvious" (or "simplest"?[*]) way.

That's correct.

[*] It would be more obvious IMHO to convert $string to /\Q$string/.

That's not going to happen, since that would break a LOT of existing
code.

Some time ago I proposed to emit a warning when the first argument to
split is a plain string, but there are good arguments against it as
well. (and moreover I haven't produced a good patch for it either

Matt Garrish · Jan 5, 2004

Uri Guttman said:
The pattern "/PATTERN/" may be replaced with an expression to
specify patterns that vary at runtime. (To do runtime
compilation only once, use "/$variable/o".)

Which makes the issue as clear as mud, as the saying goes. The issue isn't
what Uri understands, but what someone trying to understand how the function
works for the first time would understand. The first sentence in the above
paragraph, for example, is a horrible abuse of the English language (how
much wood could a woodchuck chuck...). It's not a question of whether you
can eventually decipher the usage from the information but one of clarity,
and the explanation for split is sorely lacking on that particular aspect of
its usage (IMO).

Matt

Michele Dondi · Jan 6, 2004

You haven't read far enough. *All* the examples from "perldoc -f split"
show a pattern as the first argument.

I think *you* have not read far enough. The following fragment shows
that indeed *not* all examples from 'perldoc -f split' "show a pattern
as the first argument":

As a special case, specifying a PATTERN of space (' ') will
split on white space just as "split" with no arguments does.
Thus, "split(' ')" can be used to emulate awk's default
behavior, whereas "split(/ /)" will give you as many null
initial fields as there are leading spaces.

Said this, I cannot see any mention of *other* "non-pattern-split()s",
and I guess that no pre-defined behaviour is documented. However it
seems to be perfectly legal since -wMstrict doesn't complain; OTOH

perl -MO=Deparse -le 'print split ".", "a.b.c"'

yields:

print split(/./, 'a.b.c', 0);
-e syntax OK

so it seems that stings other than ' ' are automatically converted to
regexen in the "obvious" (or "simplest"?[*]) way.

[*] It would be more obvious IMHO to convert $string to /\Q$string/.

Michele

Anno Siegel · Jan 6, 2004

Matt Garrish said:
Which makes the issue as clear as mud, as the saying goes. The issue isn't
what Uri understands, but what someone trying to understand how the function
works for the first time would understand. The first sentence in the above
paragraph, for example, is a horrible abuse of the English language (how
much wood could a woodchuck chuck...). It's not a question of whether you
can eventually decipher the usage from the information but one of clarity,
and the explanation for split is sorely lacking on that particular aspect of
its usage (IMO).

Patches are welcome, bitching less so.

Anno

Eric Schwartz · Jan 6, 2004

Patches are welcome, bitching less so.

A few minutes thought produced:

--- perlfunc.pod 2004-01-06 11:10:45.000000000 -0700
+++ /usr/share/perl/5.8.2/pod/perlfunc.pod 2003-11-15 01:07:45.000000000 -0700
@@ -4927,10 +4927,9 @@
$header =~ s/\n\s+/ /g; # fix continuation lines
%hdrs = (UNIX_FROM => split /^(\S*?):\s*/m, $header);

-To specify patterns that vary at runtime, replace /PATTERN/ with an
-expression that returns the pattern to split on. (Use C</$variable/o>
-to compile the pattern only once-- see L<perlretut> for detail on the /o
-option.)
+The pattern C</PATTERN/> may be replaced with an expression to specify
+patterns that vary at runtime. (To do runtime compilation only once,
+use C</$variable/o>.)

As a special case, specifying a PATTERN of space (S<C<' '>>) will split on
white space just as C<split> with no arguments does. Thus, S<C<split(' ')>> can

Comments welcome.

-=Eric

Ben Morrow · Jan 6, 2004

Eric Schwartz said:
--- perlfunc.pod 2004-01-06 11:10:45.000000000 -0700
+++ /usr/share/perl/5.8.2/pod/perlfunc.pod 2003-11-15 01:07:45.000000000 -0700
@@ -4927,10 +4927,9 @@
$header =~ s/\n\s+/ /g; # fix continuation lines
%hdrs = (UNIX_FROM => split /^(\S*?):\s*/m, $header);

-To specify patterns that vary at runtime, replace /PATTERN/ with an
-expression that returns the pattern to split on. (Use C</$variable/o>
-to compile the pattern only once-- see L<perlretut> for detail on the /o
-option.)
+The pattern C</PATTERN/> may be replaced with an expression to specify
+patterns that vary at runtime. (To do runtime compilation only once,
+use C</$variable/o>.)

The pattern may be varied at runtime by using an expression instead of
a literal C</PATTERN/>. (If the result of that expression will not in

fact vary said:
As a special case, specifying a PATTERN of space (S<C<' '>>) will split on
white space just as C<split> with no arguments does. Thus,
S<C<split(' ')>> can

Ben

Eric Schwartz · Jan 6, 2004

Ben Morrow said:
The pattern may be varied at runtime by using an expression instead of
a literal C</PATTERN/>. (If the result of that expression will not in
fact vary, use C</o>: see L<perlretut> for details.)

I abhor passive voice in documentation, but generally I like your
version better. When tightend up a bit, it's more succinct and more
expressive. I also think it's best to explicitly specify that /o
applies to a regex (yes, it's redundant, but a little redundancy isn't
bad):

To vary the pattern at runtime, use an expression instead of a literal
C</PATTERN/>. (If the result of that expression will not in fact vary,
use C</EXPRESSION/o>: see L<perlretut> for details.)

-=Eric

Ben Morrow · Jan 6, 2004

Eric Schwartz said:
To vary the pattern at runtime, use an expression instead of a literal
C</PATTERN/>. (If the result of that expression will not in fact vary,
use C</EXPRESSION/o>: see L<perlretut> for details.)

Yup, except that the usual term in perl documentation is EXPR rather
than EXPRESSION.

Ben

Walter Roberson · Jan 6, 2004

:To vary the pattern at runtime, use an expression instead of a literal
:C</PATTERN/>. (If the result of that expression will not in fact vary,
:use C</EXPRESSION/o>: see L<perlretut> for details.)

I know you are trying to make things concise, but I would have to
say that I find that parenthetical expression to miss an important
nuance. The way it is written, someone who did not already know
how /o worked might read it as indicating that in order to use /o
that the result of the expression had to be the same every time --
that the expression components were invarients relative to that
section of code. But that is, of course, not how /o works: it
evaluates the expression the first time the expression is encountered
and "locks in" that result, even if the result of the expression
does in fact vary.

Sorry, I do not have alternative wording to offer at the moment.

Eric Schwartz · Jan 6, 2004

:To vary the pattern at runtime, use an expression instead of a literal
:C</PATTERN/>. (If the result of that expression will not in fact vary,
:use C</EXPRESSION/o>: see L<perlretut> for details.)

I know you are trying to make things concise,

Conciseness is a nice-to-have, but is never a main goal in
documentation to me. It's a side effect of good writing.

but I would have to
say that I find that parenthetical expression to miss an important
nuance. The way it is written, someone who did not already know
how /o worked might read it as indicating that in order to use /o
that the result of the expression had to be the same every time --
that the expression components were invarients relative to that
section of code. But that is, of course, not how /o works: it
evaluates the expression the first time the expression is encountered
and "locks in" that result, even if the result of the expression
does in fact vary.

There's a reason I referred to perlretut there. It's best not to have
two divergent explanations of /o; if someone comes up with a
particularly sexy one for perlretut, then the one in perlsub may well
suffer.

Sorry, I do not have alternative wording to offer at the moment.

One option is to assume that people will read perlretut; I don't know
how likely that is. Normally I'd say "not very", but if they're
already reading doco, then it's not a huge stretch to think they may
read more. Another is to leave out the reference to /o entirely, but
I'm not sure if that's more or less useful.

-=Eric

pkent · Jan 6, 2004

I've found that the code fragment

my @p = split ".",$s;

doesn't behave as I would expect it.

In the docs, perldoc -f split, it says:

split /PATTERN/,EXPR

split() operates with a _regular expression_ pattern, not a simple
string. The '.' is a special characters in regexes, so you need to
escape it:

my @p = split /\./, $s;

P

Matt Garrish · Jan 7, 2004

Michele Dondi said:
Then the current documentation, even if *not* incorrect or misleading
(as someone else suggested), could be updated to mention that
(generic) plain strings can be used and explain how they will be
treated.

I'll assume that someone is me, and just say that I never meant to imply
that the current documentation is incorrect or misleading. As you've noted,
and as was my point in replying, I think it could be made clearer on this
issue.

Matt

Michele Dondi · Jan 8, 2004

The paragraph about ' ' and awk compability looks more to me like an
explanation than an example.

Hmmm... I agree with you that it *is* more of an explanation than an
example. Indeed, being an explanation, it *yields* an example.
Precisely, an example of the usage of split() with *a* (particular!)
string as a first argument instead of a pattern.

The original claim, quoted hereafter, was:

You haven't read far enough. *All* the examples from "perldoc -f split"
show a pattern as the first argument.

This plainly doesn't seem true to me.

I wrote that comment because in the beginning I had thought myself
that the OP's article was more naive than in fact it was.

Michele

empty first element after split	7	Jul 11, 2008
where are isinstance types documented?	12	Sep 26, 2006
Newbie question on split, and also awk.	4	May 15, 2006
empty leading field from split()	1	Nov 2, 2006
split inconsistency- why?	24	Aug 9, 2004
Numerically sort a file on a given column where column is a $var	4	Jul 17, 2008
Split entries from LDAP	2	Oct 11, 2008
Do people think this is logical behavior from the string split method?	15	Oct 23, 2008

Where is "split on '.'" documented?

Tim Shoppa

Marc Bissonnette

Web Surfer

John W. Krahn

Ben Morrow

Uri Guttman

Matt Garrish

Rafael Garcia-Suarez

Matt Garrish

Michele Dondi

Anno Siegel

Eric Schwartz

Ben Morrow

Eric Schwartz

Ben Morrow

Walter Roberson

Eric Schwartz

pkent

Matt Garrish

Michele Dondi

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads