String#split(' ') and whitespace (perl user's surprise)


Mike Stok

I have to confess that I use a lot of Perl, and some of its idioms are
deeply embedded in my mind.

Im the course of parsing some data in Ruby I used a fragment of code

rules.each_line do |rule|
sku, price, special = rule.chomp.split(' ', 3)
# [...]

where rules was composed of lines of the form

A 50 3 for 130

coming from a perl background I would have expected 'A', '50' and '3 for 130'
to have been assigned to sku, price and special. As it turned out
special had leading whitespace.

Maybe this is the difference Hal alludes to in From Perl to Ruby in his
book The Ruby Way: "Also, note that split also behaves slightly

Is this intentional behaviour? I think that the Perl behaviour is more
useful (that is trimming the leading whitespace off the limit-th element

In the perl debugger:

DB<1> $s = ' A 50 3 for 130'

DB<2> @l = split(' ', $s, 3)

DB<3> x @l
0 'A'
1 50
2 '3 for 130'

In irb:
=> ["A", "50", " 3 for 130"]

What I'd like Ruby to do
=> ["A", "50", "3 for 130"]





W> This seems like a ruby bug: limit seems to be counting whitespace
W> characters and not whitespace "runs" when ' ' is given as the pattern.

W> But maybe it's a documentation bug. Who knows?

Well, to see what it do

svg% ruby -e 'p "a b".split(" ", 2); p "a b".split(" ", 2);'
["a", "b"]
["a", " b"]

it remove just the first space when it has a limit parameter

Guy Decoux

Mike Stok

In irb:
' A 50 3 for 130'.split(' ', 3)

=> ["A", "50", " 3 for 130"]

What I'd like Ruby to do
' A 50 3 for 130'.split(' ', 3)

=> ["A", "50", "3 for 130"]

I should have mentioned that as a workaround, you can do this:

' A 50 3 for 130'.split(' ', 3).map { |x| x.strip }

I agree you shouldn't have to do that, but it will get you going with
minimal changes.

Thanks. A minor nit is that this will destroy trailing spaces on the
third field (not relevant in this case, and maybe useful in many cases.

I think that my real point was that if there's a special case for
splitting a string on ' ' then it should behave the same way as Perl's
special case.




At Thu, 26 Jun 2003 22:45:37 +0900,
Michael said:
Does that sound like a bug to you?

Yes, to me.

Index: string.c
RCS file: /cvs/ruby/src/ruby/string.c,v
retrieving revision 1.161
diff -u -2 -p -r1.161 string.c
--- string.c 26 Jun 2003 18:24:58 -0000 1.161
+++ string.c 26 Jun 2003 18:26:27 -0000
@@ -2582,4 +2582,5 @@ rb_str_split_m(argc, argv, str)
end = beg+1;
skip = 0;
+ if (!NIL_P(limit) && lim <= i) break;
@@ -2589,5 +2590,5 @@ rb_str_split_m(argc, argv, str)
skip = 1;
beg = end + 1;
- if (!NIL_P(limit) && lim <= ++i) break;
+ if (!NIL_P(limit)) ++i;
else {



Yukihiro Matsumoto

In message "Re: String#split(' ') and whitespace (perl user's surprise)"

|At Thu, 26 Jun 2003 22:45:37 +0900,
|Michael Campbell wrote:
|> Does that sound like a bug to you?
|Yes, to me.

To me, as well. Commit the fix, please.


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question