[bug] String#split returns extra empty string

Simon Strandgaard · May 31, 2004

While extending my own regexp-engine with a split method,
I discovered something odd about Ruby's split.

irb(main):001:0> 'ab1ab'.split(/\D+/)
=> ["", "1"]

Its asymmetric, it has a special case for eliminating
the last empty string.. but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> ["1"]

Simon Strandgaard · May 31, 2004

Simon said:
While extending my own regexp-engine with a split method,
I discovered something odd about Ruby's split.

irb(main):001:0> 'ab1ab'.split(/\D+/)
=> ["", "1"]

Its asymmetric, it has a special case for eliminating
the last empty string.. but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> ["1"]

[10 minutes of experimenting later]
I wasn't aware that Ruby inserts subcaptures this way.

irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
=> ["", "ab", "2cd3"]

Because of subcapture insertion, it make sense to keep the
first empty string.

I withdraw this bug-report.

Robert Klemme · May 31, 2004

Simon Strandgaard said:
Simon said:

While extending my own regexp-engine with a split method,
I discovered something odd about Ruby's split.

irb(main):001:0> 'ab1ab'.split(/\D+/)
=> ["", "1"]

Its asymmetric, it has a special case for eliminating
the last empty string.. but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> ["1"]

Click to expand...

[10 minutes of experimenting later]
I wasn't aware that Ruby inserts subcaptures this way.

irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
=> ["", "ab", "2cd3"]

Because of subcapture insertion, it make sense to keep the
first empty string.

I withdraw this bug-report.

But what about:
=> []

You would at least expect one empty string in the result since there is at
least one separator. This strikes me as odd.

robert

Simon Strandgaard · May 31, 2004

Robert Klemme said:
Simon Strandgaard said:

Simon said:

While extending my own regexp-engine with a split method,
I discovered something odd about Ruby's split.

irb(main):001:0> 'ab1ab'.split(/\D+/)
=> ["", "1"]

Its asymmetric, it has a special case for eliminating
the last empty string.. but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> ["1"]

Click to expand...

[10 minutes of experimenting later]
I wasn't aware that Ruby inserts subcaptures this way.

irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
=> ["", "ab", "2cd3"]

Because of subcapture insertion, it make sense to keep the
first empty string.

I withdraw this bug-report.

Click to expand...

But what about:
=> []

You would at least expect one empty string in the result since there is at
least one separator. This strikes me as odd.

Guy Decoux very recently explained that to me.

When split has no limit, it wipes empty strings.

In your case you would have expected it to output [""].. but
because its an empty-string in the tail.. it gets wiped.

def split(pattern, limit=0)
...
unless limit # lets wipe tailing elements which are empty
result.pop while result.size > 0 and result.last.empty?
end
result
end

Robert Klemme · May 31, 2004

Simon Strandgaard said:
Robert Klemme said:

Simon Strandgaard said:

Simon Strandgaard wrote:
While extending my own regexp-engine with a split method,
I discovered something odd about Ruby's split.

irb(main):001:0> 'ab1ab'.split(/\D+/)
=> ["", "1"]

Its asymmetric, it has a special case for eliminating
the last empty string.. but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> ["1"]

[10 minutes of experimenting later]
I wasn't aware that Ruby inserts subcaptures this way.

irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
=> ["", "ab", "2cd3"]

Because of subcapture insertion, it make sense to keep the
first empty string.

I withdraw this bug-report.

Click to expand...

But what about:

'ab'.split(/\D+/)

Click to expand...

=> []

You would at least expect one empty string in the result since there is at
least one separator. This strikes me as odd.

Click to expand...

Guy Decoux very recently explained that to me.

When split has no limit, it wipes empty strings.

In your case you would have expected it to output [""].. but
because its an empty-string in the tail.. it gets wiped.

def split(pattern, limit=0)
...
unless limit # lets wipe tailing elements which are empty
result.pop while result.size > 0 and result.last.empty?
end
result
end

But I though it will strip trailing empty strings - what about the leading
empty string in my example? I'd expect that to be preserved.

Hm...

robert

Simon Strandgaard · May 31, 2004

Robert said:
But I though it will strip trailing empty strings - what about the leading
empty string in my example? I'd expect that to be preserved.

Let take another example both with leading and tailing empty strings.

irb(main):005:0> '34ab34'.split(/\d+/, 10)
=> ["", "ab", ""]
irb(main):006:0> '34ab34'.split(/\d+/)
=> ["", "ab"]

When no limit are specified, Ruby wipes the tailing empty strings,
until it reaches a non-empty string.

In your case there are zero non-empty strings.. so Ruby wipes everything.

irb(main):001:0> 'ab'.split(/\D+/)
=> []
irb(main):002:0> 'ab'.split(/\D+/, 10)
=> ["", ""]

FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?

David Alan Black · Jun 1, 2004

Hi --

Simon Strandgaard said:
FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?

Maybe a case like:

irb(main):006:0> "one two three ".split(" ")
=> ["one", "two", "three"]

(though there you don't need an argument to split at all I guess) or
something like:

irb(main):016:0> "one!two!three!".split("!")
=> ["one", "two", "three"]

David

Florian Gross · Jun 1, 2004

David said:
Hi --
Moin!

FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?

Click to expand...

Maybe a case like:

irb(main):006:0> "one two three ".split(" ")
=> ["one", "two", "three"]

(though there you don't need an argument to split at all I guess) or
something like:

irb(main):016:0> "one!two!three!".split("!")
=> ["one", "two", "three"]

Hm, I think that it causes more trouble than it's worth. It's very easy
to remove empty elements anyway:

"one!two!three!".split("!").reject { |item| item.empty? }

Maybe it would be better to create a reject_at_end/at_start or something
similar?

Regards,
Florian Gross

David A. Black · Jun 1, 2004

Hi --

David said:
David said:

Hi --
Moin!

FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?

Click to expand...

Maybe a case like:

irb(main):006:0> "one two three ".split(" ")
=> ["one", "two", "three"]

(though there you don't need an argument to split at all I guess) or
something like:

irb(main):016:0> "one!two!three!".split("!")
=> ["one", "two", "three"]

Click to expand...

Hm, I think that it causes more trouble than it's worth.

I'm not sure what you mean; what trouble does it cause?

It's very easy to remove empty elements anyway:

"one!two!three!".split("!").reject { |item| item.empty? }

It's even easier than that

"one!two!three!".split("!").grep(/\S/)

though I'm still not sure what's undesireable about having split do
different things.

Maybe it would be better to create a reject_at_end/at_start or something
similar?

That seems like an awfully specific case for a whole separate method.
(I admit, though, that I'm somewhat conservative about proliferation
of methods

David

[bug] String#split wipes result	2	May 31, 2004
String#split regex \W on non-ASCII text	1	Nov 9, 2010
[rcr] String#split behaves odd	3	Dec 6, 2004
java String split returns an additional first empty string	6	Oct 16, 2003
Bug in String's split method???	6	Jun 27, 2005
Strange bug in irb1.9	7	Mar 24, 2009
Writing a "String#jindex" method to do the same for "index" as "String#jlength" does for "length"	1	Feb 25, 2007
TZ bug in Date and DateTime.strftime formatter?	6	Mar 31, 2007

[bug] String#split returns extra empty string

Simon Strandgaard

Simon Strandgaard

Robert Klemme

Simon Strandgaard

Robert Klemme

Simon Strandgaard

David Alan Black

Florian Gross

David A. Black

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads