[bug] String#split returns extra empty string

S

Simon Strandgaard

While extending my own regexp-engine with a split method,
I discovered something odd about Ruby's split.

irb(main):001:0> 'ab1ab'.split(/\D+/)
=> ["", "1"]

Its asymmetric, it has a special case for eliminating
the last empty string.. but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> ["1"]
 
S

Simon Strandgaard

Simon said:
While extending my own regexp-engine with a split method,
I discovered something odd about Ruby's split.

irb(main):001:0> 'ab1ab'.split(/\D+/)
=> ["", "1"]

Its asymmetric, it has a special case for eliminating
the last empty string.. but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> ["1"]

[10 minutes of experimenting later]
I wasn't aware that Ruby inserts subcaptures this way.

irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
=> ["", "ab", "2cd3"]

Because of subcapture insertion, it make sense to keep the
first empty string.

I withdraw this bug-report.
 
R

Robert Klemme

Simon Strandgaard said:
Simon said:
While extending my own regexp-engine with a split method,
I discovered something odd about Ruby's split.

irb(main):001:0> 'ab1ab'.split(/\D+/)
=> ["", "1"]

Its asymmetric, it has a special case for eliminating
the last empty string.. but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> ["1"]

[10 minutes of experimenting later]
I wasn't aware that Ruby inserts subcaptures this way.

irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
=> ["", "ab", "2cd3"]

Because of subcapture insertion, it make sense to keep the
first empty string.

I withdraw this bug-report.

But what about:
=> []

You would at least expect one empty string in the result since there is at
least one separator. This strikes me as odd.

robert
 
S

Simon Strandgaard

Robert Klemme said:
Simon Strandgaard said:
Simon said:
While extending my own regexp-engine with a split method,
I discovered something odd about Ruby's split.

irb(main):001:0> 'ab1ab'.split(/\D+/)
=> ["", "1"]

Its asymmetric, it has a special case for eliminating
the last empty string.. but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> ["1"]

[10 minutes of experimenting later]
I wasn't aware that Ruby inserts subcaptures this way.

irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
=> ["", "ab", "2cd3"]

Because of subcapture insertion, it make sense to keep the
first empty string.

I withdraw this bug-report.

But what about:
=> []

You would at least expect one empty string in the result since there is at
least one separator. This strikes me as odd.

Guy Decoux very recently explained that to me.

When split has no limit, it wipes empty strings.

In your case you would have expected it to output [""].. but
because its an empty-string in the tail.. it gets wiped.

def split(pattern, limit=0)
...
unless limit # lets wipe tailing elements which are empty
result.pop while result.size > 0 and result.last.empty?
end
result
end
 
R

Robert Klemme

Simon Strandgaard said:
Robert Klemme said:
Simon Strandgaard said:
Simon Strandgaard wrote:
While extending my own regexp-engine with a split method,
I discovered something odd about Ruby's split.

irb(main):001:0> 'ab1ab'.split(/\D+/)
=> ["", "1"]

Its asymmetric, it has a special case for eliminating
the last empty string.. but apparently not the first empty string.

I would have expected above to be symmetric, and output:
=> ["1"]


[10 minutes of experimenting later]
I wasn't aware that Ruby inserts subcaptures this way.

irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
=> ["", "ab", "2cd3"]

Because of subcapture insertion, it make sense to keep the
first empty string.

I withdraw this bug-report.

But what about:
'ab'.split(/\D+/)
=> []

You would at least expect one empty string in the result since there is at
least one separator. This strikes me as odd.

Guy Decoux very recently explained that to me.

When split has no limit, it wipes empty strings.

In your case you would have expected it to output [""].. but
because its an empty-string in the tail.. it gets wiped.

def split(pattern, limit=0)
...
unless limit # lets wipe tailing elements which are empty
result.pop while result.size > 0 and result.last.empty?
end
result
end

But I though it will strip trailing empty strings - what about the leading
empty string in my example? I'd expect that to be preserved.

Hm...

robert
 
S

Simon Strandgaard

Robert said:
But I though it will strip trailing empty strings - what about the leading
empty string in my example? I'd expect that to be preserved.

Let take another example both with leading and tailing empty strings.

irb(main):005:0> '34ab34'.split(/\d+/, 10)
=> ["", "ab", ""]
irb(main):006:0> '34ab34'.split(/\d+/)
=> ["", "ab"]


When no limit are specified, Ruby wipes the tailing empty strings,
until it reaches a non-empty string.


In your case there are zero non-empty strings.. so Ruby wipes everything.

irb(main):001:0> 'ab'.split(/\D+/)
=> []
irb(main):002:0> 'ab'.split(/\D+/, 10)
=> ["", ""]


FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?
 
D

David Alan Black

Hi --

Simon Strandgaard said:
FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?

Maybe a case like:

irb(main):006:0> "one two three ".split(" ")
=> ["one", "two", "three"]

(though there you don't need an argument to split at all I guess) or
something like:

irb(main):016:0> "one!two!three!".split("!")
=> ["one", "two", "three"]


David
 
F

Florian Gross

David said:
Hi --
Moin!
FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?

Maybe a case like:

irb(main):006:0> "one two three ".split(" ")
=> ["one", "two", "three"]

(though there you don't need an argument to split at all I guess) or
something like:

irb(main):016:0> "one!two!three!".split("!")
=> ["one", "two", "three"]

Hm, I think that it causes more trouble than it's worth. It's very easy
to remove empty elements anyway:

"one!two!three!".split("!").reject { |item| item.empty? }

Maybe it would be better to create a reject_at_end/at_start or something
similar?

Regards,
Florian Gross
 
D

David A. Black

Hi --

David said:
Hi --
Moin!
FYI: I have no idea when this wiping empty tail elements are useful.
Any ideas ?

Maybe a case like:

irb(main):006:0> "one two three ".split(" ")
=> ["one", "two", "three"]

(though there you don't need an argument to split at all I guess) or
something like:

irb(main):016:0> "one!two!three!".split("!")
=> ["one", "two", "three"]

Hm, I think that it causes more trouble than it's worth.

I'm not sure what you mean; what trouble does it cause?
It's very easy to remove empty elements anyway:

"one!two!three!".split("!").reject { |item| item.empty? }

It's even easier than that :)

"one!two!three!".split("!").grep(/\S/)

though I'm still not sure what's undesireable about having split do
different things.
Maybe it would be better to create a reject_at_end/at_start or something
similar?

That seems like an awfully specific case for a whole separate method.
(I admit, though, that I'm somewhat conservative about proliferation
of methods :)


David
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top