[bug] String#split returns extra empty string

Discussion in 'Ruby' started by Simon Strandgaard, May 31, 2004.

  1. While extending my own regexp-engine with a split method,
    I discovered something odd about Ruby's split.

    irb(main):001:0> 'ab1ab'.split(/\D+/)
    => ["", "1"]

    Its asymmetric, it has a special case for eliminating
    the last empty string.. but apparently not the first empty string.

    I would have expected above to be symmetric, and output:
    => ["1"]

    --
    Simon Strandgaard
     
    Simon Strandgaard, May 31, 2004
    #1
    1. Advertising

  2. Simon Strandgaard wrote:
    > While extending my own regexp-engine with a split method,
    > I discovered something odd about Ruby's split.
    >
    > irb(main):001:0> 'ab1ab'.split(/\D+/)
    > => ["", "1"]
    >
    > Its asymmetric, it has a special case for eliminating
    > the last empty string.. but apparently not the first empty string.
    >
    > I would have expected above to be symmetric, and output:
    > => ["1"]
    >


    [10 minutes of experimenting later]
    I wasn't aware that Ruby inserts subcaptures this way.

    irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
    => ["", "ab", "2cd3"]

    Because of subcapture insertion, it make sense to keep the
    first empty string.

    I withdraw this bug-report.

    --
    Simon Strandgaard
     
    Simon Strandgaard, May 31, 2004
    #2
    1. Advertising

  3. "Simon Strandgaard" <> schrieb im Newsbeitrag
    news:...
    > Simon Strandgaard wrote:
    > > While extending my own regexp-engine with a split method,
    > > I discovered something odd about Ruby's split.
    > >
    > > irb(main):001:0> 'ab1ab'.split(/\D+/)
    > > => ["", "1"]
    > >
    > > Its asymmetric, it has a special case for eliminating
    > > the last empty string.. but apparently not the first empty string.
    > >
    > > I would have expected above to be symmetric, and output:
    > > => ["1"]
    > >

    >
    > [10 minutes of experimenting later]
    > I wasn't aware that Ruby inserts subcaptures this way.
    >
    > irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
    > => ["", "ab", "2cd3"]
    >
    > Because of subcapture insertion, it make sense to keep the
    > first empty string.
    >
    > I withdraw this bug-report.


    But what about:

    >> 'ab'.split(/\D+/)

    => []

    You would at least expect one empty string in the result since there is at
    least one separator. This strikes me as odd.

    robert
     
    Robert Klemme, May 31, 2004
    #3
  4. "Robert Klemme" <> wrote:
    > "Simon Strandgaard" <> schrieb im Newsbeitrag
    > news:...
    > > Simon Strandgaard wrote:
    > > > While extending my own regexp-engine with a split method,
    > > > I discovered something odd about Ruby's split.
    > > >
    > > > irb(main):001:0> 'ab1ab'.split(/\D+/)
    > > > => ["", "1"]
    > > >
    > > > Its asymmetric, it has a special case for eliminating
    > > > the last empty string.. but apparently not the first empty string.
    > > >
    > > > I would have expected above to be symmetric, and output:
    > > > => ["1"]
    > > >

    > >
    > > [10 minutes of experimenting later]
    > > I wasn't aware that Ruby inserts subcaptures this way.
    > >
    > > irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
    > > => ["", "ab", "2cd3"]
    > >
    > > Because of subcapture insertion, it make sense to keep the
    > > first empty string.
    > >
    > > I withdraw this bug-report.

    >
    > But what about:
    >
    > >> 'ab'.split(/\D+/)

    > => []
    >
    > You would at least expect one empty string in the result since there is at
    > least one separator. This strikes me as odd.
    >


    Guy Decoux very recently explained that to me.

    When split has no limit, it wipes empty strings.

    In your case you would have expected it to output [""].. but
    because its an empty-string in the tail.. it gets wiped.

    def split(pattern, limit=0)
    ...
    unless limit # lets wipe tailing elements which are empty
    result.pop while result.size > 0 and result.last.empty?
    end
    result
    end

    --
    Simon Strandgaard
     
    Simon Strandgaard, May 31, 2004
    #4
  5. "Simon Strandgaard" <> schrieb im Newsbeitrag
    news:...
    > "Robert Klemme" <> wrote:
    > > "Simon Strandgaard" <> schrieb im Newsbeitrag
    > > news:...
    > > > Simon Strandgaard wrote:
    > > > > While extending my own regexp-engine with a split method,
    > > > > I discovered something odd about Ruby's split.
    > > > >
    > > > > irb(main):001:0> 'ab1ab'.split(/\D+/)
    > > > > => ["", "1"]
    > > > >
    > > > > Its asymmetric, it has a special case for eliminating
    > > > > the last empty string.. but apparently not the first empty string.
    > > > >
    > > > > I would have expected above to be symmetric, and output:
    > > > > => ["1"]
    > > > >
    > > >
    > > > [10 minutes of experimenting later]
    > > > I wasn't aware that Ruby inserts subcaptures this way.
    > > >
    > > > irb(main):001:0> "ab2cd3".split(/(\D+)/, 2)
    > > > => ["", "ab", "2cd3"]
    > > >
    > > > Because of subcapture insertion, it make sense to keep the
    > > > first empty string.
    > > >
    > > > I withdraw this bug-report.

    > >
    > > But what about:
    > >
    > > >> 'ab'.split(/\D+/)

    > > => []
    > >
    > > You would at least expect one empty string in the result since there is

    at
    > > least one separator. This strikes me as odd.
    > >

    >
    > Guy Decoux very recently explained that to me.
    >
    > When split has no limit, it wipes empty strings.
    >
    > In your case you would have expected it to output [""].. but
    > because its an empty-string in the tail.. it gets wiped.
    >
    > def split(pattern, limit=0)
    > ...
    > unless limit # lets wipe tailing elements which are empty
    > result.pop while result.size > 0 and result.last.empty?
    > end
    > result
    > end


    But I though it will strip trailing empty strings - what about the leading
    empty string in my example? I'd expect that to be preserved.

    Hm...

    robert
     
    Robert Klemme, May 31, 2004
    #5
  6. Robert Klemme wrote:
    > But I though it will strip trailing empty strings - what about the leading
    > empty string in my example? I'd expect that to be preserved.
    >


    Let take another example both with leading and tailing empty strings.

    irb(main):005:0> '34ab34'.split(/\d+/, 10)
    => ["", "ab", ""]
    irb(main):006:0> '34ab34'.split(/\d+/)
    => ["", "ab"]


    When no limit are specified, Ruby wipes the tailing empty strings,
    until it reaches a non-empty string.


    In your case there are zero non-empty strings.. so Ruby wipes everything.

    irb(main):001:0> 'ab'.split(/\D+/)
    => []
    irb(main):002:0> 'ab'.split(/\D+/, 10)
    => ["", ""]


    FYI: I have no idea when this wiping empty tail elements are useful.
    Any ideas ?

    --
    Simon Strandgaard
     
    Simon Strandgaard, May 31, 2004
    #6
  7. Hi --

    Simon Strandgaard <> writes:

    > FYI: I have no idea when this wiping empty tail elements are useful.
    > Any ideas ?


    Maybe a case like:

    irb(main):006:0> "one two three ".split(" ")
    => ["one", "two", "three"]

    (though there you don't need an argument to split at all I guess) or
    something like:

    irb(main):016:0> "one!two!three!".split("!")
    => ["one", "two", "three"]


    David

    --
    David A. Black
     
    David Alan Black, Jun 1, 2004
    #7
  8. David Alan Black wrote:
    > Hi --


    Moin!

    >>FYI: I have no idea when this wiping empty tail elements are useful.
    >>Any ideas ?

    >
    > Maybe a case like:
    >
    > irb(main):006:0> "one two three ".split(" ")
    > => ["one", "two", "three"]
    >
    > (though there you don't need an argument to split at all I guess) or
    > something like:
    >
    > irb(main):016:0> "one!two!three!".split("!")
    > => ["one", "two", "three"]


    Hm, I think that it causes more trouble than it's worth. It's very easy
    to remove empty elements anyway:

    "one!two!three!".split("!").reject { |item| item.empty? }

    Maybe it would be better to create a reject_at_end/at_start or something
    similar?

    Regards,
    Florian Gross
     
    Florian Gross, Jun 1, 2004
    #8
  9. Hi --

    On Tue, 1 Jun 2004, Florian Gross wrote:

    > David Alan Black wrote:
    > > Hi --

    >
    > Moin!
    >
    > >>FYI: I have no idea when this wiping empty tail elements are useful.
    > >>Any ideas ?

    > >
    > > Maybe a case like:
    > >
    > > irb(main):006:0> "one two three ".split(" ")
    > > => ["one", "two", "three"]
    > >
    > > (though there you don't need an argument to split at all I guess) or
    > > something like:
    > >
    > > irb(main):016:0> "one!two!three!".split("!")
    > > => ["one", "two", "three"]

    >
    > Hm, I think that it causes more trouble than it's worth.


    I'm not sure what you mean; what trouble does it cause?

    > It's very easy to remove empty elements anyway:
    >
    > "one!two!three!".split("!").reject { |item| item.empty? }


    It's even easier than that :)

    "one!two!three!".split("!").grep(/\S/)

    though I'm still not sure what's undesireable about having split do
    different things.

    > Maybe it would be better to create a reject_at_end/at_start or something
    > similar?


    That seems like an awfully specific case for a whole separate method.
    (I admit, though, that I'm somewhat conservative about proliferation
    of methods :)


    David

    --
    David A. Black
     
    David A. Black, Jun 1, 2004
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hanif
    Replies:
    6
    Views:
    17,877
    Paul Lutus
    Oct 17, 2003
  2. -
    Replies:
    2
    Views:
    1,229
    Nigel Wade
    Feb 9, 2005
  3. mathieu
    Replies:
    3
    Views:
    609
    Bo Persson
    Sep 4, 2009
  4. Sam Kong
    Replies:
    5
    Views:
    255
    Rick DeNatale
    Aug 12, 2006
  5. Stanley Xu
    Replies:
    2
    Views:
    638
    Stanley Xu
    Mar 23, 2011
Loading...

Share This Page