String#split(' ') and whitespace (perl user's surprise)

Discussion in 'Ruby' started by Mike Stok, Jun 26, 2003.

  1. Mike Stok

    Mike Stok Guest

    I have to confess that I use a lot of Perl, and some of its idioms are
    deeply embedded in my mind.

    Im the course of parsing some data in Ruby I used a fragment of code
    like

    rules.each_line do |rule|
    sku, price, special = rule.chomp.split(' ', 3)
    # [...]
    end

    where rules was composed of lines of the form

    A 50 3 for 130

    coming from a perl background I would have expected 'A', '50' and '3 for 130'
    to have been assigned to sku, price and special. As it turned out
    special had leading whitespace.

    Maybe this is the difference Hal alludes to in From Perl to Ruby in his
    book The Ruby Way: "Also, note that split also behaves slightly
    differently."

    Is this intentional behaviour? I think that the Perl behaviour is more
    useful (that is trimming the leading whitespace off the limit-th element
    returned.)

    In the perl debugger:

    DB<1> $s = ' A 50 3 for 130'

    DB<2> @l = split(' ', $s, 3)

    DB<3> x @l
    0 'A'
    1 50
    2 '3 for 130'

    In irb:

    >> ' A 50 3 for 130'.split(' ', 3)

    => ["A", "50", " 3 for 130"]

    What I'd like Ruby to do

    >> ' A 50 3 for 130'.split(' ', 3)

    => ["A", "50", "3 for 130"]

    Mike
    --
    | The "`Stok' disclaimers" apply.
    http://www.stok.co.uk/~mike/ | GPG PGP Key 1024D/059913DA
    | Fingerprint 0570 71CD 6790 7C28 3D60
    http://www.exegenix.com/ | 75D2 9EC4 C1C0 0599 13DA
     
    Mike Stok, Jun 26, 2003
    #1
    1. Advertising

  2. Mike Stok

    ts Guest

    >>>>> "W" == Wesley J Landaker <> writes:

    W> This seems like a ruby bug: limit seems to be counting whitespace
    W> characters and not whitespace "runs" when ' ' is given as the pattern.

    W> But maybe it's a documentation bug. Who knows?

    Well, to see what it do

    svg% ruby -e 'p "a b".split(" ", 2); p "a b".split(" ", 2);'
    ["a", "b"]
    ["a", " b"]
    svg%


    it remove just the first space when it has a limit parameter


    Guy Decoux
     
    ts, Jun 26, 2003
    #2
    1. Advertising

  3. Mike Stok

    Mike Stok Guest

    In article <>,
    Wesley J Landaker <> wrote:
    >On Thursday 26 June 2003 6:14 am, Mike Stok wrote:
    >
    >> In irb:
    >> >> ' A 50 3 for 130'.split(' ', 3)

    >>
    >> => ["A", "50", " 3 for 130"]
    >>
    >> What I'd like Ruby to do
    >>
    >> >> ' A 50 3 for 130'.split(' ', 3)

    >>
    >> => ["A", "50", "3 for 130"]

    >
    >I should have mentioned that as a workaround, you can do this:
    >
    >' A 50 3 for 130'.split(' ', 3).map { |x| x.strip }
    >
    >I agree you shouldn't have to do that, but it will get you going with
    >minimal changes.


    Thanks. A minor nit is that this will destroy trailing spaces on the
    third field (not relevant in this case, and maybe useful in many cases.
    :)

    I think that my real point was that if there's a special case for
    splitting a string on ' ' then it should behave the same way as Perl's
    special case.

    Mike

    --
    | The "`Stok' disclaimers" apply.
    http://www.stok.co.uk/~mike/ | GPG PGP Key 1024D/059913DA
    | Fingerprint 0570 71CD 6790 7C28 3D60
    http://www.exegenix.com/ | 75D2 9EC4 C1C0 0599 13DA
     
    Mike Stok, Jun 26, 2003
    #3
  4. Mike Stok

    Guest

    Hi,

    At Thu, 26 Jun 2003 22:45:37 +0900,
    Michael Campbell wrote:
    > Does that sound like a bug to you?


    Yes, to me.


    Index: string.c
    ===================================================================
    RCS file: /cvs/ruby/src/ruby/string.c,v
    retrieving revision 1.161
    diff -u -2 -p -r1.161 string.c
    --- string.c 26 Jun 2003 18:24:58 -0000 1.161
    +++ string.c 26 Jun 2003 18:26:27 -0000
    @@ -2582,4 +2582,5 @@ rb_str_split_m(argc, argv, str)
    end = beg+1;
    skip = 0;
    + if (!NIL_P(limit) && lim <= i) break;
    }
    }
    @@ -2589,5 +2590,5 @@ rb_str_split_m(argc, argv, str)
    skip = 1;
    beg = end + 1;
    - if (!NIL_P(limit) && lim <= ++i) break;
    + if (!NIL_P(limit)) ++i;
    }
    else {


    --
    Nobu Nakada
     
    , Jun 26, 2003
    #4
  5. In message "Re: String#split(' ') and whitespace (perl user's surprise)"
    on 03/06/27, <> writes:

    |At Thu, 26 Jun 2003 22:45:37 +0900,
    |Michael Campbell wrote:
    |> Does that sound like a bug to you?
    |
    |Yes, to me.

    To me, as well. Commit the fix, please.

    matz.
     
    Yukihiro Matsumoto, Jun 27, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David Pratt

    Whitespace test after string.split

    David Pratt, Nov 26, 2005, in forum: Python
    Replies:
    3
    Views:
    329
    Bengt Richter
    Nov 27, 2005
  2. Replies:
    2
    Views:
    650
  3. Chaim Krause

    split() and string.whitespace

    Chaim Krause, Oct 31, 2008, in forum: Python
    Replies:
    12
    Views:
    2,173
  4. Sam Kong
    Replies:
    5
    Views:
    258
    Rick DeNatale
    Aug 12, 2006
  5. Stanley Xu
    Replies:
    2
    Views:
    648
    Stanley Xu
    Mar 23, 2011
Loading...

Share This Page