String#split(' ') and whitespace (perl user's surprise)

Discussion in 'Ruby' started by Mike Stok, Jun 26, 2003.

  1. Mike Stok

    Mike Stok Guest

    I have to confess that I use a lot of Perl, and some of its idioms are
    deeply embedded in my mind.

    Im the course of parsing some data in Ruby I used a fragment of code
    like

    rules.each_line do |rule|
    sku, price, special = rule.chomp.split(' ', 3)
    # [...]
    end

    where rules was composed of lines of the form

    A 50 3 for 130

    coming from a perl background I would have expected 'A', '50' and '3 for 130'
    to have been assigned to sku, price and special. As it turned out
    special had leading whitespace.

    Maybe this is the difference Hal alludes to in From Perl to Ruby in his
    book The Ruby Way: "Also, note that split also behaves slightly
    differently."

    Is this intentional behaviour? I think that the Perl behaviour is more
    useful (that is trimming the leading whitespace off the limit-th element
    returned.)

    In the perl debugger:

    DB<1> $s = ' A 50 3 for 130'

    DB<2> @l = split(' ', $s, 3)

    DB<3> x @l
    0 'A'
    1 50
    2 '3 for 130'

    In irb:
    => ["A", "50", " 3 for 130"]

    What I'd like Ruby to do
    => ["A", "50", "3 for 130"]

    Mike
     
    Mike Stok, Jun 26, 2003
    #1
    1. Advertisements

  2. Mike Stok

    ts Guest

    W> This seems like a ruby bug: limit seems to be counting whitespace
    W> characters and not whitespace "runs" when ' ' is given as the pattern.

    W> But maybe it's a documentation bug. Who knows?

    Well, to see what it do

    svg% ruby -e 'p "a b".split(" ", 2); p "a b".split(" ", 2);'
    ["a", "b"]
    ["a", " b"]
    svg%


    it remove just the first space when it has a limit parameter


    Guy Decoux
     
    ts, Jun 26, 2003
    #2
    1. Advertisements

  3. Mike Stok

    Mike Stok Guest

    Thanks. A minor nit is that this will destroy trailing spaces on the
    third field (not relevant in this case, and maybe useful in many cases.
    :)

    I think that my real point was that if there's a special case for
    splitting a string on ' ' then it should behave the same way as Perl's
    special case.

    Mike
     
    Mike Stok, Jun 26, 2003
    #3
  4. Mike Stok

    nobu.nokada Guest

    Hi,

    At Thu, 26 Jun 2003 22:45:37 +0900,
    Yes, to me.


    Index: string.c
    ===================================================================
    RCS file: /cvs/ruby/src/ruby/string.c,v
    retrieving revision 1.161
    diff -u -2 -p -r1.161 string.c
    --- string.c 26 Jun 2003 18:24:58 -0000 1.161
    +++ string.c 26 Jun 2003 18:26:27 -0000
    @@ -2582,4 +2582,5 @@ rb_str_split_m(argc, argv, str)
    end = beg+1;
    skip = 0;
    + if (!NIL_P(limit) && lim <= i) break;
    }
    }
    @@ -2589,5 +2590,5 @@ rb_str_split_m(argc, argv, str)
    skip = 1;
    beg = end + 1;
    - if (!NIL_P(limit) && lim <= ++i) break;
    + if (!NIL_P(limit)) ++i;
    }
    else {
     
    nobu.nokada, Jun 26, 2003
    #4
  5. In message "Re: String#split(' ') and whitespace (perl user's surprise)"

    |At Thu, 26 Jun 2003 22:45:37 +0900,
    |Michael Campbell wrote:
    |> Does that sound like a bug to you?
    |
    |Yes, to me.

    To me, as well. Commit the fix, please.

    matz.
     
    Yukihiro Matsumoto, Jun 27, 2003
    #5
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.