String#split(/\s+/) vs. String#split(/(\s+)/)

Discussion in 'Ruby' started by Sam Kong, Aug 12, 2006.

  1. Sam Kong

    Sam Kong Guest

    Hello Rubyists,

    I'm reading Ruby Cookbook.
    The first chapter is about String.
    One of the examples shows the differenct between String#split(/\s+/) and
    String#split(/(\s+)/) without much explanation.
    I understand what sub-grouping is in regex.
    Bug I don't understand what role that plays in String#split.

    s = "one two three"

    p s.split(/\s+/) #=> ["one", "two", "three"]
    p s.split(/(\s+)/) #=> ["one", " ", "two", " ", "three"]


    Could anybody explain it, please?

    Thanks,
    Sam

    --
    Posted via http://www.ruby-forum.com/.
    Sam Kong, Aug 12, 2006
    #1
    1. Advertising

  2. Sam Kong

    Guest

    Hi --

    On Sat, 12 Aug 2006, Sam Kong wrote:

    > Hello Rubyists,
    >
    > I'm reading Ruby Cookbook.
    > The first chapter is about String.
    > One of the examples shows the differenct between String#split(/\s+/) and
    > String#split(/(\s+)/) without much explanation.
    > I understand what sub-grouping is in regex.
    > Bug I don't understand what role that plays in String#split.
    >
    > s = "one two three"
    >
    > p s.split(/\s+/) #=> ["one", "two", "three"]
    > p s.split(/(\s+)/) #=> ["one", " ", "two", " ", "three"]
    >
    >
    > Could anybody explain it, please?


    When you use (), you get the delimiter (the thing you're splitting on)
    back in the array, along with the items between the delimiters. An
    example without spaces might make it clearer:

    "aaaXXXbbbXXXccc".split(/XXX/) => ["aaa","bbb","ccc"]
    "aaaXXXbbbXXXccc".split(/(XXX)/) => ["aaa","XXX","bbb","XXX","ccc"]

    In your example, the delimiter is \s+ which is of variable length;
    that's why you get both " " and " " in the final array.


    David

    --
    http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
    ----> SEE SPECIAL DEAL FOR RUBY/RAILS USERS GROUPS! <-----
    http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
    http://www.manning.com/black => book, Ruby for Rails
    http://www.rubycentral.org => Ruby Central, Inc.
    , Aug 12, 2006
    #2
    1. Advertising

  3. Why does using the parentheses cause the separator string/character
    to be placed into the resulting array?

    -ken

    On 11-Aug-06, at 9:03 PM, wrote:

    > Hi --
    >
    > On Sat, 12 Aug 2006, Sam Kong wrote:
    >
    >> Hello Rubyists,
    >>
    >> I'm reading Ruby Cookbook.
    >> The first chapter is about String.
    >> One of the examples shows the differenct between String#split(/\s
    >> +/) and
    >> String#split(/(\s+)/) without much explanation.
    >> I understand what sub-grouping is in regex.
    >> Bug I don't understand what role that plays in String#split.
    >>
    >> s = "one two three"
    >>
    >> p s.split(/\s+/) #=> ["one", "two", "three"]
    >> p s.split(/(\s+)/) #=> ["one", " ", "two", " ", "three"]
    >>
    >>
    >> Could anybody explain it, please?

    >
    > When you use (), you get the delimiter (the thing you're splitting on)
    > back in the array, along with the items between the delimiters. An
    > example without spaces might make it clearer:
    >
    > "aaaXXXbbbXXXccc".split(/XXX/) => ["aaa","bbb","ccc"]
    > "aaaXXXbbbXXXccc".split(/(XXX)/) => ["aaa","XXX","bbb","XXX","ccc"]
    >
    > In your example, the delimiter is \s+ which is of variable length;
    > that's why you get both " " and " " in the final array.
    >
    >
    > David
    >
    > --
    > http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
    > ----> SEE SPECIAL DEAL FOR RUBY/RAILS USERS GROUPS! <-----
    > http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
    > http://www.manning.com/black => book, Ruby for Rails
    > http://www.rubycentral.org => Ruby Central, Inc.
    Ken & Deb Allen, Aug 12, 2006
    #3
  4. Sam Kong wrote:
    > Hello Rubyists,
    >
    > I'm reading Ruby Cookbook.
    > The first chapter is about String.
    > One of the examples shows the differenct between String#split(/\s+/) and
    > String#split(/(\s+)/) without much explanation.
    > I understand what sub-grouping is in regex.
    > Bug I don't understand what role that plays in String#split.
    >
    > s = "one two three"
    >
    > p s.split(/\s+/) #=> ["one", "two", "three"]
    > p s.split(/(\s+)/) #=> ["one", " ", "two", " ", "three"]


    # Try this one
    p s.split /((((\s+))))/

    >
    > Could anybody explain it, please?
    >
    > Thanks,
    > Sam



    --
    Posted via http://www.ruby-forum.com/.
    Eero Saynatkari, Aug 12, 2006
    #4
  5. Sam Kong

    Jan Svitok Guest

    On 8/12/06, Eero Saynatkari <> wrote:
    > Sam Kong wrote:
    > > Hello Rubyists,
    > >
    > > I'm reading Ruby Cookbook.
    > > The first chapter is about String.
    > > One of the examples shows the differenct between String#split(/\s+/) and
    > > String#split(/(\s+)/) without much explanation.
    > > I understand what sub-grouping is in regex.
    > > Bug I don't understand what role that plays in String#split.
    > >
    > > s = "one two three"
    > >
    > > p s.split(/\s+/) #=> ["one", "two", "three"]
    > > p s.split(/(\s+)/) #=> ["one", " ", "two", " ", "three"]

    >
    > # Try this one
    > p s.split /((((\s+))))/
    >
    > >
    > > Could anybody explain it, please?
    > >
    > > Thanks,
    > > Sam


    Seems like all groups in the separator regex are output to the result array.

    I wonder where is it documented, except for the source itself?
    (string.c, rb_str_split_m())
    Jan Svitok, Aug 12, 2006
    #5
  6. On 8/12/06, Jan Svitok <> wrote:
    > On 8/12/06, Eero Saynatkari <> wrote:
    > > Sam Kong wrote:
    > > > Hello Rubyists,
    > > >
    > > > I'm reading Ruby Cookbook.
    > > > The first chapter is about String.
    > > > One of the examples shows the differenct between String#split(/\s+/) and
    > > > String#split(/(\s+)/) without much explanation.
    > > > I understand what sub-grouping is in regex.
    > > > Bug I don't understand what role that plays in String#split.
    > > >
    > > > s = "one two three"
    > > >
    > > > p s.split(/\s+/) #=> ["one", "two", "three"]
    > > > p s.split(/(\s+)/) #=> ["one", " ", "two", " ", "three"]

    > >
    > > # Try this one
    > > p s.split /((((\s+))))/
    > >
    > > >
    > > > Could anybody explain it, please?
    > > >
    > > > Thanks,
    > > > Sam

    >
    > Seems like all groups in the separator regex are output to the result array.
    >
    > I wonder where is it documented, except for the source itself?
    > (string.c, rb_str_split_m())


    Well the pickaxe (2nd ed.) says so:

    "If pattern is a Regexp, str is divided where the pattern matches.
    Whenever the pattern matches a zero-length string, str is split into
    individual characters. If pattern includes groups, these groups will
    be included in the returned values."

    Ruby-doc.org doesn't have that last sentence, in either the 1.8 nor
    the 1.9 documentation.

    --
    Rick DeNatale
    http://talklikeaduck.denhaven2.com
    Rick DeNatale, Aug 12, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hanif
    Replies:
    6
    Views:
    17,689
    Paul Lutus
    Oct 17, 2003
  2. Replies:
    2
    Views:
    451
  3. Carlos Ribeiro
    Replies:
    11
    Views:
    684
    Alex Martelli
    Sep 17, 2004
  4. trans.  (T. Onoma)

    split on '' (and another for split -1)

    trans. (T. Onoma), Dec 27, 2004, in forum: Ruby
    Replies:
    10
    Views:
    201
    Florian Gross
    Dec 28, 2004
  5. Stanley Xu
    Replies:
    2
    Views:
    582
    Stanley Xu
    Mar 23, 2011
Loading...

Share This Page