Break apart a string by kind of characters

D

Daniel Waite

Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

I did a search on the forums and came up with this regex:

'a1000aa'.scan(/((.)\2*)/).map { |i| i[0] }

Which is pretty close, but it groups on a change of character, so I
would get:

[ 'a', '1', '000', 'aa' ]

I tried playing around with the regex (e.g. swapping the . for (\d|\w))
but to no avail.

Any ideas?
 
D

Daniel Waite

Daniel said:
Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

I did a search on the forums and came up with this regex:

'a1000aa'.scan(/((.)\2*)/).map { |i| i[0] }

Which is pretty close, but it groups on a change of character, so I
would get:

[ 'a', '1', '000', 'aa' ]

I tried playing around with the regex (e.g. swapping the . for (\d|\w))
but to no avail.

Any ideas?

I figured out one possible solution. Granted, it's not as elegant as a
single regex, but it works and I understand it. Here goes...

First, I opened up class String to add some convenience and make things
a bit shorter:

class String

def letter?
self.first.scan(/[A-Za-z]/).empty? ? false : true
end

def digit?
self.first.scan(/[0123456789]/).empty? ? false : true
end

end

Any my method:

def break_apart_rule_increment
groups = Array.new
string = 'a1000aa'

string.each_char do |character|
# Put the first character into a group.
groups << character and next if groups.empty?

# If this character is of the same kind as the last,
# add it to the group, otherwise, create a new group
# and put it there.
if (groups.last.letter? and character.letter?) or
(groups.last.digit? and character.digit?)
groups.last << character
else
groups << character
end
end

groups
end
 
P

Phrogz

Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]
 
P

Phrogz

Hi all, I've an interesting problem. Imagine the following string:

I want to break it apart like so:
[ 'a', '1000', 'aa' ]

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

Or, if you want multiple types of character groupings:

irb(main):001:0> s = 'hello world, you crazy world!'
=> "hello world, you crazy world!"

irb(main):003:0> s.scan( /[aeiou]+|[b-df-hj-np-tv-z]+|[^a-z]+/ )
=> ["h", "e", "ll", "o", " ", "w", "o", "rld", ", ", "y", "ou", " ",
"cr", "a", "zy", " ", "w", "o", "rld", "!"]
 
D

Daniel Waite

Gavin said:
irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

WOW! Freakin' awesome!

One caveat...

irb(main):004:0> '11aa1000aaa'.split(/(\d+)/)
=> ["", "11", "aa", "1000", "aaa"]

For some reason it answers with a blank element, but I'm sure that's an
easy one to solve.

Thanks, Gavin!
 
J

James Edward Gray II

Gavin said:
irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

WOW! Freakin' awesome!

One caveat...

irb(main):004:0> '11aa1000aaa'.split(/(\d+)/)
=> ["", "11", "aa", "1000", "aaa"]

For some reason it answers with a blank element, but I'm sure
that's an
easy one to solve.

If you just want digits and non-digits, I suggest:
=> ["11", "aa", "1000", "aaa"]

James Edward Gray II
 
D

Daniel Waite

James said:
If you just want digits and non-digits, I suggest:
=> ["11", "aa", "1000", "aaa"]


I LOVE it! I gotta brush up on my regex skills. Wait, I need to get some
regex skills first. :)

Thanks, Edward; that made my night.
 
L

Lloyd Linklater

Gavin said:
Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

Gavin, how in the WORLD does this bit of black magic work and how did
you ever figure it out???
 
J

James Edward Gray II

Gavin said:
Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

Gavin,

I'm not Gavin, but...
how in the WORLD does this bit of black magic work

Captures in a Regexp passed to split() are returned as part of the
result.
and how did you ever figure it out???

Interestingly, the documentation doesn't seem to mention it. I guess
I knew it was there because Perl works the same way and I tried it
sometime.

James Edward Gray II
 
Y

Yossef Mendelssohn

I'm not Gavin, but...
Ditto

Interestingly, the documentation doesn't seem to mention it. I guess
I knew it was there because Perl works the same way and I tried it
sometime.
Ditto

James Edward Gray II

Not ditto
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top