Question about split method

G

Glenn

[Note: parts of this message were removed to make it a legal post.]

Hello,

I'm wondering if there is a method for the String class that splits a string on some characters and keeps the split characters in the elements of the resulting array?

The split method returns an array in this example:

p "This is a sentence. This is a sentence! This is a sentence?".strip.split(/\.|\?|\!/)


["This is a sentence", " This is a sentence", " This is a sentence"]

The three sentences in the above string have very different meanings, but loose those meanings without the punctuation, so I'd like to keep the punctuation. I'd like a method that keeps the split characters, and returns this array:

["This is a sentence.", " This is a sentence!", " This is a sentence?"]

Does such an array exist? If not, would it be possible to modify the split method to produce that result?

I'm running Ruby 1.8.6 on Windows.

Thanks for your help.
 
T

ThoML

Does such an array exist? If not, would it be possible to modify the split method to produce that result?

If you put the pattern in a group, it will be included in the array --
but not quite in the way you described:

irb(main):001:0> p "This is a sentence. This is a sentence! This is a
sentence?".strip.split(/(\.|\?|\!)/)
["This is a sentence", ".", " This is a sentence", "!", " This is a
sentence", "?"]

Regards,
Thomas.
 
R

Robert Klemme

2008/2/25 said:
Hello,

I'm wondering if there is a method for the String class that splits a string on some characters and keeps the split characters in the elements of the resulting array?

The split method returns an array in this example:

p "This is a sentence. This is a sentence! This is a sentence?".strip.split(/\.|\?|\!/)


["This is a sentence", " This is a sentence", " This is a sentence"]

The three sentences in the above string have very different meanings, but loose those meanings without the punctuation, so I'd like to keep the punctuation. I'd like a method that keeps the split characters, and returns this array:

["This is a sentence.", " This is a sentence!", " This is a sentence?"]

Does such an array exist? If not, would it be possible to modify the split method to produce that result?

I'm running Ruby 1.8.6 on Windows.

Hm, you could do it with lookbehind on 1.9. On 1.8 you only have
lookforward which gives you this:

irb(main):002:0> "a. b.".split /(?=\.\s+)/
=> ["a", ". b."]

Not quite what you wanted. :)

But here's an alternative approach which works with 1.8:

irb(main):005:0> "a. b. c! d? e.".scan /.*?[.!?](?:\s|$)/
=> ["a. ", "b. ", "c! ", "d? ", "e."]

Kind regards

robert
 
J

James Gray

2008/2/25 said:
Hello,

I'm wondering if there is a method for the String class that splits
a string on some characters and keeps the split characters in the
elements of the resulting array?

The split method returns an array in this example:

p "This is a sentence. This is a sentence! This is a
sentence?".strip.split(/\.|\?|\!/)


["This is a sentence", " This is a sentence", " This is a sentence"]

The three sentences in the above string have very different
meanings, but loose those meanings without the punctuation, so I'd
like to keep the punctuation. I'd like a method that keeps the
split characters, and returns this array:

["This is a sentence.", " This is a sentence!", " This is a
sentence?"]

Does such an array exist? If not, would it be possible to modify
the split method to produce that result?

I'm running Ruby 1.8.6 on Windows.

Hm, you could do it with lookbehind on 1.9. On 1.8 you only have
lookforward which gives you this:

irb(main):002:0> "a. b.".split /(?=\.\s+)/
=> ["a", ". b."]

Not quite what you wanted. :)

We can turn look-ahead into into look-behind, though it's not pretty:

$ ruby -ve 'p "This is a sentence. This is a sentence! This is a
sentence?".reverse.split(/(?=(?:\A|\s+)[.!?])/).map { |s|
s.reverse }.reverse'
ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-darwin9.1.0]
["This is a sentence. ", "This is a sentence! ", "This is a sentence?"]

James Edward Gray II
 
P

Power One

This post is old, but since I'm searching for something else but landed
on this post in Google, but I have a better solution for this particular
post than other suggestions posted above, so I hope you guy don't mind
my resurrection of the post.

My solution is this:

puts "This is a sentence. This is a sentence! This is a
sentence?".strip.split(/\b(\.|\?|\!)\b/)

If you try out the code above, it will return:
This is a sentence. This is a sentence! This is a sentence?
=> nil

:) Try that in irb!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,772
Messages
2,569,590
Members
45,100
Latest member
MelodeeFaj
Top