Splitting A String

A

Andrew Stewart

Hello,

What's a (good) way to convert this:

'a quick "brown fox" jumped "over the lazy" dog'

into this:

[ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ] ?

Thanks!

Regards,
Andy Stewart
 
J

Jan Friedrich

Andrew said:
What's a (good) way to convert this:

'a quick "brown fox" jumped "over the lazy" dog'

into this:

[ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ] ?


require 'csv'
CSV.parse_line('a quick "brown fox" jumped "over the lazy" dog', ' ')


regards
Jan
 
J

James Edward Gray II

Andrew said:
What's a (good) way to convert this:

'a quick "brown fox" jumped "over the lazy" dog'

into this:

[ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ] ?


require 'csv'
CSV.parse_line('a quick "brown fox" jumped "over the lazy" dog', ' ')

Wow, that's mighty clever. I didn't even think of trying that. Nice
job.

James Edward Gray II
 
J

James Edward Gray II

Hello,

What's a (good) way to convert this:

'a quick "brown fox" jumped "over the lazy" dog'

into this:

[ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ] ?

Thanks!

Regards,
Andy Stewart

Do this -
'a quick "brown fox" jumped "over the lazy" dog'.split

Not quite the same. Look again. ;)

James Edward Gray II
 
A

Andrew Stewart

Hello Jan,

Andrew said:
What's a (good) way to convert this:

'a quick "brown fox" jumped "over the lazy" dog'

into this:

[ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ] ?


require 'csv'
CSV.parse_line('a quick "brown fox" jumped "over the lazy" dog', ' ')

Nice!

Thank you!
Andy Stewart
 
J

Jan Friedrich

Satish said:
'a quick "brown fox" jumped "over the lazy" dog'.split
This was also my first idea, but

['a', 'quick', '"brown', 'fox"', 'jumped', '"over', 'the', 'lazy"',
'dog'] != ['a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog']

regards
Jan
 
W

Wolfgang Nádasi-Donner

James said:
Hello,

What's a (good) way to convert this:

'a quick "brown fox" jumped "over the lazy" dog'

into this:

[ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ] ?

Thanks!

Regards,
Andy Stewart

Do this -
'a quick "brown fox" jumped "over the lazy" dog'.split

Not quite the same. Look again. ;)

James Edward Gray II

But this works:

irb(main):001:0> 'a quick "brown fox" jumped "over the lazy" dog'.split(/[ "]+/)
=> ["a", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog"]

Wolfgang Nádasi-Donner
 
X

Xavier Noria

Hello,

What's a (good) way to convert this:

'a quick "brown fox" jumped "over the lazy" dog'

into this:

[ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ]

Can quotes be escaped? If not there's a simple regexp that does the job:

str = 'a quick "brown fox" jumped "over the lazy" dog'
puts str.scan(/"([^"]*)"|(\w+)/).flatten.select {|s| s}

You can also handle slashes, but it gets uglier.

-- fxn
 
X

Xavier Noria

Hello,

What's a (good) way to convert this:

'a quick "brown fox" jumped "over the lazy" dog'

into this:

[ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ]

Can quotes be escaped? If not there's a simple regexp that does the
job:

str = 'a quick "brown fox" jumped "over the lazy" dog'
puts str.scan(/"([^"]*)"|(\w+)/).flatten.select {|s| s}

Heh, reading it I recalled there's a more specific idiom for that
last select:

str = 'a quick "brown fox" jumped "over the lazy" dog'
puts str.scan(/"((?:\\.|[^"])*)"|(\w+)/).flatten.compact

-- fxn
 
L

Logan Capaldo

Hello,

What's a (good) way to convert this:

'a quick "brown fox" jumped "over the lazy" dog'

into this:

[ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ] ?
Here's yet another way:

#!/usr/bin/env ruby
require 'test/unit'
require 'strscan'
class TestScan < Test::Unit::TestCase
def test_splitter
assert_equal( %w(a b c), splitter(%q{a b c}))
assert_equal(["the", "\"quick brown\"", "fox", "jumped", "over", "the", "lazy", "dog"], splitter(%{the "quick brown" fox jumped over the lazy dog}))
end
end

def splitter(s)
res = []
scanner = StringScanner.new(s)
scanner.skip(/\s*/)
until scanner.eos?
if scanner.scan(/"/)
# quoted string
scanner.scan(/([^"]*")/)
res << '"' + scanner[1]
elsif scanner.scan(/(\S+)/)
res << scanner[1]
end
scanner.skip(/\s*/)
end
res
end
__END__
 
X

Xavier Noria

puts str.scan(/"((?:\\.|[^"])*)"|(\w+)/).flatten.compact

Sorry, that regexp was part of a test and got copied by accident. I
just meant to clean up the select, that's:

str = 'a quick "brown fox" jumped "over the lazy" dog'
puts str.scan(/"([^"]*)"|(\w+)/).flatten.compact

-- fxn
 
R

Robert Klemme

puts str.scan(/"((?:\\.|[^"])*)"|(\w+)/).flatten.compact

Sorry, that regexp was part of a test and got copied by accident. I just
meant to clean up the select, that's:

str = 'a quick "brown fox" jumped "over the lazy" dog'
puts str.scan(/"([^"]*)"|(\w+)/).flatten.compact

And solutions with #inject:

irb(main):010:0> require 'enumerator'
=> true
irb(main):011:0> str = 'a quick "brown fox" jumped "over the lazy" dog'
=> "a quick \"brown fox\" jumped \"over the lazy\" dog"
irb(main):012:0> str.to_enum:)scan, /"([^"]*)"|(\S+)/).inject([]) {|a,m|
a << m.compact!.shift}
=> ["a", "quick", "brown fox", "jumped", "over the lazy", "dog"]
irb(main):013:0> str.to_enum:)scan, /"([^"]*)"|(\S+)/).inject([])
{|a,(m,n)| a << (m||n)}
=> ["a", "quick", "brown fox", "jumped", "over the lazy", "dog"]

But honestly, I found Jan's solution much more elegant. Great stuff!

Kind regards

robert
 
X

Xavier Noria

s = 'a quick "brown fox" jumped "over the lazy" dog'
a = s.split(/"([^"]+)"/).map{|s|s.strip}

Preserving part of the separator is a good trick, but the split
itself is wrong:

["a quick", "brown fox", "jumped", "over the lazy", "dog"]

-- fxn
 
K

Kyle Schmitt

OK this was my answer, but it didn't quite work...
I need to work harder on my regexes I guess
s = 'a quick "brown fox" jumped "over the lazy" dog'
a = s.gsub(/("[a-z]*) ([a-z ]*")/i,"#{$1}_#{$2}")
a.each_index{|i| a.gsub!('_',' ')}
 
L

Louis J Scoras

Not a good solution by any means, but somebody might find it
interesting. Assumes balanced quotes, no escaping, etc.

require 'enumerator'

def my_split s
s.split('"') . # Just split on the quote
to_enum:)each_slice,2) . # ... and deal w/ pairs
inject([]) {
|a,(e,o)| a .
concat(
e.split(' ') + # Split unquotes
[o] # Stuff in quotes is okay as is
)
} .
compact # Finnally remove nils
end
 
G

George Ogata

Hello,

What's a (good) way to convert this:

'a quick "brown fox" jumped "over the lazy" dog'

into this:

[ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ] ?

Thanks!

If you're looking for shell-quoting-like behavior, I actually think
it's more appropriate to use shellwords for this:

irb(main):001:0> require 'shellwords'
=> true
irb(main):002:0> Shellwords.shellwords 'a quick "brown fox" jumps
"over the lazy" dog'
=> ["a", "quick", "brown fox", "jumps", "over the lazy", "dog"]

It will also handle sloshing:

irb(main):003:0> Shellwords.shellwords 'a\ b c'
=> ["a b", "c"]

And distinguish between single and double quotes (for better or for worse):

irb(main):004:0> Shellwords.shellwords %{"a\\"a"}
=> ["a\"a"]
irb(main):005:0> Shellwords.shellwords %{'a\\'}
=> ["a\\"]

Shellwords is in the standard library.

Regards,
George.
 
A

Andrew Stewart

If you're looking for shell-quoting-like behavior, I actually think
it's more appropriate to use shellwords for this: [snip]
Shellwords is in the standard library.

Thanks for the pointer -- I didn't know about Shellwords.

Regards,
Andy Stewart
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top