Splitting a string with escapable separator?

M

Michael Schuerig

I'm trying to come up with an *elegant* way to split a string into an
array at a separator with the additional feature that the separators
can be escaped. It should work like this

"Hello\, World,Hi".split_escapable(',' '\')
# => ["Hello, World", "Hi"]

Through a number of permutations with regexps, scan and the rest of the
family, I was unable to find a solution. I could parse the given string
myself, going though it character by character, but I'd prefer a less
pedestrian approach.

Michael
 
G

Gavin Kistner

I'm trying to come up with an *elegant* way to split a string into an
array at a separator with the additional feature that the separators
can be escaped. It should work like this

"Hello\, World,Hi".split_escapable(',' '\')
# => ["Hello, World", "Hi"]

Through a number of permutations with regexps, scan and the rest of
the
family, I was unable to find a solution.

Your above example is missing a couple of \, but I assume I know what
you meant.

Is the following elegant or not?

class String
def split_escapable( separator, escape_char=nil )
results = []
re = /(.+?)(?:#{escape_char ? "([^\\#{escape_char}])" : ''}#
{separator}|$)/
self.scan( re ){ |str,last_char|
results << str + last_char.to_s
}
results
end
end

p "Hello\\, World,Hi".split_escapable( ',', '\\' )
#=> ["Hello\\, World", "Hi"]

Note that the above does not account for the case of:
Hello \\,World
(where an escaped backslash is intended to end the first entry)
but if that was important, that's just a matter of a bit of odd/even
backslash counting.

Something like (untested):
re = /(.+?)(?:#{escape_char ? "([^\\#{escape_char}](\\#{escape_char}\
\#{escape_char})*)" : ''}#{separator}|$)/
 
W

William James

Michael said:
I'm trying to come up with an *elegant* way to split a string into an
array at a separator with the additional feature that the separators
can be escaped. It should work like this

"Hello\, World,Hi".split_escapable(',' '\')
# => ["Hello, World", "Hi"]

Through a number of permutations with regexps, scan and the rest of the
family, I was unable to find a solution. I could parse the given string
myself, going though it character by character, but I'd prefer a less
pedestrian approach.


class String
def split_escapable( splitter, escaper )
escaper = escaper*2 if escaper=='\\'
re = %r{ \G
# Make sure at least 1 character remains.
(?= . )
(
(?:
[^#{ splitter }#{ escaper }]
|
(?: #{ escaper } . )
) *
)
(?:
#{ splitter }
|
\Z
)

}xm
scan( re ).map{|x| x.first.gsub( /#{escaper}(.)/, '\1' ) }
end
end

s = <<HERE
Hello@, World!,Hi.
Alarm rings@, lights flash.,One escaper @@
HERE
s.split("\n").each {|x|a=x.split_escapable(',','@');p a; puts a}
puts "----"

s = <<'HERE'
Hello\, World!,Hi.
Alarm rings\, lights flash.,One escaper \\
HERE
s.split("\n").each {|x|a=x.split_escapable(',','\\');p a; puts a}
 
J

Jason Sweat

I'm trying to come up with an *elegant* way to split a string into an
array at a separator with the additional feature that the separators
can be escaped. It should work like this

"Hello\, World,Hi".split_escapable(',' '\')
# =3D> ["Hello, World", "Hi"]

Through a number of permutations with regexps, scan and the rest of the
family, I was unable to find a solution. I could parse the given string
myself, going though it character by character, but I'd prefer a less
pedestrian approach.

Michael

With the new Regex engine in cvs ruby you can use a negative lookback
assertion in your split:=3D> ["Hello\\, World", " Hi"]


$ ruby --v
ruby 1.9.0 (2005-09-08) [i686-linux]


Regards,
Jason
http://blog.casey-sweat.us/
 
M

Michael Schuerig

Jason said:
I'm trying to come up with an *elegant* way to split a string into an
array at a separator with the additional feature that the separators
can be escaped. It should work like this

"Hello\, World,Hi".split_escapable(',' '\')
# => ["Hello, World", "Hi"]

Through a number of permutations with regexps, scan and the rest of
the family, I was unable to find a solution. I could parse the given
string myself, going though it character by character, but I'd prefer
a less pedestrian approach.

Michael

With the new Regex engine in cvs ruby you can use a negative lookback
assertion in your split:=> ["Hello\\, World", " Hi"]

That must be the most elegant solution. Unfortunately I can't use cvs
ruby and can't wait for it either.

Michael
 
M

Michael Schuerig

William said:
Michael said:
I'm trying to come up with an *elegant* way to split a string into an
array at a separator with the additional feature that the separators
can be escaped.
[snip]

class String
def split_escapable( splitter, escaper ) [snip]
end
end

Thanks, that appears to work indeed, although I can't claim to
understand how or why.

Michae.
 
M

Michael Schuerig

Gavin Kistner wrote:
[snip]
Note that the above does not account for the case of:
Hello \\,World
(where an escaped backslash is intended to end the first entry)
but if that was important, that's just a matter of a bit of odd/even
backslash counting.

That's a thing I'd need. Opportunistically, I'll go with William's
suggestion from a sibling post.
Something like (untested):
re = /(.+?)(?:#{escape_char ? "([^\\#{escape_char}](\\#{escape_char}\
\#{escape_char})*)" : ''}#{separator}|$)/

Thanks.

Michael
 
H

Han Holl

------=_Part_1590_9868449.1127989112255
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

I'm trying to come up with an *elegant* way to split a string into an
array at a separator with the additional feature that the separators
can be escaped. It should work like this

"Hello\, World,Hi".split_escapable(',' '\')
# =3D> ["Hello, World", "Hi"]
None of the offerings so far had support for the second argument for split.
The following is probably quite efficent if the chance of an occurrence of
an escaped character is low.
(I haven't benchmarked anything, though). Also, it's not suitable for binar=
y
strings.

class String
def split_escapable(separator, escape_char, *args)
istr =3D dup
impossible =3D "\x01"
replace =3D "#{escape_char}#{separator}"
changed =3D istr.gsub!(replace, impossible)
fields =3D istr.split(separator, *args)
if changed
fields.each do |f|
f.gsub!(impossible, separator)
end
end
fields
end
end
a =3D "Hello\\, World,Hi"
puts a.split_escapable( ',', '\\' )

Cheers,

Han Holl

------=_Part_1590_9868449.1127989112255--
 
E

email55555

Without 1.9 lookback, you could try:
'Hello\,World,Hi'.scan(/(?:\\,|[^,])+/).map {|e| e.tr('\\','')}
=> ["Hello,World", "Hi"]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top