UTF-8 aware chop for 1.8?

A

Ammar Ali

Hello,

Is there an easy way to chop (as in String#chop) a string that can
potentially contain UTF-8 in ruby 1.8? Or should I roll my own?

Thanks,
Ammar
 
J

James Edward Gray II

Is there an easy way to chop (as in String#chop) a string that can
potentially contain UTF-8 in ruby 1.8? Or should I roll my own?

Well, it should be this simple:

str.gsub(/.\z/mu, "")

James Edward Gray II
 
A

Adam Prescott

[Note: parts of this message were removed to make it a legal post.]

I was going to say
=> "one two thre"

I guess I overthought it, huh!
 
A

Ammar Ali

Well, it should be this simple:

=C2=A0str.gsub(/.\z/mu, "")

=3D> "one two thre"


Beautiful. Thank you both.

It was a god exercise for me, so I don't necessarily feel that I
wasted 30 minutes of my life :)

By the way, the m options seems superfluous in James' version. I get
the same results without it.

Thanks again,
Ammar
 
J

James Edward Gray II

Well, it should be this simple:
=20
str.gsub(/.\z/mu, "") =20

=20
=20
Beautiful. Thank you both.
=20
It was a god exercise for me, so I don't necessarily feel that I
wasted 30 minutes of my life :)
=20
By the way, the m options seems superfluous in James' version. I get
the same results without it.

It's not:
=3D> ""

Using gsub() over sub() was a dumb mistake on my part though. sub() is =
all you need, since it can only match once.

James Edward Gray II=
 
A

Ammar Ali

It's not:

=3D> ""

Using gsub() over sub() was a dumb mistake on my part though. =C2=A0sub()=
is all you need, since it can only match once.

Thanks for the clarification.

My method now looks like:

def chop_utf8(s)
return unless s

lead =3D s.sub(/.\z/mu, "")
last =3D s.scan(/.\z/mu).first
last =3D '' unless last

[lead, last]
end

Short and sweet.

Cheers,
Ammar
 
J

James Edward Gray II

My method now looks like:

def chop_utf8(s)
return unless s

lead = s.sub(/.\z/mu, "")
last = s.scan(/.\z/mu).first
last = '' unless last

The two lines above can be replaced with the more efficient:

last = s[/.\z/mu] || ''
[lead, last]
end

James Edward Gray II
 
A

Ammar Ali

My method now looks like:

def chop_utf8(s)
=C2=A0return unless s

=C2=A0lead =3D s.sub(/.\z/mu, "")
=C2=A0last =3D s.scan(/.\z/mu).first
=C2=A0last =3D '' unless last

The two lines above can be replaced with the more efficient:

last =3D s[/.\z/mu] || ''

At this rate the method is going to disappear. :)

I updated the gist accordingly:

https://gist.github.com/661257

Thanks again,
Ammar
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top