UTF-8 aware chop for 1.8?

Ammar Ali · Nov 3, 2010

Hello,

Is there an easy way to chop (as in String#chop) a string that can
potentially contain UTF-8 in ruby 1.8? Or should I roll my own?

Thanks,
Ammar

Ammar Ali · Nov 3, 2010

Ended up making my own. Posting it here for the benefit of others, and
maybe some feedback.

https://gist.github.com/661217

Regards,
Ammar

James Edward Gray II · Nov 3, 2010

Is there an easy way to chop (as in String#chop) a string that can
potentially contain UTF-8 in ruby 1.8? Or should I roll my own?

Well, it should be this simple:

str.gsub(/.\z/mu, "")

James Edward Gray II

Adam Prescott · Nov 3, 2010

[Note: parts of this message were removed to make it a legal post.]

I was going to say
=> "one two thre"

I guess I overthought it, huh!

Ammar Ali · Nov 3, 2010

Well, it should be this simple:

=C2=A0str.gsub(/.\z/mu, "")

=3D> "one two thre"

Beautiful. Thank you both.

It was a god exercise for me, so I don't necessarily feel that I
wasted 30 minutes of my life

By the way, the m options seems superfluous in James' version. I get
the same results without it.

Thanks again,
Ammar

James Edward Gray II · Nov 3, 2010

Well, it should be this simple:
=20
str.gsub(/.\z/mu, "") =20

=20
=20
Beautiful. Thank you both.
=20
It was a god exercise for me, so I don't necessarily feel that I
wasted 30 minutes of my life
=20
By the way, the m options seems superfluous in James' version. I get
the same results without it.

Click to expand...

It's not:
=3D> ""

Using gsub() over sub() was a dumb mistake on my part though. sub() is =
all you need, since it can only match once.

James Edward Gray II=

Ammar Ali · Nov 3, 2010

It's not:

=3D> ""

Using gsub() over sub() was a dumb mistake on my part though. =C2=A0sub()=

is all you need, since it can only match once.

Thanks for the clarification.

My method now looks like:

def chop_utf8(s)
return unless s

lead =3D s.sub(/.\z/mu, "")
last =3D s.scan(/.\z/mu).first
last =3D '' unless last

[lead, last]
end

Short and sweet.

Cheers,
Ammar

James Edward Gray II · Nov 3, 2010

My method now looks like:

def chop_utf8(s)
return unless s

lead = s.sub(/.\z/mu, "")
last = s.scan(/.\z/mu).first
last = '' unless last

The two lines above can be replaced with the more efficient:

last = s[/.\z/mu] || ''

[lead, last]
end

James Edward Gray II

Ammar Ali · Nov 3, 2010

My method now looks like:

def chop_utf8(s)
=C2=A0return unless s

=C2=A0lead =3D s.sub(/.\z/mu, "")
=C2=A0last =3D s.scan(/.\z/mu).first
=C2=A0last =3D '' unless last

Click to expand...

The two lines above can be replaced with the more efficient:

last =3D s[/.\z/mu] || ''

At this rate the method is going to disappear.

I updated the gist accordingly:

https://gist.github.com/661257

Thanks again,
Ammar

botp · Nov 4, 2010

last =3D s[/.\z/mu] || ''

Click to expand...

I updated the gist accordingly:
=A0https://gist.github.com/661257

can we make that a one pass?

str =3D~ /.\z/mu
[$`,$&]

best regards -botp

Brian Candler · Nov 4, 2010

Ammar Ali wrote in post #959047:

By the way, the m options seems superfluous in James' version. I get
the same results without it.

=> "abc\n"

Ammar Ali · Nov 4, 2010

Ammar Ali wrote in post #959047:
=> "abc\n"

James clarified this earlier. But thanks for chiming in nonetheless.

Cheers,
Ammar

Batch Convert HTML to UTF-8 Files	2	Oct 2, 2023
Ruby 1.8.* convert string to utf-8	7	Aug 18, 2008
UTF-8 read & print?	6	Nov 25, 2012
to_yaml in utf-8 encoding	7	Apr 8, 2011
UTF-8 and strings	44	Jun 7, 2011
String#chop chops last byte, not char	1	Apr 23, 2008
UTF-8 encoding with BOM under Ruby 1.8.x (Windows)	5	Aug 15, 2007
broken UTF-8 string	1	Jul 24, 2010

UTF-8 aware chop for 1.8?

Ammar Ali

Ammar Ali

James Edward Gray II

Adam Prescott

Ammar Ali

James Edward Gray II

Ammar Ali

James Edward Gray II

Ammar Ali

botp

Brian Candler

Ammar Ali

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads