A Code Point's Tale: There and Back Again

Terry Michaels · Apr 30, 2011

This is probably obvious in the docs and I'm just missing it, but here
goes: So, I see there is str.each_codepoint, which I want to use in a
function to convert Unicode Strings to a list of Unicode code points.
But what can I do if I have a list of Unicode code points and want to
convert them back into a String?

xcr xcr · Apr 30, 2011

I hope this is what u r looking for
http://ruby-unicode.rubyforge.org/doc/

Markus Fischer · Apr 30, 2011

Hi,

This is probably obvious in the docs and I'm just missing it, but here
goes: So, I see there is str.each_codepoint, which I want to use in a
function to convert Unicode Strings to a list of Unicode code points.
But what can I do if I have a list of Unicode code points and want to
convert them back into a String?

I think you can use Array#pack for that:

$ irb
ruby-1.9.2-p180 :001 > "fë€€oÃ¶bÃŸ".each_codepoint.to_a
=> [102, 45056, 111, 246, 98, 223]
ruby-1.9.2-p180 :002 > "fë€€oÃ¶bÃŸ".each_codepoint.to_a.pack("U*")
=> "fë€€oÃ¶bÃŸ"

cheers

7stud -- · May 1, 2011

Terry Michaels wrote in post #995906:

This is probably obvious in the docs and I'm just missing it, but here
goes: So, I see there is str.each_codepoint, which I want to use in a
function to convert Unicode Strings to a list of Unicode code points.
But what can I do if I have a list of Unicode code points and want to
convert them back into a String?

#encoding: UTF-8
#That comment tells ruby to treat string literals in my source code, =

like
#the one below, as utf-8 encoded.

str =3D "\xE2\x82\xAC\xE2\x82\xAC"

codes =3D str.each_codepoint.to_a

p codes
puts codes.map {|code| code.chr(Encoding::UTF_8) }.join(" ")

--output:--
[8364, 8364]
=E2=82=AC =E2=82=AC

(You should see two euro symbols as the last line of output.)

I don't know where you are getting your string, but you can always do =

this:

str =3D "\xE2\x82\xAC\xE2\x82\xAC"
str.force_encoding("UTF-8")

codes =3D str.each_codepoint.to_a

p codes
puts codes.map {|code| code.chr(Encoding::UTF_8) }.join(" ")

--output:--
[8364, 8364]
=E2=82=AC =E2=82=AC

(You should see two euro symbols as the last line of output.)

-- =

Posted via http://www.ruby-forum.com/.=

7stud -- · May 1, 2011

7stud -- wrote in post #996022:

Terry Michaels wrote in post #995906:

You will never learn ruby unicode by reading the docs. Head over to
James Edward Gray II's website for some lessons:

http://blog.grayproductions.net/articles/ruby_19s_three_default_encodings

Someone else blogged in great detail about all the intricacies of ruby
unicode and its problems, but I can't find the link now.

7stud -- · May 1, 2011

Maybe each_char() will work for you? Take a look at the following code.

str =3D "\xE2\x82\xAC\xE2\x82\xAC"
puts str.encoding

str.force_encoding("UTF-8")
puts str.encoding

chars =3D str.each_char.to_a
p chars

puts chars[0].encoding

puts chars.join

--output:--
ASCII-8BIT
UTF-8
["\u20AC", "\u20AC"]
UTF-8
=E2=82=AC=E2=82=AC

(You should see two euro symbols as the last line of output.)

The output implies that a string with unicode escapes is given a UTF-8 =

encoding by default. And that seems to be the case:

str =3D "\u20AC\u20AC"
puts str.encoding

--output:--
UTF-8

-- =

Posted via http://www.ruby-forum.com/.=

Born Again C.S. Guy Intro/Career Questions	3	May 2, 2023
PHP variables and 'back' button	5	Feb 9, 2022
A tale of yak shaving	1	Aug 29, 2011
Need help again please	19	Feb 14, 2020
Did you know that there is a match-case function in python?	4	Dec 17, 2023
Python Unicode handling wins again -- mostly	67	Nov 30, 2013
Can't decide which language to get back into programming with	1	Mar 28, 2023
C code String And Comparison	2	Dec 27, 2022

A Code Point's Tale: There and Back Again

Terry Michaels

xcr xcr

Markus Fischer

7stud --

7stud --

7stud --

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads