A Code Point's Tale: There and Back Again

T

Terry Michaels

This is probably obvious in the docs and I'm just missing it, but here
goes: So, I see there is str.each_codepoint, which I want to use in a
function to convert Unicode Strings to a list of Unicode code points.
But what can I do if I have a list of Unicode code points and want to
convert them back into a String?
 
M

Markus Fischer

Hi,

This is probably obvious in the docs and I'm just missing it, but here
goes: So, I see there is str.each_codepoint, which I want to use in a
function to convert Unicode Strings to a list of Unicode code points.
But what can I do if I have a list of Unicode code points and want to
convert them back into a String?

I think you can use Array#pack for that:

$ irb
ruby-1.9.2-p180 :001 > "f뀀oöbß".each_codepoint.to_a
=> [102, 45056, 111, 246, 98, 223]
ruby-1.9.2-p180 :002 > "f뀀oöbß".each_codepoint.to_a.pack("U*")
=> "f뀀oöbß"

cheers
 
7

7stud --

Terry Michaels wrote in post #995906:
This is probably obvious in the docs and I'm just missing it, but here
goes: So, I see there is str.each_codepoint, which I want to use in a
function to convert Unicode Strings to a list of Unicode code points.
But what can I do if I have a list of Unicode code points and want to
convert them back into a String?


#encoding: UTF-8
#That comment tells ruby to treat string literals in my source code, =

like
#the one below, as utf-8 encoded.

str =3D "\xE2\x82\xAC\xE2\x82\xAC"

codes =3D str.each_codepoint.to_a

p codes
puts codes.map {|code| code.chr(Encoding::UTF_8) }.join(" ")

--output:--
[8364, 8364]
=E2=82=AC =E2=82=AC

(You should see two euro symbols as the last line of output.)


I don't know where you are getting your string, but you can always do =

this:

str =3D "\xE2\x82\xAC\xE2\x82\xAC"
str.force_encoding("UTF-8")

codes =3D str.each_codepoint.to_a

p codes
puts codes.map {|code| code.chr(Encoding::UTF_8) }.join(" ")


--output:--
[8364, 8364]
=E2=82=AC =E2=82=AC

(You should see two euro symbols as the last line of output.)

-- =

Posted via http://www.ruby-forum.com/.=
 
7

7stud --

Maybe each_char() will work for you? Take a look at the following code.

str =3D "\xE2\x82\xAC\xE2\x82\xAC"
puts str.encoding

str.force_encoding("UTF-8")
puts str.encoding

chars =3D str.each_char.to_a
p chars

puts chars[0].encoding

puts chars.join

--output:--
ASCII-8BIT
UTF-8
["\u20AC", "\u20AC"]
UTF-8
=E2=82=AC=E2=82=AC

(You should see two euro symbols as the last line of output.)

The output implies that a string with unicode escapes is given a UTF-8 =

encoding by default. And that seems to be the case:

str =3D "\u20AC\u20AC"
puts str.encoding

--output:--
UTF-8

-- =

Posted via http://www.ruby-forum.com/.=
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top