Decode/encode Unicode

T

Thomas B.

Kless said:
How to decode a String type to Unicode?
And, to encode Unicode to String?

Unicode is not fully supported in Ruby 1.8.X, it will be in Ruby 1.9.
 
J

James Gray

How to decode a String type to Unicode?

$ ruby -KU -e 'p "R=E9sum=E9".unpack("U*")'
[82, 233, 115, 117, 109, 233]
And, to encode Unicode to String?

$ ruby -KU -e 'p [82, 233, 115, 117, 109, 233].pack("U*")'
"R=E9sum=E9"

Hope that helps.

James Edward Gray II=
 
S

Stephen Boisvert

How to decode a String type to Unicode?
And, to encode Unicode to String?

This is a trickier question than you probably realize.
You might need to know what encoding you want to transform your string
to and from. Sometimes you may not know - mp3 ID3 tag info is
supposed to be UTF8 for example but often isn't. If you don't specify
an encoding it may or may not end up being encoded based on your
locale settings.

Things I have seen mentioned:

iconv - a unix based library for translating between different
encodings. Requires you tell it what you decoding from and encoding
to.
http://wiki.rubyonrails.org/rails/pages/iconv

unidecode - I have used this translate Unicode strings to ASCII by
simple character mapping- it works about 99% of the time so you will
need error handling for when it goes bonk.

Setting KCODE:

$KCODE = 'UTF8'
or
$KCODE = 'u'
or
use -Ku at the command line. This tells Ruby that you want to be
using UTF8 encoding

require 'jcode'

will give you access to some code developed to deal with Japanese
Unicode encoding that address some of the problems with the String
class struggles with Unicode chars like jlength. Always keep in mind
that a lot of the string methods in Ruby do not work properly with
Unicode because they count letters wrong.

Good luck.

Stephen Boisvert
http://blog.ennuyer.net
 
K

Kless

Thanks! It will help until rb 1.9 been more extended.

How to decode  a String type to Unicode?

$ ruby -KU -e 'p "Résumé".unpack("U*")'
[82, 233, 115, 117, 109, 233]
And, to encode Unicode to String?

$ ruby -KU -e 'p [82, 233, 115, 117, 109, 233].pack("U*")'
"Résumé"

Hope that helps.

James Edward Gray II
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,040
Latest member
papereejit

Latest Threads

Top