Decode/encode Unicode

Discussion in 'Ruby' started by Kless, Aug 28, 2008.

  1. Kless

    Kless Guest

    How to decode a String type to Unicode?
    And, to encode Unicode to String?
    Kless, Aug 28, 2008
    #1
    1. Advertising

  2. Kless

    Thomas B. Guest

    Kless wrote:
    > How to decode a String type to Unicode?
    > And, to encode Unicode to String?


    Unicode is not fully supported in Ruby 1.8.X, it will be in Ruby 1.9.
    --
    Posted via http://www.ruby-forum.com/.
    Thomas B., Aug 28, 2008
    #2
    1. Advertising

  3. Kless

    James Gray Guest

    On Aug 28, 2008, at 3:51 AM, Kless wrote:

    > How to decode a String type to Unicode?


    $ ruby -KU -e 'p "R=E9sum=E9".unpack("U*")'
    [82, 233, 115, 117, 109, 233]

    > And, to encode Unicode to String?


    $ ruby -KU -e 'p [82, 233, 115, 117, 109, 233].pack("U*")'
    "R=E9sum=E9"

    Hope that helps.

    James Edward Gray II=
    James Gray, Aug 28, 2008
    #3
  4. On Thu, Aug 28, 2008 at 9:51 AM, Kless <> wrote:
    > How to decode a String type to Unicode?
    > And, to encode Unicode to String?
    >
    >


    This is a trickier question than you probably realize.
    You might need to know what encoding you want to transform your string
    to and from. Sometimes you may not know - mp3 ID3 tag info is
    supposed to be UTF8 for example but often isn't. If you don't specify
    an encoding it may or may not end up being encoded based on your
    locale settings.

    Things I have seen mentioned:

    iconv - a unix based library for translating between different
    encodings. Requires you tell it what you decoding from and encoding
    to.
    http://wiki.rubyonrails.org/rails/pages/iconv

    unidecode - I have used this translate Unicode strings to ASCII by
    simple character mapping- it works about 99% of the time so you will
    need error handling for when it goes bonk.

    Setting KCODE:

    $KCODE = 'UTF8'
    or
    $KCODE = 'u'
    or
    use -Ku at the command line. This tells Ruby that you want to be
    using UTF8 encoding

    require 'jcode'

    will give you access to some code developed to deal with Japanese
    Unicode encoding that address some of the problems with the String
    class struggles with Unicode chars like jlength. Always keep in mind
    that a lot of the string methods in Ruby do not work properly with
    Unicode because they count letters wrong.

    Good luck.

    Stephen Boisvert
    http://blog.ennuyer.net
    Stephen Boisvert, Aug 28, 2008
    #4
  5. Kless

    Kless Guest

    Thanks! It will help until rb 1.9 been more extended.

    On Aug 28, 1:11 pm, James Gray <> wrote:
    > On Aug 28, 2008, at 3:51 AM, Kless wrote:
    >
    > > How to decode  a String type to Unicode?

    >
    > $ ruby -KU -e 'p "Résumé".unpack("U*")'
    > [82, 233, 115, 117, 109, 233]
    >
    > > And, to encode Unicode to String?

    >
    > $ ruby -KU -e 'p [82, 233, 115, 117, 109, 233].pack("U*")'
    > "Résumé"
    >
    > Hope that helps.
    >
    > James Edward Gray II
    Kless, Aug 28, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Harald Kirsch
    Replies:
    2
    Views:
    2,111
    Harald Kirsch
    Aug 28, 2003
  2. =?UTF-8?B?UmFmYcWCIE1haiBSYWYyNTY=?=

    c++ support for unicode, utf-8, encode/decode, ifstream, wstream?

    =?UTF-8?B?UmFmYcWCIE1haiBSYWYyNTY=?=, Jan 20, 2006, in forum: C++
    Replies:
    12
    Views:
    6,330
    JustBoo
    Jan 23, 2006
  3. anonymous
    Replies:
    1
    Views:
    608
  4. peter pilsl
    Replies:
    2
    Views:
    135
    peter pilsl
    Oct 1, 2004
  5. Alan Franzoni
    Replies:
    0
    Views:
    191
    Alan Franzoni
    Jul 27, 2012
Loading...

Share This Page