Encoding issue for special characters on Windows

Discussion in 'Ruby' started by Nicolas Gaiffe, Jan 9, 2009.

  1. Hi,

    I am facing an issue with special characters handling inside a Ruby
    script running on Windows and am sure some of you could help me on
    this.

    This script copies files such as "<English_name>.txt" to
    "<Other_language_name>.txt". But once translated, the new filename may
    have special characters. 'ä' for instance.

    Running
    puts 'ä'
    in a Ruby script gives
    'õ'
    as an output, whereas the same code in irb gives
    'ä'

    There must be an encoding issue at some point in my script but I
    didn't manage to fix it (tried different values of '#encoding:'
    without success). Any clue ?

    Many thanks in advance
    Best regards

    Nicolas
     
    Nicolas Gaiffe, Jan 9, 2009
    #1
    1. Advertising

  2. Nicolas Gaiffe <> writes:

    > Hi,
    >
    > I am facing an issue with special characters handling inside a Ruby
    > script running on Windows and am sure some of you could help me on
    > this.
    >
    > This script copies files such as "<English_name>.txt" to
    > "<Other_language_name>.txt". But once translated, the new filename may
    > have special characters. 'ä' for instance.
    >
    > Running
    > puts 'ä'
    > in a Ruby script gives
    > 'õ'
    > as an output, whereas the same code in irb gives
    > 'ä'
    >
    > There must be an encoding issue at some point in my script but I
    > didn't manage to fix it (tried different values of '#encoding:'
    > without success). Any clue ?


    I use emacs. In emacs, you'd just put:

    #!/usr/bin/ruby
    # -*- coding:utf-8 -*-
    puts "ä"

    to have the script encoded in utf-8 and therefore outputing an utf-8 byte stream.
    Then of course, you have to have an utf-8 terminal:



    [pjb@simias :0.0 tmp]$ chmod 755 test.rb
    [pjb@simias :0.0 tmp]$ export LC_CTYPE=en_US.UTF-8
    [pjb@simias :0.0 tmp]$ ./test.rb
    ä
    [pjb@simias :0.0 tmp]$ cat test.rb
    #!/usr/bin/ruby
    # -*- coding:utf-8 -*-
    puts "ä"
    [pjb@simias :0.0 tmp]$

    Notice that in irb, with an utf-8 terminal, "ä".length == 2


    Of course, you can choose to use iso-8859-1 or iso-8859-15, just substitute utf-8.
    --
    __Pascal Bourguignon__
     
    Pascal J. Bourguignon, Jan 9, 2009
    #2
    1. Advertising

  3. Nicolas Gaiffe

    F. Senault Guest

    Le 9 janvier 2009 à 10:10, Nicolas Gaiffe a écrit :

    > There must be an encoding issue at some point in my script but I
    > didn't manage to fix it (tried different values of '#encoding:'
    > without success). Any clue ?


    It depends. If you are trying to echo something to the console, you'll
    have to use CP850.

    The character for ä is 228 in the ISO8859-1 [1] encoding that your file
    seems to use, and that corresponds to the õ character in CP850 [2].

    Now, if you're writing something on the screen as a means of control or
    debug while manipulating files, don't convert your output to CP850 in
    your resulting file ! You'd better stay in ISO, or maybe even in UTF-8,
    depending on what your real goal is (website, internal application,
    database, etc).

    Fred
    [1] : http://en.wikipedia.org/wiki/ISO/IEC_8859-1
    [2] : http://en.wikipedia.org/wiki/Code_page_850
    --
    I don't need no arms around me I don't need no drugs to calm me
    I have seen the writing on the wall Don't think I need anything at all
    No, don't think I'll need anything at all
    (Pink Floyd, Another Brick in The Wall part 3)
     
    F. Senault, Jan 10, 2009
    #3
  4. On 10 jan, 16:24, "F. Senault" <> wrote:
    > It depends.  If you are trying to echo something to the console, you'll
    > have to use CP850.
    >
    > The character for ä is 228 in the ISO8859-1 [1]encodingthat your file
    > seems to use, and that corresponds to the õ character in CP850 [2].
    >
    > Now, if you're writing something on the screen as a means of control or
    > debug while manipulating files, don't convert your output to CP850 in
    > your resulting file !  You'd better stay in ISO


    Hi and sorry for the delay,

    You were right. The screen output was the only one concerned by the
    issue. The result in the filesystem was allright. So everything is
    working as expected since I have no need to display the filenames once
    in production.

    Thanks to both of you
     
    Nicolas Gaiffe, Jan 13, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?RWR3YXJk?=

    Preventing ASP.NET from encoding special characters

    =?Utf-8?B?RWR3YXJk?=, Dec 14, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    488
    =?Utf-8?B?RWR3YXJk?=
    Dec 14, 2004
  2. Stefan Mueller
    Replies:
    3
    Views:
    33,063
    Stefan Mueller
    Jul 23, 2006
  3. Replies:
    2
    Views:
    1,102
    Ingo Menger
    May 31, 2007
  4. rvino
    Replies:
    0
    Views:
    4,670
    rvino
    Aug 14, 2007
  5. majna
    Replies:
    4
    Views:
    686
    Thomas 'PointedEars' Lahn
    Sep 19, 2007
Loading...

Share This Page