Unicode in irb on windows (respectively script/console in instantrails)

Discussion in 'Ruby' started by michael.raidel@gmail.com, Nov 7, 2006.

  1. Guest

    Hi everyone!

    I have a problem with Unicode in irb on Windows. I recognized it when
    trying to save an attribute of an ActiveRecord-Model with an umlaut
    (for example "ü") in script/console. If the database connection is
    encoded in utf8, everything after the umlaut gets truncated, in the
    default encoding I get funny characters back. It doesn't matter if the
    $KCODE is set to UTF8 or NONE, the character number stays the same
    (also on plain irb)!

    Does anyone has a hint on how to solve this? Of course I could try
    things such as Cygwin, but I am trying to find an elegant solution for
    Windows-Users, which eventually could merge in the next
    InstantRails-release, if Curt agrees.

    Thanks a lot,

    Michael
     
    , Nov 7, 2006
    #1
    1. Advertising

  2. On 11/7/06, <> wrote:
    > I have a problem with Unicode in irb on Windows. I recognized it when
    > trying to save an attribute of an ActiveRecord-Model with an umlaut
    > (for example "=FC") in script/console. If the database connection is
    > encoded in utf8, everything after the umlaut gets truncated, in the
    > default encoding I get funny characters back. It doesn't matter if the
    > $KCODE is set to UTF8 or NONE, the character number stays the same
    > (also on plain irb)!


    The windows console -- also used by cygwin -- doesn't recognise UTF-8.
    (That is, it's not possible to properly display UTF-8 in cmd.exe, at
    least so far as I can tell.)

    -austin
    --=20
    Austin Ziegler * * http://www.halostatue.ca/
    * * http://www.halostatue.ca/feed/
    *
     
    Austin Ziegler, Nov 7, 2006
    #2
    1. Advertising

  3. Re: Unicode in irb on windows (respectively script/console ininstantrails)

    A DOS console displays characters according to the OEM code page. Here is
    an example showing how to properly display a=20
    string with 8bit chars (e.g. characters
    with diacritics, or accent marks)...

    # file: oemCodePage.rb

    require 'chilkat'

    # (The CkString class is freeware)
    myStr =3D Chilkat::CkString.new()

    # A DOS console does NOT display this correctly:
    print "=E9 =F4 =E0 =E7\n"

    # What we need is the OEM (DOS) code page...
    # OEM code pages are listed here:
    #=20
    http://msdn.microsoft.com/library/default.asp?url=3D/library/en-us/intl/unic=
    ode_81rn.asp
    myStr.appendAnsi("=E9 =F4 =E0 =E7\n")

    # Emit the string in the character encoding of your choice:
    # ibm850 is the OEM code page for Latin1
    print myStr.getEnc("ibm850")

    # Chilkat supports these:
    # us-ascii
    # unicode
    # unicodefffe
    # iso-8859-1
    # iso-8859-2
    # iso-8859-3
    # iso-8859-4
    # iso-8859-5
    # iso-8859-6
    # iso-8859-7
    # iso-8859-8
    # iso-8859-9
    # iso-8859-13
    # iso-8859-15
    # windows-874
    # windows-1250
    # windows-1251
    # windows-1252
    # windows-1253
    # windows-1254
    # windows-1255
    # windows-1256
    # windows-1257
    # windows-1258
    # utf-7
    # utf-8
    # utf-32
    # utf-32be
    # shift_jis
    # gb2312
    # ks_c_5601-1987
    # big5
    # iso-2022-jp
    # iso-2022-kr
    # euc-jp
    # euc-kr
    # macintosh
    # x-mac-japanese
    # x-mac-chinesetrad
    # x-mac-korean
    # x-mac-arabic
    # x-mac-hebrew
    # x-mac-greek
    # x-mac-cyrillic
    # x-mac-chinesesimp
    # x-mac-romanian
    # x-mac-ukrainian
    # x-mac-thai
    # x-mac-ce
    # x-mac-icelandic
    # x-mac-turkish
    # x-mac-croatian
    # asmo-708
    # dos-720
    # dos-862
    # ibm037
    # ibm437
    # ibm500
    # ibm737
    # ibm775
    # ibm850
    # ibm852
    # ibm855
    # ibm857
    # ibm00858
    # ibm860
    # ibm861
    # ibm863
    # ibm864
    # ibm865
    # cp866
    # ibm869
    # ibm870
    # cp875
    # koi8-r
    # koi8-u



    At 05:07 PM 11/7/2006, you wrote:

    >On 11/7/06, <> wrote:
    >>I have a problem with Unicode in irb on Windows. I recognized it when
    >>trying to save an attribute of an ActiveRecord-Model with an umlaut
    >>(for example "=FC") in script/console. If the database connection is
    >>encoded in utf8, everything after the umlaut gets truncated, in the
    >>default encoding I get funny characters back. It doesn't matter if the
    >>$KCODE is set to UTF8 or NONE, the character number stays the same
    >>(also on plain irb)!

    >
    >The windows console -- also used by cygwin -- doesn't recognise UTF-8.
    >(That is, it's not possible to properly display UTF-8 in cmd.exe, at
    >least so far as I can tell.)
    >
    >-austin
    >--
    >Austin Ziegler * * http://www.halostatue.ca/
    > * * http://www.halostatue.ca/feed/
    > *
    >
    >
    >
    >
    >--
    >No virus found in this incoming message.
    >Checked by AVG Free Edition.
    >Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006



    --
    No virus found in this outgoing message.
    Checked by AVG Free Edition.
    Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006
     
    Chilkat Software, Nov 7, 2006
    #3
  4. On 11/7/06, Austin Ziegler <> wrote:
    > On 11/7/06, <> wrote:
    > > I have a problem with Unicode in irb on Windows. I recognized it when
    > > trying to save an attribute of an ActiveRecord-Model with an umlaut
    > > (for example "=FC") in script/console. If the database connection is
    > > encoded in utf8, everything after the umlaut gets truncated, in the
    > > default encoding I get funny characters back. It doesn't matter if the
    > > $KCODE is set to UTF8 or NONE, the character number stays the same
    > > (also on plain irb)!

    > The windows console -- also used by cygwin -- doesn't recognise UTF-8.
    > (That is, it's not possible to properly display UTF-8 in cmd.exe, at
    > least so far as I can tell.)


    Ack my bad. I had forgotten: you can specify the UTF-8 codepage (CP_UTF8) w=
    ith:

    chcp 65001

    There are some caveats, of course:

    http://blogs.msdn.com/michkap/archive/2006/03/06/544251.aspx

    -austin
    --=20
    Austin Ziegler * * http://www.halostatue.ca/
    * * http://www.halostatue.ca/feed/
    *
     
    Austin Ziegler, Nov 8, 2006
    #4
  5. --------------enig5BAD7457B47BBDA592CE45D0
    Content-Type: text/plain; charset=ISO-8859-1
    Content-Transfer-Encoding: quoted-printable

    Austin Ziegler wrote:
    >=20
    > Ack my bad. I had forgotten: you can specify the UTF-8 codepage
    > (CP_UTF8) with:
    >=20
    > chcp 65001
    >=20
    > There are some caveats, of course:
    >=20
    > http://blogs.msdn.com/michkap/archive/2006/03/06/544251.aspx
    >=20


    Also the good old combo of "mode con codepage select=3D65001".

    http://msdn.microsoft.com/library/default.asp?url=3D/library/en-us/intl/u=
    nicode_81rn.asp
    lists pretty much all the numbers you can use. (The pain of navigating
    to that on the MSDN website.)

    Amusingly enough, none of those are even present anymore on WinXP Pro
    x64. For yet more hilarity, the console is by default set to the DOS OEM
    codepage of the given locale, instead of the newer ANSI ones that are
    ISO extensions, which causes great fun when trying to use software
    that's ever so smart and autodetects my locale as my preferred language
    (Postgres, assorted GNU stuff being too clever by half) instead of using
    the OS language version.

    And "there are some caveats" is an understatement, the UTF-8 support in
    the console is a sham - I couldn't get a trivial C program using
    arbitrary combinations of tchar.h, wchar.h, -DUNICODE, cmd.exe, the
    Windows console, a Cygwin and an MSYS rxvt to do something as daunting
    as input random characters that aren't shared between Latin1 and Latin2
    codepages, store them as multibyte internally, and then write them out
    to a text file and to the console successfully without one step
    breaking. The fact whole of CMD broke down in tears from changing that
    setting is also worth noting - IIRC, had problems doing output
    redirection to a file and whatnot (I can't play around with this without
    setting up a virtual machine with a 32bit XP). Basically, the Path Less
    Annoying is to only use the console for working in your "native"
    codepage, and use a non-console tool for everything else.

    end # of rant

    David Vallner


    --------------enig5BAD7457B47BBDA592CE45D0
    Content-Type: application/pgp-signature; name="signature.asc"
    Content-Description: OpenPGP digital signature
    Content-Disposition: attachment; filename="signature.asc"

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.5 (MingW32)

    iD8DBQFFUT+dy6MhrS8astoRAmPfAJoCUln9FPx8DYExQi7e9msv1vOUNgCfaoXR
    xcbu7raVVAoX95XQGwpwRLQ=
    =WsAE
    -----END PGP SIGNATURE-----

    --------------enig5BAD7457B47BBDA592CE45D0--
     
    David Vallner, Nov 8, 2006
    #5
  6. Guest

    > Ack my bad. I had forgotten: you can specify the UTF-8 codepage (CP_UTF8) with:
    >
    > chcp 65001


    Thank you Austin for the nice hint!

    The problem is, that as soon as I switch the codepage, irb (and also
    script/console) stops working (it doesn't even start anymore, it just
    quits immediately without an error-message).

    Michael
     
    , Nov 8, 2006
    #6
  7. On 11/8/06, <> wrote:
    > > Ack my bad. I had forgotten: you can specify the UTF-8 codepage (CP_UTF8) with:
    > >
    > > chcp 65001

    >
    > Thank you Austin for the nice hint!
    >
    > The problem is, that as soon as I switch the codepage, irb (and also
    > script/console) stops working (it doesn't even start anymore, it just
    > quits immediately without an error-message).


    That's one of the caveats mentioned: batch files no longer work.
    I don't know why. However, if you have Ruby installed in C:\Ruby, you can do:

    copy C:\Ruby\bin\irb C:\Ruby\bin\irb.rb
    irb.rb

    Or:

    ruby C:\Ruby\bin\irb

    And you'll get a working irb.

    -austin
    --
    Austin Ziegler * * http://www.halostatue.ca/
    * * http://www.halostatue.ca/feed/
    *
     
    Austin Ziegler, Nov 9, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. zxo102
    Replies:
    2
    Views:
    261
    zxo102
    Jul 19, 2007
  2. Alf P. Steinbach
    Replies:
    11
    Views:
    2,309
    Alf P. Steinbach
    Nov 23, 2011
  3. basi
    Replies:
    11
    Views:
    273
  4. Replies:
    1
    Views:
    178
    Florian Groß
    Oct 26, 2005
  5. Ronald Dsouza
    Replies:
    16
    Views:
    254
Loading...

Share This Page