Encoding issue for special characters on Windows

N

Nicolas Gaiffe

Hi,

I am facing an issue with special characters handling inside a Ruby
script running on Windows and am sure some of you could help me on
this.

This script copies files such as "<English_name>.txt" to
"<Other_language_name>.txt". But once translated, the new filename may
have special characters. 'ä' for instance.

Running
puts 'ä'
in a Ruby script gives
'õ'
as an output, whereas the same code in irb gives
'ä'

There must be an encoding issue at some point in my script but I
didn't manage to fix it (tried different values of '#encoding:'
without success). Any clue ?

Many thanks in advance
Best regards

Nicolas
 
P

Pascal J. Bourguignon

Nicolas Gaiffe said:
Hi,

I am facing an issue with special characters handling inside a Ruby
script running on Windows and am sure some of you could help me on
this.

This script copies files such as "<English_name>.txt" to
"<Other_language_name>.txt". But once translated, the new filename may
have special characters. 'ä' for instance.

Running
puts 'ä'
in a Ruby script gives
'õ'
as an output, whereas the same code in irb gives
'ä'

There must be an encoding issue at some point in my script but I
didn't manage to fix it (tried different values of '#encoding:'
without success). Any clue ?

I use emacs. In emacs, you'd just put:

#!/usr/bin/ruby
# -*- coding:utf-8 -*-
puts "ä"

to have the script encoded in utf-8 and therefore outputing an utf-8 byte stream.
Then of course, you have to have an utf-8 terminal:



[pjb@simias :0.0 tmp]$ chmod 755 test.rb
[pjb@simias :0.0 tmp]$ export LC_CTYPE=en_US.UTF-8
[pjb@simias :0.0 tmp]$ ./test.rb
ä
[pjb@simias :0.0 tmp]$ cat test.rb
#!/usr/bin/ruby
# -*- coding:utf-8 -*-
puts "ä"
[pjb@simias :0.0 tmp]$

Notice that in irb, with an utf-8 terminal, "ä".length == 2


Of course, you can choose to use iso-8859-1 or iso-8859-15, just substitute utf-8.
 
F

F. Senault

Le 9 janvier 2009 à 10:10, Nicolas Gaiffe a écrit :
There must be an encoding issue at some point in my script but I
didn't manage to fix it (tried different values of '#encoding:'
without success). Any clue ?

It depends. If you are trying to echo something to the console, you'll
have to use CP850.

The character for ä is 228 in the ISO8859-1 [1] encoding that your file
seems to use, and that corresponds to the õ character in CP850 [2].

Now, if you're writing something on the screen as a means of control or
debug while manipulating files, don't convert your output to CP850 in
your resulting file ! You'd better stay in ISO, or maybe even in UTF-8,
depending on what your real goal is (website, internal application,
database, etc).

Fred
[1] : http://en.wikipedia.org/wiki/ISO/IEC_8859-1
[2] : http://en.wikipedia.org/wiki/Code_page_850
 
N

Nicolas Gaiffe

It depends.  If you are trying to echo something to the console, you'll
have to use CP850.

The character for ä is 228 in the ISO8859-1 [1]encodingthat your file
seems to use, and that corresponds to the õ character in CP850 [2].

Now, if you're writing something on the screen as a means of control or
debug while manipulating files, don't convert your output to CP850 in
your resulting file !  You'd better stay in ISO

Hi and sorry for the delay,

You were right. The screen output was the only one concerned by the
issue. The result in the filesystem was allright. So everything is
working as expected since I have no need to display the filenames once
in production.

Thanks to both of you
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top