Unicode strings in ruby code

I

Idan Miller

Hi everyone,

I'm trying to get ruby to run a script that has a hebrew string.
I'm using a string with just one letter - aleph, the first in the
hebrew alphabet.

This code:

a = "×"

that I write using SciTE and saved as UTF-8 gives these errors:
C:/Documents and Settings/idan/Desktop/test.rb:1: Invalid char `\357'
in expression
C:/Documents and Settings/idan/Desktop/test.rb:1: Invalid char `\273'
in expression
C:/Documents and Settings/idan/Desktop/test.rb:1: Invalid char `\277'
in expression

Now, if I save this as ANSI in SciTE it allows it, and then it runs.
However, if I use eclipse ruby plugin, it won't allow me to save aleph
as ANSI (rightfully).

Also, if I read aleph from a file everything works fine when the ruby
script is UTF-8.

What is wrong here?
Can ruby just not get strings in the code that aren't ANSI?

Thanks,
Idan Miller.
 
7

7stud --

Idan said:
Hi everyone,

I'm trying to get ruby to run a script that has a hebrew string.
I'm using a string with just one letter - aleph, the first in the
hebrew alphabet.

This code:

a = "×"

that I write using SciTE and saved as UTF-8 gives these errors:
C:/Documents and Settings/idan/Desktop/test.rb:1: Invalid char `\357'
in expression
C:/Documents and Settings/idan/Desktop/test.rb:1: Invalid char `\273'
in expression
C:/Documents and Settings/idan/Desktop/test.rb:1: Invalid char `\277'
in expression

Now, if I save this as ANSI in SciTE it allows it, and then it runs.
However, if I use eclipse ruby plugin, it won't allow me to save aleph
as ANSI (rightfully).

Also, if I read aleph from a file everything works fine when the ruby
script is UTF-8.

What is wrong here?
Can ruby just not get strings in the code that aren't ANSI?

Thanks,
Idan Miller.

You can always do this:

a = "\xD7\x90"
puts a


If what you are trying to do is avoid having to type in utf chars in
hexidecimal notation, then try putting this at the top of your file:

$KCODE = "UTF-8"

and see if that allows you to enter the actual character with your
editor.
 
I

Idan Miller

Hi,

I am trying to avoid the hexidecimal notation since it is obviously
hard to do and not readable...
The KCODE statement doesn't help.

Idan.
 
A

Arlen Cuss

SG0uIEl0IHdvcmtzIGZvciBtZToKCmNlbHRpY0Bzb2htYTp+JCBmaWxlIGhlYnJldy5yYgpoZWJy
ZXcucmI6IFVURi04IFVuaWNvZGUgdGV4dApjZWx0aWNAc29obWE6fiQgY2F0IGhlYnJldy5yYgph
ID0gIteQIgpwdXRzIGEKCmNlbHRpY0Bzb2htYTp+JCBydWJ5IGhlYnJldy5yYgrXkApjZWx0aWNA
c29obWE6fiQgcnVieSAtdgpydWJ5IDEuOC42ICgyMDA3LTA2LTA3IHBhdGNobGV2ZWwgMzYpIFtp
NDg2LWxpbnV4XQpjZWx0aWNAc29obWE6fiQKClNvIEkgY2FuJ3QgcmVjcmVhdGUgeW91ciBwcm9i
bGVtIC0gSSB3b25kZXIgaWYgaXQncyB0aGUgV2luZG93cwpkaXN0cmlidXRpb24uCgpBcmxlbgo=
 
I

Idan Miller

*feels stupid*
I see you're running 1.8.6 as well
How can I move this issue forward?

It must be a windows issue...
Which ruby do you run?
I'm running 1.8.6

Hm. It works for me:
celtic@sohma:~$ file hebrew.rb
hebrew.rb: UTF-8 Unicode text
celtic@sohma:~$ cat hebrew.rb
a = "×"
puts a
celtic@sohma:~$ ruby hebrew.rb
×
celtic@sohma:~$ ruby -v
ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]
celtic@sohma:~$
So I can't recreate your problem - I wonder if it's the Windows
distribution.
Arlen-הסתר טקסט מצוטט-

-הר××” טקסט מצוטט-
 
A

Alex Fenton

Idan said:
This code:

a = "×"

that I write using SciTE and saved as UTF-8 gives these errors:
C:/Documents and Settings/idan/Desktop/test.rb:1: Invalid char `\357'
in expression
C:/Documents and Settings/idan/Desktop/test.rb:1: Invalid char `\273'
in expression
C:/Documents and Settings/idan/Desktop/test.rb:1: Invalid char `\277'
in expression

Now, if I save this as ANSI in SciTE it allows it, and then it runs.

The problem is partly because you're using SCITE. It's saves the "Byte
Order Mark" or BOM at the beginning of UTF-8 files. Ruby fails to run
scripts with a BOM at the beginning.

The solution is to find a setting to save UTF8 without BOM. I don't know
how it's done in SCITE, but any decent code editor should be able to do
this. If, not find another.

As others have said, you should puts $KCODE='u' or run ruby with -Ku if
your script is encoded in UTF8.

alex
 
T

Tadashi Saito

Hi,

Hi,

I am trying to avoid the hexidecimal notation since it is obviously
hard to do and not readable...
The KCODE statement doesn't help.

-Ku option would.
 
I

Idan Miller

When I run with -KU I'm expected to have a certian method missing in
Japanese:

Desktop/test.rb:1: undefined local variable or method `∩╗â”' for
main:Object (NameError)
 
J

Justin Collins

Idan said:
When I run with -KU I'm expected to have a certian method missing in
Japanese:

Desktop/test.rb:1: undefined local variable or method `∩╗â”' for
main:Object (NameError)

I think you might still be having issues with how your editor is saving
the file. Try creating a fresh test file with something like Notepad and
see if you have the same problems.

-Justin
 
W

Wolfgang Nádasi-Donner

Justin said:
I think you might still be having issues with how your editor is saving
the file. Try creating a fresh test file with something like Notepad and
see if you have the same problems.

If you start the utf-8 encoded file with BOM with the following line...

=nil

..., Ruby 1.8 will have no problems.

Wolfgang Nádasi-Donner
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top