Cann't require UTF-8 files.

O

O01eg Oleg

When I require file with UTF-8 encoding I get error:

irb(main):001:0> require '/tmp/share/mudserver/game.rb'
SyntaxError: /tmp/share/mudserver/game.rb:2: invalid multibyte char
(US-ASCII)
/tmp/share/mudserver/game.rb:2: invalid multibyte char (US-ASCII)
/tmp/share/mudserver/game.rb:2: syntax error, unexpected $end, expecting
keyword_end

when I simply assign unicode string to variable I don't get any error.
In C API I have such problem with rb_require and rb_eval_string.
I think that I have to set encoding for required files but cann't find
how.
P.S. I try use $KCODE but it no longer work:

irb(main):006:0> $KCODE = 'u'
(irb):6: warning: variable $KCODE is no longer effective; ignored

I try require recommended in Internet 'jcode' but it isn't exist and try
to add u prefix for string, but it cause error even in evalation:

irb(main):005:0> intro = u"привет"
NoMethodError: undefined method `u' for main:Object
 
C

Caleb Clausen

When I require file with UTF-8 encoding I get error:

irb(main):001:0> require '/tmp/share/mudserver/game.rb'
SyntaxError: /tmp/share/mudserver/game.rb:2: invalid multibyte char
(US-ASCII)
/tmp/share/mudserver/game.rb:2: invalid multibyte char (US-ASCII)
/tmp/share/mudserver/game.rb:2: syntax error, unexpected $end, expecting
keyword_end

when I simply assign unicode string to variable I don't get any error.
In C API I have such problem with rb_require and rb_eval_string.
I think that I have to set encoding for required files but cann't find
how.
P.S. I try use $KCODE but it no longer work:

irb(main):006:0> $KCODE =3D 'u'
(irb):6: warning: variable $KCODE is no longer effective; ignored

I try require recommended in Internet 'jcode' but it isn't exist and try
to add u prefix for string, but it cause error even in evalation:

irb(main):005:0> intro =3D u"=D0=BF=D1=80=D0=B8=D0=B2=D0=B5=D1=82"
NoMethodError: undefined method `u' for main:Object

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8
 
O

O01eg Oleg

Caleb said:
Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8
Thanks, it work.
 
F

Fernando Perez

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8

Is there a way to avoid adding this magic encoding line in each file?

That's really a metadata and does not belong to the source code.
 
G

Gary Wright

Is there a way to avoid adding this magic encoding line in each file?
=20
That's really a metadata and does not belong to the source code.

If the encoding declaration isn't in the file itself then where exactly =
would you store it? If it isn't in the file then it has to be in some =
OS or filesystem specific meta-data store or in yet another file. All =
of which increases the likelihood that the file and its meta-data will =
get out of synch or won't stay together when the file is copied or =
transferred somewhere else.

Placing the encoding information in the file itself seems like the most =
practical solution. The encoding declaration could of course be =
incorrect, but that is always a possibility no matter where you store =
the info.

Gary Wright=
 
C

Clifford Heath

This is not a good solution for library code.

Right. Is there a good reason why Ruby can't just detect a UTF-8 BOM?
It's still "metadata" but a lot of tools deal with it.
 
P

Phillip Gawlowski

Right. Is there a good reason why Ruby can't just detect a UTF-8 BOM?

The use of a byte order mark is optional. Bit hard to detect what
isn't there, is it?

Here's a (short) discussion on auto-detecting Unicode:
http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
 
I

Iñaki Baz Castillo

2011/2/14 Fernando Perez said:
Is there a way to avoid adding this magic encoding line in each file?

That's really a metadata and does not belong to the source code.

If it's metadata, why are you using "require 'file'" instead of
"File.read('file.rb')"?

--=20
I=C3=B1aki Baz Castillo
<[email protected]>
 
J

Jeremy Bopp

Is there a good reason why Ruby can't just detect a UTF-8 BOM?
It's still "metadata" but a lot of tools deal with it.

Using a BOM would break shebang processing. It's not a problem for
Windows users of Ruby since the shebang line is ignored there, but it
would break things for all Unix-like platforms (including Cygwin) where
a script can be run directly as a program:

http://en.wikipedia.org/wiki/Shebang_(Unix)#As_magic_number

My personal preference would be for a single multi-byte encoding to be
selected for all Ruby files. This would make it easier to configure an
editor or source visualizer to handle a file appropriately without the
need to replicate Ruby's encoding detection. One downside though is
that existing scripts encoded differently may be broken for this
hypothetical Ruby's consumption.

Using the magic comment to mark the encoding is probably the least
disruptive solution overall.

-Jeremy
 
F

Florian Gilcher

Using the magic comment to mark the encoding is probably the least
disruptive solution overall.
=20
-Jeremy

I usually recommend not using UTF-8 in source at all and
push all UTF-8 strings into localization files (Either using a =
heavyweight
solution like i18n or just a plain YAML file, if you don't want a =
dependency).
This also circumvents the problem of headers and is good practice.
For scripts of smaller scope, I usually skip that rule ;).[2]

Ruby still assumes source code to be US-ASCII by default, which I think =
is a good
choice for compatibility reasons.[1]

Regards,
Florian

[1] Which is also the assumption that Ruby 1.8 had, but not as explicit.
[2] A neat trick is the following:

require "yaml"
puts YAML.load(DATA).inspect

__END__
 
F

Fernando Perez

I usually recommend not using UTF-8 in source at all and
push all UTF-8 strings into localization files (Either using a
heavyweight
solution like i18n or just a plain YAML file, if you don't want a
dependency).

This makes the views (in RoR) unreadable, also we somehow lose
autocompletion by the text-editor of html in the yaml file.
 
F

Florian Gilcher

=20
This makes the views (in RoR) unreadable, also we somehow lose=20
autocompletion by the text-editor of html in the yaml file.

I think at least the "unreadable" part is debatable. Autocompletion =
might
be handy, but the features of your editor should not factor into the =
organization
of your code.

Also, ERB templates are #read, which takes the external-encoding setting
into account and then evaluated using #eval, which does take the =
encoding=20
of the string into account. Other templating libraries like haml have a=20=

setting for the default template encoding. So templates are not
really the problem, as you can already use utf-8 pretty freely without =
marking=20
it.

Regards,
Florian=
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top