Cann't require UTF-8 files.

O01eg Oleg · Apr 30, 2010

When I require file with UTF-8 encoding I get error:

irb(main):001:0> require '/tmp/share/mudserver/game.rb'
SyntaxError: /tmp/share/mudserver/game.rb:2: invalid multibyte char
(US-ASCII)
/tmp/share/mudserver/game.rb:2: invalid multibyte char (US-ASCII)
/tmp/share/mudserver/game.rb:2: syntax error, unexpected $end, expecting
keyword_end

when I simply assign unicode string to variable I don't get any error.
In C API I have such problem with rb_require and rb_eval_string.
I think that I have to set encoding for required files but cann't find
how.
P.S. I try use $KCODE but it no longer work:

irb(main):006:0> $KCODE = 'u'
(irb):6: warning: variable $KCODE is no longer effective; ignored

I try require recommended in Internet 'jcode' but it isn't exist and try
to add u prefix for string, but it cause error even in evalation:

irb(main):005:0> intro = u"Ð¿Ñ€Ð¸Ð²ÐµÑ‚"
NoMethodError: undefined method `u' for main:Object

Caleb Clausen · Apr 30, 2010

When I require file with UTF-8 encoding I get error:

irb(main):001:0> require '/tmp/share/mudserver/game.rb'
SyntaxError: /tmp/share/mudserver/game.rb:2: invalid multibyte char
(US-ASCII)
/tmp/share/mudserver/game.rb:2: invalid multibyte char (US-ASCII)
/tmp/share/mudserver/game.rb:2: syntax error, unexpected $end, expecting
keyword_end

when I simply assign unicode string to variable I don't get any error.
In C API I have such problem with rb_require and rb_eval_string.
I think that I have to set encoding for required files but cann't find
how.
P.S. I try use $KCODE but it no longer work:

irb(main):006:0> $KCODE =3D 'u'
(irb):6: warning: variable $KCODE is no longer effective; ignored

I try require recommended in Internet 'jcode' but it isn't exist and try
to add u prefix for string, but it cause error even in evalation:

irb(main):005:0> intro =3D u"=D0=BF=D1=80=D0=B8=D0=B2=D0=B5=D1=82"
NoMethodError: undefined method `u' for main:Object

Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8

O01eg Oleg · Apr 30, 2010

Caleb said:
Are you using ruby 1.9? If so, then you need to add a magic encoding
line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8

Thanks, it work.

Fernando Perez · Feb 14, 2011

Are you using ruby 1.9? If so, then you need to add a magic encoding

line as the first line (or second if the first is a shebang line) of
your source file, like this:
# encoding: utf-8

Is there a way to avoid adding this magic encoding line in each file?

That's really a metadata and does not belong to the source code.

Josh Cheek · Feb 14, 2011

[Note: parts of this message were removed to make it a legal post.]

Is there a way to avoid adding this magic encoding line in each file?

That's really a metadata and does not belong to the source code.

Run with -Ku flag.

https://gist.github.com/825626

Gary Wright · Feb 14, 2011

Is there a way to avoid adding this magic encoding line in each file?
=20
That's really a metadata and does not belong to the source code.

If the encoding declaration isn't in the file itself then where exactly =
would you store it? If it isn't in the file then it has to be in some =
OS or filesystem specific meta-data store or in yet another file. All =
of which increases the likelihood that the file and its meta-data will =
get out of synch or won't stay together when the file is copied or =
transferred somewhere else.

Placing the encoding information in the file itself seems like the most =
practical solution. The encoding declaration could of course be =
incorrect, but that is always a possibility no matter where you store =
the info.

Gary Wright=

Eric Hodel · Feb 14, 2011

=20
=20
Run with -Ku flag.

Click to expand...

This is not a good solution for library code.=

Clifford Heath · Feb 14, 2011

This is not a good solution for library code.

Right. Is there a good reason why Ruby can't just detect a UTF-8 BOM?
It's still "metadata" but a lot of tools deal with it.

Phillip Gawlowski · Feb 15, 2011

Right. Is there a good reason why Ruby can't just detect a UTF-8 BOM?

The use of a byte order mark is optional. Bit hard to detect what
isn't there, is it?

Here's a (short) discussion on auto-detecting Unicode:
http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

IÃ±aki Baz Castillo · Feb 15, 2011

2011/2/14 Fernando Perez said:
Is there a way to avoid adding this magic encoding line in each file?

That's really a metadata and does not belong to the source code.

If it's metadata, why are you using "require 'file'" instead of
"File.read('file.rb')"?

--=20
I=C3=B1aki Baz Castillo
<[email protected]>

Jeremy Bopp · Feb 15, 2011

Is there a good reason why Ruby can't just detect a UTF-8 BOM?
It's still "metadata" but a lot of tools deal with it.

Using a BOM would break shebang processing. It's not a problem for
Windows users of Ruby since the shebang line is ignored there, but it
would break things for all Unix-like platforms (including Cygwin) where
a script can be run directly as a program:

http://en.wikipedia.org/wiki/Shebang_(Unix)#As_magic_number

My personal preference would be for a single multi-byte encoding to be
selected for all Ruby files. This would make it easier to configure an
editor or source visualizer to handle a file appropriately without the
need to replicate Ruby's encoding detection. One downside though is
that existing scripts encoded differently may be broken for this
hypothetical Ruby's consumption.

Using the magic comment to mark the encoding is probably the least
disruptive solution overall.

-Jeremy

Florian Gilcher · Feb 15, 2011

Using the magic comment to mark the encoding is probably the least
disruptive solution overall.
=20
-Jeremy

I usually recommend not using UTF-8 in source at all and
push all UTF-8 strings into localization files (Either using a =
heavyweight
solution like i18n or just a plain YAML file, if you don't want a =
dependency).
This also circumvents the problem of headers and is good practice.
For scripts of smaller scope, I usually skip that rule

.[2]

Ruby still assumes source code to be US-ASCII by default, which I think =
is a good
choice for compatibility reasons.[1]

Regards,
Florian

[1] Which is also the assumption that Ruby 1.8 had, but not as explicit.
[2] A neat trick is the following:

require "yaml"
puts YAML.load(DATA).inspect

__END__

Fernando Perez · Feb 17, 2011

I usually recommend not using UTF-8 in source at all and

push all UTF-8 strings into localization files (Either using a
heavyweight
solution like i18n or just a plain YAML file, if you don't want a
dependency).

This makes the views (in RoR) unreadable, also we somehow lose
autocompletion by the text-editor of html in the yaml file.

Florian Gilcher · Feb 17, 2011

=20
This makes the views (in RoR) unreadable, also we somehow lose=20
autocompletion by the text-editor of html in the yaml file.

I think at least the "unreadable" part is debatable. Autocompletion =
might
be handy, but the features of your editor should not factor into the =
organization
of your code.

Also, ERB templates are #read, which takes the external-encoding setting
into account and then evaluated using #eval, which does take the =
encoding=20
of the string into account. Other templating libraries like haml have a=20=

setting for the default template encoding. So templates are not
really the problem, as you can already use utf-8 pretty freely without =
marking=20
it.

Regards,
Florian=

ruby unicode/string explosion (0xFF in utf-8)	2	Dec 10, 2010
require fails when requiring scripts with utf-8 filenames.	4	Jun 12, 2010
Ruby 1.9 # coding: utf-8	5	Mar 27, 2009
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position	58	Sep 29, 2013
Ruby 1.9.2 UTF-8 Encoding issues whiles reading/writing files	2	Nov 17, 2010
Stuck with urllib.quote and Unicode/UTF-8	0	May 7, 2011
Dir.entires and UTF-8	5	Jan 12, 2006
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025

Cann't require UTF-8 files.

O01eg Oleg

Caleb Clausen

O01eg Oleg

Fernando Perez

Josh Cheek

Gary Wright

Eric Hodel

Clifford Heath

Phillip Gawlowski

IÃ±aki Baz Castillo

Jeremy Bopp

Florian Gilcher

Fernando Perez

Florian Gilcher

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads