Multibyte and Gems

M

Martin Hess

I've tracked down a problem with a Gem I am trying to use. It turns =20
out that it has some non-ascii characters in it; for example the =20
second quote in the regular expression below is not an ASCII character:

parts =3D self.split( %r/( [:.;?!][ ] | (?:[ ]|^)["=93] )/x )

It produces errors like this:

:in `require': =
/opt/local/lib/ruby1.9/gems/1.9.1/gems/webby-0.9.4/lib/=20
webby/core_ext/string.rb:14: invalid multibyte char (US-ASCII) =20
(SyntaxError)

I fixed it by adding the following to the top of the offending file:

# encoding: utf-8

My questions:

* Is this the preferred fix?

* Is there a way to work around this problem without modifying the Gem?

* Is there an easy way to see if gems have non-ascii source files but =20=

haven't included an encoding comment? Some kind of Ruby warning for =20
instance.
 
E

Eric Hodel

I've tracked down a problem with a Gem I am trying to use. It turns =20=
out that it has some non-ascii characters in it; for example the =20
second quote in the regular expression below is not an ASCII =20
character:

parts =3D self.split( %r/( [:.;?!][ ] | (?:[ ]|^)["=93] )/x )

It produces errors like this:

:in `require': = /opt/local/lib/ruby1.9/gems/1.9.1/gems/webby-0.9.4/=20
lib/webby/core_ext/string.rb:14: invalid multibyte char (US-ASCII) =20
(SyntaxError)

I fixed it by adding the following to the top of the offending file:

# encoding: utf-8

My questions:

* Is this the preferred fix?
Yes.

* Is there a way to work around this problem without modifying the =20
Gem?

File a bug with the author and have them release a new version, =20
otherwise no.
* Is there an easy way to see if gems have non-ascii source files =20
but haven't included an encoding comment? Some kind of Ruby warning =20=
for instance.

ruby -c will do this for you.
 
M

Martin Hess

So is it considered best practice to put an encoding comment at the =20
begging of all your files now days? Such as:

# encoding: utf-8

or whatever encoding you like. is this what people are doing or are =20
they doing it one off for the files that have non-ascii characters?

It seems to me that if you have a modern editor it isn't too hard to =20
accidentally slip in some non-ascii characters resulting in some pain =20=

down the road.


I've tracked down a problem with a Gem I am trying to use. It turns =20=
out that it has some non-ascii characters in it; for example the =20
second quote in the regular expression below is not an ASCII =20
character:

parts =3D self.split( %r/( [:.;?!][ ] | (?:[ ]|^)["=93] )/x )

It produces errors like this:

:in `require': = /opt/local/lib/ruby1.9/gems/1.9.1/gems/webby-0.9.4/=20
lib/webby/core_ext/string.rb:14: invalid multibyte char (US-ASCII) =20=
(SyntaxError)

I fixed it by adding the following to the top of the offending file:

# encoding: utf-8

My questions:

* Is this the preferred fix?
Yes.

* Is there a way to work around this problem without modifying the =20=
Gem?

File a bug with the author and have them release a new version, =20
otherwise no.
* Is there an easy way to see if gems have non-ascii source files =20
but haven't included an encoding comment? Some kind of Ruby warning =20=
for instance.

ruby -c will do this for you.
 
M

Michael Fellinger

So is it considered best practice to put an encoding comment at the beggi= ng
of all your files now days? Such as:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 # encoding: utf-8

or whatever encoding you like. is this what people are doing or are they
doing it one off for the files that have non-ascii characters?

It seems to me that if you have a modern editor it isn't too hard to
accidentally slip in some non-ascii characters resulting in some pain dow= n
the road.

It isn't hard to mess up any code in a lot of ways, so as usual, try
to run/test it before you release/deploy :)
That also means that using Ruby 1.9.1 for your daily coding might be a
better choice, otherwise you'll have to use multiruby.
I've tracked down a problem with a Gem I am trying to use. It turns out
that it has some non-ascii characters in it; for example the second quo= te in
the regular expression below is not an ASCII character:

=C2=A0parts =3D self.split( %r/( [:.;?!][ ] | (?:[ ]|^)["=E2=80=9C] )/x= )

It produces errors like this:

=C2=A0 =C2=A0 =C2=A0 =C2=A0:in `require':
/opt/local/lib/ruby1.9/gems/1.9.1/gems/webby-0.9.4/lib/webby/core_ext/s= tring.rb:14:
invalid multibyte char (US-ASCII) (SyntaxError)

I fixed it by adding the following to the top of the offending file:

=C2=A0 =C2=A0 =C2=A0 =C2=A0# encoding: utf-8

My questions:

* Is this the preferred fix?
Yes.

* Is there a way to work around this problem without modifying the Gem?

File a bug with the author and have them release a new version, otherwis= e
no.
* Is there an easy way to see if gems have non-ascii source files but
haven't included an encoding comment? Some kind of Ruby warning for
instance.

ruby -c will do this for you.



--=20
Michael Fellinger
CTO, The Rubyists, LLC
972-996-5199
 
E

Eric Hodel

It isn't hard to mess up any code in a lot of ways, so as usual, try
to run/test it before you release/deploy :)
That also means that using Ruby 1.9.1 for your daily coding might be a
better choice, otherwise you'll have to use multiruby.

With hoe, it's as easy as:

multiruby_setup the_usual # only once
rake multi
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top