RDoc and encoding

C

Claus Folke Brobak

Hi,

Running Ruby/JRuby 1.8.7 on Windows XP.

Until now I have been using the RDoc version built into the Ruby
Standard Library. That is version 1.0.1. Now I am trying out RDoc 3.4,
installed via a gem.

I have run into a problem with the double quote chracter. Example code:

RDoc 1.0.1

require 'rdoc/markup/simple_markup'
require 'rdoc/markup/simple_markup/to_html'

sm =3D SM::SimpleMarkup.new()
th =3D SM::ToHtml.new()
puts sm.convert('=C3=A6=C3=A6=C3=A6"=C3=B8=C3=B8=C3=B8"=C3=A5=C3=A5=C3=A5=
', th)

Output:

<p>
=C3=A6=C3=A6=C3=A6&quot;=C3=B8=C3=B8=C3=B8&quot;=C3=A5=C3=A5=C3=A5
</p>

RDoc 3.4

require 'rubygems'
require 'rdoc/markup/to_html'

puts RDoc::Markup::ToHtml.new().convert('=C3=A6=C3=A6=C3=A6"=C3=B8=C3=B8=
=C3=B8"=C3=A5=C3=A5=C3=A5')

Output:

<p>=C3=A6=C3=A6=C3=A6=C3=A2=E2=82=AC=C5=93=C3=B8=C3=B8=C3=B8=C3=A2=E2=82=
=AC=C2=9D=C3=A5=C3=A5=C3=A5</p>

It seems as if RDoc 3.4 is adding a double quote in UTF-8 encoding
instead of "&quot;". Running on Windows XP, the normal encoding is
Windows-1252. If I look at the HTML and tell the browser that it is
UTF-8 encoded, the double quotes are displayed correctly. Then, however,
the Danish national characters (=C3=A6=C3=B8=C3=A5) are not displayed as =
they should.

Do you think I have hit a bug in Rdoc 3.4, or am I missing something?

Claus

-- =

Posted via http://www.ruby-forum.com/.=
 
E

Eric Hodel

Hi,
=20
Running Ruby/JRuby 1.8.7 on Windows XP.
=20
Until now I have been using the RDoc version built into the Ruby
Standard Library. That is version 1.0.1. Now I am trying out RDoc 3.4,
installed via a gem.
=20
I have run into a problem with the double quote chracter. Example = code:
=20
RDoc 1.0.1
=20
require 'rdoc/markup/simple_markup'
require 'rdoc/markup/simple_markup/to_html'
=20
sm =3D SM::SimpleMarkup.new()
th =3D SM::ToHtml.new()
puts sm.convert('=C3=A6=C3=A6=C3=A6"=C3=B8=C3=B8=C3=B8"=C3=A5=C3=A5=C3=A5= ', th)
=20
Output:
=20
<p>
=C3=A6=C3=A6=C3=A6&quot;=C3=B8=C3=B8=C3=B8&quot;=C3=A5=C3=A5=C3=A5
</p>
=20
RDoc 3.4
=20
require 'rubygems'
require 'rdoc/markup/to_html'
=20
puts RDoc::Markup::ToHtml.new().convert('=C3=A6=C3=A6=C3=A6"=C3=B8=C3=B8= =C3=B8"=C3=A5=C3=A5=C3=A5')
=20
Output:
=20
<p>=C3=A6=C3=A6=C3=A6=C3=A2=E2=82=AC=C5=93=C3=B8=C3=B8=C3=B8=C3=A2=E2=82=
=AC=C2=9D=C3=A5=C3=A5=C3=A5 said:
=20
It seems as if RDoc 3.4 is adding a double quote in UTF-8 encoding
instead of "&quot;". Running on Windows XP, the normal encoding is
Windows-1252. If I look at the HTML and tell the browser that it is
UTF-8 encoded, the double quotes are displayed correctly. Then, = however,
the Danish national characters (=C3=A6=C3=B8=C3=A5) are not displayed = as they should.
=20
Do you think I have hit a bug in Rdoc 3.4, or am I missing something?

Transcoding is not supported in RDoc on ruby 1.8.7. Upgrade to Ruby =
1.9.

My primary platform for developing RDoc is Ruby 1.9. Ruby 1.8.6 is =
unsupported and 1.8.7 gets second tier status and will not support =
transcoding.=
 
C

Claus Folke Brobak

Eric Hodel wrote in post #973761:
Transcoding is not supported in RDoc on ruby 1.8.7. Upgrade to Ruby
1.9.

I don't think it is a matter of transcoding. I would have thought the
output would remain in the Windows-1252 encoding of the input.

As I can figure out, RDoc always "thinks" the input is in UTF-8
encoding. This is probably rarely the case on Windows.

Can you explain the use of a double quote in UTF-8 encoding instead of
"&quot;" in the generated HTML?

Claus
 
E

Eric Hodel

Eric Hodel wrote in post #973761:
=20
I don't think it is a matter of transcoding. I would have thought the
output would remain in the Windows-1252 encoding of the input.
=20
As I can figure out, RDoc always "thinks" the input is in UTF-8
encoding. This is probably rarely the case on Windows.

With Ruby 1.8 this is true. If you upgrade to Ruby 1.9 RDoc 3 can =
automatically determine the output encoding and transcode for you. You =
can also override it with --encoding.
Can you explain the use of a double quote in UTF-8 encoding instead of
"&quot;" in the generated HTML?

RDoc now performs "prettier" replacements of characters such as matching =
opening and closing quotes. Such characters are not available in all =
output encodings so transcoding is performed.=
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top