French sentences appearing weird in Rails Website

Ritvvij Parrikh · May 15, 2013

I have a Rails app. One of my clients is importing French Text which
is appearing weirdly. Check below example:

1. str = "--- \nFrench: \"3. Combien de r\\xC3\\xA9gions y a-t-il
au Cameroon?\"\nEnglish: 3. How many regions are there in Cameroon?\n"

Can someone assist please?

I am thinking on following lines:

2. str = str.gsub('"', '')

3. **Need to add a line which replaces \\ in the str above to just
\**

4. str = str.force_encoding("iso-8859-1")

5. str = str.encode('UTF-8')

In step 3, I was thinking of something like

str = str.gsub(/\\\\/, "\\")

OR somehow if possible push output of puts or a similar function back
to str example:

puts str

---

French: 3. Combien de r\xC3\xA9gions y a-t-il au Cameroon?

English: 3. How many regions are there in Cameroon?

but even that works. Can someone please assist?

Simon Krahnke · May 17, 2013

* Ritvvij Parrikh said:
I have a Rails app. One of my clients is importing French Text which
is appearing weirdly. Check below example:

1. str = "--- \nFrench: \"3. Combien de r\\xC3\\xA9gions y a-t-il
au Cameroon?\"\nEnglish: 3. How many regions are there in Cameroon?\n"

Can someone assist please?

I am thinking on following lines:

2. str = str.gsub('"', '')

3. **Need to add a line which replaces \\ in the str above to just
\**

4. str = str.force_encoding("iso-8859-1")

No, "\xc3\xa9" is UTF-8, not ISO-8859-1. At least, that makes much more
sense in UTF-8.

5. str = str.encode('UTF-8')

In step 3, I was thinking of something like

str = str.gsub(/\\\\/, "\\")

Yeah.

mfg, simon .... l

Charles Calvert · May 29, 2013

I have a Rails app. One of my clients is importing French Text which
is appearing weirdly. Check below example:

1. str = "--- \nFrench: \"3. Combien de r\\xC3\\xA9gions y a-t-il
au Cameroon?\"\nEnglish: 3. How many regions are there in Cameroon?\n"

As Simon said, this text is encoded in UTF-8. You need to process it
as such. Are you using 1.8 or 1.9?

[snip rest]

Simon Krahnke · May 30, 2013

* Charles Calvert said:
On Wed, 15 May 2013 04:30:45 -0700 (PDT), Ritvvij Parrikh

As Simon said, this text is encoded in UTF-8. You need to process it
as such. Are you using 1.8 or 1.9?

Are there versions of 1.8 that support encodings for strings?

mfg, simon .... l

Charles Calvert · May 30, 2013

Are there versions of 1.8 that support encodings for strings?

For file i/o, the only option of which I'm aware is the iconv library
(http://ruby-doc.org/stdlib-1.8.7/libdoc/iconv/rdoc/Iconv.html).

1.9, on the other hand, has built-in support for encoded strings and
conversion for file i/o. Here's some demo code that I wrote for a
talk that I gave on Unicode in Ruby:

#!/usr/bin/env ruby
# encoding: UTF-8

File.open('utf8.txt', 'w') do |file|
puts "Writing a UTF-8 file"
file.write('TomÃ¡s')
puts ""
end

File.open('utf8.txt', 'r:UTF-8') do |file|
puts "Reading the UTF-8 file"
puts "File external encoding: #{file.external_encoding}"
puts "File contains:"
line_count = 1
file.each_line do |line|
puts "#{line_count}: #{line}"
line_count += 1
puts ""
end
end

File.open('utf8.txt', 'r:UTF-8:UTF-16LE') do |file|
puts "Reading the UTF-8 file and storing in memory as UTF-16 little
endian"
puts "File external encoding: #{file.external_encoding}"
puts "File internal encoding: #{file.internal_encoding}"
puts "In memory representation contains:"
line_count = 1
file.each_line do |line|
puts "#{line_count}: contains #{line.size} characters and
#{line.bytesize} bytes in encoding #{line.encoding.name}"
line_count += 1
end
puts ""
end

Simon Krahnke · May 31, 2013

* Charles Calvert said:
For file i/o, the only option of which I'm aware is the iconv library
(http://ruby-doc.org/stdlib-1.8.7/libdoc/iconv/rdoc/Iconv.html).

1.9, on the other hand, has built-in support for encoded strings and
conversion for file i/o. Here's some demo code that I wrote for a
talk that I gave on Unicode in Ruby:

#!/usr/bin/env ruby
# encoding: UTF-8

File.open('utf8.txt', 'w') do |file|
puts "Writing a UTF-8 file"
file.write('Tomás')

That String is UTF-8 because of the default encoding specified in the
encoding magic comment above.

But why is the File written in UTF-8, because of the same reason?

Thanks for the examples.

mfg, simon .... l

Charles Calvert · Jun 1, 2013

[snip]

1.9, on the other hand, has built-in support for encoded strings and
conversion for file i/o. Here's some demo code that I wrote for a
talk that I gave on Unicode in Ruby:

#!/usr/bin/env ruby
# encoding: UTF-8

File.open('utf8.txt', 'w') do |file|
puts "Writing a UTF-8 file"
file.write('Tomás')

Click to expand...

That String is UTF-8 because of the default encoding specified in the
encoding magic comment above.
Correct.

But why is the File written in UTF-8, because of the same reason?

I believe so, though I haven't checked the source to verify.

Thanks for the examples.

You're welcome.

Simon Krahnke · Jun 2, 2013

* Charles Calvert said:
That String is UTF-8 because of the default encoding specified in the
encoding magic comment above.
Correct.

But why is the File written in UTF-8, because of the same reason?

I believe so, though I haven't checked the source to verify.[/QUOTE]

But you can make it explicit, like you did for reading, can't you. I
think that would be a good idea, to keep things local. Someone might
change the encoding of the file, and then the file will have a different
encoding. Some other application might try read the file as UTF-8,
though.

For string literals there is no way to declare the encoding locally,
Let's just hope that the one who changes the encoding doesn't think it
is magically done by just changing the comment.

mfg, simon .... l

Charles Calvert · Jun 5, 2013

But you can make it explicit, like you did for reading, can't you.

Yes, as well as specifying an in-memory encoding that is different
from the file's encoding on disk.

I think that would be a good idea, to keep things local. Someone
might change the encoding of the file, and then the file will have
a different encoding.

Except that specifying the encoding doesn't transform the data if the
actual encoding is something other than what you specified. Maybe I
misunderstood you.

Some other application might try read the file as UTF-8, though.

Yes. You have to be careful with encodings.

For string literals there is no way to declare the encoding locally,

No, but you can escape them (e.g. "\x00\x50\x00\x65\x00\xF1\x00\x61")
if you need a literal in an encoding other than the default.

Let's just hope that the one who changes the encoding doesn't think it
is magically done by just changing the comment.

True.

Simon Krahnke · Jun 6, 2013

I've looked through the code and it looks to me like the default is
Encoding.default_external, which seems to be initialized by the locale,
not the file's encoding. I can't find a place to find the source files
encoding from within Ruby.

Yes, as well as specifying an in-memory encoding that is different
from the file's encoding on disk.

puts and the like seem to just dump that internal encoding out, right?

Except that specifying the encoding doesn't transform the data if the
actual encoding is something other than what you specified. Maybe I
misunderstood you.

That was based an false premises anyway. The internal encoding doesn't
inform the default encoding of files written, the locale does.

Yes. You have to be careful with encodings.

Which too should expect to find the file be encoded with what the locale
says.

No, but you can escape them (e.g. "\x00\x50\x00\x65\x00\xF1\x00\x61")
if you need a literal in an encoding other than the default.

But that string will still have an encoding attributed with it that says
file's encoding.

True.

I've seen people who seemed to think that on usenet.

mfg, simon .... l

Charles Calvert · Jun 10, 2013

I've looked through the code and it looks to me like the default is
Encoding.default_external, which seems to be initialized by the locale,
not the file's encoding. I can't find a place to find the source files
encoding from within Ruby.

That makes sense from what I've seen. Detecting the encoding of a
file without a BOM is a tricky process, and there are libraries to do
it, so building it into the core seems like overkill.

puts and the like seem to just dump that internal encoding out, right?

The internal encoding of the string, yes.

That was based an false premises anyway. The internal encoding doesn't
inform the default encoding of files written, the locale does.

From my testing, it appears to be the encoding of the string written
to the file, rather than the locale.

Which too should expect to find the file be encoded with what the locale
says.

I never assume when it comes to user input.

But that string will still have an encoding attributed with it that says
file's encoding.

String#force_encoding is useful there.

BIOCHIP --->> VERY BAD !!!	0	Apr 16, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Changes to Answers to Frequently Asked Questions (FAQ)	1	Jul 4, 2004
comp.lang.vhdl FAQ part 3 of 4: products & services	0	Jul 8, 2003
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004

French sentences appearing weird in Rails Website

Ritvvij Parrikh

Simon Krahnke

Charles Calvert

Simon Krahnke

Charles Calvert

Simon Krahnke

Charles Calvert

Simon Krahnke

Charles Calvert

Simon Krahnke

Charles Calvert

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads