S
Sylvester T Cat
Hi, I'm using ruby 1.9.2
I'm reading a CSV file that has some non US-ASCII characters. I want
to parse each value in each row and strip out any leading/lagging
potential whitespace.
However, when I come across some unusual characters, I get invalid
byte sequence in UTF-8
here is an example:
irb(main):041:0* a = "\xFF"
=> "\xFF"
irb(main):042:0> a.encoding
=> #<Encoding:UTF-8>
irb(main):043:0> a.strip
ArgumentError: invalid byte sequence in UTF-8
from (irb):43:in `strip'
from (irb):43
from /usr/local/lib/ruby/gems/1.9.1/gems/railties-3.0.3/lib/
rails/commands/console.rb:44:in `start'
from /usr/local/lib/ruby/gems/1.9.1/gems/railties-3.0.3/lib/
rails/commands/console.rb:8:in `start'
from /usr/local/lib/ruby/gems/1.9.1/gems/railties-3.0.3/lib/
rails/commands.rb:23:in `<top (required)>'
from script/rails:6:in `require'
from script/rails:6:in `<main>'
# so now I'm going to try and change encoding, but this doesn't work
either
irb(main):044:0> a.encode!("ASCII-8BIT", undef: :replace)
Encoding::InvalidByteSequenceError: "\xFF" on UTF-8
from (irb):44:in `encode!'
from (irb):44
from /usr/local/lib/ruby/gems/1.9.1/gems/railties-3.0.3/lib/
rails/commands/console.rb:44:in `start'
from /usr/local/lib/ruby/gems/1.9.1/gems/railties-3.0.3/lib/
rails/commands/console.rb:8:in `start'
from /usr/local/lib/ruby/gems/1.9.1/gems/railties-3.0.3/lib/
rails/commands.rb:23:in `<top (required)>'
from script/rails:6:in `require'
from script/rails:6:in `<main>'
Is there any way to strip out these characters while staying with
utf-8 encoding?
I'm reading a CSV file that has some non US-ASCII characters. I want
to parse each value in each row and strip out any leading/lagging
potential whitespace.
However, when I come across some unusual characters, I get invalid
byte sequence in UTF-8
here is an example:
irb(main):041:0* a = "\xFF"
=> "\xFF"
irb(main):042:0> a.encoding
=> #<Encoding:UTF-8>
irb(main):043:0> a.strip
ArgumentError: invalid byte sequence in UTF-8
from (irb):43:in `strip'
from (irb):43
from /usr/local/lib/ruby/gems/1.9.1/gems/railties-3.0.3/lib/
rails/commands/console.rb:44:in `start'
from /usr/local/lib/ruby/gems/1.9.1/gems/railties-3.0.3/lib/
rails/commands/console.rb:8:in `start'
from /usr/local/lib/ruby/gems/1.9.1/gems/railties-3.0.3/lib/
rails/commands.rb:23:in `<top (required)>'
from script/rails:6:in `require'
from script/rails:6:in `<main>'
# so now I'm going to try and change encoding, but this doesn't work
either
irb(main):044:0> a.encode!("ASCII-8BIT", undef: :replace)
Encoding::InvalidByteSequenceError: "\xFF" on UTF-8
from (irb):44:in `encode!'
from (irb):44
from /usr/local/lib/ruby/gems/1.9.1/gems/railties-3.0.3/lib/
rails/commands/console.rb:44:in `start'
from /usr/local/lib/ruby/gems/1.9.1/gems/railties-3.0.3/lib/
rails/commands/console.rb:8:in `start'
from /usr/local/lib/ruby/gems/1.9.1/gems/railties-3.0.3/lib/
rails/commands.rb:23:in `<top (required)>'
from script/rails:6:in `require'
from script/rails:6:in `<main>'
Is there any way to strip out these characters while staying with
utf-8 encoding?