Unicode escaping fun & games


A

Andrew S. Townley

Hi folks,

After my last question, I finally sat down and figured out how to easily
do the kinds of conversions I wanted (at least the Unicode UTF-8 part).
Here's what I came up with in the event that it may be useful to others
having to exchange encoded 7-bit data across environments.

excalibur$ cat utf8.rb
# Created: Thu Apr 23 17:03:23 IST 2009
#
# This is some quick code to deal with UTF-8 manipulation and
# serialization of 7-bit ASCII representations.

$KCODE=3D'u'
require 'jcode'

def utf8_escape(str)
s =3D ""
str.each_char do |c|
x =3D c.unpack("C")[0]
if x < 128=20
s << c
else
s << "\\u%04x" % c.unpack("U")[0]
end
end
s
end

def utf8_unpack(str)
str.gsub(/\\u([0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])/) do
[ $1.hex ].pack("U*")
end
end

Running it:

excalibur$ irb
irb(main):001:0> require 'utf8'
=3D> true
irb(main):002:0> s =3D "Hello =E2=82=AC!"
=3D> "Hello =E2=82=AC!"
irb(main):003:0> t =3D utf8_escape(s)
=3D> "Hello \\u20ac!"
irb(main):004:0> u =3D utf8_unpack(t)
=3D> "Hello =E2=82=AC!"
irb(main):005:0> s =3D=3D u
=3D> true

excalibur$ irb
irb(main):001:0> s =3D "=C3=A0cA=E7=BB=8Bf=C3=A9=C3=A0"
=3D> "\303\240cA\347\273\213f\303\251\303\240"
irb(main):002:0> require 'utf8'
=3D> true
irb(main):003:0> s =3D "=C3=A0cA=E7=BB=8Bf=C3=A9=C3=A0"
=3D> "=C3=A0cA=E7=BB=8Bf=C3=A9=C3=A0"
irb(main):004:0> t =3D utf8_escape(s)
=3D> "\\u00e0cA\\u7ecbf\\u00e9\\u00e0"
irb(main):005:0> u =3D utf8_unpack(t)
=3D> "=C3=A0cA=E7=BB=8Bf=C3=A9=C3=A0"
irb(main):006:0> s =3D=3D u
=3D> true

It may not be 100% bullet-proof, but it works for some simple examples
that I could find, so this may be as far as I need to go with that part.
The next step is to roll this into a one-pass string escaping routine so
you don't need to do a bunch of gsub calls.

Any suggestions, comments and improvements are welcome.

Cheers,

ast
--=20
Andrew S. Townley <[email protected]>
http://atownley.org
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top