Improving hexadecimal escaping performance

IÃ±aki Baz Castillo · Feb 23, 2009

Hi, I've a module with two methods (thanks Jeff):
=2D hex_unescape(string)
=2D hex_scape(string)
as follows:

def self::hex_unescape(str)
str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
end

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) }
end

"hex_escape" method is copied from CGI lib, and sincerelly I don't like too=
=20
much its approach using "sprintf". Is there other way more ellegant?=20
(performance is the mos important requeriment anyway).

Thanks a lot.

=2D-=20
I=C3=B1aki Baz Castillo

7stud -- · Feb 23, 2009

IÃ±aki Baz Castillo said:
I don't like
too
much its approach using "sprintf". Is there other way more ellegant?
(performance is the mos important requeriment anyway).

pickaxe2, p. 23:
------
Another output method we use a lot is printf....
------

pickaxe2, p. 526:
--------
printf

Equivalent to io.write sprintf(...)
--------

The Ruby Way (2nd), p. 72:
----------
2.9 Formatting a String

This is done in Ruby as it is in C, with the sprintf method.
---------

Is there other way more ellegant?

def hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) do |match|
"%%%02X" % match[0]
end
end

s = "?<>Ã©"
puts hex_escape(s)

--output:--
%3F%3C%3E%C3%A9

Robert Klemme · Feb 23, 2009

2009/2/23 I=F1aki Baz Castillo said:
Hi, I've a module with two methods (thanks Jeff):
- hex_unescape(string)
- hex_scape(string)
as follows:

def self::hex_unescape(str)
str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
end

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) }
end

"hex_escape" method is copied from CGI lib, and sincerelly I don't like t= oo
much its approach using "sprintf". Is there other way more ellegant?
(performance is the mos important requeriment anyway).

Then I am sure you _measured_ it and came to the conclusion that it is
too slow, did you? What are your results and what are your
performance requirements?

Cheers

robert

--=20
remember.guy do |as, often| as.you_can - without end

IÃ±aki Baz Castillo · Feb 23, 2009

2009/2/23 Robert Klemme said:
2009/2/23 I=C3=B1aki Baz Castillo said:

Hi, I've a module with two methods (thanks Jeff):
- hex_unescape(string)
- hex_scape(string)
as follows:

def self::hex_unescape(str)
str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
end

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) = }
end

"hex_escape" method is copied from CGI lib, and sincerelly I don't like = too
much its approach using "sprintf". Is there other way more ellegant?
(performance is the mos important requeriment anyway).

Click to expand...

Then I am sure you _measured_ it and came to the conclusion that it is
too slow, did you? What are your results and what are your
performance requirements?

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
~4*10^(-5).

Anyway I've realized right now that "sprintf" is directly implemented
as C code so it can't be faster.

Thanks.

--=20
I=C3=B1aki Baz Castillo
<[email protected]>

Robert Klemme · Feb 23, 2009

2009/2/23 I=F1aki Baz Castillo said:
2009/2/23 Robert Klemme said:

2009/2/23 I=F1aki Baz Castillo said:

Hi, I've a module with two methods (thanks Jeff):
- hex_unescape(string)
- hex_scape(string)
as follows:

def self::hex_unescape(str)
str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
end

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0])= }
end

"hex_escape" method is copied from CGI lib, and sincerelly I don't like= too
much its approach using "sprintf". Is there other way more ellegant?
(performance is the mos important requeriment anyway).

Click to expand...

Then I am sure you _measured_ it and came to the conclusion that it is
too slow, did you? What are your results and what are your
performance requirements?

Click to expand...

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
~4*10^(-5).

Anyway I've realized right now that "sprintf" is directly implemented
as C code so it can't be faster.

Well, you can at least do this in 1.8

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m[0]) }
end

And this in 1.9

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m.getbyte(0)) }
end

Cheers

robert

--=20
remember.guy do |as, often| as.you_can - without end

IÃ±aki Baz Castillo · Feb 23, 2009

2009/2/23 Robert Klemme said:
Well, you can at least do this in 1.8

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m[0]) }
end

And this in 1.9

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m.getbyte(0)) }
end

Thanks, do you mean that "m[0]" in Ruby 1.9 has a different behaviour
than in 1.8? maybe in 1.9 "m[0]" returns the first character (even if
it's more than two bytes as "=C3=B1", "=E2=82=AC"...) while in 1.8 it retur=
ns just
the firrst two bytes?

PD: I've Ruby 1.9 (2007-12-25 revision 14709) and I don't have
"getbyte()" method for String.

Thanks a lot.

--=20
I=C3=B1aki Baz Castillo
<[email protected]>

Robert Klemme · Feb 23, 2009

2009/2/23 I=F1aki Baz Castillo said:
2009/2/23 Robert Klemme said:

Well, you can at least do this in 1.8

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m[0]) }
end

And this in 1.9

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m.getbyte(0)) }
end

Click to expand...

Thanks, do you mean that "m[0]" in Ruby 1.9 has a different behaviour
than in 1.8? maybe in 1.9 "m[0]" returns the first character (even if
it's more than two bytes as "=F1", "=80"...) while in 1.8 it returns just
the firrst two bytes?

PD: I've Ruby 1.9 (2007-12-25 revision 14709) and I don't have
"getbyte()" method for String.

15:15:25 ~$ ruby -ve 'p "foo"[0]'
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
102
15:15:31 ~$ ruby19 -ve 'p "foo"[0]'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
"f"
15:15:34 ~$ ruby19 -ve 'p "foo".getbyte(0)'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
102
15:15:57 ~$

robert

IÃ±aki Baz Castillo · Feb 23, 2009

2009/2/23 Robert Klemme said:
15:15:25 ~$ ruby -ve 'p "foo"[0]'
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
102
15:15:31 ~$ ruby19 -ve 'p "foo"[0]'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
"f"
15:15:34 ~$ ruby19 -ve 'p "foo".getbyte(0)'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
102
15:15:57 ~$

Clear now, thanks

--=20
I=C3=B1aki Baz Castillo
<[email protected]>

Simon Krahnke · Feb 23, 2009

* Iñaki Baz Castillo said:
I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
~4*10^(-5).

For what exactly is 40 microseconds too slow?

mfg, simon .... l

IÃ±aki Baz Castillo · Feb 23, 2009

2009/2/23 Simon Krahnke said:
For what exactly is 40 microseconds too slow?

I don't mean that, but it's extrange that the inverse method takes
double time, isn't it?

--=20
I=C3=B1aki Baz Castillo
<[email protected]>

Simon Krahnke · Feb 24, 2009

* Iñaki Baz Castillo said:
I don't mean that, but it's extrange that the inverse method takes
double time, isn't it?

How would you implement these at the core level?

mfg, simon .... l

Un-escaping hexadecimal code	5	Feb 20, 2009
Unicode escaping fun & games	0	Apr 23, 2009
how do i configure this code	0	Oct 7, 2010
RXParse module v.91 (by robic0)	0	Jun 8, 2006
Can't make this page work	6	Mar 8, 2006

Improving hexadecimal escaping performance

IÃ±aki Baz Castillo

7stud --

Robert Klemme

IÃ±aki Baz Castillo

Robert Klemme

IÃ±aki Baz Castillo

Robert Klemme

IÃ±aki Baz Castillo

Simon Krahnke

IÃ±aki Baz Castillo

Simon Krahnke

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads