Improving hexadecimal escaping performance

  • Thread starter Iñaki Baz Castillo
  • Start date
I

Iñaki Baz Castillo

Hi, I've a module with two methods (thanks Jeff):
=2D hex_unescape(string)
=2D hex_scape(string)
as follows:

def self::hex_unescape(str)
str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
end

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) }
end

"hex_escape" method is copied from CGI lib, and sincerelly I don't like too=
=20
much its approach using "sprintf". Is there other way more ellegant?=20
(performance is the mos important requeriment anyway).

Thanks a lot.



=2D-=20
I=C3=B1aki Baz Castillo
 
7

7stud --

Iñaki Baz Castillo said:
I don't like
too
much its approach using "sprintf". Is there other way more ellegant?
(performance is the mos important requeriment anyway).

pickaxe2, p. 23:
------
Another output method we use a lot is printf....
------

pickaxe2, p. 526:
--------
printf

Equivalent to io.write sprintf(...)
--------

The Ruby Way (2nd), p. 72:
----------
2.9 Formatting a String

This is done in Ruby as it is in C, with the sprintf method.
---------

Is there other way more ellegant?

def hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) do |match|
"%%%02X" % match[0]
end
end

s = "?<>é"
puts hex_escape(s)

--output:--
%3F%3C%3E%C3%A9
 
R

Robert Klemme

2009/2/23 I=F1aki Baz Castillo said:
Hi, I've a module with two methods (thanks Jeff):
- hex_unescape(string)
- hex_scape(string)
as follows:

def self::hex_unescape(str)
str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
end

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) }
end

"hex_escape" method is copied from CGI lib, and sincerelly I don't like t= oo
much its approach using "sprintf". Is there other way more ellegant?
(performance is the mos important requeriment anyway).

Then I am sure you _measured_ it and came to the conclusion that it is
too slow, did you? What are your results and what are your
performance requirements?

Cheers

robert


--=20
remember.guy do |as, often| as.you_can - without end
 
I

Iñaki Baz Castillo

2009/2/23 Robert Klemme said:
2009/2/23 I=C3=B1aki Baz Castillo said:
Hi, I've a module with two methods (thanks Jeff):
- hex_unescape(string)
- hex_scape(string)
as follows:

def self::hex_unescape(str)
str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
end

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) = }
end

"hex_escape" method is copied from CGI lib, and sincerelly I don't like = too
much its approach using "sprintf". Is there other way more ellegant?
(performance is the mos important requeriment anyway).

Then I am sure you _measured_ it and came to the conclusion that it is
too slow, did you? What are your results and what are your
performance requirements?

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
~4*10^(-5).

Anyway I've realized right now that "sprintf" is directly implemented
as C code so it can't be faster.

Thanks.

--=20
I=C3=B1aki Baz Castillo
<[email protected]>
 
R

Robert Klemme

2009/2/23 I=F1aki Baz Castillo said:
2009/2/23 Robert Klemme said:
2009/2/23 I=F1aki Baz Castillo said:
Hi, I've a module with two methods (thanks Jeff):
- hex_unescape(string)
- hex_scape(string)
as follows:

def self::hex_unescape(str)
str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
end

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0])= }
end

"hex_escape" method is copied from CGI lib, and sincerelly I don't like= too
much its approach using "sprintf". Is there other way more ellegant?
(performance is the mos important requeriment anyway).

Then I am sure you _measured_ it and came to the conclusion that it is
too slow, did you? What are your results and what are your
performance requirements?

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
~4*10^(-5).

Anyway I've realized right now that "sprintf" is directly implemented
as C code so it can't be faster.

Well, you can at least do this in 1.8

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m[0]) }
end

And this in 1.9

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m.getbyte(0)) }
end

Cheers

robert

--=20
remember.guy do |as, often| as.you_can - without end
 
I

Iñaki Baz Castillo

2009/2/23 Robert Klemme said:
Well, you can at least do this in 1.8

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m[0]) }
end

And this in 1.9

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m.getbyte(0)) }
end


Thanks, do you mean that "m[0]" in Ruby 1.9 has a different behaviour
than in 1.8? maybe in 1.9 "m[0]" returns the first character (even if
it's more than two bytes as "=C3=B1", "=E2=82=AC"...) while in 1.8 it retur=
ns just
the firrst two bytes?

PD: I've Ruby 1.9 (2007-12-25 revision 14709) and I don't have
"getbyte()" method for String.

Thanks a lot.



--=20
I=C3=B1aki Baz Castillo
<[email protected]>
 
R

Robert Klemme

2009/2/23 I=F1aki Baz Castillo said:
2009/2/23 Robert Klemme said:
Well, you can at least do this in 1.8

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m[0]) }
end

And this in 1.9

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m.getbyte(0)) }
end


Thanks, do you mean that "m[0]" in Ruby 1.9 has a different behaviour
than in 1.8? maybe in 1.9 "m[0]" returns the first character (even if
it's more than two bytes as "=F1", "=80"...) while in 1.8 it returns just
the firrst two bytes?

PD: I've Ruby 1.9 (2007-12-25 revision 14709) and I don't have
"getbyte()" method for String.

15:15:25 ~$ ruby -ve 'p "foo"[0]'
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
102
15:15:31 ~$ ruby19 -ve 'p "foo"[0]'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
"f"
15:15:34 ~$ ruby19 -ve 'p "foo".getbyte(0)'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
102
15:15:57 ~$


robert
 
I

Iñaki Baz Castillo

2009/2/23 Robert Klemme said:
15:15:25 ~$ ruby -ve 'p "foo"[0]'
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
102
15:15:31 ~$ ruby19 -ve 'p "foo"[0]'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
"f"
15:15:34 ~$ ruby19 -ve 'p "foo".getbyte(0)'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
102
15:15:57 ~$

Clear now, thanks :)
--=20
I=C3=B1aki Baz Castillo
<[email protected]>
 
S

Simon Krahnke

* Iñaki Baz Castillo said:
I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
~4*10^(-5).

For what exactly is 40 microseconds too slow?

mfg, simon .... l
 
S

Simon Krahnke

* Iñaki Baz Castillo said:
I don't mean that, but it's extrange that the inverse method takes
double time, isn't it?

How would you implement these at the core level?

mfg, simon .... l
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
SterlingLa
Top