performance stats of String#scan, strscan and a homemade approach

  • Thread starter Simon Strandgaard
  • Start date
S

Simon Strandgaard

because I recently have messed around with a ruby syntax colorer,
I needed to know more about the performance of #scan or if there
were faster alternatives.. String#scan seems to be the fastest.

maybe this come others in handy.

--
Simon Strandgaard


bash-2.05b$ ruby h.rb
user system total real
String#scan 0.810000 0.020000 0.830000 ( 0.937981)
strscan 1.110000 0.040000 1.150000 ( 1.255724)
homemade slicer 2.420000 0.130000 2.550000 ( 2.648530)
true
true
bash-2.05b$ expand -t2 h.rb
require 'strscan'
def strscan(string, re)
tokens = []
ss = StringScanner.new(string)
until ss.eos?
m = ss.scan(re)
break unless m
tokens << m
end
tokens
end
def slicer(string, re)
tokens = []
while string.size > 0
m = re.match(string)
break unless m
token = string.slice!(0, m.end(0))
tokens << token
end
tokens
end
re_src = '\d+|\s+|.'
n = 10000
require 'benchmark'
Benchmark.bm(20) do |b|
# Exercise String#scan
re1 = Regexp.new(re_src)
lines = IO.readlines(__FILE__)
result1 = []
GC.disable
b.report("String#scan") do
n.times do |i|
result1 << lines[i%lines.size].scan(re1)
end
end
GC.enable
# Exercise strscan
lines = IO.readlines(__FILE__)
result2 = []
GC.disable
b.report("strscan") do
n.times do |i|
result2 << strscan(lines[i%lines.size], re1)
end
end
GC.enable
# Exercise homemade slicer
re2 = Regexp.new('\A(?:'+re_src+')')
lines = IO.readlines(__FILE__)
result3 = []
GC.disable
b.report("homemade slicer") do
n.times do |i|
result3 << slicer(lines[i%lines.size].clone, re2)
end
end
GC.enable
# check that output was correct
p((result1 == result2), (result1 == result3))
end
bash-2.05b$
 
M

Michael Neumann

Simon said:
because I recently have messed around with a ruby syntax colorer,
I needed to know more about the performance of #scan or if there
were faster alternatives.. String#scan seems to be the fastest.

maybe this come others in handy.

--
Simon Strandgaard


bash-2.05b$ ruby h.rb
user system total real
String#scan 0.810000 0.020000 0.830000 ( 0.937981)
strscan 1.110000 0.040000 1.150000 ( 1.255724)
homemade slicer 2.420000 0.130000 2.550000 ( 2.648530)
true
true
bash-2.05b$ expand -t2 h.rb
require 'strscan'
def strscan(string, re)
tokens = []
ss = StringScanner.new(string)
until ss.eos?
m = ss.scan(re)

There's no advantage in your case when using strscan. Try to store the
positions of the tokens and not the scanned string itself (the token) =>
StringScanner#skip. But that might not work for you.

Regards,

Michael
 

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top