For gsub/gsub!, instead of replacing one or more white space with a
white space, speed it up by replacing two or more white space with a
white space. This saves unneeded processing by not replacing single
white space. I.e., instead of gsub(/ +/, ' '), try gsub(/ +/, ' ') or
gsub(/ {2,}/, ' ') and benchmark them (they should be faster). Of
course, it's still not going to be as fast as split.join or
squeeze.strip (at least depending on the version of ruby used, as older
ruby versions may put squeeze.strip markedly slower than split.join.
#!/usr/bin/ruby
require 'benchmark'
n = 1_000_000
Benchmark.bm(15) do |x|
x.report("gsub") { n.times do
a_string = "This is a test string."
a_string.gsub(/ +/, ' ')
end
}
x.report("gsub!") { n.times do
a_string = "This is a test string."
a_string.gsub!(/ +/, ' ')
end
}
x.report("gsub2") { n.times do
a_string = "This is a test string."
a_string.gsub(/ {2,}/, ' ')
end
}
x.report("gsub!2") { n.times do
a_string = "This is a test string."
a_string.gsub!(/ {2,}/, ' ')
end
}
x.report("gsub s") { n.times do
a_string = "This is a test string."
a_string.gsub(/\s+/, ' ')
end
}
x.report("gsub! s") { n.times do
a_string = "This is a test string."
a_string.gsub!(/\s+/, ' ')
end
}
x.report("gsub2 s") { n.times do
a_string = "This is a test string."
a_string.gsub(/\s{2,}/, ' ')
end
}
x.report("gsub!2 s") { n.times do
a_string = "This is a test string."
a_string.gsub!(/\s{2,}/, ' ')
end
}
x.report("split.join") { n.times do
a_string = "This is a test string."
a_string.split.join(' ')
end
}
x.report("squeeze.strip") { n.times do
a_string = "This is a test string."
a_string.squeeze(' ').strip
end
}
end
$ ./stringcleanup.rb
user system total real
gsub 7.250000 0.120000 7.370000 ( 8.386002)
gsub! 7.240000 0.130000 7.370000 ( 9.007479)
gsub2 7.110000 0.140000 7.250000 ( 9.302915)
gsub!2 6.830000 0.150000 6.980000 ( 9.309362)
gsub s 7.410000 0.140000 7.550000 ( 10.864572)
gsub! s 7.400000 0.130000 7.530000 ( 10.286886)
gsub2 s 7.020000 0.100000 7.120000 ( 8.977424)
gsub!2 s 6.860000 0.100000 6.960000 ( 8.421220)
split.join 4.420000 0.110000 4.530000 ( 5.716684)
squeeze.strip 3.120000 0.110000 3.230000 ( 3.651918)
I haven't run it enough to do any statistical analysis, but after
5 runs it seems like the various flavors of gsub/gsub! don't really
perform any differently. splitting and squeezing look like much
better options.